Updating Data with SSIS

Working with an ETL design that does a straight insert operation is the most simple, but often ETL processes are required to update data as well. There are several ways to go about updating data with SSIS, one of which (the staging table method) requires a bit more setup but can also perform significantly better.

Updating Data with SSIS

I see a lot of questions on the forums about updating existing data in SSIS.  When the update is dynamic and is based on elements within your data flow, a common solution is to use the Ole DB Command within the data flow pane:

Updating Data with SSIS

While there are cases in which this would work fine, the fact is that this solution does not scale well. The above update has us retrieving data from some source, and then running an update on our target database based on one or more values in the source.  The problem occurs when you touch more than a handful of rows.  When you bring in the Ole Db command into the picture in a data flow, you’ll be firing the statement in this control once for every row of data in your pipeline.  So let’s say our source retrieves a million rows into the data flow: the UPDATE command inside the OLEDB Command component will be executed one million times!  Such operations could bring the most capable server to its knees.

A better performing solution involves staging your data.  Using this method, you’ll retrieve the data from the source and write it into a staging table in the destination database.  You can then use an Execute SQL task to run your update in a more organic manner.

figure2

Set up your data flow as shown above to pull in the data to a staging table, then you can run a single UPDATE statement:

UPDATE i
SET InvoiceAmount = st.UpdatedInvoiceAmount
FROM Invoices i
INNER JOIN StagedData st ON i.InvoiceID = st.InvoiceID

The advantage here is that you’re executing the expensive UPDATE statement once for each table rather than once for each row affected.

Of course, there are some situations that explicitly disallow the use of staging tables in destination systems.  If storage or access restrictions keep you from using this method, you may have to use the row-by-row insert, so be aware that it’s going to be a bottleneck.

Conclusion

Although row-by-row updates using the OLEDB Command component can work, this solution does not work for large sets of data. When updating data with SSIS, consider instead using a staging table to process the update operation.

About the Author

Tim Mitchell
Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

Be the first to comment on "Updating Data with SSIS"

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.