A few years back, I wrote a blog post about using an SSIS object variable as a data flow source. In that post, I described how you could load a set of query results into an object-typed variable in SQL Server Integration Services and then use that in-memory data as a source within a data flow. In the comments and the feedback I got on that post, the same question kept coming up: what is the process for reusing a recordset in an SSIS object variable in the same package?
In this post, I’ll show how you can modify the scripts within your SSIS package to allow reprocessing of the same set of results in an object variable.
Reusing a Recordset in an SSIS Object Variable
First, a couple of quick definitions: A result set is any set of data retrieved from a relational database. For our purposes, a recordset (one word, not two) is stylized to indicate that we’re referring to an ADO recordset object rather than a generic set of results. The legacy ADO recordset is the default structure in which an SSIS object variable stores a set of results from a relational query. Also, a datatable is a class of object used for storing a set of rows and columns. The older recordset shares some of the same attributes as its newer cousin, but the latter is much easier to work with in code.
As I demonstrated in the previously linked post, you can use an Execute SQL Task in SSIS to populate an object-typed variable with the results from the query in that task. When you do so, that object variable contains an ADO recordset which can be used as a data flow source or an enumerator in a foreach loop.
Using the recordset object in an SSIS script does have one notable weakness, though: it maintains a persistent pointer to the row of data currently being read, and when it reaches the end of the data, that pointer doesn’t automatically reset. If you try to read the same recordset again, it’ll behave as if the recordset is empty since the pointer has reached the end of the data. It is possible to reset this pointer manually, but you’d need to add several steps to be able to do so (including the addition of another .dll reference in each of your script components).
Where is This Useful?
Valid question. Processing a recordset via script in SSIS isn’t something you’ll need to use on an everyday basis, and having to process the same recordset more than once in the same package is even less common. However, it’s often useful to cycle through the same recordset object in SSIS twice in the same package when several of the following apply to your load:
- The data source (or just the query, view, or stored procedure you’re using) returns results slowly, or the data retrieval significantly impacts performance on the source system
- You need to clean or reshape the data in a way that isn’t easily handled through normal data flow components
- The same data needs to be processed in more than one different way; for example, you need to identify a parent transaction and separately roll up all the child transactions for each parent
- You can’t create staging tables on the destination server, thus preventing these operations from being loaded and then transformed (ELT as opposed to ETL)
Again, this is an edge case, but this scenario was the most commonly referenced scenario in the feedback I got from my last post on this topic.
Converting the Recordset to a Data Table
There are some ways to work directly with the ADO recordset object, including using the Recordset.MoveFirst() method to reset that read pointer. However, if you go that route, you have to do this for every time you need to reprocess that recordset, and working with the recordset object isn’t exactly straightforward. What I’ve found is that it’s much easier to convert that recordset object into a datatable object, which is far simpler to work with in SSIS and does not suffer from the same row pointer limitation as the recordset.
To accomplish this, I’ll take the SSIS object variable loaded with the recordset as shown in the first step of this post, and run that variable through a new script task to explicitly convert it from a recordset to a datatable object. In that script task, I need to first add the SSIS object variable to the list of ReadWriteVariables as shown below.
Inside the script, I go through a type conversion process similar to that I wrote about in the previous blog post. I’m creating an OleDbDataAdapter for extracting the recordset data and writing it into a new datatable object. However, in this example, I won’t immediately use this datatable as I did in the prior post. Instead, I’m going to write that datatable object back to the SSIS object-typed variable. This will replace the serialized recordset object with a serialized datatable object containing exactly the same data.
The change I just made won’t be detectable to the naked eye, since we’re dealing with serialized object data. However, when I use that SSIS object variable in a script component in the subsequent data flow tasks, this change will be obvious. No longer will I need to use the OleDbDataAdapter to convert the recordset into a data table. As shown below in the code from the script component, I can now simply cast the variable directly to a data table and start using that data.
As shown, we’re just looping through the rows in the datatable object. Since there’s no persistent pointer to reset, we’re always dealing with the full set of data in this datatable. I can run the same simple, one-line type conversion from serialized object to datatable each time I need to reuse this result set in the same package.
This design has the advantage of making my code simpler, and avoiding the record pointer reset required when using the recordset object. In fact, using a datatable object is so much easier than dealing with the legacy ADO recordset, you may want to use this method for handling in-memory data sets even if you don’t need to process the result set multiple times.
For those infrequent occasions when you may need to process an SSIS object variable recordset object more than once, you can save yourself some time and trouble by converting that recordset to a datatable object. The latter is more modern, easier to work with, and requires less code.