Using the SSIS Object Variable as a Data Flow Source

Object variables in SSIS are incredibly versatile, allowing the storage of almost any type of data (even .NET objects). In my last post on this topic, I demonstrated how an SSIS object variable containing a .NET DataSet object could be used by the for each loop container as an iterator. In this post, I’ll continue the discussion by showing how SSIS object-typed variables can also be used in the data flow by the script component to generate source data.

The last post described how to use the execute SQL task to execute a query that would generate a result set. As a refresher, the ResultSet setting specifies that the results of the query should be written to a variable.

image

The object variable intended to store the result set of this query is specified in the Result Set tab. Note that this must be an object-typed variable, or the operation will fail.

image

When the package executes, the SSIS variable will be set to an ADO recordset object which contains the results of the query. From here, the variable can be consumed via script – by either the script component or the script task – in SSIS. In this example, I’ll show how to use it on the SSIS data flow through a script component configured as a source.

To read the data from this object variable, I’ve created a data flow, added a script component source, and set the script component to use the variable (User::vResults) as a read-only variable. I have also configured the default output with the expected column metadata from this query, with the expected column names and data types set up in the Inputs and Outputs tab.

image

In the script itself, the SSIS variable containing the ADO recordset can be read by the System.Data.OleDb.OleDbDataAdapter, and then written to a System.Data.DataTable object. It’s relatively easy to extract the results from there, since a DataTable generally has a rows-and-columns structure that is easily navigable. As shown below, the rows from the extracted DataTable object are then, one by one, written to the output buffer of the script component, which will send them down the data flow pipeline and eventually into a relational database.

To be clear, this script would have to be configured with all of the column-level metadata defined at design time. SSIS is very picky about metadata in the data flow, which must be defined at design time and not inferred at runtime.

After adding a destination component to write this data to a table, I can test the package and confirm that the results sent to that table are as expected. The data flow visualization shows the 20 rows originally written to the object variable, and the data viewer confirms that the values have been properly extracted from the object variable and formatted for the data flow.

image

On its own, this example would be impractical. After all, the only thing we’ve done here is taken a result set from a relational database and run it through an object-typed variable, just to extract it and write it out to another database. However, there are cases where using SSIS object variables in a design pattern similar to this would be very helpful:

  • Processing data that requires extensive reshaping or transformation, where the data cannot be transformed in the source (such as data delivered from static stored procedures that cannot be modified)
  • Complex processing not ideally suited for the native SSIS components, such as rolling aggregates or heavy string manipulation
  • Processing the output of stored procedures that return more than one result set

I’m going to spend time in my next post going over the last bullet in that list, to demonstrate how this method behaves with multiple result sets from a single query.

Conclusion

In this post, I’ve demonstrated how to use an object-typed variable containing an ADO recordset to provide source data for the SSIS data flow. Although this is not something the average SSIS developer will use on a daily basis, it’s a very handy design pattern to have in one’s tool belt.

Update: Using the Resultset More Than Once in a Package (07/30/2020)

In the 5+ years since I originally posted this, I’ve been asked several times about how one could process the same ADO recordset object variable more than once, since processing it a second time returns no results. I’ve written a follow-up post describing why this happens and offering a handy workaround.

About the Author

Tim Mitchell
Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

38 Comments on "Using the SSIS Object Variable as a Data Flow Source"

  1. Craigmeister | April 30, 2015 at 8:25 am | Reply

    This is very helpful info. Its purpose seems mostly to allow custom work against a dataset outside of a Data Flow. As you noted, the Script Transformation can allow you to do the same process as your example within a Data Flow. However, this opens up a nearly unlimited number of options.

    The follow up question then is – what are the limits of having the object, the dataset, in memory on the device (the SQL Server/dedicated SSIS server/PC/whatever) running the package? A Data flow uses a pipeline buffer to process large amounts of data, but I am not clear where the SSIS Script task processing memory usage resides.
    Is there an optimum dataset size limit?
    Where is the point when the custom-izableness become less effective?

    Thanks

    • Tim Mitchell | May 1, 2015 at 9:39 am | Reply

      Craig, thanks for the comment. Memory management with the script component is certainly a consideration here, and would almost certainly merit its own blog post (or ten). The short answer is that you can have two different memory spaces – the .NET memory space you use in processing the data in the script, and the data flow memory used when rows are added to the data flow buffer. The SSIS execution engine manages data flow memory in a (mostly) automated fashion, while using scripting to process data outside of the data flow pipeline has a bit more flexibility.

  2. Thank you Tim. Great demo!!!

  3. Matthew Brook | May 27, 2015 at 2:17 pm | Reply

    This is great information! I came across your article by searching for “SSIS variables memory management.” I have a question that is related to Craig’s but more specific.

    I just finished working on a SSIS Package and discovered an unexpected behavior (at least to me). The Package had a SSIS variable declared as Object, a Control Flow that created a Recordset Destination in a Data Flow, then processed the resulting Recordset in a ForEach Loop Container, that contained Script Tasks that got executed for each row of the Recordset.

    In “Script Task A”, I initialized the variable as such:
    List myList = new List();
    Dts.Variables[“User::ValuesList”].Value = myList;

    In “Script Task B”, I ran checks on conditions and added values:
    List myList = (List)Dts.Variables[“User::ValuesList”].Value;
    myList.Add(aValue);

    I logged the resulting List Contents and it worked beautifully. Then I looked at the code and realized that in “Script Task B”, I didn’t save the results of myList back to Dts.Variables[“User::ValuesList”] at the end of “Script Task B.” But yet it kept adding Items to myList.

    In this instance, when referencing SSIS Variables in “Another Script Task”, they act ByRef instead of ByVal. That was unexpected.

    So, my questions are as follows:
    1. Are the SSIS Variables ByRef or ByVal? Do they behave this way across the Entire Package, within a Container Scope or within individual Script Tasks?
    2. If the SSIS Package is executed by a SQL Job, will the package be run as multi-thread or single-thread?
    3. When do the SSIS variables revert back to their Default Values?
    4. If a SSIS variable is declared as Object and in “Script Task B” is cast into a List that has many values added to it, should I have “Script Task C” that sets the Object variable back to an Empty List?

  4. Hi Matthew,

    When dealing with variables in the script task or script component, variables are passed by reference. There are some cases where variables are passed by value (such as when using the Execute Package Task) but typically they are passed by ref.

    Variables don’t arbitrarily revert back to their default values during a single execution. However, any change to a variable value is only valid for a single execution, so when you execute the same package again, the variables will start with whatever default values are assigned.

  5. A great article. I would like to add that this is a great alternative to MERGE JOIN transformations and LOOKUPS. The downside with merge joins is that you have to sort both data sets, and it is a semi-blocking transformation, so performance is very likely to take a hit. The problem with lookups is that they will not process multiple matching rows and generate duplicates, and therefore may not yield the expected results in some cases.

  6. Hi I landed up on your page as i was searching for how to use the results of ForEachLoop variable as a data set outside the foreach loop. My foreachloop runs almost 5000 times in 8 seconds and as a result contains 5000 records. After which I would like to do a look up on the result of the data set. You article is great and I tried your method. For some reason, the data viewer after the script is showing 1 record at a time based on the above script even though my script component is outside the foreach loop container. But your screenshot shows 20records in one window. For me it shows only 1 record everytime i debug.

    • Hi Raj – most likely the issue lies in the For Each Loop processing. If it’s running 5000 times (and loading a variable each time), it’s overwriting – not appending to – that variable each time the loop runs.

  7. SSIS Engineering Group

  8. Perhaps related to two of the other comments, I’ve found that the object variable loses its data after the first use. In my case I am populating them at the start of the package and attempting to use in more than one data flow. But after the first data flow is done looping through the var contents, the next data flow finds no rows in the object variable (count=0). I am using it within a script component of type ‘transformation’ and doing a lookup function with it. The rows are loaded exactly as in the article (da.Fill, ForEach row, etc.), the object var is marked as read-only in the SC properties. Is this a limitation of the object variable, or am I doing something wrong?

    • Hi Juan – there is no reason the value of the object variable should be gone after it is used for the first time. The value can be overwritten, but once the object value is set up it will remain static in that value until it is either set to something different, falls out of scope, or the package ends.

      I’d check to make sure that your object variable is set to package scope.

  9. Tim, you are correct in that the object variable doesn’t lose its value, but for whatever reason I can’t load it back into a datatable via a dataAdapter. I can however cast the variable as an ADODB.Recordset and traverse it with that object model. I tried this just before giving up on using this object var as a data source method for a complex lookup. For this to work I have to add the ADO 2.7 reference. There must be something going onwith the status of the ADODB recordset that prevents it from loading again via the ADO.Net data adapter. I may look into that later, for now I have to keep making progress in this project. Thanks for the ideas, will keep circling back to your blog posts often.

    • Juan, when you mentioned traversing the object, it reminded me that there is a pointer (either on the data set or on the data adapter – not certain right offhand) that lets you traverse the records from top to bottom. If you don’t reset that pointer, it will appear as if there are no other rows to process. I don’t recall the syntax but there is a way to programmatically reset that pointer. I suspect that’s probably the source of the issue.

  10. I usually don’t post thanks , But really “Thanks” Sir 🙂

  11. hi… the last steps when you populate the column you have (int,datatime) data type can you please tell me how will be if i have varchare datat type for example i tried to write it like that string. and after i put (.) i try to type parse but look like not write format… can you please lead me to write formate

  12. thanks for such greate articles… but the scenario i working with use the datat type varchar.. i tried to defined it like that (myOutputBuffer.Name = string.Parse) but when i put string follwed with parse through me error.. can you please tell me whate is the right way to do it

  13. Hi Tim.

    When I was trying to replicate your package. on execution i’m getting this error.

    Run time error:

    at System.Data.OleDb.OleDbDataAdapter.FillFromADODB(Object data, Object adodb, String srcTable, Boolean multipleResults)
    at System.Data.OleDb.OleDbDataAdapter.Fill(DataTable dataTable, Object ADODBRecordSet)
    at ScriptMain.CreateNewOutputRows()
    at UserComponent.PrimeOutput(Int32 Outputs, Int32[] OutputIDs, PipelineBuffer[] Buffers, OutputNameMap OutputMap)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers).

    Error:
    [Script Component [35]] Error: System.ArgumentException: Object is not an ADODB.RecordSet or an ADODB.Record.
    Parameter name: adodb
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPrimeOutput(IDTSManagedComponentWrapper100 wrapper, Int32 outputs, Int32[] outputIDs, IDTSBuffer100[] buffers, IntPtr ppBufferWirePacket)

    Please help me out with this issue.

  14. Hello Tim,

    I have a SQL table which holds 5 queries to be run against an Oracle source. I use SSIS to read and store each query to a Variable and I have confirmed the variable’s contents during runtime. Now, how do I execute the code in the variable against the Oracle DB ? I tried a Data flow task, but that failed as the variable did not have any columns (metadata) during design time and hence, was unable to define the Source – destination mapping.

    Thanks in advance for your valuable help.

  15. Hi Tim,
    I inherit my package where, Script task populate the object var, and passes to the script component (look exactly like your, except var name). The package ran successfully from the SSIS Catalog, but failed running from a sql agent job (I created a proxy account for this job). I am using VS2015 and SQL Server 2014 on the same Windows 2012 R2. Here is the error:
    Object is not an ADODB.RecordSet or an ADODB.Record.
    Parameter name: adodb
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPrimeOutput(IDTSManagedComponentWrapper100 wrapper, Int32 outputs, Int32[] outputIDs, IDTSBuffer100[] buffers, IntPtr ppBufferWirePacket)
    I am new to VB, but have experience on many programing languages. Any suggestions?
    Thank you
    tlf

    • Lan, if it is running successfully when manually executed from the catalog but failing when invoked from SQL Server agent, it’s most likely a permissions issue. That’s where I’d start troubleshooting.

      • Hi Tim,
        Thank you for your response. I eliminate permission is the issue since my job is executing by a proxy account and has admin privilege (Windows and SQL). It might be incompatibility between system.object and ADODB.recordset (Script Task to Script Component). It also failed when I converted the object to string in the OleDbDataAdapter.Fill method. Any suggestions?
        Thank you,
        tlf

        • Hmm, that’s interesting. Sounds like you’ve checked all of the common things related to permissions. Right offhand nothing else stands out to me. You’ll probably have to do a line-by-line debug execution to drill down to the root cause.

  16. Hi Tim,
    Thank you for your response. I eliminate permission is the issue since my job is executing by a proxy account and has admin privilege (Windows and SQL). It might be incompatibility between system.object and ADODB.recordset (Script Task to Script Component). It also failed when I converted the object to string in the OleDbDataAdapter.Fill method. Any suggestions?
    Thank you,
    tlf

  17. Arun Kumaravel | May 23, 2018 at 6:56 am | Reply

    Hi Tim.

    Thanks for the very useful info. I am a newbie in script creation. Do u have the script for the above still?? I managed to pull my varchar(max) to the object variable but need the script to put that into a string and use it as my data flow source.

    Regards
    Arun

  18. Hi Tim
    Many thanks for demystifying the script component, and how it accesses SSIS variables. Very useful indeed!

    Regards
    Scott

  19. Tim, thank you for the illustrations and explanation. Very helpful stuff!

    One question: In your example, you populate an Object and then use the Script component to read the Object and flow the data outwards. If I had a stored procedure that output data, would I use a similar design pattern with the script component?

    I have a proc that I can’t use as an OLEDB Source, because it uses temp tables and so SSIS can’t read the metadata to build the output. I can define the output myself, using the methods you’re describing above.

    So, rather than use the Execute SQL task to create an Object variable, and then use a script component for processing, I’d prefer to execute the proc in the script component.

    Am I on the right track? Or, are there reasons why one might want to execute the proc separately and then process the object variable as you’ve done here?

    Cheers! And thanks again for your article.

  20. Thank you for script. I updated it to use reflection to save some time mapping each column and it’s data type in code. Still, output columns must be defined & output column names must match those within custom object for this to work.

    See gist: https://gist.github.com/janis-veinbergs/df95b0198c972fbd42e2dc7b7eb70153

  21. Hi Tim, I’ve hit your site a number of times recently and have found some very useful information, so thank you! I do have a question though: A long while back, you told a person named Juan about a pointer – “Juan, when you mentioned traversing the object, it reminded me that there is a pointer (either on the data set or on the data adapter – not certain right offhand) that lets you traverse the records from top to bottom. If you don’t reset that pointer, it will appear as if there are no other rows to process. I don’t recall the syntax but there is a way to programmatically reset that pointer.” I saw that as an answer to a problem I am having, so went on a quest to find this pointer. Sadly, my google-foo is not up to the task and the internet has no idea what I’m asking about. So I thought I’d ask you to see if you ever remembered. How does one address the pointer in the data set or data adapter or ???

  22. Thanks for the post Tim! Well written and saved me heaps of time and resources struggling with huge raw files over my LAN.

  23. I am getting an Error
    The name ‘SalesOutputBuffer’ does not exist in the current context

    Have we defined this variable/class anywhere ?

    • Paul Sherrill | March 13, 2023 at 1:39 pm | Reply

      So if you name the “output” as shown in picture 3, the script will have it available automatically.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.