Using the SSIS Object Variable as a Data Flow Source

TOPICS:ETL Scripting SSIS SSIS Object Variable

Posted By: Tim Mitchell April 20, 2015

Object variables in SSIS are incredibly versatile, allowing the storage of almost any type of data (even .NET objects). In my last post on this topic, I demonstrated how an SSIS object variable containing a .NET DataSet object could be used by the for each loop container as an iterator. In this post, I’ll continue the discussion by showing how SSIS object-typed variables can also be used in the data flow by the script component to generate source data.

The last post described how to use the execute SQL task to execute a query that would generate a result set. As a refresher, the ResultSet setting specifies that the results of the query should be written to a variable.

The object variable intended to store the result set of this query is specified in the Result Set tab. Note that this must be an object-typed variable, or the operation will fail.

When the package executes, the SSIS variable will be set to an ADO recordset object which contains the results of the query. From here, the variable can be consumed via script – by either the script component or the script task – in SSIS. In this example, I’ll show how to use it on the SSIS data flow through a script component configured as a source.

To read the data from this object variable, I’ve created a data flow, added a script component source, and set the script component to use the variable (User::vResults) as a read-only variable. I have also configured the default output with the expected column metadata from this query, with the expected column names and data types set up in the Inputs and Outputs tab.

In the script itself, the SSIS variable containing the ADO recordset can be read by the System.Data.OleDb.OleDbDataAdapter, and then written to a System.Data.DataTable object. It’s relatively easy to extract the results from there, since a DataTable generally has a rows-and-columns structure that is easily navigable. As shown below, the rows from the extracted DataTable object are then, one by one, written to the output buffer of the script component, which will send them down the data flow pipeline and eventually into a relational database.

To be clear, this script would have to be configured with all of the column-level metadata defined at design time. SSIS is very picky about metadata in the data flow, which must be defined at design time and not inferred at runtime.

After adding a destination component to write this data to a table, I can test the package and confirm that the results sent to that table are as expected. The data flow visualization shows the 20 rows originally written to the object variable, and the data viewer confirms that the values have been properly extracted from the object variable and formatted for the data flow.

On its own, this example would be impractical. After all, the only thing we’ve done here is taken a result set from a relational database and run it through an object-typed variable, just to extract it and write it out to another database. However, there are cases where using SSIS object variables in a design pattern similar to this would be very helpful:

Processing data that requires extensive reshaping or transformation, where the data cannot be transformed in the source (such as data delivered from static stored procedures that cannot be modified)
Complex processing not ideally suited for the native SSIS components, such as rolling aggregates or heavy string manipulation
Processing the output of stored procedures that return more than one result set

I’m going to spend time in my next post going over the last bullet in that list, to demonstrate how this method behaves with multiple result sets from a single query.

Conclusion

In this post, I’ve demonstrated how to use an object-typed variable containing an ADO recordset to provide source data for the SSIS data flow. Although this is not something the average SSIS developer will use on a daily basis, it’s a very handy design pattern to have in one’s tool belt.

Update: Using the Resultset More Than Once in a Package (07/30/2020)

In the 5+ years since I originally posted this, I’ve been asked several times about how one could process the same ADO recordset object variable more than once, since processing it a second time returns no results. I’ve written a follow-up post describing why this happens and offering a handy workaround.

About the Author

Tim Mitchell

Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

38 Comments on "Using the SSIS Object Variable as a Data Flow Source"

Craigmeister | April 30, 2015 at 8:25 am | Reply

This is very helpful info. Its purpose seems mostly to allow custom work against a dataset outside of a Data Flow. As you noted, the Script Transformation can allow you to do the same process as your example within a Data Flow. However, this opens up a nearly unlimited number of options.

The follow up question then is – what are the limits of having the object, the dataset, in memory on the device (the SQL Server/dedicated SSIS server/PC/whatever) running the package? A Data flow uses a pipeline buffer to process large amounts of data, but I am not clear where the SSIS Script task processing memory usage resides.
Is there an optimum dataset size limit?
Where is the point when the custom-izableness become less effective?

Thanks
- Tim Mitchell | May 1, 2015 at 9:39 am | Reply
  
  Craig, thanks for the comment. Memory management with the script component is certainly a consideration here, and would almost certainly merit its own blog post (or ten). The short answer is that you can have two different memory spaces – the .NET memory space you use in processing the data in the script, and the data flow memory used when rows are added to the data flow buffer. The SSIS execution engine manages data flow memory in a (mostly) automated fashion, while using scripting to process data outside of the data flow pipeline has a bit more flexibility.
Jing Huang | April 30, 2015 at 8:33 am | Reply

Thank you Tim. Great demo!!!
- Tim Mitchell | May 1, 2015 at 9:36 am | Reply
  
  Thanks Jing!
Matthew Brook | May 27, 2015 at 2:17 pm | Reply

This is great information! I came across your article by searching for “SSIS variables memory management.” I have a question that is related to Craig’s but more specific.

I just finished working on a SSIS Package and discovered an unexpected behavior (at least to me). The Package had a SSIS variable declared as Object, a Control Flow that created a Recordset Destination in a Data Flow, then processed the resulting Recordset in a ForEach Loop Container, that contained Script Tasks that got executed for each row of the Recordset.

In “Script Task A”, I initialized the variable as such:
List myList = new List();
Dts.Variables[“User::ValuesList”].Value = myList;

In “Script Task B”, I ran checks on conditions and added values:
List myList = (List)Dts.Variables[“User::ValuesList”].Value;
myList.Add(aValue);

I logged the resulting List Contents and it worked beautifully. Then I looked at the code and realized that in “Script Task B”, I didn’t save the results of myList back to Dts.Variables[“User::ValuesList”] at the end of “Script Task B.” But yet it kept adding Items to myList.

In this instance, when referencing SSIS Variables in “Another Script Task”, they act ByRef instead of ByVal. That was unexpected.

So, my questions are as follows:
1. Are the SSIS Variables ByRef or ByVal? Do they behave this way across the Entire Package, within a Container Scope or within individual Script Tasks?
2. If the SSIS Package is executed by a SQL Job, will the package be run as multi-thread or single-thread?
3. When do the SSIS variables revert back to their Default Values?
4. If a SSIS variable is declared as Object and in “Script Task B” is cast into a List that has many values added to it, should I have “Script Task C” that sets the Object variable back to an Empty List?
Tim Mitchell | May 28, 2015 at 9:01 am | Reply

Hi Matthew,

When dealing with variables in the script task or script component, variables are passed by reference. There are some cases where variables are passed by value (such as when using the Execute Package Task) but typically they are passed by ref.

Variables don’t arbitrarily revert back to their default values during a single execution. However, any change to a variable value is only valid for a single execution, so when you execute the same package again, the variables will start with whatever default values are assigned.
- Matthew Brook | May 28, 2015 at 11:37 am | Reply
  
  Tim, thank you for the quick response. I appreciate it!
Samir | June 17, 2015 at 12:16 pm | Reply

A great article. I would like to add that this is a great alternative to MERGE JOIN transformations and LOOKUPS. The downside with merge joins is that you have to sort both data sets, and it is a semi-blocking transformation, so performance is very likely to take a hit. The problem with lookups is that they will not process multiple matching rows and generate duplicates, and therefore may not yield the expected results in some cases.
Raj | July 14, 2015 at 9:18 am | Reply

Hi I landed up on your page as i was searching for how to use the results of ForEachLoop variable as a data set outside the foreach loop. My foreachloop runs almost 5000 times in 8 seconds and as a result contains 5000 records. After which I would like to do a look up on the result of the data set. You article is great and I tried your method. For some reason, the data viewer after the script is showing 1 record at a time based on the above script even though my script component is outside the foreach loop container. But your screenshot shows 20records in one window. For me it shows only 1 record everytime i debug.
- Tim Mitchell | July 14, 2015 at 10:24 am | Reply
  
  Hi Raj – most likely the issue lies in the For Each Loop processing. If it’s running 5000 times (and loading a variable each time), it’s overwriting – not appending to – that variable each time the loop runs.
Ron Cash | June 24, 2016 at 3:21 pm | Reply

SSIS Engineering Group
Juan | August 17, 2016 at 5:54 pm | Reply

Perhaps related to two of the other comments, I’ve found that the object variable loses its data after the first use. In my case I am populating them at the start of the package and attempting to use in more than one data flow. But after the first data flow is done looping through the var contents, the next data flow finds no rows in the object variable (count=0). I am using it within a script component of type ‘transformation’ and doing a lookup function with it. The rows are loaded exactly as in the article (da.Fill, ForEach row, etc.), the object var is marked as read-only in the SC properties. Is this a limitation of the object variable, or am I doing something wrong?
- Tim Mitchell | August 19, 2016 at 6:14 pm | Reply
  
  Hi Juan – there is no reason the value of the object variable should be gone after it is used for the first time. The value can be overwritten, but once the object value is set up it will remain static in that value until it is either set to something different, falls out of scope, or the package ends.
  
  I’d check to make sure that your object variable is set to package scope.
Juan | August 20, 2016 at 11:09 pm | Reply

Tim, you are correct in that the object variable doesn’t lose its value, but for whatever reason I can’t load it back into a datatable via a dataAdapter. I can however cast the variable as an ADODB.Recordset and traverse it with that object model. I tried this just before giving up on using this object var as a data source method for a complex lookup. For this to work I have to add the ADO 2.7 reference. There must be something going onwith the status of the ADODB recordset that prevents it from loading again via the ADO.Net data adapter. I may look into that later, for now I have to keep making progress in this project. Thanks for the ideas, will keep circling back to your blog posts often.
- Tim Mitchell | August 21, 2016 at 10:46 am | Reply
  
  Juan, when you mentioned traversing the object, it reminded me that there is a pointer (either on the data set or on the data adapter – not certain right offhand) that lets you traverse the records from top to bottom. If you don’t reset that pointer, it will appear as if there are no other rows to process. I don’t recall the syntax but there is a way to programmatically reset that pointer. I suspect that’s probably the source of the issue.
Subrat | August 30, 2016 at 6:30 am | Reply

I usually don’t post thanks , But really “Thanks” Sir 🙂
- Tim Mitchell | September 17, 2016 at 1:52 pm | Reply
  
  Thanks so much! Glad you found the post helpful.
ali | February 28, 2017 at 11:26 pm | Reply

hi… the last steps when you populate the column you have (int,datatime) data type can you please tell me how will be if i have varchare datat type for example i tried to write it like that string. and after i put (.) i try to type parse but look like not write format… can you please lead me to write formate
ali | March 1, 2017 at 9:26 am | Reply

thanks for such greate articles… but the scenario i working with use the datat type varchar.. i tried to defined it like that (myOutputBuffer.Name = string.Parse) but when i put string follwed with parse through me error.. can you please tell me whate is the right way to do it
- Tim Mitchell | March 2, 2017 at 8:27 am | Reply
  
  There could be any number of things causing this. I’ve found that running script tasks/components in debug (line-by-line execution) mode helps to identify the source of the error. https://msdn.microsoft.com/en-us/library/ms140033.aspx
Venkat | March 1, 2017 at 8:24 pm | Reply

Hi Tim.

When I was trying to replicate your package. on execution i’m getting this error.

Run time error:

at System.Data.OleDb.OleDbDataAdapter.FillFromADODB(Object data, Object adodb, String srcTable, Boolean multipleResults)
at System.Data.OleDb.OleDbDataAdapter.Fill(DataTable dataTable, Object ADODBRecordSet)
at ScriptMain.CreateNewOutputRows()
at UserComponent.PrimeOutput(Int32 Outputs, Int32[] OutputIDs, PipelineBuffer[] Buffers, OutputNameMap OutputMap)
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers).

Error:
[Script Component [35]] Error: System.ArgumentException: Object is not an ADODB.RecordSet or an ADODB.Record.
Parameter name: adodb
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPrimeOutput(IDTSManagedComponentWrapper100 wrapper, Int32 outputs, Int32[] outputIDs, IDTSBuffer100[] buffers, IntPtr ppBufferWirePacket)

Please help me out with this issue.
- Tim Mitchell | March 2, 2017 at 8:25 am | Reply
  
  The error message indicates that the variable type is incorrect.
  - Bhumika | July 26, 2018 at 2:51 am | Reply
    
    Hi Tim,
    
    I am also getting the same issue. Can you please tell me how to correct this issue as i got stuck in this issue.
Vinodh | June 20, 2017 at 2:46 pm | Reply

Hello Tim,

I have a SQL table which holds 5 queries to be run against an Oracle source. I use SSIS to read and store each query to a Variable and I have confirmed the variable’s contents during runtime. Now, how do I execute the code in the variable against the Oracle DB ? I tried a Data flow task, but that failed as the variable did not have any columns (metadata) during design time and hence, was unable to define the Source – destination mapping.

Thanks in advance for your valuable help.
Lan | October 12, 2017 at 7:35 pm | Reply

Hi Tim,
I inherit my package where, Script task populate the object var, and passes to the script component (look exactly like your, except var name). The package ran successfully from the SSIS Catalog, but failed running from a sql agent job (I created a proxy account for this job). I am using VS2015 and SQL Server 2014 on the same Windows 2012 R2. Here is the error:
Object is not an ADODB.RecordSet or an ADODB.Record.
Parameter name: adodb
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPrimeOutput(IDTSManagedComponentWrapper100 wrapper, Int32 outputs, Int32[] outputIDs, IDTSBuffer100[] buffers, IntPtr ppBufferWirePacket)
I am new to VB, but have experience on many programing languages. Any suggestions?
Thank you
tlf
- Tim Mitchell | October 15, 2017 at 9:26 am | Reply
  
  Lan, if it is running successfully when manually executed from the catalog but failing when invoked from SQL Server agent, it’s most likely a permissions issue. That’s where I’d start troubleshooting.
  - Lan | October 16, 2017 at 12:04 pm | Reply
    
    Hi Tim,
    Thank you for your response. I eliminate permission is the issue since my job is executing by a proxy account and has admin privilege (Windows and SQL). It might be incompatibility between system.object and ADODB.recordset (Script Task to Script Component). It also failed when I converted the object to string in the OleDbDataAdapter.Fill method. Any suggestions?
    Thank you,
    tlf
    - Tim Mitchell | October 17, 2017 at 9:49 am | Reply
      
      Hmm, that’s interesting. Sounds like you’ve checked all of the common things related to permissions. Right offhand nothing else stands out to me. You’ll probably have to do a line-by-line debug execution to drill down to the root cause.
Lan | October 16, 2017 at 11:03 am | Reply

Hi Tim,
Thank you for your response. I eliminate permission is the issue since my job is executing by a proxy account and has admin privilege (Windows and SQL). It might be incompatibility between system.object and ADODB.recordset (Script Task to Script Component). It also failed when I converted the object to string in the OleDbDataAdapter.Fill method. Any suggestions?
Thank you,
tlf
Arun Kumaravel | May 23, 2018 at 6:56 am | Reply

Hi Tim.

Thanks for the very useful info. I am a newbie in script creation. Do u have the script for the above still?? I managed to pull my varchar(max) to the object variable but need the script to put that into a string and use it as my data flow source.

Regards
Arun
Scott | September 13, 2018 at 4:15 am | Reply

Hi Tim
Many thanks for demystifying the script component, and how it accesses SSIS variables. Very useful indeed!

Regards
Scott
Chris Carson | December 18, 2018 at 1:22 pm | Reply

Tim, thank you for the illustrations and explanation. Very helpful stuff!

One question: In your example, you populate an Object and then use the Script component to read the Object and flow the data outwards. If I had a stored procedure that output data, would I use a similar design pattern with the script component?

I have a proc that I can’t use as an OLEDB Source, because it uses temp tables and so SSIS can’t read the metadata to build the output. I can define the output myself, using the methods you’re describing above.

So, rather than use the Execute SQL task to create an Object variable, and then use a script component for processing, I’d prefer to execute the proc in the script component.

Am I on the right track? Or, are there reasons why one might want to execute the proc separately and then process the object variable as you’ve done here?

Cheers! And thanks again for your article.
jveinbergs | December 3, 2019 at 9:30 am | Reply

Thank you for script. I updated it to use reflection to save some time mapping each column and it’s data type in code. Still, output columns must be defined & output column names must match those within custom object for this to work.

See gist: https://gist.github.com/janis-veinbergs/df95b0198c972fbd42e2dc7b7eb70153
Craig | July 20, 2020 at 3:58 pm | Reply

Hi Tim, I’ve hit your site a number of times recently and have found some very useful information, so thank you! I do have a question though: A long while back, you told a person named Juan about a pointer – “Juan, when you mentioned traversing the object, it reminded me that there is a pointer (either on the data set or on the data adapter – not certain right offhand) that lets you traverse the records from top to bottom. If you don’t reset that pointer, it will appear as if there are no other rows to process. I don’t recall the syntax but there is a way to programmatically reset that pointer.” I saw that as an answer to a problem I am having, so went on a quest to find this pointer. Sadly, my google-foo is not up to the task and the internet has no idea what I’m asking about. So I thought I’d ask you to see if you ever remembered. How does one address the pointer in the data set or data adapter or ???
- Tim Mitchell | July 30, 2020 at 1:37 pm | Reply
  
  Hey Craig, I was typing up a response to your question, and it kept getting longer and longer so I turned it into a blog post 🙂 Here you go: https://www.timmitchell.net/post/2020/07/30/reusing-a-recordset-in-an-ssis-object-variable/
Tom | January 6, 2021 at 9:32 pm | Reply

Thanks for the post Tim! Well written and saved me heaps of time and resources struggling with huge raw files over my LAN.
Chintan | July 31, 2022 at 10:33 pm | Reply

I am getting an Error
The name ‘SalesOutputBuffer’ does not exist in the current context

Have we defined this variable/class anywhere ?
- Paul Sherrill | March 13, 2023 at 1:39 pm | Reply
  
  So if you name the “output” as shown in picture 3, the script will have it available automatically.