In the first post in this series, I covered the basics of object typed variables in SQL Server Integration Services, along with a brief examination of some potential use cases. In this installment, I’m going to illustrate the most common use of object typed variables in SSIS: using an object variable as an ADO recordset within a loop container to perform iterative logic.
Before we examine the how, let’s talk about the why. Although this is not a design pattern you’ll have to use every day, there are any number of cases that would lend themselves to building and using an ADO recordset enumerator:
- You need to create a series of export files – one per client – showing that client’s charges for a given period.
- You’re dealing with a very large set of data, and/or your processing hardware has limited resources. You want to explore breaking up the workload into smaller chunks to be processed serially.
- You are performing a data load operation, and want to design the package in such a way that the loaded data can be immediately used as a validation source in the same package execution.
For cases such as these (among others), using this design pattern can be an excellent way to address your ETL needs.
SSIS Object Variable Result Set
At a high level, this design pattern will have three moving parts:
- A relational query used to populate the object variable (thus transforming its internal type into an ADO recordset)
- A For Each Loop container to loop through the list stored in this variable
- Some business logic for each value (or set of values) in each row of the object variable
Note that while the first two moving parts I mentioned will be relatively consistent from one package to another, the business logic component will, by nature, vary greatly from one package to another. For the purposes of this post, I’m purposefully keeping my business logic piece simple so as to not distract from the larger design pattern.
For my sample data, I’m going to deal with a data domain that is near and dear to my heart: baseball. In this case I want to get a list of all postseason baseball games, and for each game, create an export file detailing the at-bat statistics for that game. Because I don’t know at design time how many games will be played in the postseason, I can’t simply hard-code my outputs – I need design the ETL in such a way that the data will dictate, at runtime, the number of output files and their respective filenames.
Configuring and Populate the Object Variable
The first thing I’ll do in my demo package is set up an SSIS variable, giving it the data type of Object. As shown below, I’m using the SSIS variable named [GameList] as the object typed variable, which will store the ADO recordset list of playoff game IDs that should be processed. Also included is a variable specifying the directory to which the output files will be written, as well as a variable to store the individual game ID for each iteration of the loop.
Next up, I’m going to add an instance of the Execute SQL Task to the control flow of my package, typing in my query to select the IDs of the playoff games from the database. In the settings for this task shown below, you’ll also see in the highlighted portion that I’ve changed the behavior of the Result Set to use Full result set (remember the default is None, which would expect no data rows to be returned). By setting this behavior, I’m configuring the task to expect a result set to be returned.
When I configure the Result Set setting in this way, I also need to indicate where those results should end up – specifically, I have to indicate which object typed variable will store these results. In the Result Set tab of the same task, I’ll set the variable name to use the [GameList] variable I set up in the previous step. Also note that the result set name should always be 0 in this case.
What I’ve done here is quite simple, and required no code (other than the SQL statement, of course). What’s happening behind the scenes is a little more complex, however. At runtime when the Execute SQL Task is executed, the [GameList] variable will be instantiated as a new object of type ADO recordset. Note that this action will not change the data type shown in SSIS; even though the in-memory object will be configured as an ADO recordset, it will still show up as an object type variable in the designer. This ADO recordset object will then be loaded with the resulting records, if any, from the query I used in the Execute SQL Task.
Using the SSIS Object Variable Result Set as an Enumerator
My next step will be to consume that list, processing each game ID in turn to extract the data I need. To handle this, I’ll add a For Each Loop container to the control flow, and connect the previously configured instance of Execute SQL Task to this new container. When I configure the properties for the loop container, in the Collection tab I’m presented with several different options for the enumerator (the list that controls how many times the logic within the loop will be executed). Since I’m working from the ADO recordset list created in the previous step, I’m going to select Foreach ADO Enumerator, and use the variable drop down list to select the [GameList] object variable. I also set the Enumeration Mode to use Rows in the first table, which is the only option I can use when working with a ADO recordset (note that we have more options when working with an ADO.NET recordset, which I plan to cover in a future post).
With the collection tab set to use my SSIS object variable result set as an enumerator, I’ll next jump over to the Variable Mappings tab. It is on this tab where I will align fields in the record set with variables in the package. As shown below, I’m only expecting one column to be returned, and for each iteration of the loop, this value will be stored in the variable named [ThisGameID]. As you can see, I’m using index  to indicate the position of this value; if the record set is expected to return more than one column, I could add those in as additional column/variable mappings, using the ordinal position of each column to map to the proper SSIS variable.
With that done, I’ll add a new data flow task to the for each loop container configured above, which will complete the work on the control flow:
Configure the Business Logic in the Data Flow
Within that data flow, I’ll add a new OLE DB Connection component, the purpose of which will be to retrieve the at-bat statistics for each playoff game. To the output of that source, I will attach an instance of the Flat File Destination, which will be used to send each game’s data to the respective output file.
Within the data source, I need to configure the query such that it retrieves data for one and only one game at a time. Since the current game ID value is stored in the [ThisGameID] SSIS variable, I can simply map that variable as a query parameter, so that each execution of this SELECT query will limit the results to only include statistics for the current game ID. As shown below, the placeholder (the question mark in the query) indicates the use of a parameter:
… and when I click the Parameters… button, I can map the SSIS variable containing the game ID to that query parameter:
I have already configured an instance of the Flat File Destination (and by extension, set up the Flat File Connection Manager) to allow me to write out the results to a file, but how will I create a separate file per game? It’s actually quite easy: by using a simple SSIS expression on the ConnectionString property of the Flat File Connection Manager, I can configure the output file name to change on each iteration of the loop by using the game ID value as part of the file name. As shown below, I’m accessing the Expressions collection within my Flat File Connection manager, overriding the static value of the ConnectionString property with an interpreted value using the amalgamation of two variables – the directory location I specified earlier, along with the current game ID. Remember that since SSIS variables are evaluated at runtime, the value of the variables can change during execution, thus allowing the use of a single Flat File Connection Manager to write out multiple files during each package execution.
Finally, when I execute the configured package, I end up with a few dozen output files – one per playoff game. As shown below, each output file is distinctified with the game ID as part of the file name.
Use of the SSIS object typed variable can be a very powerful tool, but it need not be complex. As shown in this example, we can easily leverage the object variable for iteration over a result set without writing a single line of programmatic code.
In the next post in this series, I’ll dig further into object typed SSIS variables, and will explore how to use and manipulate other types of objects not natively represented in SSIS.
Very nice post.
But i’m trying to reproduce your package and when i execute i have the following error message:
The type of the value being assigned to variable “User::ThisGameID” differs from the current variable type. Variables may not change type during execution.
The error occurs on the for each loop.
Do you have a clue of what can be happening?
Herick, this is caused by a data type error. If you’re using the sample I’ve shown here, you’ll need to set the data type to String for the variable named ThisGameID.
Tim I am having the same type of problem as Herick. My object is populated by by a data flow task that outputs a single record with two columns: A string (DT_WSTR) and an integer (DT_I4).
When I build the foreach loop and set the source to my object variable, I can assign the string to a string variable but no matter what datatype I choose for the variable for the integer column I am getting the same error message.
I am using 2008 R2
null value is the issue, use isnull in the query.
very good post.Let me know how to get the column names in all the files
Nidhin, if you’re talking about adding the column names to your output file, you can use the checkbox on the flat file connection manager to set the column names in the first row of the file.
Great post, more clearly explained than almost any other similar posts I’ve seen…and I’ve seen a lot of them. One question though..I have a lot of columns that I have to map in the foreach loop container. Is there a way to do this without having to manually add all the columns? In the past I’ve only needed one or 2, so its never been an issue. Now its looking like its going to be a huge PITA.
Jon, thanks for the kind words!
For the mapping of columns in the for each loop container, are you talking about mapping values from the object variable into other variables, or the source-to-target mappings in the data flow itself? If it’s the data flow, it’s a little easier because SSIS will infer the mappings when the names and data types are the same. Mapping the extracted values from the object variable into other discrete variables is certainly more tedious, and there’s not a good way (in SSIS, anyway) to automate this.
However, there are ways to make it easier. Have you looked into Biml? I’m using Biml a lot these days to generate SSIS packages, and it definitely helps with some of the more tedious and error-prone work – such as column mappings – of SSIS.
Yeah, I’m referring to the more tedious stuff. I’ve never heard of Biml…Going to check it out now! Thanks!
Very helpful – thank you!
Thanks, CS! Glad you found it useful.
Hi This is a great explanation Thanks, but I have a problem that my for each loop cycles through the records but overwrites the file each time instead of creating a new file?
If you are writing to a file within the loop, you’ll most likely need to use a dynamic filename in your flat file connection manager.
If I have a variable of Object type it interferes with my ability to create a configuration file. Try to generate a configuration file from your package and you’ll see what I mean. You get this message:
Property “Value” with type “Object” of variable “User::dataset” can not be exported to the configuration file.
Thank you so much this article is helping me do a task at work very similar to this.
I missed your Richmond pre-con sadly.
Thanks for the feedback! Glad to hear it’s useful. Hopefully we’ll catch up at a future event.
Hi, I’ve been working on something similar to this where I fetch a list of names and pass them into a foreach loop to be used as variables in a child process. I’m having an interesting issue where everything works perfectly as long as there are 2 or more rows returned to be looped through. But if only one row is returned, the loop seems to fail out and the process in the loop is not run. Is this a known issue of looping through a result set in this fashion? If so, how can I get around it? Thank you!
Thanks for an excellent article. Do you make the sample data available so I can reproduce what you did? I looked for a link but did not see one.
Wonderful article. This helps for something that I am currently working on lately. Thank you so much!
Thanks! I’m glad you found it useful.
this polished off a days worth of learning, and following my error codes to each and every destination ty for the finishing touch!
Thanks for the comment! Glad you found it useful.
Great post! Exactly what I needed.
Thank you Tim.
Thanks! Glad it was helpful to you.
I have question over here . I am trying to do almost same stuff but Data flow is different .What I need is I am passing the variable from fullresult set of execute sql task to the expression of another variable and that variable expression is excuting for ado.net source.but I am getting the error while passing the full resultset variable i.e. msid to another variable .because @msid is of object type and while trying to evaluate it in expression builder it gives an datatype incorrect error and its not working even after casting it to dtstr or dtwstr .
Please provide your suggestions.
Since your object variable would contain a dataset rather than just a string, int, or other type, you can’t just assign the object variable to another variable. You can use the For Each Loop to extract each row into a set of variables, or programmatically (using the script task and the System.Data namespace) extract the values.
Hi. I have an exact setup as yours the difference is with the variable mappings I have 2. The output of the object variable brings a list of IDs, so the foreach loop takes in the IDs, but in the data flow it only brings through one ID in place of all the ID. I am trying to understand where is it getting this wrong.
Thank! Knowing that the Enumerator type on the collection page of the Foreach Loop Editor is a “Foreach ADO Enumerator” is what helped me. I was stuck trying to enumerate through a variable and it didn’t work.
I have two questions about using a loop container as opposed to calling the package multiple times as steps in a SQL Job. We have a package that gets a list of databases to for which 8 stored procedures must be run. Originally there was a loop container to loop through the 8 stored procedures and a loop container to loop through multiple databases. The select to get the databases originally pulled only the databases for which we needed to execute the stored procedure in based on a where clause. This resulted 300 records. However the DBA said that we should not use a where clause and to instead return all 1500 databases and then do a check in the loop as to whether the stored procedure should be executed. The DBA also said that rather than have a loop container to loop through the 8 stored procedure it would be more optimal to have a step in the SQL job for each stored procedure and pass the procedure name in as a parameter to the SSIS package.
I am wondering how it is more optimal if we are selecting 1500 records each time the SSIS package is executed and checking for the database to use in the loop container rather than have “Where” clause in the SQL statement that returns the record set of databases to process in. Also why would looping through the stored procedure names be less optimal than calling the package 8 times from the SQL Job. I would appreciate any insight that you have.
Jeannine, thanks for the comment. The short answer is, as always, “It Depends.”
My longer answer is this: I prefer to keep SSIS packages relatively small (see this post: https://www.timmitchell.net/post/2016/11/04/etl-atomicity/) and to use a parent package as an orchestrator to call the child packages as needed. Using SSIS to execute packages – in a loop or otherwise – is much more flexible than using job steps in the SQL Server Agent. You also have better control over what happens upon failure when using an SSIS package to call other packages.
Although there are some valid cases where SQL Server Agent job steps are the right way to handle execution flow, I usually go with an SSIS package as an orchestrator.
A data type object has a length limit as a string – 255 (in your case GameList). Is it possible to increase it or walk around?
I am sorry Tim for my previous comment.
It was a classic problem with Send Mail task (SSIS Send Mail task limited to 255 characters in address).
Thank you soooo much for this great post, but could I know the databases that you’re using for this post, from which site you get it?
Is it from this site, if so, which year, please?
Nazarh, this data set was pulled from an MLB feed that is unfortunately no longer available.
Thanks for sharing this great example. However I am trying to do this similar thing in Excel and wanted to create 2 excel worksheet in a similar way.
Can we do this in the similar way by using excel file destination instead of flat file destination?
Very Useful Post! I had same use case in my organization and you have explained each step with best explanations and screenshots!
Excellent! I am busy trying to compare two folders and copy over missing files (jpg), renaming them at the time. This walked me through really nicely – especially the whole object description. Thanks.