SQL Saturday 223 OKC

okc This past weekend I made the relatively short (about 3 hours) trip to the Oklahoma City area to attend and speak at the third annual SQL Saturday event there.  I’ve been invited to speak at each of their three events, and I’ve been impressed with each one of them so far.  The folks in the OKC and surrounding groups have an excellent community, and the leadership team did a great job again of putting on a well-organized event.

sqlsat223_TimAndRyan This weekend was a first for me in that I brought along my oldest son Ryan to the event.  At age 9, he’s already showing highly analytical tendencies, and I’m trying to expose him to more technology-related happenings.  He was very excited to attend the event – I was out of town with a client all week, and when I called him each night last week, the upcoming trip was always one of the first things he wanted to talk about.  During the event he did get bored frequently, but that’s to be expected when deeply technical content and a short attention span intersect.  He still enjoyed the event, but it was more about doing something different and getting to see a bit of what I do for a living (he sat on the front row during my presentation) that made it fun for him.  I think it was a great experience for him, and I’m hoping to bring him along to some more events in the future.

For my part in the event, I presented “Handling Errors and Data Anomalies in SSIS”.  Had a moderate sized group, maybe 35-40 folks, with some good discussion during and after the presentation.  For those who attended and are interested in the presentation materials, you can download them here.

Thanks again to the OKC-area folks who organized this event.  I look forward to coming back again next year.

Upcoming SQL Saturday Precons

SQLSatI’m happy to announce that I’ll be delivering three, one-day preconference seminars this summer prior to three different SQL Saturday events:

For each of these events, I’ll be delivering a full day of content entitled “Real World SSIS: A Survival Guide.”  This day of content consists of many lessons that I’ve learned – many of which were learned the hard way – through my decade or so in this business.  I’ve got lots of demos to illustrate the concepts we’ll be covering.

If you are able to make it to any of these SQL Saturday events, I’d be honored if you’d join me for one of these talks.  Registration is open for all three seminars, as well as each of the SQL Saturday events.  I hope to see you there!

Using the SSIS Object Variable as a Result Set Enumerator

tmitch2In the first post in this series, I covered the basics of object typed variables in SQL Server Integration Services, along with a brief examination of some potential use cases.  In this installment, I’m going to illustrate the most common use of object typed variables in SSIS: using an object variable as an ADO recordset within a loop container to perform iterative logic.

Before we examine the how, let’s talk about the why.  Although this is not a design pattern you’ll have to use every day, there are any number of cases that would lend themselves to building and using an ADO recordset enumerator:

  • You need to create a series of export files – one per client – showing that client’s charges for a given period.
  • You’re dealing with a very large set of data, and/or your processing hardware has limited resources.  You want to explore breaking up the workload into smaller chunks to be processed serially.
  • You are performing a data load operation, and want to design the package in such a way that the loaded data can be immediately used as a validation source in the same package execution.

For cases such as these (among others), using this design pattern can be an excellent way to address your ETL needs.

Design Pattern

At a high level, this design pattern will have three moving parts:

  • A relational query used to populate the object variable (thus transforming its internal type into an ADO recordset)
  • A For Each Loop container to loop through the list stored in this variable
  • Some business logic for each value (or set of values) in each row of the object variable

Note that while the first two moving parts I mentioned will be relatively consistent from one package to another, the business logic component will, by nature, vary greatly from one package to another.  For the purposes of this post, I’m purposefully keeping my business logic piece simple so as to not distract from the larger design pattern.

For my sample data, I’m  going to deal with a data domain that is near and dear to my heart: baseball.  In this case I want to get a list of all postseason baseball games, and for each game, create an export file detailing the at-bat statistics for that game.  Because I don’t know at design time how many games will be played in the postseason, I can’t simply hard-code my outputs – I need design the ETL in such a way that the data will dictate, at runtime, the number of output files and their respective filenames.

Configuring and Populate the Object Variable

The first thing I’ll do in my demo package is set up an SSIS variable, giving it the data type of Object.  As shown below, I’m using the SSIS variable named [GameList] as the object typed variable, which will store the ADO recordset list of playoff game IDs that should be processed.  Also included is a variable specifying the directory to which the output files will be written, as well as a variable to store the individual game ID for each iteration of the loop.

variables

Next up, I’m going to add an instance of the Execute SQL Task to the control flow of my package, typing in my query to select the IDs of the playoff games from the database.  In the settings for this task shown below, you’ll also see in the highlighted portion that I’ve changed the behavior of the Result Set to use Full result set (remember the default is None, which would expect no data rows to be returned).  By setting this behavior, I’m configuring the task to expect a result set to be returned.

fullresultset

When I configure the Result Set setting in this way, I also need to indicate where those results should end up – specifically, I have to indicate which object typed variable will store these results.  In the Result Set tab of the same task, I’ll set the variable name to use the [GameList] variable I set up in the previous step.  Also note that the result set name should always be 0 in this case.

resultsetmapping

What I’ve done here is quite simple, and required no code (other than the SQL statement, of course).  What’s happening behind the scenes is a little more complex, however.  At runtime when the Execute SQL Task is executed, the [GameList] variable will be instantiated as a new object of type ADO recordset.  Note that this action will not change the data type shown in SSIS; even though the in-memory object will be configured as an ADO recordset, it will still show up as an object type variable in the designer.  This ADO recordset object will then be loaded with the resulting records, if any, from the query I used in the Execute SQL Task.

Using the SSIS Variable as an Enumerator

My next step will be to consume that list, processing each game ID in turn to extract the data I need.  To handle this, I’ll add a For Each Loop container to the control flow, and connect the previously configured instance of Execute SQL Task to this new container.  When I configure the properties for the loop container, in the Collection tab I’m presented with several different options for the enumerator (the list that controls how many times the logic within the loop will be executed).  Since I’m working from the ADO recordset list created in the previous step, I’m going to select Foreach ADO Enumerator, and use the variable drop down list to select the [GameList] object variable.  I also set the Enumeration Mode to use Rows in the first table, which is the only option I can use when working with a ADO recordset (note that we have more options when working with an ADO.NET recordset, which I plan to cover in a future post).

foreachconfig

With the collection tab set to use my object variable as an enumerator, I’ll next jump over to the Variable Mappings tab.  It is on this tab where I will align fields in the record set with variables in the package.  As shown below, I’m only expecting one column to be returned, and for each iteration of the loop, this value will be stored in the variable named [ThisGameID].  As you can see, I’m using index [0] to indicate the position of this value; if the record set is expected to return more than one column, I could add those in as additional column/variable mappings, using the ordinal position of each column to map to the proper SSIS variable.

variablemapping

With that done, I’ll add an instance of the Data Flow Task to the loop container configured above, which will complete the work on the control flow:

controlflow

Configure the Business Logic in the Data Flow

Now it’s time to dive into the data flow I just created.  Within that data flow, I’ll add a new OLE DB Connection component, the purpose of which will be to retrieve the at-bat statistics for each playoff game. To the output of that source, I will attach an instance of the Flat File Destination, which will be used to send each game’s data to the respective output file.

dataflow

Within the data source, I need to configure the query such that it retrieves data for one and only one game at a time.  Since the current game ID value is stored in the [ThisGameID] SSIS variable, I can simply map that variable as a query parameter, so that each execution of this SELECT query will limit the results to only include statistics for the current game ID.  As shown below, I’m using a parameter placeholder (the question mark in the query) to indicate the use of a parameter:

query

… and when I click the Parameters… button, I can map the SSIS variable containing the game ID to that query parameter:

queryparams

I have already configured an instance of the Flat File Destination (and by extension, set up the Flat File Connection Manager) to allow me to write out the results to a file, but how will I create a separate file per game?  It’s actually quite easy: by using a simple SSIS expression on the ConnectionString property of the Flat File Connection Manager, I can configure the output file name to change on each iteration of the loop by using the game ID value as part of the file name.  As shown below, I’m accessing the Expressions collection within my Flat File Connection manager, overriding the static value of the ConnectionString property with an interpreted value using the amalgamation of two variables – the directory location I specified earlier, along with the current game ID.  Remember that since SSIS variables are evaluated at runtime, the value of the variables can change during execution, thus allowing the use of a single Flat File Connection Manager to write out multiple files during each package execution.

fileconnstr

Finally, when I execute the configured package, I end up with a few dozen output files – one per playoff game.  As a side note, my Texas Rangers were only represented in one of those playoff games from last year.  We’ll get ‘em this year.  As shown below, each output file is distinctified with the game ID as part of the file name.

outputfiles

Conclusion

Use of the SSIS object typed variable can be a very powerful tool, but it need not be complex.  As shown in this example, we can easily leverage the object variable for iteration over a result set without writing a single line of programmatic code.

In the next post in this series, I’ll dig further into object typed SSIS variables, and will explore how to use and manipulate other types of objects not natively represented in SSIS.

Speaking at PASS Summit 2013

I’m happy to announce that I have been selected to presentlogo_header
at the SQL PASS Summit in Charlotte, North Carolina this October.   I’ll be delivering a session entitled “Data Cleansing in SQL Server Integration Services”, in which I’ll cover various ways to detect and cleanse dirty data using tools built into (or accessible from) SQL Server Integration Services.

This will be my third year to present at the PASS Summit.  As always, I’m happy to be able to participate and humbled to be selected to speak.

On a side note, this is the first time in several years that the Summit has been scheduled to be held outside of Seattle, and the first time it will have been held in Charlotte. Although I have to admit that I’m going to miss Seattle, I’m happy that PASS was open to community feedback requesting that the Summit be occasionally held in locations outside of Seattle.

Now we just need to task someone with finding us a karaoke dive bar in Charlotte.

Webinar: Scripting and SSIS

Tomorrow at 10am (11am EDT), I’ll be joining together with my good friend and SSIS Design Patterns coauthor Andy Leonard for a free one hour webinar to discuss scripting in SQL Server Integration Services:

Join SQL Server MVP Tim Mitchell and Andy Leonard as they discuss and demonstrate scripting in SSIS! In this demo-packed session, two co-authors of the book SSIS Design Patterns share their experience using the Script Task and Script Component to accomplish difficult transformations and improve data integration.

You can register online here.  We look forward to seeing you tomorrow!