SSIS Parent-Child Architecture in Catalog Deployment Mode

This is the third in a series of posts about SSIS parent-child architecture.  You can find the index page here.

In my previous posts on SSIS parent-child package architecture, I described the benefits of the parent-child package design pattern and demonstrated the implementation of such a pattern in package deployment mode.  In this post, I will demonstrate the parent-child pattern for implementations using catalog deployment mode, which is the default design on SSIS 2012 and 2014.

Catalog deployment mode in SSIS

If you’re reading this post and find yourself asking, “What is catalog deployment mode?”, here it is in a nutshell: Starting with SQL Server 2012, there were significant changes to the architecture of SSIS, most notably the move to a deployment/storage structure called catalog deployment model (which is also frequently referred to as the project deployment model).  In this model, SSIS code is more project-centric than package-centric; packages are deployed as an entire project instead of individually (though each package can still be executed independently).  Catalog deployment mode in SSIS also brought the addition of parameters, which can be used to externally set runtime values for package executions, as well as project connections, which can be used to easily share connection settings across all packages in a project.  Many other changes were introduced, including a simpler logging model and a dedicated SSIS database.

Among the many changes brought about by the catalog deployment model, the one that had the most impact on the parent-child design pattern is the addition of parameters.  In older versions of SSIS, it was possible to pass runtime values to a package, but the process was clunky at best.  When using SSIS 2012 or 2014 in catalog deployment mode, setting runtime values for a child package (or a standalone package, for that matter) is much easier and more straightforward than performing the same task in previous versions.

It is also worth noting that you don’t have to use the catalog deployment model in recent versions of SSIS.  Although catalog deployment model is the default setting in SSIS 2012 and 2014, you can set your project to use the package deployment model.  You’ll lose many of the new features (including parameterization and simplified logging) by choosing package deployment model, but this might be practical if your organization has made a significant investment in SSIS architecture that would be broken by moving to catalog deployment model.

Parent-child package execution

At the heart of parent-child architecture is the collection of parameters.  In catalog deployment mode, we can set up parameters at the package level or at the project level.  For values that would affect just a single package, using a package parameter would make the most sense.  However, if a value might need to be shared among several (or all) packages in a particular project, a project parameter would allow you to create the parameter once for the entire project rather than one for each package.

Execute package task

When executing a child package, the simplest method is still the execute package task.  Introduced in 2012, the execute package task now has a dropdown list (shown below, on the Package tab) to allow the SSIS developer to specify the target package.

image

There are a few limitations with this approach.  Most notably, this dropdown list selection only works when calling a package that exists in the same project.  You’ll notice that the selection above the package name, entitled ReferenceType, is set to Project Reference.   Though you can change ReferenceType to use a project located elsewhere, oddly enough you can’t use it to execute a package in a different project deployed to the SSIS catalog (you can read more about that limitation, as well as upvote the issue on Connect here).  I’ll discuss a couple of workarounds for this momentarily.

Clicking over to the Parameter bindings tab, we can specify which values to pass into the child package.  For each child package parameter, we specify exactly one value to be supplied at runtime.  Remember, like the dropdown list for package selection, this piece only works when executing packages in the same project (using the Project Reference setting on the ReferenceType from the Package tab).

image

Keep in mind that you have to use a parameter or variable (again, choosing from the dropdown list) to map to the child parameter.  You can’t simply type in a static value in the Binding parameter or value field.  Also, remember that you will only see package parameters (not project parameters) in the list of child package parameters that may be mapped.  This is by design – it wouldn’t make sense to map a value to a project parameter when executing a package in the same project, since that child package would already implicitly have access to all of the project parameters.

Another distinct advantage of using the execute package task is the process for handling errors in the child package.  In the event that a child package fails, the execute package task will fail as well.  This is a good thing, because if the child package does fail, in almost all cases we would want the parent package to fail to prevent dependent tasks from improperly executing.  Even better, error messages from the child package would be bubbled up to the parent package, allowing you to collect error messages from all child packages within the parent package.  Consolidated error handling and logging means less development time upfront, and less maintenance effort down the road.

If you have the option of using the execute package task for starting packages stored in the SSIS catalog, I recommend sticking with this method.

Execute SQL task

Another method for executing one package from another is by using the T-SQL stored procedures in the SSIS catalog itself.  Executing a package in the SSIS catalog in T-SQL is actually a 3-step process:

  • Create the execution entry in the catalog
  • Add in any parameter values
  • Execute the package

Catalog package execution via T-SQL, another new addition in SSIS 2012, allows us to overcome the limitation in the execute package task I mentioned earlier.  Using a T-SQL command (via the execute SQL task in SSIS), we can execute a package in any project.  It’s certainly more difficult to do so, because we lose the convenience of having the list of available packages and parameters exposed in a dropdown list in the GUI.  Here there be typing.  However, being able to execute packages in other projects – and for that matter, on other SSIS catalog servers entirely – somewhat makes up for the manual nature of this method.

To execute a child package using this method, you’d create an execute SQL task and drop in the appropriate commands, which might look something like the following:

DECLARE @execution_id BIGINT

EXEC [SSISDB].[catalog].[create_execution] @package_name = N'ChildPkgRemote.dtsx'
	,@execution_id = @execution_id OUTPUT
	,@folder_name = N'SSIS Parent-Child'
	,@project_name = N'SSIS Parent-Child Catalog Deployment - Child'
	,@use32bitruntime = False
	,@reference_id = NULL

-- Set user parameter value for filename
DECLARE @filename SQL_VARIANT = N'E:\Dropbox\Presentations\_sampleData\USA_small1.txt'

EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id
	,@object_type = 30
	,@parameter_name = N'pSourceFileName'
	,@parameter_value = @filename

-- Set execution parameter for logging level
DECLARE @loggingLevel SMALLINT = 1

EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id
	,@object_type = 50
	,@parameter_name = N'LOGGING_LEVEL'
	,@parameter_value = @loggingLevel

-- Set execution parameter for synchronized
DECLARE @synchronous SMALLINT = 1

EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id
	,@object_type = 50
	,@parameter_name = N'SYNCHRONIZED'
	,@parameter_value = @synchronous

-- Now execute the package
EXEC [SSISDB].[catalog].[start_execution] @execution_id

-- Show status
SELECT [status] AS [execution_status]
FROM SSISDB.CATALOG.executions
WHERE execution_id = @execution_id

Two things in particular I want to point out here.  First of all, by default when executing a package using T-SQL, the package is started asynchronously.  This means that when you call the stored procedure [SSISDB].[catalog].[start_execution], the T-SQL command will return immediately (assuming you passed in a valid package name and parameters), giving no indication of either success or failure.  That’s why, on this example, I’m setting the execution parameter named SYNCHRONIZED to force the T-SQL command to wait until the package has completed execution before returning.  (Note: For additional information about execution parameters, check out this post by Phil Brammer).  Second, regardless of whether you set the SYNCHRONIZED parameter, the T-SQL command will not return an error even if the package fails.  I’ve added the last query in this example, which will return the execution ID as well as the execution status.  I can use this to check the execution status of the child package before starting any subsequent dependent tasks.

image

As shown, I’ve set the SQLStatement value to the T-SQL code block I listed above, and set the ResultSet value to Single row, the latter of which will allow me to capture the output status of the executed package.  Below, I’ve set that execution status value to a new package variable.

image

To round out this design pattern, I set up my control flow as shown below.  Using precedence constraints coupled with SSIS expressions, I execute the package and then check the return value: a successful catalog execution returns a value of 7, and my parent package handles any return value other than a 7 as a failure.

SNAGHTML2fb7ad03

You may also have to give special consideration for errors in child packages when using T-SQL for package execution – especially when running packages interactively in the BIDS/SSDT designer.  Since the T-SQL command does not report the failure of a package by default, it also doesn’t “bubble up” errors in the traditional SSIS manner.  Therefore, you’ll need to rely on capturing any child package error messages from the SSIS catalog logging tables, especially when developing and testing packages in Visual Studio.

Script Task

It is also possible to execute SSIS packages programmatically from the script task.  This method is significantly more complicated, but also offers a great deal of flexibility.  A fellow SQL Server MVP and acquaintance of mine, Joost van Rossum, has a detailed blog post on how to programmatically execute SSIS packages from the SSIS script task.  I won’t restate what he has already covered in his comprehensive and well-written post on the topic, but if you need to programmatically fire child packages in the SSIS catalog, check out his write-up.

Conclusion

In this post, I’ve covered the essentials of executing child packages in the SSIS catalog, including provisioning for errors in the child package.  Though it can be quirky, especially when executing child packages in a different project or on a different server, there are several different ways to address this parent-child design pattern.

In my next post, I’ll talk a little more about passing values in a parent-child package, and will illustrate how to pass values from child packages back to the parent package.

Speaking at the SQL PASS 2014 Summit

imageI’m happy to announce that I will be speaking at the SQL PASS Summit this fall.  The summit will be held during the first week of November in Seattle, Washington.  This will be my seventh year attending the PASS Summit, and my fourth year as a speaker.

This year I will be delivering a new presentation entitled “Building Bullet-Resistant SSIS Packages”:

It’s 2:30 a.m., and you’ve just gotten that call no ETL developer wants to get: an SSIS package has failed unexpectedly. Immediately, you start asking yourself: “What could have caused this?”, “What do I do to fix the data from this failed execution?”, and, most importantly, “Could I design the package in such a way that this doesn’t happen again?”

In this demo-packed session, we will examine the design patterns and logistics of building error-resistant SSIS packages. Starting with the control flow, we’ll work through various tools and patterns that can be used to prevent, or at least handle appropriately, task-level errors. We’ll then move to the data flow, and discuss and demonstrate how to proactively address problem data to prevent unexpected package failures.

I’m looking forward to presenting this session, but more importantly, I’m excited about meeting new community members and catching up with #sqlfamily.  I hope to see you in Seattle in November!

SQL Saturday 223 OKC

okc This past weekend I made the relatively short (about 3 hours) trip to the Oklahoma City area to attend and speak at the third annual SQL Saturday event there.  I’ve been invited to speak at each of their three events, and I’ve been impressed with each one of them so far.  The folks in the OKC and surrounding groups have an excellent community, and the leadership team did a great job again of putting on a well-organized event.

sqlsat223_TimAndRyan This weekend was a first for me in that I brought along my oldest son Ryan to the event.  At age 9, he’s already showing highly analytical tendencies, and I’m trying to expose him to more technology-related happenings.  He was very excited to attend the event – I was out of town with a client all week, and when I called him each night last week, the upcoming trip was always one of the first things he wanted to talk about.  During the event he did get bored frequently, but that’s to be expected when deeply technical content and a short attention span intersect.  He still enjoyed the event, but it was more about doing something different and getting to see a bit of what I do for a living (he sat on the front row during my presentation) that made it fun for him.  I think it was a great experience for him, and I’m hoping to bring him along to some more events in the future.

For my part in the event, I presented “Handling Errors and Data Anomalies in SSIS”.  Had a moderate sized group, maybe 35-40 folks, with some good discussion during and after the presentation.  For those who attended and are interested in the presentation materials, you can download them here.

Thanks again to the OKC-area folks who organized this event.  I look forward to coming back again next year.

Upcoming SQL Saturday Precons

SQLSatI’m happy to announce that I’ll be delivering three, one-day preconference seminars this summer prior to three different SQL Saturday events:

For each of these events, I’ll be delivering a full day of content entitled “Real World SSIS: A Survival Guide.”  This day of content consists of many lessons that I’ve learned – many of which were learned the hard way – through my decade or so in this business.  I’ve got lots of demos to illustrate the concepts we’ll be covering.

If you are able to make it to any of these SQL Saturday events, I’d be honored if you’d join me for one of these talks.  Registration is open for all three seminars, as well as each of the SQL Saturday events.  I hope to see you there!

Using the SSIS Object Variable as a Result Set Enumerator

tmitch2In the first post in this series, I covered the basics of object typed variables in SQL Server Integration Services, along with a brief examination of some potential use cases.  In this installment, I’m going to illustrate the most common use of object typed variables in SSIS: using an object variable as an ADO recordset within a loop container to perform iterative logic.

Before we examine the how, let’s talk about the why.  Although this is not a design pattern you’ll have to use every day, there are any number of cases that would lend themselves to building and using an ADO recordset enumerator:

  • You need to create a series of export files – one per client – showing that client’s charges for a given period.
  • You’re dealing with a very large set of data, and/or your processing hardware has limited resources.  You want to explore breaking up the workload into smaller chunks to be processed serially.
  • You are performing a data load operation, and want to design the package in such a way that the loaded data can be immediately used as a validation source in the same package execution.

For cases such as these (among others), using this design pattern can be an excellent way to address your ETL needs.

Design Pattern

At a high level, this design pattern will have three moving parts:

  • A relational query used to populate the object variable (thus transforming its internal type into an ADO recordset)
  • A For Each Loop container to loop through the list stored in this variable
  • Some business logic for each value (or set of values) in each row of the object variable

Note that while the first two moving parts I mentioned will be relatively consistent from one package to another, the business logic component will, by nature, vary greatly from one package to another.  For the purposes of this post, I’m purposefully keeping my business logic piece simple so as to not distract from the larger design pattern.

For my sample data, I’m  going to deal with a data domain that is near and dear to my heart: baseball.  In this case I want to get a list of all postseason baseball games, and for each game, create an export file detailing the at-bat statistics for that game.  Because I don’t know at design time how many games will be played in the postseason, I can’t simply hard-code my outputs – I need design the ETL in such a way that the data will dictate, at runtime, the number of output files and their respective filenames.

Configuring and Populate the Object Variable

The first thing I’ll do in my demo package is set up an SSIS variable, giving it the data type of Object.  As shown below, I’m using the SSIS variable named [GameList] as the object typed variable, which will store the ADO recordset list of playoff game IDs that should be processed.  Also included is a variable specifying the directory to which the output files will be written, as well as a variable to store the individual game ID for each iteration of the loop.

variables

Next up, I’m going to add an instance of the Execute SQL Task to the control flow of my package, typing in my query to select the IDs of the playoff games from the database.  In the settings for this task shown below, you’ll also see in the highlighted portion that I’ve changed the behavior of the Result Set to use Full result set (remember the default is None, which would expect no data rows to be returned).  By setting this behavior, I’m configuring the task to expect a result set to be returned.

fullresultset

When I configure the Result Set setting in this way, I also need to indicate where those results should end up – specifically, I have to indicate which object typed variable will store these results.  In the Result Set tab of the same task, I’ll set the variable name to use the [GameList] variable I set up in the previous step.  Also note that the result set name should always be 0 in this case.

resultsetmapping

What I’ve done here is quite simple, and required no code (other than the SQL statement, of course).  What’s happening behind the scenes is a little more complex, however.  At runtime when the Execute SQL Task is executed, the [GameList] variable will be instantiated as a new object of type ADO recordset.  Note that this action will not change the data type shown in SSIS; even though the in-memory object will be configured as an ADO recordset, it will still show up as an object type variable in the designer.  This ADO recordset object will then be loaded with the resulting records, if any, from the query I used in the Execute SQL Task.

Using the SSIS Variable as an Enumerator

My next step will be to consume that list, processing each game ID in turn to extract the data I need.  To handle this, I’ll add a For Each Loop container to the control flow, and connect the previously configured instance of Execute SQL Task to this new container.  When I configure the properties for the loop container, in the Collection tab I’m presented with several different options for the enumerator (the list that controls how many times the logic within the loop will be executed).  Since I’m working from the ADO recordset list created in the previous step, I’m going to select Foreach ADO Enumerator, and use the variable drop down list to select the [GameList] object variable.  I also set the Enumeration Mode to use Rows in the first table, which is the only option I can use when working with a ADO recordset (note that we have more options when working with an ADO.NET recordset, which I plan to cover in a future post).

foreachconfig

With the collection tab set to use my object variable as an enumerator, I’ll next jump over to the Variable Mappings tab.  It is on this tab where I will align fields in the record set with variables in the package.  As shown below, I’m only expecting one column to be returned, and for each iteration of the loop, this value will be stored in the variable named [ThisGameID].  As you can see, I’m using index [0] to indicate the position of this value; if the record set is expected to return more than one column, I could add those in as additional column/variable mappings, using the ordinal position of each column to map to the proper SSIS variable.

variablemapping

With that done, I’ll add an instance of the Data Flow Task to the loop container configured above, which will complete the work on the control flow:

controlflow

Configure the Business Logic in the Data Flow

Now it’s time to dive into the data flow I just created.  Within that data flow, I’ll add a new OLE DB Connection component, the purpose of which will be to retrieve the at-bat statistics for each playoff game. To the output of that source, I will attach an instance of the Flat File Destination, which will be used to send each game’s data to the respective output file.

dataflow

Within the data source, I need to configure the query such that it retrieves data for one and only one game at a time.  Since the current game ID value is stored in the [ThisGameID] SSIS variable, I can simply map that variable as a query parameter, so that each execution of this SELECT query will limit the results to only include statistics for the current game ID.  As shown below, I’m using a parameter placeholder (the question mark in the query) to indicate the use of a parameter:

query

… and when I click the Parameters… button, I can map the SSIS variable containing the game ID to that query parameter:

queryparams

I have already configured an instance of the Flat File Destination (and by extension, set up the Flat File Connection Manager) to allow me to write out the results to a file, but how will I create a separate file per game?  It’s actually quite easy: by using a simple SSIS expression on the ConnectionString property of the Flat File Connection Manager, I can configure the output file name to change on each iteration of the loop by using the game ID value as part of the file name.  As shown below, I’m accessing the Expressions collection within my Flat File Connection manager, overriding the static value of the ConnectionString property with an interpreted value using the amalgamation of two variables – the directory location I specified earlier, along with the current game ID.  Remember that since SSIS variables are evaluated at runtime, the value of the variables can change during execution, thus allowing the use of a single Flat File Connection Manager to write out multiple files during each package execution.

fileconnstr

Finally, when I execute the configured package, I end up with a few dozen output files – one per playoff game.  As a side note, my Texas Rangers were only represented in one of those playoff games from last year.  We’ll get ‘em this year.  As shown below, each output file is distinctified with the game ID as part of the file name.

outputfiles

Conclusion

Use of the SSIS object typed variable can be a very powerful tool, but it need not be complex.  As shown in this example, we can easily leverage the object variable for iteration over a result set without writing a single line of programmatic code.

In the next post in this series, I’ll dig further into object typed SSIS variables, and will explore how to use and manipulate other types of objects not natively represented in SSIS.

Speaking at PASS Summit 2013

I’m happy to announce that I have been selected to presentlogo_header
at the SQL PASS Summit in Charlotte, North Carolina this October.   I’ll be delivering a session entitled “Data Cleansing in SQL Server Integration Services”, in which I’ll cover various ways to detect and cleanse dirty data using tools built into (or accessible from) SQL Server Integration Services.

This will be my third year to present at the PASS Summit.  As always, I’m happy to be able to participate and humbled to be selected to speak.

On a side note, this is the first time in several years that the Summit has been scheduled to be held outside of Seattle, and the first time it will have been held in Charlotte. Although I have to admit that I’m going to miss Seattle, I’m happy that PASS was open to community feedback requesting that the Summit be occasionally held in locations outside of Seattle.

Now we just need to task someone with finding us a karaoke dive bar in Charlotte.

Webinar: Scripting and SSIS

Tomorrow at 10am (11am EDT), I’ll be joining together with my good friend and SSIS Design Patterns coauthor Andy Leonard for a free one hour webinar to discuss scripting in SQL Server Integration Services:

Join SQL Server MVP Tim Mitchell and Andy Leonard as they discuss and demonstrate scripting in SSIS! In this demo-packed session, two co-authors of the book SSIS Design Patterns share their experience using the Script Task and Script Component to accomplish difficult transformations and improve data integration.

You can register online here.  We look forward to seeing you tomorrow!