Parent-Child SSIS Architecture

This is the first in a series of technical posts on using parent-child architectures in SQL Server Integration Services.  The index page for all posts can be found here.

In this post, I will provide an overview of the architecture and describe the benefits of implementing a parent-child design pattern in SSIS structures.

Definition

The simplest definition of SSIS parent-child architecture is that it consists of packages executing other packages.  In SSIS, the package is the base executable; it is the most granular component that can be executed independently1.  Every version of SSIS includes the ability for one package to execute another package through the use of one of the following:

  • The Execute Package Task
  • T-SQL commands
  • The Execute Process Task (using the dtexec.exe utility)

Though the reference to parent-child architecture implies that there are exactly two layers to this design, that does not have to be the case.  You may have a design where a package executes a package which executes a package, and so forth.  Although there may be a hard limit to how deeply nested a parent-child architecture may go, I have never encountered such a limitation.  I have found it useful on a few occasions to go deeper than two levels in this type of architecture, particularly when designing a formal ETL framework (to be discussed further in a future post in this series).  In cases where greater than two levels exist, finding the right terminology for those layers is important.  You can refer to them by patriarchy (grandparent/parent/child) or by cardinality (level 1, level 2, level n), as long as you remain consistent – especially in your documentation – with those references.

Conceptually, a parent-child architecture is a form of code abstraction.  By encapsulating ETL actions into discrete units of work (packages), we’re creating a network of moving parts that can be developed, tested, and executed independently or as part of a larger collection.

Benefits

As I mentioned in my introductory post, there are several benefits to using parent-child structures in SSIS.

Reusability.  In any ETL environment of significant size or complexity, it’s quite normal to discover common ETL behaviors that are reusable across different implementations.  For a concrete example of this: In my spare time, I’m working on an ETL application that downloads XML files from a Major League Baseball web service.  There are files of various formats, and each file format is processed a different way, but with respect to the download of the files, I always perform the same set of operations: create a log entry for the file; attempt to download the file to the local server; log the result (success or failure) of the download operation; if the download has failed, set the HasErrors variable on the main package.  If I were to load this behavior into a group of tasks in the package for each XML format, I’d have five different copies of the same logic.  However, by building a parameterized child package that performs all of these core functions, I only have to build the file download/logging logic once, and execute the resulting package with the appropriate parameters each time I need to download a file.

Easier development.  Working with large and complex SSIS packages can be a pain.  The larger the SSIS packages, the longer it takes for the BIDS or SSDT environment to do its validation checks when the package is opened or modified.  Further, when multiple ETL developers are working on the same project, it is much easier to break apart the project into discrete units of work when using numerous smaller SSIS packages.

Easier testing and debugging.  When working through the test and debug cycles during and after initial development, it’s almost always easier to test and debug smaller packages.  To test a single task that resides in a large SSIS package would require either running the task by itself manually in the Visual Studio designer, or disabling all of the other tasks and redeploying the package.  When working with packages that each perform one unit of work, one can often simply execute the package to be tested through the normal scheduling/execution mechanism.

Clarity of purpose. An architecture that uses small, single-operation packages lends itself to clarity of purpose by virtue of naming.  When browsing a list of deployed packages, it is much more clear to see package names such as “Load Customers Table”, “Merge Product Table”, and “Remove Duplicates in Vehicle Table” than to find do-everything packages with names like “Load Production DB”, “Update DW”, etc.

Performance. In some cases, breaking out multi-step SSIS package can bring some performance gains.  One distinct case that comes to mind is using a distributed architecture, where packages within a single execution group are executed on multiple servers.  By distributing packages across different SQL Server machines (either physical or virtual), it may be possible to improve performance in cases where the processing load on a single SSIS server has become a bottleneck.  I want to emphasize that using a parent-child architecture does not arbitrarily improve performance, so this should not be used as a silver bullet to improve a poorly performing group of packages.

The Tools

As I mentioned earlier, there are three tools that can be used to execute a package from within another package.

The execute package task.  This is the easiest and most common means of executing a package from within another.  This task can trigger the execution of a package stored on the file system, deployed to MSDB or the SSIS catalog, or residing in the same project.  If using SSIS 2012 with catalog deployment mode, you can also use the execute package task to pass parameter values from the parent package to the child package.  It is important to note that the execute package task behaves differently in SSIS 2012 than it does in older versions.

T-SQL commands (via the execute SQL task).  For SSIS projects using project deployment model in SSIS 2012, the built-in stored procedures in the SSIS catalog can be used to execute packages.  This method for executing packages, like the execute package task, allows you to specify runtime parameters via T-SQL code.  One significant advantage of using T-SQL commands to execute packages is that, unlike the execute package task, you can use expressions to set at runtime the name of the package to be executed.  This is useful in cases where you are iterating over a list of packages that may not be known at runtime, such as a pattern found in ETL frameworks.

dtexec.exe (via the execute process task).  Using this method allows you to trigger package execution via the command-line application dtexec.exe.  Although this method is typically used to execute packages in a standalone environment – for example, when using third-party scheduling tools to orchestrate package execution – but dtexec can also be used within SSIS by way of the execute process task.  As an aside, I rarely use dtexec to execute child packages – in most cases, it’s easier to use either the execute package task or T-SQL commands to execute one package from within another.

I’ll also briefly mention dtexecui.exe.  This is a graphical tool that serves the same purpose as dtexec.exe, except that the former exposes functionality via a graphical user interface rather than forcing the user to use command-line parameters for configuration.  Except for this brief mention, I’ll not cover dtexecui.exe in this discussion of parent-child architecture, as that tool is intended for interactive (manual) execution of packages and is not a suitable tool for executing one package from within another.

Parent-Child architecture in the real world

To illustrate how this can work, let’s model out a realistic example.  Imagine that we have charge over the development of a sizeable healthcare database.  In addition to our production data, we’ve got multiple environments – test, development, and training – to support the development life cycle and education needs.  As is typical for these types of environments, these databases need to be refreshed from the production database from time to time.

The refresh processes for each of these environments will look similar to the others.  In each of them, we will extract any necessary data for that environment, retrieve and restore the backup from production, and import the previously extracted data back into that environment.  Since we are dealing with sensitive healthcare data, the information in the training database needs to be sufficiently anonymized to avoid an inappropriate disclosure of data.  In addition, our test database needs to be loaded with some test cases to facilitate testing for potential vulnerabilities.  Even though there are some differences in the way each environment is refreshed, there are several points of shared – and duplicate – behavior, as shown below (with the duplicates in blue).

ParentChild-Duplicate

Instead of using duplicate static elements, we can eliminate some code redundancy and maintenance overhead by encapsulating those shared behavior into their own container – specifically, a parameterized package.  In doing so, we can avoid having multiple points of administration when (not if) we need to make adjustments to those common elements of the refresh process.  The updated architecture uses parameters (or package configurations, if using package deployment mode in SSIS 2012 or any older version of SSIS) to pass in the name of the database environment to refresh.

ParentChild-HighLevel

As shown, we’ve moved those shared behaviors into a separate package (RefreshDB), the behavior of which is driven by the parameters passed into it.  The duplicate code is gone.  We now have just one SSIS package, instead of three, that need to be altered when those common behaviors change.  Further, we can individually test and debug the child package containing those common behaviors, without the additional environment-specific operations.

Note that we haven’t reduced the number of packages using this architecture.  The goal isn’t fewer packages.  We’re aiming for a modularized, easy-to-maintain design, which typically results in a larger number of packages that each perform just a few (and sometimes just one) functions.  In fact, in the parent-child architecture shown above, we could even further refine this pattern by breaking out the individual operations in the RefreshDB package into packages of their own, which would be practical for cases in which those tasks might be executed apart from the others.

Exceptions to the rule

Are there cases in which parent-child structures do not add value?  Certainly.  A prime example of such a case is a small, simple package developed for a single execution with no expectation that its logic will be reused.  I call these throwaway packages.  Because of their single-use nature, there is likely little value in going through the effort to building a parent-child architecture around their business logic.

Up Next

In my next post in this series, I’ll work through the mechanics of using a parent-child pattern in SSIS 2005 or SSIS 2008.

1 Technically, there are lower-level items in the SSIS infrastructure that can be executed independently.  For example, from the BIDS or SSDT design surface, one can manually execute a single task or container within a package.  However, when deploying or scheduling the execution of some ETL behavior, the package is the lowest level of granularity that can be addressed.

Edit: Corrected typo on one of the graphics.

Tagged . Bookmark the permalink.

9 Responses to Parent-Child SSIS Architecture

  1. Lee Everest says:

    Hi Tim,
    I like the refresh environment example.. Nice.

    Question: When have you found that a parent-child is best to use, and when maybe not an avenue to explore? You can gain a sort of abstraction in the method that the package is called as well e.g. Let’s say I have an sftp package that may be called either before the main package in step 1 (for import), or afterwards (export) of a SQL Agent job.. If I were doing an import and the files were already downloaded to the server I would simply call step 2 of the job, and keep separation without the parent-child relationship.

    Thanks,
    Lee

  2. Tim Mitchell says:

    Lee, I’ve found that when creating logic that may be reused, it’s a no-brainer to wrap it in a child package that can be accessed by multiple parent packages. Same thing if there are multiple developers working on an initiative – farming out tasks into individual packages is much easier than everyone working on a single package.

    If neither of these apply, it’s a judgement call. There’s some overhead (sometimes a little, sometimes a lot) in breaking work into multiple packages. If breaking those tasks in to multiple package can ease the pain of development, troubleshooting, or deployment, I usually break it up. As a rule, when in question I break up those tasks into multiple packages.

  3. Pingback: (SFTW) SQL Server Links 14/02/14 • John Sansom

  4. Nick Ryan says:

    Hi Tim

    I’ve read your first post with interest as I have implemented a framework using a Parent-Child pattern I found in a book. I’m keen to see your suggestions for a framework – the one I have works quite well with standardised error reporting and logging but I haven’t found an easy way to run packages in parallel, yet.

    Regards,
    Nick

  5. you give us best informative blog about Parent-Child SSIS Architecture

  6. Tiago Moreiras says:

    Hi Tim,
    Just wanted to share some thoughts about an interesting use case of SSIS Architecture using Parent-Child I designed a couple years ago for a client and still use to this day.
    Applying the parent-child architecture to manage all of the “package” workflow logic is very useful but sometimes it can get a bit messy. For instance, when we create a couple of new packages, usually we end up having to alter all the parents where they are executed and update both packages to server. In bigger solutions and bigger teams this can get very error prone. The approach we took at the time was to try to separate all the packages that actually “do stuff” from those responsible for controlling all of their workflow logic.
    We ended up developing a metadata-driven solution where we have a core package table and a single package responsible for controlling all the package execution flow. The core package table is the place where we specify all package names, parents, server path, and all the detail associated with a single package, and even execution flags like “active” (that may be update to zero to avoid execution). This table only contains packages that actually extract, clean and conform data and it was designed to represent a parent-child relationship where we may have packages that execute child packages. This “parent packages” only exist has an entry in this table. To actually make this work we developed a very simple package that reads ahead all the information in this table and it is responsible for the execution of all the “active” packages in the table. Additionally we ended up adding some checkpoints to control execution after failure and other similar features.
    Would love to hear some thoughts/comments about this type of approaches and suggestions of other ways to implement this type of logic.
    Regards,
    Tiago

  7. Pingback: New Blog Series: Parent-Child Architecture in SSIS - Tim Mitchell

Leave a Reply