Tim Mitchell
Follow Tim Mitchell on Twitter  Like Tim Mitchell on Facebook  Subscribe on YouTube  Connect on LinkedIn  Subscribe to the Data Geek Newsletter

SSIS Data Taps

SSIS Data TapsOne of my favorite testing features of SSIS is also one of the most underutilized. SSIS data taps were introduced with the SSIS catalog in SQL Server 2012 as a way to capture data within one leg of a data flow task and write it out to a file for testing or auditing purposes. In this brief post, I’ll show you what SSIS data taps are and how to use them.

SSIS Data Taps

SSIS data taps are to a scheduled execution what the data flow data viewers are to interactive executions in SSDT. Creating a data tap allows you to write to a file all of the data passing through one segment of the data flow. Unlike data viewers, SSIS data taps do not require any modification to the package itself; rather, the data tap is set up at the execution level, and is only valid for that one execution. The use of SSIS data taps requires that the package being executed is deployed to the SSIS catalog, and that the package is executed via T-SQL.

As you may recall, there are two steps – each a call to an SSIS catalog stored procedure – required to execute a package from the SSIS catalog using T-SQL:

As I demonstrated in a recent blog post, step 1 above exposes the execution instance (via the BIGINT value execution_id) to allow additional properties to be set. Parameters can be added to the execution once it has been created but before it is actually executed in the second step above. It is in this same space between those two steps that we would add the SSIS data taps. As noted, data taps are execution-specific; they do not modify the package itself, and must be set up for each execution for which you want to capture the data. Thus the workflow for using a data tap on a package execution looks something like this:

The call to catalog.add_data_tap uses the following parameters:

  • @execution_id: The BIGINT value of the execution instance. This value is supplied as an output parameter from the previously-invoked catalog.create_execution .
  • @task_package_path: The path to the data flow task to be tapped. This value is shown in the Properties window of the data flow task in question, in the PackagePath property.
  • @dataflow_path_id_string: The string identifying the data path within the above data flow task to be tapped. This value can be found by clicking the data flow path to be tapped, then checking the Properties window in the IdentificationString property.
  • @data_filename: The name of the output file to which the data tap results will be written. Note that this parameter expects only the relative filename; unfortunately, the output directory is not configurable, and will always write out to the local DataDumps directory (on my SQL Server 2016 machine, this directory is C:\Program Files\Microsoft SQL Server\130\DTS\DataDumps). The output of this will be formatted as CSV, so it is best to use either .txt or .csv as an extension for the output file.
  • @max_rows (optional): If used, limits the number of rows written to the data tap output file.

I have set up an example of a T-SQL execution using a data tap. As shown below, I’ll create the execution instance, add a couple of system parameters, and then set up the data tap to write out to a file called Failed.csv.

When executing the package, it will write to the output file all of the data (including all columns) that passed through the specified path on that data flow. Note that the data tap will create the output file even if no rows of data passed through that path. When the output files are created, it will not overwrite existing files with the same name; as shown below, if the output filename already exists in the destination directory, the new file will be appended with an ascending numbering sequence.

image

Remember that the call to catalog.add_data_tap  is specific to just that one execution instance. If you execute the package again, it will not use a data tap unless you explicitly set one up for that second execution. It is also worth noting that you are not limited to a single data tap per package; you can set up a data tap for each data path if needed.

Conclusion

In this post, I have shown how SSIS data taps can be easily implemented in package executions for testing, validation, and auditing data loads in SQL Server Integration Services.

About the Author

Tim Mitchell
Tim Mitchell is a business intelligence and SSIS consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or SSIS training? Contact Tim here: TimMitchell.net/contact

4 Comments on "SSIS Data Taps"

  1. Cristhian Neumann | December 13, 2016 at 6:41 am | Reply

    Great post. I wasn’t aware of data taps in SSIS.

  2. Steven Neumersky | December 13, 2016 at 11:32 am | Reply

    Always wondered if this was like the solution created to measure data flow throughput in Project Real in 2005. Is it done in a similar way under the covers?

    • I didn’t work with Project Real, but I doubt if there’s a lot of crossover there. Data taps aren’t designed to measure throughput, but rather to allow easier access for testing and validation.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: