One of my favorite testing features of SSIS is also one of the most underutilized. SSIS data taps were introduced with the SSIS catalog in SQL Server 2012 as a way to capture data within one leg of a data flow task and write it out to a file for testing or auditing purposes. In this brief post, I’ll show you what SSIS data taps are and how to use them.
SSIS Data Taps
SSIS data taps are to a scheduled execution what the data flow data viewers are to interactive executions in SSDT. Creating a data tap allows you to write to a file all of the data passing through one segment of the data flow. Unlike data viewers, SSIS data taps do not require any modification to the package itself; rather, the data tap is set up at the execution level, and is only valid for that one execution. The use of SSIS data taps requires that the package being executed is deployed to the SSIS catalog, and that the package is executed via T-SQL.
As you may recall, there are two steps – each a call to an SSIS catalog stored procedure – required to execute a package from the SSIS catalog using T-SQL:
- catalog.create_execution sets up the execution instance(including the name and path of the package to be executed), and then
- catalog.start_execution kicks off the execution instance of said package
As I demonstrated in a recent blog post, step 1 above exposes the execution instance (via the BIGINT value execution_id) to allow additional properties to be set. Parameters can be added to the execution once it has been created but before it is actually executed in the second step above. It is in this same space between those two steps that we would add the SSIS data taps. As noted, data taps are execution-specific; they do not modify the package itself, and must be set up for each execution for which you want to capture the data. Thus the workflow for using a data tap on a package execution looks something like this:
- Use catalog.create_execution to create the execution instance
- Optionally, use catalog.set_execution_parameter to specify parameter values
- Set up the SSIS data tap(s) using catalog.add_data_tap
- Execute the package using catalog.start_execution
The call to catalog.add_data_tap uses the following parameters:
- @execution_id: The BIGINT value of the execution instance. This value is supplied as an output parameter from the previously-invoked catalog.create_execution .
- @task_package_path: The path to the data flow task to be tapped. This value is shown in the Properties window of the data flow task in question, in the PackagePath property.
- @dataflow_path_id_string: The string identifying the data path within the above data flow task to be tapped. This value can be found by clicking the data flow path to be tapped, then checking the Properties window in the IdentificationString property.
- @data_filename: The name of the output file to which the data tap results will be written. Note that this parameter expects only the relative filename; unfortunately, the output directory is not configurable, and will always write out to the local DataDumps directory (on my SQL Server 2016 machine, this directory is C:\Program Files\Microsoft SQL Server\130\DTS\DataDumps). The output of this will be formatted as CSV, so it is best to use either .txt or .csv as an extension for the output file.
- @max_rows (optional): If used, limits the number of rows written to the data tap output file.
I have set up an example of a T-SQL execution using a data tap. As shown below, I’ll create the execution instance, add a couple of system parameters, and then set up the data tap to write out to a file called Failed.csv.
When executing the package, it will write to the output file all of the data (including all columns) that passed through the specified path on that data flow. Note that the data tap will create the output file even if no rows of data passed through that path. When the output files are created, it will not overwrite existing files with the same name; as shown below, if the output filename already exists in the destination directory, the new file will be appended with an ascending numbering sequence.
Remember that the call to catalog.add_data_tap is specific to just that one execution instance. If you execute the package again, it will not use a data tap unless you explicitly set one up for that second execution. It is also worth noting that you are not limited to a single data tap per package; you can set up a data tap for each data path if needed.
In this post, I have shown how SSIS data taps can be easily implemented in package executions for testing, validation, and auditing data loads in SQL Server Integration Services.
Great post. I wasn’t aware of data taps in SSIS.
Thanks Cristhian! Glad you found it useful.
Always wondered if this was like the solution created to measure data flow throughput in Project Real in 2005. Is it done in a similar way under the covers?
I didn’t work with Project Real, but I doubt if there’s a lot of crossover there. Data taps aren’t designed to measure throughput, but rather to allow easier access for testing and validation.
My fault. It’s looks like a data viewer for run time. I really wish the write path was configurable, but I absolutely understand the rationale for why it is not.
We recently had a situation where 4 packages from one project were physically taken out of one project and put into another project. It created “debug time only” issues when trying to run these 4 packages in the SSDT in the new project. Validation would not happen.
At the end of the day we KNEW one of these packages was pulling some bad data based off run time messages. However, when we went to try and run the package (after adding a data viewer to the appropriate data flow task path), we were not able to do some due to validation errors with the project level connection managers that did not come along for the ride when one of the developers moved the 4 packages! So annoying.
So, we added a data tap to the package at runtime. Unfortunately, we do not have access to the output path of the data tap! Crazy, eh?
I’m with you. I wish the data taps output path could be set as an option.