One of the more common questions I find in SQL Server SSIS forums is how to split apart a single text file into multiple outputs. Files extracted from foreign sources often arrive in nonstandard formats, and are too often unchangeable at the source and must be parsed during the import process. For unconventional custom transformations such as this, using the SSIS script component with multiple outputs is often the best tool for shredding and importing data.
In this post, I will demonstrate a relatively simple way to address complex input files using the SQL Server Integration Services script component. In particular, I’ll show how to use multiple outputs to process differing file types within a single file
Script Component With Multiple Outputs
This example uses a file layout I refer to as the “record type” format. A single file may contain many different record types, with the type identified by the first element in each record (line of text) in the file. You may find other formats, including those that are identified by their position in the file, a particular string of text, or number of elements; this example could be easily modified to handle these types of nonstandard formats. I’ve spent a good part of my career working in and around healthcare data, and I’ve found this record type layout is quite common in EDI (electronic data interchange) situations.
As shown in the example below, there are varying numbers of elements within this file source, which would pose a challenge if you simply use the off-the-shelf transformations.
|1|202|Scott,Michael|District Manager|Scranton PA
3|241|9582|Bob Vance Refrigeration
3|189|2792|State of Pennsylvania
3|189|4929|Office Systems of America
3|493|2555|Ace Printing Center
In the snippet for this demonstration, record type 1 contains manager details, record type 2 lists the employee information, and record type 3 shows the list of clients. Since the number of columns and their respective data types differ from one record to the next, I can’t just use a conditional split to create multiple paths for data flow; this is a case for the script component.
To get started, I created a data flow task and added a new script component to it. Since I’m using the script component as a source, I must select that configuration when prompted (shown below).
Opening the UI for this script component reveals the Inputs and Outputs window. When I selected that I wanted to use this component as a source, it created one default output without any inputs, as shown below.
This example requires not one but three outputs, one each for Managers, Employees, and Clients. To create the additional outputs, I’ve used the Add Output function. For each output, I have also configured the column names and data types. As shown, I’ve used meaningful names (rather than the default Output 0, Column 1, etc.) for the outputs as well as the columns – this will come in handy shortly.
Next up: it’s time to write some code. I’ve configured the script to parse the input file using a System.IO.StreamReader object. The file name is defined in the SSIS package as the variable named vInputFilename, which has already been populated with the path to the source file.
I used a C# while loop to read each line of the file into a variable and process each line in turn. The test data is pipe-delimited, so I use the C# string function Split() to break apart each line into atomic elements, and evaluate the first element on each line (referred to as items, as a zero-based index) to send each row of data to the correct output.
The script logic follow one of three paths depending on the first element of the row, with each path leading to one of the buffers to be wired up to one of our intended destinations. The buffer objects, which follow the <OutputName>Buffer naming convention, are objects that are automagically created for each output defined in the editor in the previous step. In each of the paths, I’ve programmatically created a new output row to send to the data stream by calling the Buffer.AddRow() method. Note that you must call the AddRow() method before attempting to assign values to any of the output variables for that row. After adding the new output row for the appropriate path, the script will assign the corresponding value from the data file to each of the output variables.
Use the Multiple Outputs
After saving the changes to the script component with multiple outputs configured, I built a data destination for each output type. In this example, I’ve used a Flat File Destination to send each of the three data streams to a delimited file. After creating the output destinations, I connected the output (green arrow) from the script to each of the 3 outputs. When using multiple outputs, the SSIS UI will prompt to select the proper output as shown below. Be sure to match the output from the script component to the corresponding destination.
After all three of the destinations are configured and properly connected to their respective outputs, the package is ready to execute. The output paths are labeled by default with the name of the output; this is where the use of meaningful and descriptive names becomes valuable. The package practically documents itself!
When executing the package, the UI shows the rows of data flowing from the monolithic file into their respective outputs
A final review of the 3 output files confirms that the rows were directed to the correct destinations.
In this brief demonstration, I demonstrated how using the SSIS script component with multiple outputs can be used to extract data from source files that have varying metadata. The record type format is just one example of such files, but this same design pattern could be used for differing or more complex ETL needs.
Author’s note: Large portions of this post were previously published. This reposted material has been updated to reflect changes in SSIS since the last publication.