SQL Server Integration Services does a great job of retrieving and processing data on the fly, directly in the data flow pipeline. However, there are circumstances that occasionally require the persistence of result sets in SSIS for use during package execution. For these such cases, one option is to use SSIS raw files. In this SSIS Basics post, I will briefly demonstrate how to use raw files in SSIS and in what situations they are useful.
Using Raw Files in SSIS
The raw file in SSIS is a special kind of binary file, optimized for and intended for use only through SSIS. This file type stores temporary result sets on disk to allow reuse of data across multiple packages and multiple executions.
SSIS raw files are helpful in a number of scenarios:
- The package reuses the same result set several times as part of the different data flow (or even different packages)
- Due to size or query complexity, retrieving a particular set of reference data takes a significant amount of time
- Due to business rules, one or more interim result set is required during the ETL process
In cases such as this, using raw files to store result sets can be a handy solution.
To create and use raw files, SSIS comes with two components to allow reading from and writing to these files. I’ll start by showing the raw file destination (ordinarily I would start with the source, but since we need to create a raw file with the destination component before we can use the source component, I’m changing the typical demonstration order), and will move to the raw file source next.
Raw File Destination
As shown below, this example uses a data flow with an OleDB source – connected to AdventureWorks – for sending sales data into a raw file destination.
On the raw file destination, we’ve got a handful of configuration options, including:
- The type of file name (either direct or from an SSIS variable)
- A selector to choose the file name or the variable supplying that file name
- An option to select the write method (create, overwrite, append, etc.)
- A column pane to allow source-to-target mappings
With the Write Option selection, there are limitations with respect to whether the specified output file already exists. For example, if you choose the Create Once option, you’ll see a design-time error if the file already exists. Similarly, if you choose either the Append or the Truncate and Append option, you’ll see an error if the file does not yet exist. To resolve this, use the button in the UI of the destination to create the initial empty raw file.
Raw File Source
The raw file source is the simpler of the two SSIS raw file components. It has just a couple of configuration options: the type of file name, a selector to choose the file name source, and a data grid for output column selection.
As shown below, the raw file source feeds the data flow pipeline, sending its data into the OleDB destination.
Keep in mind is that there is no automatic emptying or removal of the SSIS raw files after use. Consider adding follow-on process for deleting deletes the raw files after the process completes.
Raw files in SSIS provide an efficient way to temporarily store a result set during package execution. The raw file source and destination can be used to connect to the resulting raw file for interim processing. Although not an everyday set of tools, this source and destination are useful tools to know how to use in cases where persisting interim result sets is needed.
Leave a Reply