SQL Server Integration Services is well designed for retrieving and processing data on the fly, directly in the data flow pipeline. However, there are circumstances that occasionally require the persistence of result sets in SSIS for use during package execution. For these such cases, one option is to use SSIS raw files.
In this SSIS Basics post, I will briefly demonstrate how to use raw files in SSIS and in what situations they are useful.
Using Raw Files in SSIS
The raw file in SSIS is a special kind of binary file, optimized for and intended for use only through SSIS. This special file is intended to store temporary result sets on disk to allow reuse of data across multiple packages and multiple executions.
Although these are not tools one would likely use on a daily basis, the raw file tools in SSIS are helpful in a number of scenarios:
- The same result set is reused several times as part of the different data flow (or even different packages)
- Due to size or query complexity, retrieving a particular set of reference data takes a significant amount of time
- Due to business rules, one or more interim result set is required during the ETL process
In cases such as this, using raw files to store result sets can be a handy solution.
To create and use raw files, SSIS comes with two components – a raw file source and a raw file destination – to allow reading from and writing to these files. I’ll start by showing the destination (ordinarily I would start with the source, but since we need to create a raw file with the destination component before we can use the source component, I’m changing the typical demonstration order). As shown below, this example uses a data flow with an OleDB source – connected to AdventureWorks – for sending sales data into a raw file destination.
On the raw file destination, we’ve got a handful of configuration options, including:
- The type of file name (either direct or from an SSIS variable)
- A selector to choose the file name or the variable supplying that file name
- An option to select the write method (create, overwrite, append, etc.)
- A column pane to allow source-to-target mappings
Note that on the Write Option selection, there are some limitations with respect to whether the specified output file already exists. For example, if you choose the Create Once option, you’ll see a design-time error if the file already exists. Similarly, if you choose either the Append or the Truncate and Append option, you’ll see an error if the file does not yet exist. To make the design-time experience easier, there is a button in the UI of the destination to create the initial raw file in case you want to use either of the Append options without having previously created the file.
The raw file source, the simpler of the two raw file components, has just a couple of configuration options: the type of file name, a selector to choose the file name or the variable supplying that file name, and a data grid to allow the selection of the columns to be included from the raw file.
As shown in use below, the raw file source feeds the data flow pipeline, ultimately sending its data into the OleDB destination.
One thing you’ll want to keep in mind is that there is no automatic purge process to empty or remove the SSIS raw files after use. Therefore, if there is a concern about persisting the files, either because of disk space or data sensitivity, there should be a follow-on process that truncates or deletes the raw files after they are no longer needed. Also, if you want to track how recently the raw file was loaded, you could either check the update stamp of the file (using a script task), or use a date stamp column in the process that loads the raw file to allow storage of the load date in the data within the file.
Raw files in SSIS provide an efficient way to temporarily store a result set during package execution. The raw file source and destination in SSIS can be used to connect to the resulting raw file for interim processing. Although not an everyday set of tools, this source and destination are useful tools to know how to use in cases where persisting interim result sets is needed.