When building an ETL pipeline to import data from a text file, it’s very common to have the incoming data spread across multiple files. For example, if you are ingesting files generated on a periodic basis (per day, per hour, etc.), you could have dozens or hundreds of files with identical structure. This is an ideal setup for building a modular solution rather than separate import process for each file.
To make this process easier, SQL Server Integration Services include a Multiple Flat Files connection manager, allowing you to iterate through a list of identically-structured flat files in a data flow.
Using the SSIS Multiple Flat Files connection manager
The multiple flat files connection manager in SSIS works very much like the familiar flat file connection manager. The former lives in relative obscurity, hiding in the list of other miscellaneous connections accessible only by right-clicking the connection managers area and selecting New Connection.
The multiple flat file connection manager, labeled as MULTIFLATFILE, appears as shown in the list of generic connections.
When you add a new MULTIFLATFILE connection, you’ll see the similarity between this and its flat file connection manager cousin. In fact, the only obvious visual differences are the title bar text and the slightly different icon for the multiple file variant of this connection manager.
Configuring the multiple flat file connection manager works exactly the same as the single file connection, with the exception of the File Name configuration. Instead of specifying a single file, you’ll use this setting to specify multiple file paths (separated by a vertical pipe character), or by specifying a given directory with a file name wildcard (such as *.txt). The only limitations here are that all of the files processed by the multiple flat file connection manager must have the exact same structure, and that the MULTIFLATFILE connection can only be used as a source and not a destination.
Loading data with the multiple flat file connection manager
Once configured, the multiple flat file connection manager will transparently loop through the source files specified in the File Names property. The data flow UI will show the total number of rows processed. For this example, I chose to use the wildcard approach, such as “C:\MyDataDirectory\*.txt”.
If you want to see how many rows of data were processed from each file, you can see that information in the Progress window when executing from SSDT…
… or by querying the [catalog].[event_messages] table for packages executed in the SSIS catalog.
If you want to include the file name from which each row originated, you can use the FileNameColumnName property that I described in yesterday’s post.
In both of the above loads, the files were processed in alphabetical order, but I don’t see that there is any guarantee of the order of processing. If you need to process files in a specific order, you’ll probably want to use a different method, such as the for each loop.
The multiple flat file connection manager (MULTIFLATFILE) is an obscure but useful tool to allow you to easily process identically-structured files using a single connection manager.
For more information about the multiple flat file connection manager, check out the official documentation on MSDN.