Tim Mitchell
Follow Tim Mitchell on Twitter  Like Tim Mitchell on Facebook  Subscribe on YouTube  Connect on LinkedIn  Subscribe to the Data Geek Newsletter

Using the SSIS Multiple Flat Files Connection Manager

When building an ETL pipeline to import data from a text file, it’s very common to have the incoming data spread across multiple files. For example, if you are ingesting files generated on a periodic basis (per day, per hour, etc.), you could have dozens or hundreds of files with identical structure. This is an ideal setup for building a modular solution rather than separate import process for each file.

To make this process easier, SQL Server Integration Services include a Multiple Flat Files connection manager, allowing you to iterate through a list of identically-structured flat files in a data flow.

Using the SSIS Multiple Flat Files connection manager

The multiple flat files connection manager in SSIS works very much like the familiar flat file connection manager. The former lives in relative obscurity, hiding in the list of other miscellaneous connections accessible only by right-clicking the connection managers area and selecting New Connection.

image

The multiple flat file connection manager, labeled as MULTIFLATFILE, appears as shown in the list of generic connections.

MULTIFLATFILE

When you add a new MULTIFLATFILE connection, you’ll see the similarity between this and its flat file connection manager cousin. In fact, the only obvious visual differences are the title bar text and the slightly different icon for the multiple file variant of this connection manager.

Configuring the multiple flat file connection manager works exactly the same as the single file connection, with the exception of the File Name configuration. Instead of specifying a single file, you’ll use this setting to specify multiple file paths (separated by a vertical pipe character), or by specifying a given directory with a file name wildcard (such as *.txt). The only limitations here are that all of the files processed by the multiple flat file connection manager must have the exact same structure, and that the MULTIFLATFILE connection can only be used as a source and not a destination.

Loading data with the multiple flat file connection manager

Once configured, the multiple flat file connection manager will transparently loop through the source files specified in the File Names property. The data flow UI will show the total number of rows processed. For this example, I chose to use the wildcard approach, such as “C:\MyDataDirectory\*.txt”.

image

If you want to see how many rows of data were processed from each file, you can see that information in the Progress window when executing from SSDT…

image

… or by querying the [catalog].[event_messages] table for packages executed in the SSIS catalog.

image

If you want to include the file name from which each row originated, you can use the FileNameColumnName property that I described in yesterday’s post.

In both of the above loads, the files were processed in alphabetical order, but I don’t see that there is any guarantee of the order of processing. If you need to process files in a specific order, you’ll probably want to use a different method, such as the for each loop.

Conclusion

The multiple flat file connection manager (MULTIFLATFILE) is an obscure but useful tool to allow you to easily process identically-structured files using a single connection manager.

For more information about the multiple flat file connection manager, check out the official documentation on MSDN.

About the Author

Tim Mitchell
Tim Mitchell is a business intelligence and SSIS consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or SSIS training? Contact Tim here: TimMitchell.net/contact

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: