Yesterday I wrote about the little-known but still useful multiple flat file connection manager. In this post, I will briefly show a more commonly used alternative approach for processing multiple data files: the foreach loop container.
The SSIS foreach loop container
The foreach loop container is used to iterate through a discrete list of items at runtime. That list could be a static list of items coded at design time, rows in a result set, nodes in an XML file, or of interest in this blog post, a list of files in a directory. The foreach loop container will execute its contents from zero to [n] times, with [n] being the number of items inventoried when the foreach loop starts its execution.
Processing multiple files in SSIS with the foreach loop
The SSIS foreach loop is configured to allow you to easily ingest multiple data files in a single data flow task. For this to work, all of the files would need to reside in the same directory structure (either locally or on the network), and they must all have the same structure and metadata.
In this design, the data flow is contained within the foreach container, which will execute the contents of that data flow task once for each file found in the specified directory.
As shown above, the foreach loop (indicated with the naming convention prefix FELC) contains only one task, the data flow (prefaced with DFT) that will be used to load the data files. To configure the foreach loop to iterate through the files in the source directory, we need to specify the source directory and filename pattern as shown below.
Configured as such, the foreach loop container will execute its contents – in this case, the single data flow task shown earlier – once for each file in the ~\_data\ directory above. To make this work properly, though, we need to write each filename in turn to a variable value, which will be used as the source filename inside that data flow. The Variable Mappings tab can be used as shown to write the full file path into a variable.
This setup will write each filename to the variable named vThisFile. After setting up the data flow task with a flat file source (using one of the files in the specified directory as an example to set up the metadata), you’ll replace the static filename in the flat file connection manager with the vThisFile variable. This will cause the flat file to use the current file path as it iterates through the list of files.
You can use the FileNameColumnName property of the flat file source to add the full filename as an additional column in the data flow for easy data lineage tracking.
Multiple flat file connection manager versus foreach loop
The method I describe above and the multiple flat file connection manager I outlined in yesterday’s post are two different ways to solve the same problem. Neither method is superior to the other – they’re just different approaches. Using the multiple flat file connection manager has the advantage of being simpler to configure, and does not require an extra variable storing the current file to process. The foreach loop container takes a tiny bit more effort to set up, but has the added benefit of being more flexible, especially in cases when you might want to allow for single-file load failures without failing the rest of the load.
Using the For Each file loop, the variable does not seem to get populated. I get this error:
CSV, SSIS.Pipeline: Pre-Execute phase is beginning.
Information: 0x402090DC at Load User to Entitlement Certification Completion Report from CSV, Flat File Source : The processing of file “” has started.
Warning: 0x80070003 at Load User to Entitlement Certification Completion Report from CSV, Flat File Source : The system cannot find the path specified.
Error: 0xC020200E at Load User to Entitlement Certification Completion Report from CSV, Flat File Source : Cannot open the datafile “”.