I’ve fielded a number of requests recently asking how to interrogate a file within SSIS and change the processing rules based on the metadata of said file. A recent forum poster specifically asked about using the foreach loop to iterate through the files in a directory and, on a per-file basis, either process the file or skip the file if it was updated after a specific date. I’ll use that most recent request to illustrate one method to implement conditional processing in a foreach loop container in SSIS.
Conditional Processing in a Foreach Loop Container
For this demonstration, our SSIS package will require the following:
- A foreach loop to process each file in a given directory
- A script task to interrogate each file and, based on the timestamp, mark it to be either processed or skipped.
- Two SSIS variables:
- @vFilename (String) – stores the current filename for each cycle of the foreach loop
- @vProcessFile (Boolean) – a flag to indicate whether the current file should be processed
- Two SSIS package parameters:
- @pSourceDirectory (String) – stores the directory to loop through
- @pMinDateStamp (DateTime) – indicates the earliest date to process
- A precedence constraint which will be configured to validate an expression and task outcome.
- A data flow task to process the validated files.
As a side note, I used both parameters and variables here because the function of each of these is different. I used package parameters for values that are to be specified by the user or process running the package. For values that will be internal to the operation of the package and will be modified during the execution of said package, I used package variables.
Set Up the Loop
The foreach loop is configured to work as a Foreach File Enumerator, which will loop through each of the files in the directory specified in the package parameter.
Since there is no native SSIS task designed to interrogate file metadata, we’re going to need to use a script task to do this. After dropping a script task from the toolbox into the foreach loop container, we’ll edit the script to create a FileInfo object as a logical hook to the file. After confirming that the file exists, we’ll compare its LastWriteTime property to the earliest acceptable cut-off date (defined by the @pMinDateStamp parameter value) to determine if the timestamp meets the criteria for processing. Based on the results of that comparison, we will set the @vProcessFile value to either True or False. You can see the resulting code logic in the snippet below:
So with each iteration of the foreach loop, the @vProcessFile value will indicate whether the current file should be processed or skipped. After adding a data flow task containing the necessary components to process the flat file, our next step would be to add a precedence constraint connecting the script task to the new data flow task. This precedence constraint will be configured to use an expression and a constraint, and will confirm that the current file is to be processed by interrogating the value of the @vProcessFile variable. If that value is true, then program flow continues to the data flow task; otherwise the loop starts again with the next file in turn. The precedence constraint would be configured as such:
After configuring all of the necessary tasks for this operation, the data flow pane should look similar to the following:
Now, when the SSIS package is executed, the timestamp of each file in the specified directory will be checked, and only those that meet the date criteria will be processed in the data flow task. Note that you could replace the timestamp in our example to some other file criteria; for example, you could check the file size, type, attributes, or other settings to determine if the file should be processed.
Although SSIS does not include a native component to conditionally process files, you can see from this example that a simple script can easily solve this ETL challenge.