When it comes to ETL, I’m a pessimist. Until proven otherwise, I assume that all data is bad, all connections are volatile, and all transformation logic is suspect. As such, I spent a lot of time addressing how to prepare for and handle errors in the ETL pipeline with SSIS. Building packages to expect and properly handle errors is a key component in a well-performing, durable, trustworthy ETL infrastructure.
SQL Server Integration Services has a number of built-in tools for handling errors and other anomalies. Among these:
- Event handlers, defined at the package, container, or task level
- Precedence constraints on the data flow to change the execution path in the event of an error
- Error outputs on sources and destinations in the data flow to redirect errors on a row-by-row basis
- Fail the task (and its ancestors), which is the default behavior
All of these options provide a way to handle or otherwise report the error. But what if your scenario requires that any error is simply ignored? SSIS can do that, too. In this post, I’ll show you how to use a particular attribute of event handlers to prevent the propagation of errors in SSIS executions.
Let’s say that we have an SSIS package that processes a series of flat files using an instance of the ForEach Loop Container. The expected behavior is that we specify the directory of files to be processed, and the loop container will process the specified file(s) in that directory. If a file exists in that directory that fails to process – perhaps its security settings do not allow it to be read by the account executing the SSIS package, or the file is improperly formatted – the default behavior is that the loop container would fail, which would fail the package and any parent package that executes it. This is known as propagation (remember that word), which can be visualized as a “bubbling up” of errors. In that way, propagation of errors is not unlike the way exceptions will bubble up in C# or VB.NET programming.
Now in this same scenario, let’s assume that our business needs dictate that we don’t have to successfully process all of the files for the ETL load to be considered a success, and having a partial load of data has greater business value than failing the entire load if one file fails to load. In that case, we’ll want to override the default behavior of our package to allow certain elements to fail without affecting the outcome of the package.
As shown below, I’ve set up a package for our scenario in which we truncate the output staging table, and then loop through a specified directory to process each text file in that directory.
Notice that I’ve got precedence constraints set up after the data flow task in the loop container. This will allow me to log either a success or failure of a load operation on a file-by-file basis. The data flow task, which will be executed once per file in the source directory, will attempt to process each text file in that directory as shown below.
In this updated scenario, I don’t want to allow a failed file load to interrupt the load of the remainder of the files, so I’m going to make a couple of modifications. First of all, I’ll create an error event handler for the data flow task.
You’ll notice that there are no tasks in the error event handler for the data flow task (DFT Load File). Although you can add some custom logic here to execute upon error, you don’t necessarily need to do so. However, creating the error event handler will expose a particular setting we’ll need to prevent the propagation of any errors.
While the error event handler shown above is still in view, open the list of SSIS variables. Note that you’ll also have to set the variables window to show system variables, which are hidden by default. In the list of variables, scroll down until you find a variable named Propagate. This Boolean value is the setting that specifies whether errors within a given executable to bubble up to the ancestor tasks, containers, and packages. To prevent these errors from bubbling up, I’ll change the value of the Propagate variable from True to False.
The net result is that any error in this data flow task will still be shown as an error in this task, but that error will short circuit and will not be passed along to any executables further up the chain.
You’ll notice that in the control flow of this package, I included two different paths from the data flow task – one with the On Success constraint, and the other with the On Failure constraint. I’ve done this to allow the package to separately log the file-by-file results (success or failure, along with row count). The reason I’ve shown this is because I want to emphasize that, in most situations, you should be logging any error such as this – even if your package is built not to fail when an error is encountered. In my opinion, most any operation that it attempted, even if it doesn’t affect the outcome of the package, should be logged – especially in the event of failure.
In this post I’ve demonstrated how the built-in components in SSIS can be used to allow for the continued operation after the failure of one executable in a package. By modifying the Propagate system variable in the error event handler of a volatile executable, you can prevent the automatic failure of upstream package elements, allowing continued execution after the failure of noncritical operations.