Question from someone in one my recent classes: “What tool do you use to send email from ETL processes?”
My response: “I don’t.”
The tl;dr version of this post is I let my extract-transform-load processes do just ETL, and leave notifications to the scheduling system where they belong.
Get Your Email Out of my ETL
Before you read any further, you should know that I am a recovering ETL emailer. Not so many years ago, I would put everything – and I mean everything – in my ETL processes (typically using SSIS). Even things that didn’t really have anything to do with data movement. Of course, that included email notification tasks. However, in the years since, I have learned a very valuable lesson: There is a proper tool for every job. Quick fixes bring rapid relief, but often at the expense of sustainability and total cost of ownership.
Let’s start by reviewing the reasons you might consider including email notification logic in your ETL processes in the first place. By far the most common reason is to notify someone (or a group of someones) when something goes wrong in an ETL process. Less common reasons include the need for notifications when each process completes (a bad idea in itself, but I’ll save that discussion for another day), or a desire to send a subset or aggregate of the data (such as a row count or dollar sum) for validation. Every major ETL vendor includes email capabilities that are temptingly easy to slap onto existing ETL loads. And let’s make no mistake: some notifications are critical, especially those designed to get the attention of the folks who should respond to data load failures or anomalies. However, I’ve found that mixing the domains of ETL and exception alerts is easy to do but hard to support.
What’s The Problem?
What’s wrong with using the ETL load processes to send email directly? Let me count the ways.
It’s another potential point of failure for the ETL process. If the ETL load runs fine but something goes wrong with the email notification, it’s going to appear that the data load operation is broken when in fact it’s only the email step. Even worse, if the email notification happens in the middle of the load with other steps behind it, you’re now left with a partially completed load – all because of a failure of an operation not even central to data operations.
With some low-level failures, you’ll never get a notification. My ETL tool of choice is SSIS, so here’s how this would work there: Some part of the source or destination data structure changes and the SSIS package detects that it needs a metadata refresh. Because that validation step occurs before any of the actual work begins, it is possible that no part of the package – not even the error handling or notification logic – would be triggered. The tree would fall in the forest and no one would hear it.
It lengthens the development and testing process. ETL processes are often complex and require weeks or months to complete. Why unnecessarily clutter up the dev and testing process with more moving parts that are not a core part of data movement?
Inconsistency. In the most recent post in my ETL Best Practices series, I talked about code modularity. One of the benefits of modular processes is consistency. If you are inserting numerous ad-hoc exception notifications in ETL packages, it’s difficult to keep those consistent. Sure, you could have a generic process similar to what I describe in the modularity post, but that requires added complexity which is peripheral to the main purpose of the ETL process.
It’s not possible to include all required debugging information in a notification email. When I advise clients and course attendees against sending notifications directly from ETL, a common argument is that they want to include as much debugging information as possible in the notification email to give the responding tech or engineer everything they need to start working the problem. To include all of this information requires runtime access to error messages, thus the argument for sending email directly from the ETL load. My retort is this: How realistic is it that you’ll include in the email all the information you need to actually diagnose the problem? Anyone who has reviewed logs from any ETL system can attest to the fact that every error begets five others, and so on. Identifying the true root cause isn’t something you can automate, which means sending via notification email all of the data needed to troubleshoot the problem would require sending a novel through a POP3 connection.
It’s clunky. Perhaps there’s a better mousetrap somewhere out there, but the ETL systems I’ve worked with have email interfaces that are rudimentary at best. Email tasks in ETL software are limited in functionality and aren’t exactly designed to be robust.
A Better Way
As I mentioned earlier, there is a tool for every job. When considering the role of email notifications, it is more of a scheduling mechanism than an ETL function. Therefore, my recommendation is to always consider your scheduling tool as the principal vehicle for generating exception notifications for ETL loads. Every enterprise scheduling tool, from SQL Server Agent included with SQL Server up to commercial products such as Control-M or JAMS, have email notification tools built into them. Offload your exception notifications to whatever tool is scheduling the job.
To make the troubleshooting process faster, include a URL or other shortcut to your reports containing your logging information. As noted above, trying to cram in all of the related errors into a notification email is futile, and in the end the responder will probably just navigate to the logging tables anyway. Use the notification email to provide them that shortcut.
Finally, if you are convinced that you really do need to include error log information in the email notifications, you can lean on your reporting tools for that. For example, SQL Server Reporting Services will let you use data-driven subscriptions to send customized emails to send filtered log data to the right recipients.
Using email notifications from your ETL tool might seem like a quick and easy way to add alerts to your data loads, but ultimately this design pattern turns out to be a maintenance headache. Use the right tool for the job and let your scheduling tool handle that task.