ETL Antipattern: Skipping The Documentation

Documentation is an asset that is both loathed and loved. Creating technical and business documentation is often looked upon as a tedious chore, something that really ought to be done for every project but is often an easy candidate to push until later (or skip entirely). On the other hand, good documentation – particularly around data movement and ETL processes…


ETL Antipattern: Load Processes that Don’t Scale

One of the most significant design considerations in ETL process development is the volume of data to be processed. Most ETL processes have time constraints that require them to complete their load operations within a given window, and the time required to process data will often dictate the design of the load. One of the more common mistakes I’ve seen…


ETL Antipattern: Failure to Test and Validate

“If it compiles, it works.” – An unemployed developer Building ETL processes is quite easy. Building ETL processes that deliver accurate results as quickly as possible is substantially more difficult. Modern ETL tools (including my personal favorite, SQL Server Integration Services) make it deceptively easy to create simple load process. That’s a good thing, because an easy-to-understand front end shortens…


ETL Antipattern: Failing to Treat ETL Logic as Source Code

In most data projects, building the extract-transform-load (ETL) logic takes a significant amount of time. Enterprise ETL processes must do several things well: retrieve enough data to satisfy the business needs, apply any needed transformations to that data, and load it to the destination(s) without interruption to any other business processes. The work that goes into building and validating that…


ETL Antipattern: No Error Handling Logic

I usually avoid talking about technology in absolutes, but here’s one that I can share without reservation: On a long enough timeline, every single ETL process will eventually fail. Is your ETL design built to handle a failure? I see far too many SSIS packages, ADF data factories, and other data movement applications built with the assumption of 100% success,…


ETL Antipattern: Lazy Metadata

If data is a train, then metadata is the track on which it travels. A good metadata definition in ETL processes will help to ensure that the flow of the data is predictable, robust, and is properly constrained to avoid errors. However, many ETL processes take a hands-off approach when it comes to metadata. In some cases, this laissez-faire design…


ETL Antipattern: Performing Full Loads Instead of Incremental Loads

In my last post in the ETL Antipatterns series, I wrote about the common antipattern of ingesting or loading more data than necessary. This brief post covers one specific case of loading more data than necessary by performing a full data load rather than using a smaller incremental load. ETL Antipattern: performing full loads instead of incremental loads Earlier this…


ETL Antipattern: Processing Too Much Data

In my continuing series on ETL Antipatterns, I’ll discuss the problem of loading too much data in extract-transform-load processes. ETL Antipattern: processing too much data A common design flaw in enterprise ETL processes is that they are processing too much data. Having access to a great breadth and depth of data opens up lots of options for historical reporting and…


ETL Antipattern: Start With Writing Code

In this first post in my series on ETL Antipatterns, I’m going to discuss one of the most common missteps when building an extract-transform-load (ETL) process: jumping straight into writing code as a first step. ETL Antipattern: start with writing code Most data architects and developers are intensely curious folks. When we see a set of data, we want to…


The Eleven Days of Festivus 2020

We’re rounding the corner to the second half of December, which means it’s time for my favorite holiday: Festivus! Like many of you, I enjoy gathering around the Festivus pole and sharing the time-honored traditions such as the Feats Of Strength and the Airing Of Grievances. But my favorite Festivus tradition takes place right here on this blog: the Eleven…