ETL

ETL Antipattern: Ignore the Logging

In my last ETL Antipatterns post, I wrote about the unexciting but very necessary work of documenting ETL processes. The logging of ETL operations is just as (un)captivating as documentation, but is equally as important in the support of data movement and transformation processes. In this post, I’ll discuss a common misstep in ETL process management: ignoring the logs. What…


ETL Antipattern: Skipping The Documentation

Documentation is an asset that is both loathed and loved. Creating technical and business documentation is often looked upon as a tedious chore, something that really ought to be done for every project but is often an easy candidate to push until later (or skip entirely). On the other hand, good documentation – particularly around data movement and ETL processes…


ETL Antipattern: Load Processes that Don’t Scale

One of the most significant design considerations in ETL process development is the volume of data to be processed. Most ETL processes have time constraints that require them to complete their load operations within a given window, and the time required to process data will often dictate the design of the load. One of the more common mistakes I’ve seen…


ETL Antipattern: Failure to Test and Validate

“If it compiles, it works.” – An unemployed developer Building ETL processes is quite easy. Building ETL processes that deliver accurate results as quickly as possible is substantially more difficult. Modern ETL tools (including my personal favorite, SQL Server Integration Services) make it deceptively easy to create simple load process. That’s a good thing, because an easy-to-understand front end shortens…


ETL Antipattern: Failing to Treat ETL Logic as Source Code

In most data projects, building the extract-transform-load (ETL) logic takes a significant amount of time. Enterprise ETL processes must do several things well: retrieve enough data to satisfy the business needs, apply any needed transformations to that data, and load it to the destination(s) without interruption to any other business processes. The work that goes into building and validating that…


ETL Antipattern: Lazy Metadata

If data is a train, then metadata is the track on which it travels. A good metadata definition in ETL processes will help to ensure that the flow of the data is predictable, robust, and is properly constrained to avoid errors. However, many ETL processes take a hands-off approach when it comes to metadata. In some cases, this laissez-faire design…


ETL Antipattern: Performing Full Loads Instead of Incremental Loads

In my last post in the ETL Antipatterns series, I wrote about the common antipattern of ingesting or loading more data than necessary. This brief post covers one specific case of loading more data than necessary by performing a full data load rather than using a smaller incremental load. ETL Antipattern: performing full loads instead of incremental loads Earlier this…


ETL Antipattern: Start With Writing Code

In this first post in my series on ETL Antipatterns, I’m going to discuss one of the most common missteps when building an extract-transform-load (ETL) process: jumping straight into writing code as a first step. ETL Antipattern: start with writing code Most data architects and developers are intensely curious folks. When we see a set of data, we want to…


The Eleven Days of Festivus 2020

We’re rounding the corner to the second half of December, which means it’s time for my favorite holiday: Festivus! Like many of you, I enjoy gathering around the Festivus pole and sharing the time-honored traditions such as the Feats Of Strength and the Airing Of Grievances. But my favorite Festivus tradition takes place right here on this blog: the Eleven…


The What, Why, When, and How of Incremental Loads

When moving data in an extraction, transformation, and loading (ETL) process, the most efficient design pattern is to touch only the data you must, copying just the data that was newly added or modified since the last load was run. This pattern of incremental loads usually presents the least amount of risk, takes less time to run, and preserves the…