ETL

ETL Antipattern: Processing Too Much Data

In my continuing series on ETL Antipatterns, I’ll discuss the problem of loading too much data in extract-transform-load processes. ETL Antipattern: processing too much data A common design flaw in enterprise ETL processes is that they are processing too much data. Having access to a great breadth and depth of data opens up lots of options for historical reporting and…


ETL Antipattern: Start With Writing Code

In this first post in my series on ETL Antipatterns, I’m going to discuss one of the most common missteps when building an extract-transform-load (ETL) process: jumping straight into writing code as a first step. ETL Antipattern: start with writing code Most data architects and developers are intensely curious folks. When we see a set of data, we want to…


The Eleven Days of Festivus 2020

We’re rounding the corner to the second half of December, which means it’s time for my favorite holiday: Festivus! Like many of you, I enjoy gathering around the Festivus pole and sharing the time-honored traditions such as the Feats Of Strength and the Airing Of Grievances. But my favorite Festivus tradition takes place right here on this blog: the Eleven…


The What, Why, When, and How of Incremental Loads

When moving data in an extraction, transformation, and loading (ETL) process, the most efficient design pattern is to touch only the data you must, copying just the data that was newly added or modified since the last load was run. This pattern of incremental loads usually presents the least amount of risk, takes less time to run, and preserves the…


Secure Your Data Prep Area

I’ve been building ETL processes for many years, and I’ve learned two universal truths: data preparation is messy, and you must always secure your data prep area. A data prep area is very similar to a commercial kitchen, and in the same way that customers aren’t allowed in the kitchen, so should data consumers be kept from accessing in-process data…


Temp Tables in SSIS

Temp tables are very handy when you have the need to store and manipulate an interim result set during ETL or other data processing operations. However, if you use SQL Server Integration Services as your ETL tool, you may find some challenges when trying to work with temp tables in SSIS packages, especially in the SSIS data flow. In this…


What is ETL?

In my ongoing series on ETL Best Practices, I am illustrating a collection of extract-transform-load design patterns that have proven to be highly effective. In the interest of comprehensive coverage on the topic, I am adding to the list an introductory prequel to address the fundamental question: What is ETL? What Is ETL? ETL is shorthand for the extraction, transformation,…


ETL Error Handling

In designing a proper ETL architecture, there are two key questions that must be answered. The first is, “What should this process do?” Defining the data start and end points, transformations, filtering, and other steps must be done before any other work can proceed. The second question that must be answered is “What should happen when the process fails?” Too…


Get Your Email Out of my ETL

Question from someone in one my recent classes: “What tool do you use to send email from ETL processes?” My response: “I don’t.” The tl;dr version of this post is I let my extract-transform-load processes do just ETL, and leave notifications to the scheduling system where they belong. Get Your Email Out of my ETL Before you read any further,…


ETL Modularity

Imagine for a moment that you’ve built a software thing. In fact, we’ll call it The Thing. You put a lot of work into The Thing, and it does exactly what you wanted it to. You put The Thing into play as part of a larger solution and, after a couple of revisions, its behavior is verified and it is…