ETL

Bad Data Can Kill

Bad data can kill. Literally. I’m not talking about the impact of bad data on the bottom line of business, where estimates of total losses are usually gauged in the hundreds of billions of dollars per year. Nor am I talking about the inconvenience we all face when presented with a real-world situation resulting from incorrect or stale data. The…


Using Change Tracking in SSIS

Recently, I wrote about how to get started with SQL Server change tracking, and I demonstrated a design pattern I use with change tracking in incremental load scenarios. In this post, I’ll round out the topic by showing how using change tracking in SSIS packages can add more flexibility to ETL processes. Using Change Tracking in SSIS In my last post I…


Using SQL Server Change Tracking for Incremental Loads

Earlier this week I wrote about the basics of change tracking in SQL Server, and showed how to get started using this technology for change detection. In this post, I’ll continue what I started by demonstrating how change tracking fits into a larger design pattern for end-to-end incremental load ETL processes. Incremental Load Overview ETL processes fall into one of…


Getting Started with Change Tracking in SQL Server

Change tracking for SQL Server is a flexible and easy-to-use technology for monitoring tables for inserts, updates, and deletes. In this post, I’ll discuss getting started with change tracking in SQL Server, and will show an example of how to get started with it. Change Tracking in SQL Server Change tracking is a lightweight mechanism for tracking which rows have…


How To Get Fired from an ETL Developer Job

Through the course of my 8-someodd years of building and fixing ETL processes, I’ve had the opportunity to see a lot of ETL code. Some of that code was really good, well-thought-out and carefully executed. Other load processes were – well, let’s just say that they provide plenty of consulting opportunities (and I include much of my early code in…


The SSIS Object Variable and Multiple Result Sets

In my most recent post in this series, I talked about how to use the SSIS object variable as an ADO recordset as a source in a data flow. By loading the result set of a query into this variable, the contents of the variable can be read by an SSIS script component and sent out through the SSIS pipeline….


Using the SSIS Object Variable as a Data Flow Source

Object variables in SSIS are incredibly versatile, allowing the storage of almost any type of data (even .NET objects). In my last post on this topic, I demonstrated how an SSIS object variable containing a .NET DataSet object could be used by the for each loop container as an iterator. In this post, I’ll continue the discussion by showing how…


Null, empty string, or zero?

The answer: It Depends. One of the more common problems I encounter when managing data quality, especially in an ETL process, is the proper handling of null, empty string, or zero values. When I put on my preaching shoes to talk about bad data, this is one the areas I have to spend a lot of time covering because it…


Using the SSIS Object Variable as a Result Set Enumerator

In the first post in this series, I covered the basics of object typed variables in SQL Server Integration Services, along with a brief examination of some potential use cases.  In this installment, I’m going to illustrate the most common use of object typed variables in SSIS: using an object variable as an ADO recordset within a loop container to…


Skipping Items in a Foreach Loop

Recently, my friend Jack Corbett asked a question on Twitter: In a nutshell, the SSIS foreach loop will enumerate a given list of items (files in a directory, nodes in an XML file, static list of values, etc.) and will perform some operation for each of the items in the collection. This behavior is similar to foreach loop constructs that…