ETL

Bad Data Can Kill

Bad data can kill. Literally. I’m not talking about the impact of bad data on the bottom line of business, where estimates of total losses are usually gauged in the hundreds of billions of dollars per year. Nor am I talking about the inconvenience we all face when presented with a real-world situation resulting from incorrect or stale data. The…


ETL Data Lineage

Before I began my technical career over a decade and a half ago, I spent several years working in law enforcement. In that field, one of the things one must learn quickly is the concept of the chain of custody of evidence. There were numerous procedures we had to follow to ensure that evidence was not just gathered and preserved,…


SSIS Training Classes for Summer 2016

If you are looking for affordable, high-quality training on SQL Server Integration Services, you may be interested in one of the two full-day workshops I have scheduled for August of this year. I’ll be taking my popular course Building Better SSIS Packages on the road to Baton Rouge, Louisiana and San Antonio, Texas. This course is designed for the data…


ETL Auditing

It happens far too often: Once an ETL process has been tested and executes successfully, there are no further checks to ensure that the operation actually did what it was supposed to do. Sometimes it takes a day, other times it takes a year, but eventually that call comes from a client, coworker, or boss: “What’s wrong with this data?”…


ETL Logging

If you were to poll data professionals on which tasks they enjoy working on the most, ETL logging would probably not make the list. However, it is essential to the success of any ETL architecture to establish an appropriate logging strategy. I like to compare a good logging infrastructure to the plumbing of a house: it is not outwardly visible,…


Using Custom File Delimiters in SSIS

File-based ETL is usually dull. Most systems generate (or expect to consume) files that are delimited, with a common field separator such as comma, tab, or pipe. However, occasionally you’ll get an oddly formatted file with an unusual delimiter. Although it’s not obvious in the Visual Studio designer, SSIS is capable of consuming and generating files with custom delimiters. In…


Using Change Tracking in SSIS

Recently, I wrote about how to get started with SQL Server change tracking, and I demonstrated a design pattern I use with change tracking in incremental load scenarios. In this post, I’ll round out the topic by showing how using change tracking in SSIS packages can add more flexibility to ETL processes. Using Change Tracking in SSIS In my last post I…


Using SQL Server Change Tracking for Incremental Loads

Earlier this week I wrote about the basics of change tracking in SQL Server, and showed how to get started using this technology for change detection. In this post, I’ll continue what I started by demonstrating how change tracking fits into a larger design pattern for end-to-end incremental load ETL processes. Incremental Load Overview ETL processes fall into one of…


Getting Started with Change Tracking in SQL Server

Change tracking for SQL Server is a flexible and easy-to-use technology for monitoring tables for inserts, updates, and deletes. In this post, I’ll discuss getting started with change tracking in SQL Server, and will show an example of how to get started with it. Change Tracking in SQL Server Change tracking is a lightweight mechanism for tracking which rows have…


We Don’t Trust This Data

“Learning to trust is one of life’s most difficult tasks.” –  Isaac Watts As data professionals, there are times when our jobs are relatively easy. Back up the databases. Create the dashboard report. Move the data from flat files to the database. Create documentation. There are lots of cogs in those machines, but an experienced technologist will have little trouble…