Data Quality

Bad Data Can Kill

Bad data can kill. Literally. I’m not talking about the impact of bad data on the bottom line of business, where estimates of total losses are usually gauged in the hundreds of billions of dollars per year. Nor am I talking about the inconvenience we all face when presented with a real-world situation resulting from incorrect or stale data. The…


We Don’t Trust This Data

“Learning to trust is one of life’s most difficult tasks.” –  Isaac Watts As data professionals, there are times when our jobs are relatively easy. Back up the databases. Create the dashboard report. Move the data from flat files to the database. Create documentation. There are lots of cogs in those machines, but an experienced technologist will have little trouble…


Null, empty string, or zero?

The answer: It Depends. One of the more common problems I encounter when managing data quality, especially in an ETL process, is the proper handling of null, empty string, or zero values. When I put on my preaching shoes to talk about bad data, this is one the areas I have to spend a lot of time covering because it…


Fix Inconsistent Line Terminators in SSIS

When processing data files using SQL Server Integration Services, it is not uncommon to find files with different end-of-line markers for each line in the file. In this post, I will demonstrate how to fix inconsistent line terminators in SSIS to avoid ETL errors. Fix Inconsistent Line Terminators in SSIS In every text file, there are unprintable characters called line…


How to burn down your house while frying a turkey

It’s an odd query, yes, but in preparation to write this post I actually typed the above phrase into my browser.  No, I’m certainly not looking to burn down my house.  In fact, wait here while I clear my search history, just in case. For the sake of argument, let’s say you’re planning to fry a turkey over the upcoming…


Mile High Tech Con

There’s a brand new business intelligence conference launching next month in Denver, Colorado.  The Mile High Tech Con is a three-day event taking place July 24-26, 2014, and is aimed at business intelligence practitioners, data analysts, and information managers/CIOs. From the event website: Featuring three days of sessions and events focusing on Data Warehousing and Business Intelligence. Mile High Tech…


Upcoming SQL Saturday Precons

I’m happy to announce that I’ll be delivering three, one-day preconference seminars this summer prior to three different SQL Saturday events: Iowa City, Iowa – Friday, July 26th (before SQL Saturday 239 – East Iowa) Orlando, Florida – Friday, September 13th (before SQL Saturday 232 – Orlando) Denver, Colorado – Friday, September 27th (before SQL Saturday 190 – Denver) For…


Speaking at PASS Summit 2013

I’m happy to announce that I have been selected to present at the SQL PASS Summit in Charlotte, North Carolina this October.   I’ll be delivering a session entitled “Data Cleansing in SQL Server Integration Services”, in which I’ll cover various ways to detect and cleanse dirty data using tools built into (or accessible from) SQL Server Integration Services. This will…


DQS Composite Domains and Value Combinations

As I’ve been working with Data Quality Services over the past few weeks, I’ve spent a lot of time working with data domains, DQS composite domains, and rules. In that exploration, I’ve found some behavior that might not be expected when performing cleansing operations against a knowledge base containing a composite domain. In this post, I’ll outline the expected data…


DQS Validation Rules on Composite Domains

In Data Quality Services, composite domains can be created to associate together two or more natural domains within a knowledge base.  Like natural domains, composite domains can also contain one or more validation rules to govern which domain values are valid.  In my last post, I discussed the use of validation rules against natural domains.  In this post, I’ll continue…