Data Quality

Bad Data Can Kill

Bad Data Can Kill

Bad data can kill. Literally. I’m not talking about the impact of bad data on the bottom line of business, where estimates of total losses are usually gauged in the hundreds of billions of dollars per year. Nor am I talking about the inconvenience we all face when presented with a real-world situation resulting from incorrect or stale data. The…


We don't trust this data

We Don’t Trust This Data

“Learning to trust is one of life’s most difficult tasks.” –  Isaac Watts As data professionals, there are times when our jobs are relatively easy. Back up the databases. Create the dashboard report. Move the data from flat files to the database. Create documentation. There are lots of cogs in those machines, but an experienced technologist will have little trouble…


twoplustwo

Null, empty string, or zero?

The answer: It Depends. One of the more common problems I encounter when managing data quality, especially in an ETL process, is the handling of null values, blanks, and zeroes. When I put on my preaching shoes to talk about bad data, this is one the areas I have to spend a lot of time covering because it is so…


Digital program code

Fix Inconsistent Line Terminators in SSIS

There is a flat file processing issue I’ve run into a number of times over the years, and it’s come up again several times recently. The issue relates to the line terminators used in data files. Occasionally, changes to the systems generating these data files, or perhaps even manual edits, can change the way the file marks the end of…


fire.jpg

How to burn down your house while frying a turkey

It’s an odd query, yes, but in preparation to write this post I actually typed the above phrase into my browser.  No, I’m certainly not looking to burn down my house.  In fact, wait here while I clear my search history, just in case. For the sake of argument, let’s say you’re planning to fry a turkey over the upcoming…


mtn

Mile High Tech Con

There’s a brand new business intelligence conference launching next month in Denver, Colorado.  The Mile High Tech Con is a three-day event taking place July 24-26, 2014, and is aimed at business intelligence practitioners, data analysts, and information managers/CIOs. From the event website: Featuring three days of sessions and events focusing on Data Warehousing and Business Intelligence. Mile High Tech…


No Picture

Upcoming SQL Saturday Precons

I’m happy to announce that I’ll be delivering three, one-day preconference seminars this summer prior to three different SQL Saturday events: Iowa City, Iowa – Friday, July 26th (before SQL Saturday 239 – East Iowa) Orlando, Florida – Friday, September 13th (before SQL Saturday 232 – Orlando) Denver, Colorado – Friday, September 27th (before SQL Saturday 190 – Denver) For…


No Picture

Speaking at PASS Summit 2013

I’m happy to announce that I have been selected to present at the SQL PASS Summit in Charlotte, North Carolina this October.   I’ll be delivering a session entitled “Data Cleansing in SQL Server Integration Services”, in which I’ll cover various ways to detect and cleanse dirty data using tools built into (or accessible from) SQL Server Integration Services. This will…


No Picture

DQS Composite Domains and Value Combinations

As I’ve been working with Data Quality Services over the past few weeks, I’ve spent a lot of time working with data domains, composite domains, and rules.  In that exploration, I’ve found some behavior that might not be expected when performing cleansing operations against a knowledge base containing a composite domain. In this post, I’ll outline the expected data cleansing…


No Picture

DQS Validation Rules on Composite Domains

In Data Quality Services, composite domains can be created to associate together two or more natural domains within a knowledge base.  Like natural domains, composite domains can also contain one or more validation rules to govern which domain values are valid.  In my last post, I discussed the use of validation rules against natural domains.  In this post, I’ll continue…