Search Results for data quality

Data Quality: The Discovery

I wrote a post a few months back about a healthcare data conversion project that I’ve been working on for the better part of 2 years.  My task on this project is to convert data from an old UNIX-based Universe database to a SQL Server-based application; the database we are extracting from is quite old, both in terms of technology…


Using the SSIS Error Output On the Data Flow

When working in the SSIS data flow, you’ll notice that many sources and transformations and some destinations have a built-in output to handle errors. The error output allows the SSIS developer to create a separate path through which error rows can be directed. In this SSIS Basics post, we’ll briefly discuss the essentials and design patterns for using SSIS error…


Metadata Hygiene

Those who follow my blog know that I write a lot about data quality. Measuring and improving the quality of data is an important part of any data initiative, especially in the data warehousing space. While data quality does get its share of attention, there is a concept that is equally important but is sadly overlooked during most data projects:…


Managing Bad Data in ETL

In the last post in my ongoing series about ETL best practices, I discussed the importance of error handling in ETL processes, reviewing best practices for application flow to prevent or gracefully recover from a systematic error or data anomaly. In this post, I’ll dig a bit further into that topic to explore the design patterns for managing bad data in ETL…


We Don’t Trust This Data

“Learning to trust is one of life’s most difficult tasks.” –  Isaac Watts As data professionals, there are times when our jobs are relatively easy. Back up the databases. Create the dashboard report. Move the data from flat files to the database. Create documentation. There are lots of cogs in those machines, but an experienced technologist will have little trouble…


Finally, a Universal Data Integration Utility

Earlier today, the fine folks at the F. Oobar Corporation released a revolutionary product: a universal data integration utility. This software component, known as the Baseline Ongoing Generic Utility for Synergy, will run on any platform and can convert data to and from almost any format automatically. It also reads the semantics of the data to determine exactly how it…


Never Delete Data?

Should you ever delete data? Data quality is important, and completeness is a measure of the quality of data.  Whether you refer to it as data integrity, permanent retention, or simply maintaining a complete audit trail, it can be effectively argued that purging old data clouds the big picture that the data presents.  After all, any data that is worthy…


Managing Business Logic

Encapsulating business logic into data movement and presentation is a critical part of a stable information management strategy. Too often, though, business logic is built and added late in the process, forcing it into whatever nooks and crannies are available. While this duct-tape approach sometimes works, it makes the resulting system difficult to maintain when the business logic is spread…


ETL Error Handling

In designing a proper ETL architecture, there are two key questions that must be answered. The first is, “What should this process do?” Defining the data start and end points, transformations, filtering, and other steps must be done before any other work can proceed. The second question that must be answered is “What should happen when the process fails?” Too…


Row Numbers and Running Totals in SSIS

During data load or transformation processes, capturing a distinct row number for incoming data can be beneficial for the ETL process itself, as well as for use in the destination database. Having an arbitrary, incrementing row number assigned to each row can help to determine the order in which the rows of data were processed, and can provide a unique…