Never Delete Data?

Never Delete DataShould you ever delete data?

Data quality is important, and completeness is a measure of the quality of data.  Whether you refer to it as data integrity, permanent retention, or simply maintaining a complete audit trail, it can be effectively argued that purging old data clouds the big picture that the data presents.  After all, any data that is worthy of storing, backing up, optimizing, and mining is worth storing permanently.  Deleting data affects the ability to thoroughly research historical activity, and can impact reports and aggregations on the remaining data.  Storing only the rolled-up data, such as end-of-year financial reports are often not sufficient, because auditors or financial personnel may need to drill down to the lowest level of detail.

In other cases, the value in persisting even very old data can be measured in other ways. Think about healthcare data that tracks diagnoses, treatments, and outcomes. To effectively study the long-term impact of health decisions and courses of treatment, having years – even decades – worth of data is essential. Information that is used today to manage billing and insurance might also be able to provide insights into which treatment paths are most effective. It’s entirely possible that data retention, along with effectively mining that data, could save lives.

Why delete data?

The need to routinely purge data was far more critical when storage was more expensive, in terms of dollars and system time.  Purchasing disks for storage has never been cheaper, and with modern 15000 RPM drives and solid state disks, data access times continue to improve.  Removing data simply for the sake of saving bytes on a platter is not as critical as it was just a few years ago.  Data can be retained indefinitely, in the original store or in a separate archive (another table or a different database altogether).

To be clear, I’m not taking on technical professionals who remove data as part of their day-to-day jobs. Rather, this deals with the broader issue of data retention, which is typically governed by corporate policy (and sometimes by law).  A proper data retention policy would involve all levels of an organization, from the CXOs to the technical staff and end users.  And a competent retention policy doesn’t have to mandate that data remains in the RDBMS – information can be stored in database backups, the filesystem, cold storage in the cloud, or a combination of several of these.  The specifics of permanent data storage should be dictated by how frequently or quickly the data would need to be accessed.

There are times when deleting data is a best practice.  Sensitive data which would never be reported on or reused is expected for the protection of customers or clients – the deleting of credit card numbers after a charge is successfully posted would fall into this category.

Defining data retention

The decision about what to purge and when does not reside with database administrators alone, or even with their employing organizations.  Some vendor applications will routinely delete older, less-often used data as part of a purge to better performance or decrease storage requirements.  I recently experienced this with a healthcare vendor during a conversion from their product to a newer system.  It was discovered during the planning phase of the conversion project that this vendor’s system was hard-coded to purge the detail data from old accounts.  Although we were able to reconstruct some of the data using other means, the ability to thoroughly report on that historical data has been permanently and irreversibly diminished.

Never delete data?

The bottom line is that you should ask yourself whether you could ever need the data you are deleting.  You shouldn’t just ask whether it is likely that you will need the data again – approaching from this angle will eventually come back to bite you.  A more appropriate question would be whether you can imagine any scenario in which the data would provide value in the future. If the data was worth collecting in the first place, it’s likely worth preserving.

About the Author

Tim Mitchell
Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

Be the first to comment on "Never Delete Data?"

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.