“We have all of that information. It’s in a database in my office.”
This phrase was music to my ears. I was working on one of my first-ever data reporting projects, and I had been searching in vain for a way to access historical point-in-time data from one of our apps in these new reports. Even though this data set was not critical to the project, it would have dovetailed nicely with the rest of the information to add even more insight into this part of our business.
At the time, I had limited experience with melding together disparate sets of data. However, my technical curiosity and general understanding of data integration built my confidence that I could make use of this data with a bit of effort. When this department director said that the information was already in a database, I was all but certain that I would prevail in this data integration quest.
Analyzing the “Database”
Eager to get started, I scheduled a meeting with the director. When I arrived at her office, she was almost as excited as I was about this reporting initiative. Because her database contained years of data, she reported, she welcomed the possibility of getting access to year-over-year performance reporting that the application (the original source of this data) could not provide. However, our collective hopes were quickly dashed when I asked about the details of this database:
Me: “How do we go about accessing this database? Is it Access, SQL Server, or some other system?”
Her: “I’m not sure what system it is. It’s right over there,” gesturing toward the corner of her office.
Me: “So it’s on a disk stored in that file cabinet?”
Her: “No, it’s on top of the file cabinet, in those manila folders.”
Me: “It’s amongst all those papers?”
Her: “It IS the papers. At the end of each year since I’ve been here, I have printed out the closing numbers into that database.”
Me: [Polite smile as I realize this initiative isn’t going to be the win I hoped it would] “Ok, let’s talk about our options.”
I spent a few minutes gently clarifying what is typically considered to be a database (from a systems integration perspective), and talked through the possible options for getting those years worth of printed printed forms into an electronic system that could then be queried. Later, after having investigating the options including using OCR to extract the data, the director and her departmental team decided that the effort to import this hard-copy information into an electronic system would not justify the business value. Since these year-over-year comparisons were a small part of their overall data needs, they opted to keep the existing sneakernet processes in place.
The Dead Tree Database
In my professional travels since I left there, I have learned that there are a lot of “databases” stored on paper (thus the dead tree reference) or other mediums that cannot be easily accessed electronically. Often there is a lot of valuable information wrapped up in those dead tree databases, but it has limited business value if it cannot be easily accessed. Sometimes, as in the case I recounted above, the cost of mobilizing that information outweighed the potential business value of that data.
Over the years I’ve collected the following truisms with respect to the accessibility and value of data:
- Not every “database” is a database. The fact that data is available does not mean that is easy to access or understand.
- There is a strong correlation between how accessible and understandable a data set is and its potential business value. Data on a hard-copy piece of paper (or other difficult-to-query medium) may contain valuable insights, but its value is degraded if it cannot be easily accessed. The same is true with data that is accessible but is structurally or semantically inconsistent or difficult to understand.
- For some such data sets, the potential business value does not justify the work required to connect to or otherwise restructure the data. Always consider the cost (in dollars, time, and effort) against the value said data would add.
As one who has made his living connecting data systems, the last point is a hard one to concede, but the bottom line should always be how the business benefits from the data.