The Numbers Don’t Lie… Except When They Do

There are few things more reassuring for a data professional than having clean, consistent data to back up critical business decisions.  The numbers don’t lie, or so they say.  But can the right data lead to wrong conclusions?  Sadly, yes, and I suspect that it happens more often than we’d like to admit.

Recently, as part of a large hospital project I’ve been working on, I’ve been addressing questions around the all-important census, the magic number of patients bedded in a given facility.  This facility converted from an outdated software package to a more modern, SQL-Server based product about a year ago, and one of the key goals with the new system was to “get the census right”.  See, the old system had at least a few dozen census reports, most of which were in disagreement with the others about the true count of in-house patients.  Because the software package was dated, the most common complaint was that “the system” was reporting incorrect information.  However, during my review of the archived reports, it quickly became clear that the reports from the old system were all correct.  The malfunction was not in the answers, but in the questions.

Speaking Different Languages
The root of these types of problems is consistently identifying the performance metrics of a business.  In our census example, one report might include only admitted patients, while another could include those still in triage in the ER.  One report shows the count of inpatients as of the previous midnight, while another provides the same information in real time.  In isolation, each of these metrics is correct, but when held side-by-side with the others, it appears that the output is wrong.  This is, in my experience, a trend on the rise as the movement toward self-service reporting continues to grow: more and more end users are querying their information systems than ever before, and many of them are basing critical decisions on loosely defined standards and definitions.

To avoid these assumptions and half-truths with data, I offer the following preferred practices:

  • Clearly define the metrics, entities, and standards that are critical to your business intelligence, and share them with all principals.  Often this involves answering what appear to be silly questions: “What is a day?”, “What is a patient?”, “What is a sale?”, “What is a billable hour?”, etc.  By clarifying these elemental questions, your downstream metrics will be improved because everyone understands what is and is not included in these definitions.  Once those terms are defined, be dogmatic in their use.
  • Involve all business units in those standards-setting conversations.  Regardless of your industry, you should include principals from each major facet of your business – sales, marketing, decision support, executives, customer service – to ensure not only a comprehensive understanding of the reporting needs but to also create a feeling of ownership in the process.  If they believe in it, they’ll support it.
  • Ask leading questions.  Don’t simply give everyone ad-hoc access to your raw data; use the technical tools available to enable managed self service reporting (security controls, data warehousing, denormalizing views, or more encompassing tools such as PowerPivot) to limit the open-endedness of most user queries.  The data should be versatile enough to answer the important business questions without creating a free-for-all where the results could be made to show almost anything.
  • Validate your output.  There should be a formal, mandatory validation process for each new output created, whether it’s a simple report or an entire volume of data.  Having a validation process that crosses business units is highly desirable, as this lends itself to more accurate and versatile (reusable) reports.  Part of this review process should be to confirm that the data retrieved is not already provided by existing outputs.

With the growing trend toward self-service reporting, it’s more important for everyone in a business to keep on the same page.  It’s still going to be a challenge, but by following these few practices, this process should be a little easier.

Upcoming Speaking Engagements for December 2009

Next week, I have the honor of presenting two different sessions on SSIS.  These 2 events are the last speaking engagements on my calendar for this year:

On Monday, December 14 at 11:30 CST, I’m presenting a SQL Lunch session to discuss looping logic in SSIS using the For Loop and the For Each Loop.  This will be a working session, consisting almost exclusively of demos.  Thanks to Patrick LeBlanc for yet another opportunity to present to this group.

The following day, Tuesday, December 15th at 11:00 CST, I’ll present “Looping, Moving Files, and Splitting Data Streams: Intermediate SSIS Tasks for the DBA” for the PASS DW/BI virtual chapter.

See you there!

Electronic Health Records – What’s the Big Deal? (Part 3)

In the previous post in this series, I discussed the obstacles to implementing electronic health data systems.  Becausesilos of these obstacles, many providers are resistant to replacing their paper-based “databases” with true EHR systems.  But assuming the best case scenarios, that all healthcare providers and vendors convert from paper (or quasi-paper) to digital, that still  doesn’t fully solve the problem – you’ve still got a lot of electronic data that exists in silos.

Why Integrate?

For maximum benefit to the patient and, in the long run, to the industry as a whole, this information should be visible across every platform and every organization.  Properly authenticated users should be able to access a patient’s health record instantaneously, including even very recent visits to other providers.  Non-clinical decision making personnel should be able to aggregate the same information to analyze disease trends and patient outcomes to help identify the factors that lead to illness and the success/failure rate of different treatment paths.  This information would need to be available in real-time or very close to, in expectation of large scale, rapidly evolving health situations (avian flu, H1N1, etc.).  With this information, front line personnel could quickly make treatment decisions based on previous outcomes and the patient’s own treatment history.  Family physicians could immediately access their patients’ healthcare history from hospitals, specialists, and other caregivers, allowing them to make diagnoses and medication/lifestyle changes based on the big picture of the patient’s health rather than simply addressing acute issues as they emerge.  Insurers and other entities with a financial stake could better plan for anticipated treatment costs, which could become more consistent with the volume of data used for this analysis.  Simply put, the best use of healthcare data would include a large-scale integration plan to insure consistency of treatment and improve patient outcomes.

It’s Not So Easy

Healthcare data integration presents multifaceted challenges, including technical, administrative, and strategic obstacles:

Technical issues include the questions of responsibility and availability: Which entity(ies) will be responsible for insuring the data is always available and is properly protected?  Is the data to be centrally located for common access, and if so, who pays for this storage?  Some initiatives currently underway include RHIOs (regional healthcare information organizations), which seek to share information on a relatively local level.  Properly administered, a national system based on the RHIO model might be a good solution.

Administrative issues include budgeting for data integration, allocating personnel, and integrating the integrated information into the workflow.  It can’t be overstated that healthcare organizations, and especially front-line providers, can ill afford distractions, especially those that don’t result in an immediate payoff.  From the time required of technical staff to handle the nuts-and-bolts of implementation, to providers who have to learn to access and updated the integrated data, and finally the back-office personnel who validate and audit the information, there is an investment in human capital, which always comes with at least a soft cost.

Strategic issues certainly exists – after all, healthcare data integration violates a cardinal rule of business by simply giving away your most sensitive information.  Providers and vendors are under increasing pressure to remain competitive, and sharing patient and treatment information pulls back the curtain a bit, perhaps too much for the comfort of some.  Certainly some safeguards would need to be implemented to allow the sharing of information for everyone’s benefit without unnecessarily harming smaller companies.

Hopefully the picture I’ve painted isn’t a bleak one.  The good news is that there have been strides over the past few years, and we are closer to true healthcare data integration than we were a decade ago.  I think the effort will get a big push with the legislative changes coming down the pike, and while I don’t believe that decisions by Congress will completely address the obstacles, said changes could help establish standards and safeguards to make the process a little less painful for everyone.

BIDN.com Launch Today

For those interested in the SQL Server BI space, there is a new online resource launching today.  Brian Knight and my friends over at Pragmatic Works have released the Business Intelligence Developer Network, a virtual community of business intelligence professionals.  You’ll find discussion forums, blogs, articles, a file/script sharing repository, and a community event calendar.  All resources are free, though you’ll need to create an account (also free) to access some of the content.

Want to become an author? BIDN welcomes submissions from the community, and even offers a little compensation for articles published on the site.  This is an excellent way to help others, build a new community, and strengthen your own street cred.