It wasn’t so long ago that the first day of the month was the most common trigger event for updating key metrics. Indicators such as profit, efficiency, bonuses owed, and other markers would be published monthly after that month’s data was tabulated (which may be days or even weeks into the new month). In some organizations, the work required to calculate these metrics took the entire following month to complete, making the process of preparing month-end data a never-ending cycle. In such cases, the freshness of the reporting data was limited to a monthly basis because of the amount of work involved to crunch the data.
Fortunately, automation and better data handling tools have eased much of that burden. These days, the process of making data available on a more frequent basis leans much more on technology and less on manual work. As a result, it is more common to find reporting data current as of yesterday than relying on days- or weeks-old data to make business decisions. Aided by automated business rule processing and data validation, the effort to make available reporting data from the prior day usually justifies the costs of doing so.
Do You Really Need Real-Time?
With the new de facto standard of having reporting data current as of a day ago, it makes sense to ask the question, “Can we get it faster?” After all, the pipeline to process data multiple times per day would have a lot of the same plumbing already built for the daily load process. However, loading data in less-than-daily intervals is rarely as simple as just increasing the frequency of the load process. When the loading of reporting or analytical data becomes an ongoing process rather than a nightly batch, issues such as OLTP data contention, accuracy in change detection logic, overlapping data loads, transactional consistency, and constant load monitoring require more attention than a single daily batch load during off-peak times. As such, the ETL logic and supporting processes used for real-time or near-real-time reporting and analytics will usually look different than that used by a daily load.
Moving to real-time or near-real-time reporting and analytics can be worth the investment. If part of your analytics workflow uses data from earlier today to make decisions today, then it may be worth exploring the total cost (up-front development as well as ongoing maintenance and monitoring) versus the business value. Be sure to do the cost-benefit analysis to make sure that business value is there. Just because you can load reporting data more frequently doesn’t mean you should.