Data warehouse projects are among the most visible and expensive initiatives an organization can undertake. Sadly, they are also among the most likely to fail. At one time, Gartner reported that more than 50% of data warehouses would fail to make it to user acceptance. Because of the size of investment (both time and money) required, the success of such a project can make or break careers. Therefore, it is important to understand why data warehouse projects fail.
Why Data Warehouse Projects Fail
In my years as a data warehouse consultant, I’ve been called in to rescue a few stalled (or even failed) data warehouse projects. Although the postmortem of any two failed DW initiatives will never be identical, I find that there are some common themes in those projects that never make it over the finish line. Understanding why data warehouse projects fail is critical so you can avoid these common mistakes.
Below, I have assembled ten of the most common attributes I have found in unsuccessful data warehouse initiatives.
Not answering the big question: Why?
A surprising number of technical projects, including data warehouse initiatives, are undertaken without clear vision as to why they are needed. Sometimes it’s because the project deliverable is the industry buzzword of the year. Other times, it is just assumed that the organization needs the thing they are building because “everyone else has one.” The answer to the “why?” question is even more important than “how?”
Data warehouse projects are time-consuming and expensive, and require a great deal of support at every level of the organization. In every such initiative, there is at least one point in the middle of the project where a C-level executive asks, “Remind me again why we’re doing this…?” It is a valid question, and one that should have a codified answer long before the project begins.
Further, the answer to the question of “why?” should be known by everyone involved, not just the CxOs and those signing the checks to pay for it. I’ve seen far too many cases where staff members – especially technical people – are tasked with just doing a thing without understanding how it fits into the big picture. Each participant, from the architect to the business analyst, from project manager to QA tester, should understand the high-level goals of the project.
Using the Big Bang approach
A data warehouse is much more than a database and an ETL process feeding it. It is a complex intersection of various business units, dissimilarly shaped data coming in from numerous sources at different paces, and numerous metrics and measurements on top of each of these. In short, a data warehouse is a collection of smaller related projects which will be developed and tested at different times.
The most successful implementations I’ve seen have all involved incremental data warehouse development. Although this approach takes more careful planning and good communication, breaking the project into smaller pieces that can be developed, deployed, and tested has a higher success rate than trying to do everything at once (the “Big Bang” approach). Incremental development means your core assets are completed first, allowing any errors or omissions in the design to be corrected with minimal impact.
Although the everything-at-once Big Bang approach can work on very small data warehouse initiatives, this approach doesn’t scale well.
Jumping straight into writing code
The tool you should be using the most during the early days of a data warehouse project is a whiteboard. Any DW project that begins with, “Let’s build some tables!” or “I’ll write the ETL code!” is being driven by the wrong entity. It should be clearly understood that this is first and foremost a business project, not a technical one.
Understanding the business need (see the “Why?” bullet from earlier) is the first priority. Next is to understand the types of questions the business will be asking of the data. Coding a solution, even a prototype, is several steps away. Writing code should never be the first step in a data warehouse project.
Treating requirements and deliverables as just checkboxes
Don’t misinterpret my intent here: Understanding and honoring scope and deliverables is essential. However, data warehouse projects have suffered because requirements were treated as a punch list and delivered exactly as requested. Requirements and deliverables are what guide the tasks undertaken, but it doesn’t mean that they can’t be questioned or clarified.
When staff members are empowered to ask fundamental questions (“Why are we doing X when Y might be better?”) rather than just instructed to build a widget, you’ll end up with a more mature and robust data warehouse.
Disconnect between technical staff and stakeholders
Very often, there are very different languages spoken within each group represented in a data warehouse project. Technical folks speak in analytical and functional terms. Business analysts speak in behaviors and workflows. Executives understand outcomes and high-level results. Getting these groups on the same page is an essential task in these projects, and is also one of the most difficult things to do.
Managing communications between groups is a constant throughout the data warehouse life cycle. Among the reasons why data warehouse projects fail, this one is a factor in most any such failed initiative. From initial requirements gathering to setting expectations, from deployment to training, those managing the DW project must constantly ensure that each of these groups understands the others. From the outcomes and deliverables to the jargon used, it is critical to ensure that each group is moving toward the same finish line.
Shortening (or even skipping entirely) testing and validation
When time runs short on a data warehouse project, testing and validation are often the victims. An inexperienced project manager or architect might be enticed by the time savings of cutting or eliminating testing and validation. Conversely, someone who has rescued a project that stalled or failed due to inadequate testing knows well that this part of the project is as critical as any other.
There are problems that can only be discovered through proper testing and validation. These take time, but are essential to the success of the data warehouse initiative. Resist the urge to ease scheduling pressure by cutting back on this valuable exercise.
Spending too little time on ETL
Designing, building, and testing the extract-transform-load (ETL) logic is the most time-consuming part of every data warehouse project. It is also frequently underestimated during project scheduling. Often the ETL process is viewed as simply a copy operation, wherein data is read from one location and written to another. However, it’s much more complex than that – the “T” part of ETL is easily the most technically difficult and laborious part of the project.
The ETL layer is like the foundation of the house: get it wrong and the rest of the structure will be unstable. Take the time to do it right, following ETL best practices along the way.
Skipping the training
When deploying a data warehouse, you’re going to move a lot of cheese. You’ll be changing the way business users have interacted with data for years – possibly even decades! While building data warehouses is a lot of work for technical folks like us, learning to use the new data warehouse requires a lot of work as well. Proper training goes a long way to ease this transition.
Invest the time to train essential personnel. Don’t just deliver a truckload of documentation; work with users to make sure they can transition into the new way of accessing data. Train them in terms they understand, using whatever medium (run book, video, in-person training) that works for them.
One of the worst potential outcomes of such a project is that nobody uses the new data warehouse. Without proper training, data consumers might just keep doing things the old manual way. If the data warehouse sits unused, does it matter if the project was a technical success?
Using the wrong personnel
Data warehouse projects are unlike any other type of technical project, requiring knowledge of data warehouse architecture and best practices as well as domain-specific knowledge on the data. Simply put, using the wrong team of people is one of the reasons why data warehouse projects fail.
Choose carefully the personnel who will architect, build, and test your data warehouse solution. Whether you use in-house resources or bring in a partner to assist, be sure your team has deep experience with data warehouse projects and understands your organization’s unique data challenges.
A data warehouse project has no end date. Certainly there will be a date on which the solution goes live and resources devoted to its development are scaled back significantly. However, a data warehouse is a living thing, requiring ongoing care and feeding as data and business needs change. Paying too little attention to the ongoing needs of the data warehouse can result in a short-term success but long-term failure of the project.
Although there are myriad reasons why data warehouse projects fail, there are common themes found in many such unsuccessful initiatives. Avoid these pitfalls for a better chance of success!
Good post and it should be a must-read for anyone engaging in an EDW project. In this regard we can all learn something from the hadoop world and consider adjusting our behaviors where there is a failure history. Specifically,
–EDWs take too much time to show value. And then, the value is often nebulous, as you point out under “big question: why?”. The two biggies here are
–logical modeling: as practitioners we waste too much time doing star schema modeling. And only then can we do…
–ETL: too much time spent writing the same ETL code over-and-over, by expensive contractors who don’t know your data.
–ETL, in general, should be avoided whenever possible. Use external tables, linked servers, semantic tiers, and leverage your existing users that KNOW your data and know SQL.
Now look at how this is done in Hadoop. Very little modeling is done up front and users are given areas of the “data lake” to experiment. They do this with Hive. Now data can be “learned” and value extracted without a formal project. As a dataset becomes invaluable it becomes “certified”, the ELT process formalized, and the data naturally moved in to the EDW.
Point is…even the Kimball folks are re-evaluating how they’ve preached doing things. We should all consider alternatives and reconsider if our “best practices” are really that great.
An excellent summary of the issues around DW projects.
Agree with all your comments, as a former person who used to design and implement Data Warehouses. I am not sure that I would lump Hadoop in the conversation. I think Hadoop has its uses as “lake” of unstructured data for use in place or as a staging ground for data.
Having said all that, I realized that a DW is obsolete from the moment the the data model is finalized. The whole premise behind data relational databases and dimensional modeling is that the business rules are built within the relationships and referential integrity. Business rules change consistently in most organizations, so guess what your model is obsolete before you start building the DW. This is called “early binding” of data. One of the premises behind Hadoop and unstructured data is that you can use “late binding” of data. This simply means that you don’t “bind” the relationships until you access the data. This simply, means a more flexible data structure. I think that is what Dave is getting to. I suggest doing some reading about “Early binding vs. late binding” of data. But, IMO, this is what makes the traditional DW obsolete and is a contributor to failure along with the reasons Tim outlines above.
One addtional contributor to failure for any IT project and DW is no exception is assuming tools do what the vendor claims without appropriate testing with your data in representative volumes.