Data warehouse projects are among the most visible and expensive initiatives an organization can undertake. Sadly, they are also among the most likely to fail. At one time, Gartner reported that more than 50% of data warehouses would fail to make it to user acceptance. Because of the size of investment (both time and money) required, the success of such a project can make or break careers. Therefore, it is important to understand why data warehouse projects fail.
Why Data Warehouse Projects Fail
In my years as a data warehouse consultant, I’ve been called in to rescue a few stalled (or even failed) data warehouse projects. Although the postmortem of any two failed DW initiatives will never be identical, I find that there are some common themes in those projects that never make it over the finish line. Understanding why data warehouse projects fail is critical so you can avoid these common mistakes.
Below, I have assembled ten of the most common attributes I have found in unsuccessful data warehouse initiatives.
Not answering the big question: Why?
A surprising number of technical projects, including data warehouse initiatives, are undertaken without clear vision as to why they are needed. Sometimes it’s because the project deliverable is the industry buzzword of the year. Other times, it is just assumed that the organization needs the thing they are building because “everyone else has one.” The answer to the “why?” question is even more important than “how?”
Data warehouse projects are time-consuming and expensive, and require a great deal of support at every level of the organization. In every such initiative, there is at least one point in the middle of the project where a C-level executive asks, “Remind me again why we’re doing this…?” It is a valid question, and one that should have a codified answer long before the project begins.
Further, the answer to the question of “why?” should be known by everyone involved, not just the CxOs and those signing the checks to pay for it. I’ve seen far too many cases where staff members – especially technical people – are tasked with just doing a thing without understanding how it fits into the big picture. Each participant, from the architect to the business analyst, from project manager to QA tester, should understand the high-level goals of the project.
Using the Big Bang approach
A data warehouse is much more than a database and an ETL process feeding it. It is a complex intersection of various business units, dissimilarly shaped data coming in from numerous sources at different paces, and numerous metrics and measurements on top of each of these. In short, a data warehouse is a collection of smaller related projects which will be developed and tested at different times.
The most successful implementations I’ve seen have all involved incremental data warehouse development. Although this approach takes more careful planning and good communication, breaking the project into smaller pieces that can be developed, deployed, and tested has a higher success rate than trying to do everything at once (the “Big Bang” approach). Incremental development means your core assets are completed first, allowing any errors or omissions in the design to be corrected with minimal impact.
Although the everything-at-once Big Bang approach can work on very small data warehouse initiatives, this approach doesn’t scale well.
Jumping straight into writing code
The tool you should be using the most during the early days of a data warehouse project is a whiteboard. Any DW project that begins with, “Let’s build some tables!” or “I’ll write the ETL code!” is being driven by the wrong entity. It should be clearly understood that this is first and foremost a business project, not a technical one.
Understanding the business need (see the “Why?” bullet from earlier) is the first priority. Next is to understand the types of questions the business will be asking of the data. Coding a solution, even a prototype, is several steps away. Writing code should never be the first step in a data warehouse project.
Treating requirements and deliverables as just checkboxes
Don’t misinterpret my intent here: Understanding and honoring scope and deliverables is essential. However, data warehouse projects have suffered because requirements were treated as a punch list and delivered exactly as requested. Requirements and deliverables are what guide the tasks undertaken, but it doesn’t mean that they can’t be questioned or clarified.
When staff members are empowered to ask fundamental questions (“Why are we doing X when Y might be better?”) rather than just instructed to build a widget, you’ll end up with a more mature and robust data warehouse.
Disconnect between technical staff and stakeholders
Very often, there are very different languages spoken within each group represented in a data warehouse project. Technical folks speak in analytical and functional terms. Business analysts speak in behaviors and workflows. Executives understand outcomes and high-level results. Getting these groups on the same page is an essential task in these projects, and is also one of the most difficult things to do.
Managing communications between groups is a constant throughout the data warehouse life cycle. Among the reasons why data warehouse projects fail, this one is a factor in most any such failed initiative. From initial requirements gathering to setting expectations, from deployment to training, those managing the DW project must constantly ensure that each of these groups understands the others. From the outcomes and deliverables to the jargon used, it is critical to ensure that each group is moving toward the same finish line.
Shortening (or even skipping entirely) testing and validation
When time runs short on a data warehouse project, testing and validation are often the victims. An inexperienced project manager or architect might be enticed by the time savings of cutting or eliminating testing and validation. Conversely, someone who has rescued a project that stalled or failed due to inadequate testing knows well that this part of the project is as critical as any other.
There are problems that can only be discovered through proper testing and validation. These take time, but are essential to the success of the data warehouse initiative. Resist the urge to ease scheduling pressure by cutting back on this valuable exercise.
Spending too little time on ETL
Designing, building, and testing the extract-transform-load (ETL) logic is the most time-consuming part of every data warehouse project. It is also frequently underestimated during project scheduling. Often the ETL process is viewed as simply a copy operation, wherein data is read from one location and written to another. However, it’s much more complex than that – the “T” part of ETL is easily the most technically difficult and laborious part of the project.
The ETL layer is like the foundation of the house: get it wrong and the rest of the structure will be unstable. Take the time to do it right, following ETL best practices along the way.
Skipping the training
When deploying a data warehouse, you’re going to move a lot of cheese. You’ll be changing the way business users have interacted with data for years – possibly even decades! While building data warehouses is a lot of work for technical folks like us, learning to use the new data warehouse requires a lot of work as well. Proper training goes a long way to ease this transition.
Invest the time to train essential personnel. Don’t just deliver a truckload of documentation; work with users to make sure they can transition into the new way of accessing data. Train them in terms they understand, using whatever medium (run book, video, in-person training) that works for them.
One of the worst potential outcomes of such a project is that nobody uses the new data warehouse. Without proper training, data consumers might just keep doing things the old manual way. If the data warehouse sits unused, does it matter if the project was a technical success?
Using the wrong personnel
Data warehouse projects are unlike any other type of technical project, requiring knowledge of data warehouse architecture and best practices as well as domain-specific knowledge on the data. Simply put, using the wrong team of people is one of the reasons why data warehouse projects fail.
Choose carefully the personnel who will architect, build, and test your data warehouse solution. Whether you use in-house resources or bring in a partner to assist, be sure your team has deep experience with data warehouse projects and understands your organization’s unique data challenges.
A data warehouse project has no end date. Certainly there will be a date on which the solution goes live and resources devoted to its development are scaled back significantly. However, a data warehouse is a living thing, requiring ongoing care and feeding as data and business needs change. Paying too little attention to the ongoing needs of the data warehouse can result in a short-term success but long-term failure of the project.
Although there are myriad reasons why data warehouse projects fail, there are common themes found in many such unsuccessful initiatives. Avoid these pitfalls for a better chance of success!