ETL Antipattern: Skipping The Documentation

Documentation is an asset that is both loathed and loved. Creating technical and business documentation is often looked upon as a tedious chore, something that really ought to be done for every project but is often an easy candidate to push until later (or skip entirely).

On the other hand, good documentation – particularly around data movement and ETL processes – is as valuable as the processes it describes. A clear and up-to-date document describing the what, when, where, and why of an ETL workflow adds transparency and makes the process much easier to understand for those who support it.

ETL Antipattern: skipping the documentation

I’ll admit that the work of documentation is far less enjoyable than actually building the thing you’re documenting. Creating documentation is a little like working on the cleanup crew after a really fun party: nobody really wants to do it, but it needs to be done, and done properly.

But good documentation is essential for any technical project, and that’s especially true for ETL projects. The way data is handled in the extract-transform-load process has a huge impact on the information presented to the consumers of that data. An ETL process that is opaque or incorrectly handling data will only devalue the resulting output.

What does good documentation look like?

In a nutshell, good documentation speaks to the audience for which it is intended. For a technical audience, include topics such as metadata specifications, security and authentication, and detailed error handling. Semi- or non-technical business users will be more interested in a summary of any calculated values as well as a brief narrative of business rules that are applied during processing. Both of these audiences will be interested in which sources of data are being used, any inclusion or exclusion rules for processing, and the cadence on which the load processes run.

It’s common to have separate documentation for technical users and business users. While this requires some extra effort, both audiences will benefit from having good documentation that will address the questions each group may have.

Creating good ETL process documentation takes time and effort, but its value cannot be overstated. Don’t take the shortcut of skimping on the necessary step of documenting your data movement and transformation tasks.

About the Author

Tim Mitchell
Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.