Tim Mitchell
Follow Tim Mitchell on Twitter  Like Tim Mitchell on Facebook  Subscribe on YouTube  Connect on LinkedIn  Subscribe to the Data Geek Newsletter

Using Temporal Tables for Slowly Changing Dimensions

Using temporal tables for slowly changing dimensionsOne of my favorite recent additions to SQL Server is the ability to use temporal tables to retain change history. As I wrote in an earlier post on this topic, temporal tables allow one to easily configure a table keep all of the changes (including updated and deleted rows) in a designated history table. In this post, I will share a perfect use case for this feature: using temporal tables for slowly changing dimensions.

Slowly changing dimensions revisited

The term slowly changing dimension (SCD) will be very familiar to those who deal in the data warehouse trade. For those who do not work with SCDs, here’s a quick summary: A slowly changing dimension is a table of attributes that will change over time through the normal course of business.

A common example of a slowly changing dimension would be a Person table. When storing information about a person, you’ll often store attributes such as name, address, marital status, date of birth, and social security number. Depending on the domain in which you’re working, you may even store extended information such as hair color, eye color, homeownership status, and income level.

Some of these attributes would certainly change over time. Since the average person moves 11.4 times in their life, you can expect that the address component of your Person table will need to be updated. The same applies to name attributes, since people often change their names after marriage or other life events. By definition, the load process for a slowly changing dimension table would update those attributes when they change.

Slowly changing dimensions and history

So what does this have to do with tracking history? A lot, actually. For some of these attributes, your business processes won’t be impacted if they change. You are not likely to have a need to track a person’s hair color history. Same thing if someone’s date of birth changes: an update to this immutable value is most likely a data entry error, or a correction to an earlier error. These values where the entire change history does not need to be kept are referred to as Type 1 SCD values.

However, some changing attributes can have a significant impact on your business. If a person moves to a new address, by default all of their transaction history (sales transactions, healthcare encounters, insurance claims, etc.) will show up under their current address, not necessarily the address where they lived at the time. Imagine that your biggest customer relocates from one sales region to another. As soon as that address is updated, you’re going to see a large chunk of sales activity move from one region to another, even though the transactions took place prior to the move. To account for this, you would build a Type 2 SCD, which retains the change history for those cornerstone attributes. Each row has a start and end date, and any change requires expiring the current row and replacing it with the newest information for that person.

Challenges with Type 2 SCDs

Using a Type 2 SCD to track changing attributes is quite versatile. However, this design also presents some challenges:

  • Every process that updates these tables must honor the Type 2 SCD pattern of expiring the old row and replacing it with a new one
  • Every query against that table must account for the historical SCD pattern by filtering only for current data or for a specific point in time
  • There is no built-in constraint to prevent overlapping start and end dates for a given business key
  • If you convert an existing table to a Type 2 SCD, you will most likely have to touch every single query that reads from or writes to that table

In short, a Type 2 SCD is not a set-it-and-forget-it mechanism, and changing an existing table to a Type 2 SCD is going to be a huge pain.

Using temporal tables for slowly changing dimensions

This is the part of the story where the hero (the temporal table) swoops in to save the day. Everything described above with respect to Type 2 SCDs can be done using a temporal table while eliminating each of the challenges I listed.

Temporal tables will automatically store the insert, update, and delete history of each row. This happens without any explicit expire-and-replace logic in the ETL process. Data manipulation (insert, update, and delete) looks just like it does in a conventional table. Processes that load or update data do not need to be modified to accommodate the temporal table; all of the history retention logic takes place in the database engine, thus simplifying the ETL processes that touch the table.

Querying the table is also greatly simplified. When selecting from a temporal table, the current state of a given row will be returned by default, even if there are multiple historical versions of the same row. Remember, with a conventional Type 2 SCD table, you have to specify the date period in the query, lest you get every single historical version of that row. If you do need something other than the current version, using the FOR SYSTEM TIME AS OF {@DATETIME2} qualifier will allow you to see the point-in-time version of that table.

Because the row effective dates are managed by the database engine, you don’t need to worry that an erroneous ETL process will introduce date overlap issues. The built-in start and end DATETIME2 values are read-only, reducing the possibility of ETL logic errors in these fields.

Even better, you can forego the “Is this a Type 1 or Type 2 SCD?” debate when using temporal tables. Technically every change in a temporal table is stored like a Type 2, but for queries on attributes where history is not important, a simple SELECT against the table (without the FOR SYSTEM TIME qualifier) works just like a Type 1 SCD and returns just the current version of that row. You get Type 1 simplicity with Type 2 flexibility.

Implementing a temporal table as a slowly changing dimension table

To use a temporal table as a slowly changing dimension table, do this: Create the temporal table. That’s it. No special coding in the ETL process for expiring and replacing prior versions. Use simple INSERT, UPDATE, and DELETE statements just like you would on any non-versioned table. Simple SELECT statements let you retrieve the data as it is today, and the optional FOR SYSTEM TIME qualifier lets you look back at a given point in time.

I’m not one for flippantly offering predictions, but here’s an easy one: Temporal tables will become the standard for slowly changing dimension tables. There are numerous upsides to using temporal tables for slowing changing dimensions, and very few if any downsides.

Conclusion

Among the many use cases for SQL Server temporal tables, slowly changing dimensions are at the top of the list. Using temporal tables for slowing changing dimensions simplifies the data load process by eliminating the explicit expiration and replacement of updated data. It also eliminates the date filtering required of every SELECT query against that table. Temporal tables also reduce the possibility of having to rework the ETL and data retrieval queries when moving from a Type 1 to a Type 2 SCD. If you are building a data warehouse using SQL Server 2016 or newer, I encourage you to consider using temporal tables for slowly changing dimension tables.

About the Author

Tim Mitchell
Tim Mitchell is a business intelligence and SSIS consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or SSIS training? Contact Tim here: TimMitchell.net/contact

Be the first to comment on "Using Temporal Tables for Slowly Changing Dimensions"

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: