Using Temporal Tables for Slowly Changing Dimensions

Using temporal tables for slowly changing dimensionsOne of my favorite recent additions to SQL Server is the ability to use temporal tables to retain change history. As I wrote in an earlier post on this topic, temporal tables allow one to easily configure a table keep all of the changes (including updated and deleted rows) in a designated history table. In this post, I will share a perfect use case for this feature: using temporal tables for slowly changing dimensions.

Slowly changing dimensions revisited

The term slowly changing dimension (SCD) will be very familiar to those who deal in the data warehouse trade. For those who do not work with SCDs, here’s a quick summary: A slowly changing dimension is a table of attributes that will change over time through the normal course of business.

A common example of a slowly changing dimension would be a Person table. When storing information about a person, you’ll often store attributes such as name, address, marital status, date of birth, and social security number. Depending on the domain in which you’re working, you may even store extended information such as hair color, eye color, homeownership status, and income level.

Some of these attributes would certainly change over time. Since the average person moves 11.4 times in their life, you can expect that the address component of your Person table will need to be updated. The same applies to name attributes, since people often change their names after marriage or other life events. By definition, the load process for a slowly changing dimension table would update those attributes when they change.

Slowly changing dimensions and history

So what does this have to do with tracking history? A lot, actually. For some of these attributes, your business processes won’t be impacted if they change. You are not likely to have a need to track a person’s hair color history. Same thing if someone’s date of birth changes: an update to this immutable value is most likely a data entry error, or a correction to an earlier error. These values where the entire change history does not need to be kept are referred to as Type 1 SCD values.

However, some changing attributes can have a significant impact on your business. If a person moves to a new address, by default all of their transaction history (sales transactions, healthcare encounters, insurance claims, etc.) will show up under their current address, not necessarily the address where they lived at the time. Imagine that your biggest customer relocates from one sales region to another. As soon as that address is updated, you’re going to see a large chunk of sales activity move from one region to another, even though the transactions took place prior to the move. To account for this, you would build a Type 2 SCD, which retains the change history for those cornerstone attributes. Each row has a start and end date, and any change requires expiring the current row and replacing it with the newest information for that person.

Challenges with Type 2 SCDs

Using a Type 2 SCD to track changing attributes is quite versatile. However, this design also presents some challenges:

  • Every process that updates these tables must honor the Type 2 SCD pattern of expiring the old row and replacing it with a new one
  • Every query against that table must account for the historical SCD pattern by filtering only for current data or for a specific point in time
  • There is no built-in constraint to prevent overlapping start and end dates for a given business key
  • If you convert an existing table to a Type 2 SCD, you will most likely have to touch every single query that reads from or writes to that table

In short, a Type 2 SCD is not a set-it-and-forget-it mechanism, and changing an existing table to a Type 2 SCD is going to be a huge pain.

Using temporal tables for slowly changing dimensions

This is the part of the story where the hero (the temporal table) swoops in to save the day. Everything described above with respect to Type 2 SCDs can be done using a temporal table while eliminating each of the challenges I listed.

Temporal tables will automatically store the insert, update, and delete history of each row. This happens without any explicit expire-and-replace logic in the ETL process. Data manipulation (insert, update, and delete) looks just like it does in a conventional table. Processes that load or update data do not need to be modified to accommodate the temporal table; all of the history retention logic takes place in the database engine, thus simplifying the ETL processes that touch the table.

Querying the table is also greatly simplified. When selecting from a temporal table, the current state of a given row will be returned by default, even if there are multiple historical versions of the same row. Remember, with a conventional Type 2 SCD table, you have to specify the date period in the query, lest you get every single historical version of that row. If you do need something other than the current version, using the FOR SYSTEM TIME AS OF {@DATETIME2} qualifier will allow you to see the point-in-time version of that table.

Because the row effective dates are managed by the database engine, you don’t need to worry that an erroneous ETL process will introduce date overlap issues. The built-in start and end DATETIME2 values are read-only, reducing the possibility of ETL logic errors in these fields.

Even better, you can forego the “Is this a Type 1 or Type 2 SCD?” debate when using temporal tables. Technically every change in a temporal table is stored like a Type 2, but for queries on attributes where history is not important, a simple SELECT against the table (without the FOR SYSTEM TIME qualifier) works just like a Type 1 SCD and returns just the current version of that row. You get Type 1 simplicity with Type 2 flexibility.

Implementing a temporal table as a slowly changing dimension table

To use a temporal table as a slowly changing dimension table, do this: Create the temporal table. That’s it. No special coding in the ETL process for expiring and replacing prior versions. Use simple INSERT, UPDATE, and DELETE statements just like you would on any non-versioned table. Simple SELECT statements let you retrieve the data as it is today, and the optional FOR SYSTEM TIME qualifier lets you look back at a given point in time.

I’m not one for flippantly offering predictions, but here’s an easy one: Temporal tables will become the standard for slowly changing dimension tables. There are numerous upsides to using temporal tables for slowing changing dimensions, and very few if any downsides.

Conclusion

Among the many use cases for SQL Server temporal tables, slowly changing dimensions are at the top of the list. Using temporal tables for slowing changing dimensions simplifies the data load process by eliminating the explicit expiration and replacement of updated data. It also eliminates the date filtering required of every SELECT query against that table. Temporal tables also reduce the possibility of having to rework the ETL and data retrieval queries when moving from a Type 1 to a Type 2 SCD. If you are building a data warehouse using SQL Server 2016 or newer, I encourage you to consider using temporal tables for slowly changing dimension tables.

About the Author

Tim Mitchell
Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

7 Comments on "Using Temporal Tables for Slowly Changing Dimensions"

  1. Humayoon Khan | April 30, 2019 at 3:16 pm | Reply

    Thanks for such a nice article about Temporal tables.
    I am currently implementing a SCD 2 scenario in a project. The problem that I am facing is that time period that system assigns to a row for which the row is valid doesn’t mean much in the business case.

    The table, for example is Health Insurance Plan. This dimension has validity period, let’s say from Jan 1 thru Dec 31. We get claims data that goes into the Fact table needs to have the key from “Health Insurance Plan” dimension based on the date of claim falling between the validity period of the specific Health Insurance Plan. Although there’re attributes that tell the validity of the plan – Eff Date & Term Date, but using Temporal table only these columns won’t be updated appropriately there’s to be a mechanism to update them.

    The System time columns are values from the time the row was inserted and updated.

    Any hint how to design these tables.

    Thanks

  2. Interesting read however your prediction about Temporal Tables replacing SCD’s confuses me. In a DW environment SCD’s exist to report on the DIM at that point in time the FACT is recorded.

    So if sale A occurs to sales person X on team Y. Person X changes to team Y and sale B occurs a few days later – how would one report on the sale A attributed to team Y?

    Would you have to join the FACT created/salesdate to the DIM SystemTime along with the DIM key?

    • Jeff, that’s a valid concern. The approach for using temporal tables as SCDs would look a bit different than a conventional SCD table, since you’d need to join the dimension table based on the date the fact record occurred rather than a dimension key. For very large dimension tables, I’d want to test performance on such a setup to make sure there are no surprises down the road.

  3. Thanks for the great information.
    Question re using Temporal Tables for Incremental Load for DW:
    As far as I can see, Temporal Tables would work for Incremental Load. Are they better option than Change Tracking? Any thoughts?

    Thanks,

    Alex B.

    • Alex, I don’t see one as being better than the other; they serve different purposes. Temporal tables would be useful in an incremental load, but if you don’t need the full change history for a table, change tracking would be a better option.

  4. Oh so where to start….I SO love temporal tables for many uses. I could provide a giant list of why they are great for things like logging and persistent staging areas. But type 2 dimensions is not one of their great uses in my opinion.

    And this is why….here are just a few significant issues with type 2 temporal….
    1) Temporal dates are UTC. Really? Then I sure hope all your dates are UTC in your EDW.
    2) Forcing date compare for every fact join? Holy cow dude….you realize that won’t scale right? And why would you burden your BI folks with that when a simple join on dimension id with traditional type 2 solves that problem? Never sacrifice your customer to save yourself some work.
    3) Then there’s the overlapping temporal times….thanks a lot Microsoft. Type 2 dates should never ever overlap for obvious reasons. The whole point is type 2 rows are mutually exclusive ranges of time but temporal overlaps them. I can deal with the overlaps with ETL reading temporal but would not want to impose that complexity on my customers.
    4) Finally….temporal creates new version for any change. You mentioned that but not the potential super big problem that is. I’ve often wanted some attributes as type 1 when business doesn’t need to see their changes such as corrections over time. Which in certain cases could cause massive volume increases. You may not want this on day one but you’re locked into versions for every attribute change with temporal. Once you go down that road there’s no going back. I’ve witnessed a datamart completely fail due to to many type 2 attributes in a dimension.

    Bottom line is…I agree slapping temporal on your dimension is super super easy. But I am going to forecast that sooner or later you will curse that decision if you’re the one who has to support it.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.