This week I’m attending the SQL PASS Summit in Seattle. I’ll be live blogging each of the keynote presentations on Wednesday and Thursday morning. This post will be updated throughout the duration of the Day 2 keynote.
It’s the second full day of the PASS Summit and we’re back for another keynote. Today we’ll hear from the always-popular David DeWitt, who announced his retirement last year but seemingly can’t stay away from our SQL Server community. We’re delighted to have him back for another year.
First up, the kilt-wearing Grant Fritchey takes the stage to give a brief financial update. He shares some information about the reach of PASS, including the fact that there were well over 100 SQL Saturdays last year, with a 500+% increase in events in the EMEA region. He briefly announces a BA day coming up in Chicago next year.
Next up, Denise McInerney provides an update on PASS branding. It looks like PASS has a new logo! Also there is a new website in development, set to be released next year. Please, please, please – move it away from DotNetNuke and onto a more modern architecture. She also announces that the PASS Summit is returning to Seattle again next year.
One serious bit of bad news – next year’s PASS Summit starts on Halloween. For some, this won’t be an issue. However, for parents with younger kids this is a dealbreaker. I know the die has already been cast and this is not changeable, but I think the scheduling of the summit to start on Halloween will be a costly mistake.
By far the most anticipated part of the conference – David DeWitt! Always a great presenter, he starts with a few self-deprecating remarks about his retirement and subsequent un-retirement from PASS keynotes.
He starts off by talking about the reasons for moving a data warehouse to the cloud. He reminds us how much this space has evolved in just a few years – not long ago there were really only two options. Now there are many to choose from.
David breaks down the architecture of scalable DW designs in the cloud, including shared-nothing and shared-storage. He reminds us that APS and Azure SQLDW are effectively the same architecture – one on prem, the other in the cloud. He also breaks down table partitioning, in which each table is sharded out to multiple storage nodes. There is more than one way to split out the data, including round-robin and hash partitioning. The former is nondeterministic, so you can’t necessarily control where each row physically resides. The latter allows sharding data based on specified key column(s), but this runs the risk of having disproportionate storage between the nodes if not split properly.
David talks through how the failure of a node impacts availability and performance. With table partitioning in use, the loss of a single node in a shared-nothing architecture keeps the data available, but the surviving node with its copy of the data will take a serious performance hit. A better design involves partitioning the alternative copy of the data across multiple nodes – effectively sharding the replica of the data across multiple nodes.
For his last segment, David compares some of the architectures of competing products. He starts with Amazon Redshift, which I’ve never used but is on my short list of things to tinker with. I doubt if I could do his analysis justice in this blog, but a brief summary is that it uses a shared-nothing approach with replicated tables, with a twist: all of the data is backed up in real time in S3 storage, and the backup can be used as a source of data in the event of a node failure. He shares about Showflake (a relatively new cloud DW provider), which uses multiple files to store data in a nondeterministic way. Data is stored in S3 and is “aggressively cached but not permanently stored” in the EC2 instances. Finally, he talks Azure SQL DW. We are reminded that storage and compute can be scaled up or down separately, allowing you to pay for what you use (which Redshift does not). SQLDW does not scale in terms of the number of nodes, instead using the data warehouse unit (DWU) as an abstracted way to dial up or down performance. He summarizes by opining that SQLDW is the best solution on the market for scalable cloud data warehouse architectures.