Search Results for: dqs

DQS Composite Domains and Value Combinations

As I’ve been working with Data Quality Services over the past few weeks, I’ve spent a lot of time working with data domains, composite domains, and rules.  In that exploration, I’ve found some behavior that might not be expected when performing cleansing operations against a knowledge base containing a composite domain.

In this post, I’ll outline the expected data cleansing behavior for composite domain value combinations, and will show how the actual result is not what one would expect in this case.  I’ll also briefly describe a couple of workarounds to address this issue.

Overview

Here’s the layout of the issue at hand.  Composite domains can be created in knowledge bases in DQS, and encompass two or more existing organic domains within the same knowledge base.  Those composite domains can then be leveraged in a data cleansing project; if you engage all of the included domains that are part of a composite, that composite domain will automatically be included as part of the cleansing operation.  Now from here a reasonable person (and by “a reasonable person,” I mean me) could assume that if the composite domain is used as part of the cleansing operation, that it would perform the cleansing operation across the product of the composite domain rather than just the individual domains therein.  However, my experimentation has found otherwise.

Make sense? Don’t worry – if I lost you in the problem description, I think a simple example should bring it back into focus.

Example

I’ve been using automotive data for a lot of my recent DQS samples, so we’ll stick with that for now.  I’ve got a reference set of data with (mostly) valid automobile data that I’m using to build a DQS knowledge base through the knowledge discovery activity.  Included in the reference data are automobiles of various make and model, among them the Chevrolet Camaro and several flavors of Ford automobile (we’ll get back to these specifics in a second).  When I import this information through knowledge discovery, it renders both Ford and Chevrolet as valid automobile makes, and the Camaro is present as a valid model of automobile.

image

Now, I want to create an association between make and model, since model is mostly dependent on make.  I create a new composite domain in my knowledge base, and use the combination of Make and Model domains to build this new composite domain.

image

With that done, I’ll republish the knowledge base, and we’re good to go.  Next, I’ll create a DQS cleansing project that will leverage the knowledge base we’ve built with this automobile data.  I’m going to use a smaller and dirtier set of data to run through the cleansing process.  This data will also bring to light a counterexample of the expected behavior of the composite domain.

When I wire up the table containing the dirty data to the new cleansing project, I get the option of including the composite domain since I’m leveraging both of the elements of that composite domain against the data to be cleansed.  By clicking the View/Select Composite Domain button I can see that the Make and Model composite domain is used by default.

SNAGHTML748b8ba

Before I run the cleansing operation on this DQS project, let’s peek at the data we’ll be cleansing in this new project:

image

You’ll see that I called out a particular entry, and it’s probably clear why I referenced the Camaro earlier.  In our dirty data we have a Ford (valid make) Camaro (valid model), but there’s no such thing as a Ford Camaro in production or in our knowledge base.  When the make and model domains are individually verified, this record would be expected to go through the cleansing process with no changes at all.  However, because we’ve got a composite domain set up to group together the make and model, I expect this to fall out as a new entry (rather than a match to something existing in the knowledge base) since our KB does not have the Make and Model combination of Ford Camaro.

However, when I run the cleansing operation and review the results, what I find is not what I expected:

image

Under the Make and Model composite domain results (notice the individual Make and Model domains are not present, since we’ve engaged the composite domain), I find that the incorrect Ford Camaro entry is shown, but instead of showing up under the New tab, it instead surfaces in the Correct tab indicating that the value is already present in the knowledge base.  Given that the displayed reason indicates a “Domain value” match, this seems to indicate that, despite the use of the composite domain, the individual domains are instead being used for aligning the cleansed data with the information in the knowledge base.

Workarounds?

Ideally, what we’d see is the Ford Camaro entry pushed to the New tab since there is no such combination in the KB.  However, there are a few limited options to work around this.

First, you could create a separate field containing the entire make and model combination in your source data, and perform the Make + Model validation against the single field.  This is probably the most realistic workaround as it doesn’t require a lot of static rules.  However, it still means that you will likely need to reengineer the way you stage the data.  It’s a generally accepted practice to store data elements as atomic units, and building a Make + Model field limits your options or forces you to undo that same operation later in the ETL.

You also have the option to create rules against your composite domains to set if/then scenarios for data validation.  For example, you could create a rule that dictates that if the car is a Camaro, the make must be Chevrolet.  However, unless the cardinality of your data is very, very low, don’t do this.  Creating static rules to deal with data semantics is like pulling at a loose thread on a sweater: you’ll never find the end of it, and it’ll just make a mess in the meantime.

Resolution

I’d like to see this behavior fixed, as I think it will lead to confusion and a lot of extra work on the part of data quality and ETL professionals.  I’ve created a Connect bug report to address this behavior, and I’m hopeful that we’ll see a change in this behavior in a future update or service pack.  Feel free to add your vote or comments to the Connect item if you think the change I describe would be useful.

Conclusion

In this post, I’ve highlighted the unexpected behavior of composite domains in data cleansing operations, along with a few workarounds to help you get past this issue.  As always, comments and alternative suggestions are welcome!

DQS Validation Rules on Composite Domains

In Data Quality Services, composite domains can be created to associate together two or more natural domains within a knowledge base.  Like natural domains, composite domains can also contain one or more validation rules to govern which domain values are valid.  In my last post, I discussed the use of validation rules against natural domains.  In this post, I’ll continue the thread by covering the essentials of composite domain rules and demonstrating how these can be used to create relationships between data domains.

What is a composite domain?

Before we break off into discussing the relationships between member domains of a composite domain, we’ll first touch on the essentials of the latter.

Simply, a composite domain is a wrapper for two or more organic domains in a knowledge base.  Think of a composite domain as a virtual collection of dissimilar yet related properties.  As best I can tell, the composite domain is not materialized in the DQS knowledge base, but is simply a meta wrapper pointing to the underlying values.

To demonstrate, I’ve created a knowledge base using a list of automobile makes and models, along with a few other properties (car type and seating capacity).  I should be able to derive a loose association between automobile type and seating capacity, so I’ll create a composite domain with those two domains as shown below.

image

As shown above, creating a composite domain requires nothing more than selecting two or more domains from an existing knowledge base.  After the composite domain has been created, your configuration options are generally limited to attaching the composite domain to a reference data provider (which I’ll cover in a future post) and adding composite domain rules.

Value association via composite domain rules

The most straightforward way to associate the values of a composite domain is to create one or more rules against that composite domain.  When created against a composite domain, you can use rules to declare if/then scenarios to describe allowable combinations therein.

Back in the day, before marriage, kids, and a mortgage, I used to drive sports cars.  Even though that was a long time ago, I do remember a few things about that type of automobile: they are fast, expensive to insure, and don’t have a lot of passenger capacity.  It’s on that last point that we’ll focus our data quality efforts for now.  I want to make sure that some sneaky manufacturer doesn’t falsely identify as a sports car some big and roomy 4-door sedan.  Therefore, I’m going to create a rule that will restrict the valid domain values for seating capacity for sports cars.

I’ll start with some business assumptions.  What’s the minimum number of seats a sports car should have?  I think it’s probably 2, but I suppose if some enterprising gearhead decided to convert an Indy Car into a street-legal machine, it would likely be classified as a sports car too.  Therefore, it would be reasonable to assume that, in an edge case, a sports car could have just a single seat, so our minimum seating capacity for a sports car would be 1.  On the high side, design of sports cars should dictate that there aren’t many seats.  For example, the Chevrolet Camaro I had in high school could seat 4 people, assuming that 2 of the people were small children with stunted growth who had no claustrophobic tendencies.  However, we can give a little on this issue and assume that they somehow manage to squeeze a third rows of seats into a Dodge Magnum, so we’ll say that a sports car can have a maximum seating capacity of 6 people.

Now, with that information in hand, I’m going to use the Domain Management component of the DQS client to set up the new rule against the “Type and Capacity” composite domain from above.  As shown below, I can set value-specific constraints on the seating capacity based on the automobile type of Sports Car.

image

As shown, any valid record with a car type of Sports Car must have a seating capacity of between 1 and 6 persons.

Of course, sports cars aren’t the only types of automobiles (gasp!), so this approach would likely involve multiple rules.  Fortunately, composite domains allow for many such rules, which would permit the creation of additional restrictions for other automobile types.  You could also expand the Sports Car rule and add more values on the left side of the operator (the if side of the equation).  For example, you might call this instead a “Small Car rule” and include both sports cars and compact cars in this seating capacity restriction.

Other uses

Although we’ve limited our exploration to simply interrogating the value of the natural domains within a composite domain, this is by no means our only option for validation.  For example, when dealing with string data you can inspect the length of the string, search for patterns, use regular expressions, and test for an empty string in addition to checking against the actual value.  Shown below are some of the options you can use to query against a string value in a domain rule.

image

When dealing with date or numerical data, you have the expected comparison operators including less than, greater than, less than or equal to, etc.

Conclusion

This post has briefly explored composite domains and shown how to add validation rules to a composite domain in an existing knowledge base.  In my next DQS post, I’ll continue with composite domains to illustrate a potential misunderstanding in the way composite domains treat value combinations in cleansing operations.

DQS Domain Validation Rules

dqsA compelling feature of the new Data Quality Services in SQL Server 2012 is the ability to apply rules to fields (domains) to describe what makes up a valid value.  In this brief post, I’d like to walk through the concepts of domain validation and demonstrate how this can be implemented in DQS.

Domain validation essentials

Let’s ponder domain validation by way of a concrete example.  Consider the concept of age: it’s typically expressed in discrete, non-negative whole numbers.  However, the expected values of the ages of things will vary greatly depending on the context.  An age of 10 years seems reasonable for a building, but sounds ridiculous when describing fossilized remains.  A date of “1/1/1950” is a valid date and would be appropriate for classifying a person’s date of birth, but would be out of context if describing when a server was last restarted.  In a nutshell, the purpose of domain validation is to allow context-specific rules to provide reasonableness checks on the data.

A typical first step in data validation would involve answering the following questions:

  • Is the data of the right type?  This helps us to eliminate values such as the number “purple” and the date “3.14159”.
  • Does the data have the right precision? This is similar to the point above: If I’m expecting to store the cost of goods at a retail store, I’m probably not going to configure the downstream elements to store a value of $100 million for a single item.
  • Is the data present where required?  When expressing address data, the first line of an address might be required while a second line could be optional.

Domain validation goes one step further by answering the question, “Is a given value valid when used in this context?”  It takes otherwise valid data and validates it to be sure it fits the scenario in play.

Domain validation in DQS

Even if you don’t use this term to describe it, you’re probably already doing some sort of domain validation as part of your ETL or data maintenance routines.  Every well-designed ETL system has some measure sanity check to make sure data fits semantically as well as technically.

The downside to many of these domain validation scenarios is that they can be inconsistent and are usually decentralized.  Perhaps they are implemented at the outer layer of the ETL before data is passed downstream.  Maybe the rules are applied as stored procedures after they are loaded, or even as (yikes!) triggers on the destination tables.

Data Quality Services seeks to remedy the inconsistency and decentralization issue, as well as make the process easier, by way of domain validation rules.  When creating a domain in DQS, you are presented with the option of creating domain rules that govern what constitutes a valid value for that domain.  For the example below, I’m using data for automobile makes and models, and am implementing a domain rule to constrain the value for the number of doors for a given model.

SNAGHTML612bbb0

With the rule created, I can apply one or more conditions to each of the rules.  As shown, I am going to constrain the valid values to lie between 1 and 9 inclusive, which should account for the smallest and largest automobile types (such as limousines and buses).

SNAGHTML6292bf0

For this rule, I’m setting the conditions that the value must be greater than zero or less than ten.  Note that there is no requirement to use this bookend qualification process; you can specify a single qualifier (for example, greater than zero) or have multiple conditions strung together in the same rule.  You can even change the AND qualifier to an OR if the rule should be met if either condition is true – though I would caution you when mixing 3 or more conditions using both AND and OR, as the behavior may not yield what you might expect.

That’s all there is to creating a simple domain validation rule.  Remember that for the condition qualifiers, you can set greater than, less than, greater than/equal to, etc., for the inclusion rule when dealing with numerical or date domain data types.  For string data types, the number of options is even greater, as shown below:

image

Of particular interest here is that you can leverage regular expressions and patterns to look for partial or pattern matches within the string field.  You can also check the string value to see if it can be converted to numeric or date/time.

The rule in action

With the new domain validation rule in place, let’s run some test data through it.  I’m going to create a few test records, some of which violate the rule we just created, and run them through a new DQS project using the knowledge base we modified with this rule.

I’ll start off with the dirty data as shown below.  You can probably infer that we’ve got a few rows that do not comply with the rule we created, on both ends of the value scale:

image

After creating a new data cleansing project, I use the data shown above to test the rule constraining the number of doors.  As shown below in the New output tab, we have several rows that comply with this new rule:

SNAGHTML65f97ec

In addition, there are two distinct values found that do not meet the criteria specified in the new rule.  Selecting the Invalid tab, I see the values 0 and 12 have failed validation, as they fall outside the range specified by the rule.  In the Reason column, you can see that we get feedback indicating that our new rule is the reason that these records are marked as Invalid:

SNAGHTML661229e

So by implementing this rule against my data, I am able to validate not only that the value is present and of the correct type, but that it is reasonable for this scenario.

Conclusion

In this post we’ve reviewed the essentials of domain validation and how we can implement these checks through domain rules in SQL Server Data Quality Services.  In my next post, I’ll continue the discussion around domain rules by reviewing how these rules can be applied to composite domains in DQS.

SQL PASS 2014 Summit Diary – Day 6

Today is the last official day of the PASS Summit.  The sessions will wrap up at the end of the day, and we’ll all go our separate ways and resume our semi-normal lives.  Having delivered my presentation yesterday, my official PASS duties are over, and I’m planning to spend the day taking in a few sessions and networking.

IMG_694808:15am: No keynote today, so the sessions are starting first thing in the morning.  I’m sitting in on a Power BI session delivered by my friend Adam Saxton.  He’s an excellent and knowledgeable presenter, and I always enjoy attending his presentations.  For Power BI, this has been one piece of the Microsoft BI stack that I have largely ignored due to the fact that it runs exclusively in the cloud.  However, I’d like to get up to speed on the cloud BI offerings – even though the on-premises solutions will continue to represent the overwhelming majority of business intelligence initiatives (in terms of data volume as well as Microsoft revenue), I expect to be fluent in all of the Microsoft BI offerings, whether “earthed” or cloud-based.

11:00am: After stopping by the Linchpin booth again, I sit down in the PASS Community Zone.  And by sit down, I mean that I collapse, exhausted, into one of the bean bags.  I spent some time chatting with Pat Wright, Doug Purnell, and others, and met up with Julie Smith and Brian Davis to talk about a project we’re working on together (more on that later).

11:45am: Lunch.  Today is the Birds of a Feather lunch, in which each table is assigned a particular SQL Server-related topic for discussion.  I headed over with my Colorado buddies Russ Thomas and Matt Scardino to the DQS/MDS table, at which only two other folks were sitting (one of whom worked for Microsoft).  We had a nice chat about DQS and data quality in general.  I have to admit a bit of frustration with the lack of updates in DQS in the last release of SQL Server.  I still firmly believe that the core of DQS is solid and would be heavily used if only the deficiencies in the interface (or the absence of a publicly documented API) were addressed.

02:45pm: I don’t know why, but I want to take a certification exam.  The PASS Summit organizers have arranged for an onsite testing center, and they are offering half price for exams this week for attendees of the summit.  I registered for the 70-463 DW exam, and after sweating through the MDS and column store questions, I squeaked through the exam with a passing score.  I’m not a huge advocate for Microsoft certification exams – I find that many of the questions asked are not relevant in real-world scenarios, they are too easy to cheat, and I’m still very skeptical of Microsoft’s commitment to the education track as a whole after they abruptly and mercilessly killed the MCM program (via email, under cover of darkness on a holiday weekend, no less) – so I’m likely not jumping back into a full-blown pursuit of Microsoft certification any time soon.  Still, it was somewhat satisfying to take and pass the test without prep.

04:00pm: Back in the community zone.  Lots of folks saying their good-byes, others who are staying the night are making plans for later in the evening.  For me?  I’ve been craving some seafood from the Crab Pot all week, and I find 6 willing participants to join me.  I’m also planning a return trip to the Seattle Underground Tour.  For the record, I love having this community zone, and I particularly dig it right here on the walkway – it’s a visible, high-traffic location, and it’s been full of people every time I’ve come by.

06:30pm: An all-out assault on the crab population has commenced.  And by the way, our group of 6 became 12, which became 15, which became 20-something (and still growing).  Our poor waiter is frazzled.  I told him we’ll be back next October, in case he wants to take that week off.

image08:00pm: Seattle Underground tour.  I did this a couple of years ago with a smaller group, and it was a lot of fun.  This year, we’ve got 15 or so PASS Summit attendees here, and we get a really good tour guide this time.

09:45pm: My friend from down under, Rob Farley, turns 40 today, and about a hundred of us stop by his birthday party.

10:30pm: This may be the earliest I have ever retired on the last night of any summit.  I’m just exhausted.  I do some minimal packing and prep for tomorrow morning and crash for the evening.

Apart from any last-minute goodbyes at the airport tomorrow, the SQL PASS 2014 Summit is over for me.  Without a doubt, this was the best, most fulfilling, most thoroughly exhausting summit experience I’ve had in my seven years of attendance.  I’m sad to be leaving, but couldn’t feel more satisfied.

Resources

Presentation Materials

Download the slide decks and code samples from my recent presentations:
Next Level SSIS series – North Texas SQL Server User Group (June-Aug 2014)
SSIS Scripting – SQL Saturday #271 Albuquerque (01/25/2014)
Data Cleansing in SSIS – SQL PASS Summit 2013 (10/16/2013)
Maximizing SSIS Package Performance – SQL Saturday #232 Orlando (9/14/2013)
Handling Errors and Anomalies in SSIS – SQL Saturday #223 OKC (8/24/2013)
Scripting and SSIS – Linchpin People presentation w/ Andy Leonard (May 2013)
Real World SSIS – Full day precon at SQLBits 11 (May 2013)
Cleaning up Dirty Data with SSIS – SQLBits 11 (May 2013)
When ETL Goes Bad: Handling Errors and Anomalies in SSIS – Colorado user group tour (March 2013)
When ETL Goes Bad: Handling Errors and Anomalies in SSIS
Real World SSIS – SQL Saturday Dallas precon (10/12/2012)
Top 10 New Features of SSIS 2012 – SQL Saturday #125 OKC (8/26/2012)
Introduction to Data Quality Services – SQL Rally Dallas (5/10/2012)
Top 10 New Features of SSIS in SQL 2012 – SQL Saturday #107 Houston (4/21/2012)
Parent/Child Structures in SSIS – SQL Saturday #107 Houston (4/21/2012)
Defensive ETL – SQL Connections Las Vegas (11/3/2011)
Top 10 SSIS Best Practices – SQL Connections Las Vegas (11/2/2011)
ETL Head-To-Head: T-SQL vs. SSIS – SQL Saturday #90 OKC (08/27/2011)
Business Intelligence Drive-By: A Quick Tour of Microsoft BI – SQL Saturday #90 OKC (08/27/2011)
ETL Head-To-Head: T-SQL vs. SSIS – SQL Rally Orlando (5/12/2011)
Dirty Data? Clean It Up! – Colorado SQL Server User Group tour (Feb 2011)

Please note that all code is provided as-is with no expressed or implied warranty. Use at your own risk.

Stuff I read

Data Loading Performance Guide – Loading lots of data into SQL Server?  You need to read this.

Tools

Konesans.com produces a great Trash Destination and Data Generator for SSIS, among other tools
BIDS Helper is a Visual Studio add-in with some nice features for SSIS.
Notepad++ is Windows Notepad on steriods. ‘Nuff said.
Here’s a listing of a ton of free SQL Server utilities. Enjoy!
ConnectionStrings.com – The place to find any connection string you will ever need.
SSIS to SQL Server data type translation

SQL PASS Summit 2012 in Review

I’m back home after a long week attending and presenting at the SQL PASS Summit in Seattle.  This was the best event yet, in my opinion, and for me it was certainly the busiest.  For the second year in a row, our SSIS Design Patterns team was invited to deliver a full-day preconference seminar before the Summit.  Unlike last year, though, this year we have all five members of the author team!  I also delivered two regular sessions: one on using DQS in the enterprise, and the other on how to handle errors and data anomalies in SSIS.

Saturday

Saturday started off for me in Portland, Oregon, where I spoke at SQL Saturday #172.  I left that event just after the closing remarks and raffle (where I won a Kindle Fire HD – thanks Confio!) with Russ Loski and Karla Landrum to make the 3-hour drive from Portland to Seattle.  We rolled in late, maybe 10:30pm or so, and Twitter didn’t find much going on so I popped into Elephant and Castle for a snack while I reviewed my slide deck for Monday’s precon.

8155960043_217ee3e173_zSunday

My first #sqlfamily sighting in Seattle was none other than my good friend Andy Leonard, and shortly thereafter I met up with Michelle Ufford at breakfast.  After breakfast, Andy, Michelle, Matt Masson, Jessica Moss and I met up at the convention center to rehearse our material for the precon the following day.  Had a great day of prep work, minor revisions, and a good deal of laughter.

Later, I stopped by the registration area and officially checked in, and greeted a lot of familiar faces.  About 20 of us got together for dinner at Cheesecake Factory, followed by a brief visit to the Tap House.  Since the next day would involve an early wake-up call, I retired by 9:30 to give my material one last glance.

Monday

I met up with Andy for a quick breakfast before getting wired up for our precon presentation.  We kicked off at 0830 with a crowd of about 110 or so.  It was fun to see a couple of folks in the audience who were also at our precon last year (shout out to Bill Fellows and Aaron Lowe).  We quickly found a good cadence of trading off between the five of us presenting, each person leading the discussion for about 75 minutes.  I shared design patterns around error handling, scripting, and data warehousing.  It was fun to interact with the audience (along with others not in attendance) via Twitter throughout the entire day, through which we were able to share links to supplemental information about the current topic as the day went along.  And on the topic of Twitter, my friend Andy offered me a valuable lesson about leaving my computer unlocked (and I shall have my revenge when you least expect it, good sir).

Tuesday

On Tuesday, I got to spend the day hanging out with a bunch of my favorite folks talking about SQL Server.  What could be better?  Later, I went by the PASS volunteer appreciation event to shake a few hands, followed by an after-hours trip to that karaoke favorite, Bush Garden.

Wednesday

I stopped by the keynote for a bit to listen to Bill Graziano’s opening remarks.  I stepped out to meet my friend Steve Jones for coffee, and ended up meeting up with about a dozen other folks as we chatted.  I caught up with a couple of my Artis Consulting colleagues, and was off to take part in a book signing at the PASS bookstore.  We got to chat with a few dozen folks (and managed to sell out all of the copies of our book at the bookstore!) in the hour or so we were there.

I sat in on the session by Matthew Roche and Matt Masson on enterprise information management (EIM).  I was particularly interested in the DQS portion of the presentation, and I was glad to learn a couple of new things about the product.  Later, I stopped by the SQL Clinic to say hello to some of the folks from Microsoft.  The vendor reception followed next door, where I did a little more networking and perhaps even some recruiting.

Wednesday was also #sqlkaraoke day, with not one but two sponsored karaoke events:  First up was the Pragmatic Works/Microsoft/HP event at the Hard Rock. These sponsors rented out the Hard Rock for some live band karaoke.  Yep – a real band, not just canned tracks!  This event was a ton of fun.  I didn’t count, but there must have been at least 600 people there.  I even got to belt out some Garth Brooks.  Later I went out to the second sponsored karaoke event, at our favorite little hole in the wall Bush Gardens.  It was a much lighter crowd there, maybe 75 people or so – enough to make it fun to socialize but still giving everyone who wanted to sing the opportunity to do so.

Thursday

Since I was presenting on Thursday, I spent the morning going through my presentation materials again.  I delivered my DQS session after lunch, to a crowd of about 40-50 people.  Good discussion around the different moving parts of the product, though I was surprised that of the entire room, only one person was actually using DQS.  There were a couple of folks from the product team in my session, which I appreciated because I misspoke on the technical behavior of one of the elements of DQS, and they were able to set the record straight so I didn’t send folks off with the wrong message.

For the final session of the day, I sat in on the BI Power Hour.  This was easily the most entertaining session I attended.  Presented by a host of Microsoft rockstars, this 90 minute session offered a fun look at some uses of the Microsoft BI stack, PowerPivot, and related tools.  They had a standing-room-only crowd, and although their presentations were intended to be a little silly, I think they did a good job of showing some unconventional uses of those tools.

After the last session, I went to the Friends of Red Gate dinner at Fare Start restaurant.  This was the second year in a row that Red Gate has held the dinner at this location, which is a nonprofit training and placement organization aimed at homeless or otherwise disadvantaged individuals.  The food was outstanding and the company even better.  Later I stopped by the EMP Museum for the community appreciation event.  Since karaoke has been the theme of evening events, there was another live band karaoke for this fiesta as well.  I stayed for just a bit since I wanted to retire early and rehearse my demos again for Friday’s morning presentation.  Back in the room by 11pm or so, I rehearsed until about 1am until I completely ran out of fuel.

Friday

It’s time to talk about SSIS!  I’ve been working on my error handling presentation for quite a while, so I’ve really been looking forward to this.  I was a little concerned since it was a Friday morning after an apparently late night (at least according to the Twitter stream), and my presentation was scheduled for the same time as one given by the wildly popular Dr. David Dewitt.  However, I was pleasantly surprised by a roomful of folks ready to hear about SSIS error handling: the room monitor counted 177 people in attendance.  Lots of great questions and discussion both during and after the presentation – in fact, I was there for over 30 minutes after the presentation just chatting and answering questions.

8176663000_e35115555bAt lunchtime, I wandered down to the market to take a few pictures and watch them throw the fish around.  In the afternoon I took in a couple of sessions, one on DQS and another on slowly changing dimensions.  I skipped the last session of the day to stop into the PASS Community Zone, a very cool concept that just wasn’t very well publicized.

I had a craving for some fresh seafood, so I invited folks down to the Crab Pot for some crustaceans.  We had maybe 20 people altogether, though we did get split up since there was an early group and a late group.  After dinner, several of us went on the Seattle underground tour, a 75-minute walking tour of some of the basements and underground walkways beneath Seattle’s city streets.  Thanks to my friend Dave Stein for hooking us up with this!  It was a lot of fun and a good way to wind down.

8176709260_12cc735de2After the tour, a few of us headed over for one last trip to Bush Garden, where 30 or so SQL folks were already gathered.  I stayed for an hour or so, but fatigue got the best of me so I cabbed it back over (thanks to Randy Knight for taking care of our ride) and called it a night before midnight.

Saturday

Travel day for me.  I headed over to the rail station early, only to be greeted by a locked door with a message that the train didn’t start running until 0830 on Saturdays.  As it turned out, I was at the monorail station when I should have instead gone across the street to the Link light rail station.  Inattention cost me about 30 minutes of standing in the cold, but on the upside, I met up with Ryan Adams and Michael Swart and got to chat with them on the 40 minute ride back to the airport.  Ryan, Adam Saxton, and I ended up on the same flight back home, so we got to chat while we waited for our bird.

One final parting gift was that my checked bag didn’t arrive on my flight.  The man at the AA lost luggage counter was very helpful, tracking my back to a later flight and arranging delivery to my home for later that evening (it arrived about 11pm).  Oddly enough, for as much as I’ve traveled, this was my first experience with a lost bag.  I suppose it was as good as any time to have one, since the bag had mostly dirty clothes and I didn’t immediately need anything in there.

Summary

It was an incredible week.  As expected, I (along with many others) operated on far too little sleep and way too much coffee for the entire time.  However, it was well worth the time, travel, and exhaustion to spend time with some of the smartest people on the planet.  I look forward all year to this trip, and this year’s SQL PASS summit did not disappoint.

Update: I have published the photos I took during the week on my Flickr stream.  You can find them here.

Join me at the PASS Summit

button_event_summit2012It seems just yesterday that we were all in Seattle together, getting a crash course in what’s new and exciting in SQL Server.  It couldn’t possible have been 8 months since we were attending after-party events together, stopping by the Tap House for a beer, or closing down Bush Gardens with some spectacular SQL Karaoke.  Yet here were are, getting in the swing of conference season and preparing for this year’s SQL PASS Summit to be held again in Seattle, Washington.

I’m presenting

I’m happy to announce that I’ll be delivering three presentations at this year’s November summit: I was selected for two regular sessions and one preconference seminar.

My two regular sessions are Data Quality Services in the Enterprise and When ETL Goes Bad: Handling Errors and Data Anomalies in SSIS.  I’ve delivered some content on DQS before, and I’m looking forward to expanding this presentation for a larger crowd and longer timeframe.  I’ve been mentally putting together the error/anomaly presentation for some time, and I’m excited about sharing my experience in this area.

The precon I’m delivering is actually a platoon effort among my fellow authors on the SSIS Design Patterns book (which will be released next month).  This full day precon will be in the same vein as the book; we will be sharing from our own experiences some best practices, tips and tricks, and horror stories. 

Andy Leonard, Matt Masson, and I did a version of this precon last year.  This year’s version will not only be updated for SQL 2012, but will also feature Jessica Moss and Michelle Ufford to round out the entire author team.

“We are family….”

The annual PASS Summit is the one event I look forward to more than any other.  I look forward to the deep and broad content on SQL Server, but just as importantly, I enjoy getting the opportunity to network with the SQL Community.  If you’ve never been to a PASS event, or if you’ve attended before but didn’t get engaged in the community, I’d encourage you to come back and get to know us.

SQL Rally Slide Deck and Photos

I’m working on a more comprehensive review of last week’s SQL Rally event, but I’d like to go ahead and share my slide deck and photos from the event.

For those who attended my Data Quality Services session on Thursday, thanks so much for coming.  I had 100 or so in attendance, and a lot of good questions and discussion on this topic.  You can download the slide deck here.

If you saw me at the event, you know that I didn’t go anywhere without my camera Smile  I have a few hundred pictures from the event that I’ve loaded onto my Flickr site.  You can view or download those pictures here.

SQL Rally–1 week out

speaking_rallyIt’s hard to believe that after all the hard work, planning, and prep, that SQL Rally Dallas is just a week away! This time next week the conference will be in full swing for Day 1 of the regular sessions. The preconference seminars actually start on Tuesday, so it’s going to be a full week of learning’, Texas-style!

For my part, I’m going to be delivering a presentation on Thursday. I’ll be talking about SQL Server Data Quality Services, one of the new features of SQL Server 2012 that I’m really excited about. This one is designed for kids of all ages – whether you’ve never touched DQS or have been playing around with it for a bit, you’ll get something from this intro session.

“I’ll be there!”

If you’re already registered, great! Be sure to stop by my session and say hello – I’d be happy to meet both people who read my blog.

Remember that there are lots of networking opportunities to go along with the sessions and precons. If you love the night life and/or like to boogie, we’ll be having meetups at the Uptown Bar and Grill on Wednesday and Thursday nights. (Edit: The Wednesday night event is still on, but the location has been changed to the WXYZ Bar inside the Aloft hotel.) On Thursday, they’ll have karaoke for those of you who have a good voice or a high threshold for embarrassment. On Friday morning (if you don’t sleep in from the late night singing), meet up with me, Andy Warren, Sri Sridharan, and others at the convention center for coffee and chat. There’s even talk of an unofficial meetup near the convention center on Friday night to watch the broadcast of the Rangers pounding the Angels.

If you are registered for the 2-day conference but haven’t committed to a precon, I would encourage you to give them another look. Each one of the 7 preconference seminars is a full day deep dive into a single subject, delivered by presenters who are experts in their fields. Whether you want to learn about DBA topics, SSAS, or professional development, there’s a good chance you’ll find a good fit. Remember, these run on Tuesday and Wednesday, so you could even take in 2 of the precons. The full-day precons are priced at $219 each, which is an excellent bargain given the quality of the education you’ll get.

“I’m still not sure…”

I hear you. Perhaps work is busy and there’s nobody to take the slack if you are gone for two or three days. Maybe the boss says he won’t pay for it. It could be that you’re afraid that the material will be over your head, or that perhaps you won’t know anybody. Getting away from the office and out of the comfort zone is taxing, no doubt.

But I’d like to submit to you that your career is worth it. At SQL Rally, you’re going to be surrounded by 500-600 people who are a lot like you – problem solvers who want to learn. Every demographic and skill level will be represented, from those just starting out to experts with decades of experience. You’ll get the opportunity to talk shop and compare challenges with hundreds of other database professionals, which is an experience you can’t get from a book or online course. You’ll get to meet and chat with authors, MVPs, MCMs, IT business owners, and other folks who have a lot to teach (and I promise you that they’re just regular people, and most of them truly enjoy getting to know fellow professionals).

In addition to the educational benefits, the networking opportunities are probably the most significant element of Rally. If you’ve ever looked for a job, tried to hire someone, or needed a partner to help solve a problem, you quickly realize the need for networking especially in a wired world. There’s simply no replacement for knowing people in this business. I shared with you in my last Rally blog post about how I came into the job that I have now purely because of the networking contacts I’d made at PASS events. Build your network – one day, you’ll be glad you did.

So if you’re not already registered, I’d encourage you to do what you need to do to be a part of this event. Beg the boss (give him/her this), skip the double-foam-extra-wheat-skinny-caramel-mochas this month, pull an extra shift or two, whatever it takes – it’s an investment, but you won’t be disappointed.

Shameless Plug: Vote For My Sessions!

Have you voted yet for my sessions your favorite sessions for SQL Rally?  If not, this is a friendly reminder to hop on over and vote now.  You have to have a PASS login to vote, but don’t worry – it’s free to get one and takes about 2 minutes.

Out of the 60 total sessions to be presented at the event, 20 of those will be chosen by popular vote.  To avoid potential conflict of interest, those of us who were part of “the crew” for evaluating and selecting SQL Rally sessions were not eligible to be selected as part of the general selection. Therefore, both of the sessions that I submitted are up for vote in the community voting process.

I’ve submitted two sessions for this event:

Getting Started with the EzAPI for SSIS

The SQL Server Data Tools (formerly BIDS) environment is a very capable platform for developing SQL Server Integration Services packages.  However, there are times when ETL needs require more flexibility and dynamic behavior than what SSDT provides.  In this session, we will discuss the EzAPI for SSIS, which is a framework for dynamically altering package elements at runtime.  We’ll briefly review the capabilities exposed in the EzAPI, and will walk through a few practical examples of using this framework.

Introduction to Data Quality Services

politicianIn this session, we will take a quick tour of the new data quality tool released with SQL Server 2012. With SQL Server Data Quality Services, data professionals now have an easy-to-use framework with which they can analyze and maintain data quality. This session will serve as an introduction to this new product – we will discuss DQS concepts and architecture, review the server and client components of DQS, and will demonstrate the DQS component for SSIS.

I Want Your Vote

So if you want to see me present one or both of these sessions (and you know that you do), get out and vote!  I hope to see you there.