SQL Saturday 223 OKC

okc This past weekend I made the relatively short (about 3 hours) trip to the Oklahoma City area to attend and speak at the third annual SQL Saturday event there.  I’ve been invited to speak at each of their three events, and I’ve been impressed with each one of them so far.  The folks in the OKC and surrounding groups have an excellent community, and the leadership team did a great job again of putting on a well-organized event.

sqlsat223_TimAndRyan This weekend was a first for me in that I brought along my oldest son Ryan to the event.  At age 9, he’s already showing highly analytical tendencies, and I’m trying to expose him to more technology-related happenings.  He was very excited to attend the event – I was out of town with a client all week, and when I called him each night last week, the upcoming trip was always one of the first things he wanted to talk about.  During the event he did get bored frequently, but that’s to be expected when deeply technical content and a short attention span intersect.  He still enjoyed the event, but it was more about doing something different and getting to see a bit of what I do for a living (he sat on the front row during my presentation) that made it fun for him.  I think it was a great experience for him, and I’m hoping to bring him along to some more events in the future.

For my part in the event, I presented “Handling Errors and Data Anomalies in SSIS”.  Had a moderate sized group, maybe 35-40 folks, with some good discussion during and after the presentation.  For those who attended and are interested in the presentation materials, you can download them here.

Thanks again to the OKC-area folks who organized this event.  I look forward to coming back again next year.

My event schedule for Fall 2013

As I was browsing my calendar of upcoming events for the remainder of the summer and the fall, I realized two things:

  1. This will likely be the busiest, most travel-heavy three months I’ve ever had, and
  2. I’m going to have to buy my wife something really nice by the time I finish my travels.

Between client travel and community events, I get to spend some quality time with my friends at American Airlines.  But even better, I get to hang out with some of you fine folks!

Among the events at which I’ll be speaking:

SQL Saturday Oklahoma City (Saturday, August 24).  I’ll be presenting “Handling Errors and Data Anomalies in SSIS” while doing my best not to make jokes about Oklahoma.

SQL Saturday Orlando (Friday, September 13 – Saturday, September 14).  On Friday, I’ll be presenting a full day preconference seminar entitled “Real World SSIS: A Survival Guide”.  Since this will be held on Friday the 13th, you can bring your black cat for free with paid admission*.  I’ll be staying for the main event the following day, and will present a brand new presentation entitled “Maximizing SSIS Package Performance”.

SQL Saturday Denver (Friday, September 27 – Saturday, September 28).  I’ll spend Friday with some of you talking again about Real World SSIS, followed by my “Maximizing SSIS Package Performance” presentation on Saturday.  Somehow this lineup seems strangely familiar.

SQL Connections (Monday, September 30 – Thursday, October 3).  I get to present two sessions at the national SQL Connections event in Las Vegas.  I’ll be again delivering “Maximizing SSIS Package Performance” (this is turning out to be a popular selection) along with “Cleaning Up Dirty Data in SSIS”.  Both of my sessions are on Tuesday, which means I’ve got three unallocated days in Las Vegas.  Is there anything fun to do in Vegas for three days?

Biml Workshop (Tuesday, October 15).  I’ve been spending a lot of time using Biml lately, and I’ve finding it to be a centerpiece in my toolset these days.  In this full day workshop, I’ll be presenting with Andy Leonard and Scott Currie to demonstrate how Biml can be used as a tool for rapid development and easy maintenance of SSIS packages.

SQL PASS Summit (Tuesday, October 15 – Friday, October 18).  The SQL PASS Summit remains my favorite SQL Server event, and I get the opportunity to present for the third year in a row.  I’ll be delivering a session entitled “Data Cleansing in SSIS”.  I’m also planning an after-hours session entitled “How to Embarrass Yourself at Karaoke”, which may or may not appear in your event guide.

SQL Saturday Tampa (Friday, November 8 ? – Saturday, November 9).  I’ll be delivering a full day precon ahead of the Saturday event (presumably Friday, or perhaps Thursday) as well as one or more sessions at the main event on Saturday.

In addition to the events that have already confirmed, I’ve also submitted for the SQL Saturday BI event in Charlotte just after the PASS Summit, and may also submit to our hometown SQL Saturday event in Arlington, TX in November.

It’s going to be a busy and fun fall travel season.  I’m looking forward to getting to catch up with many in my #SQLFamily during the next several months.

* Not really. Please leave your cats at home, or I’ll be forced to bring my vicious dog to keep them away.

Join me for the 2013 Biml Workshop

Of the many emerging technologies I’m currently using, there’s nothing that excites me more than Biml.  If you’re not familiar with it, Biml (short for Business Intelligence Markup Language) is a language that allows business intelligence developers to programmatically create and document SSIS packages.  I’ve found this to be incredibly useful in environments that have a large number of packages, multiple steps of staging of data, or frequent changes to the ETL pipeline.  Biml helps to eliminate through automation much of the mundane work of package creation and maintenance.

bimlws If you are interested in learning more about Biml, I’d like to invite you to join Andy Leonard, Scott Currie, and me on Tuesday, October 15th in Charlotte, NC for a full day workshop to discuss the awesomeness that is Biml.  Scott is the CEO at Varigence, the company that developed Biml, and Andy was an early adopter and pioneer of Biml.  In this workshop, we’ll review the Biml syntax and basic usage, as well as demonstrate some Biml design patterns we’ve picked up along the way.

The cost for the workshop is $69, which includes breakfast, lunch, and snacks.  For the next 2 days (until tomorrow at midnight EDT), you can still get early bird pricing of $49.

So if you’re in town for the SQL PASS Summit and aren’t planning on attending a Tuesday precon, I’d encourage you to consider joining us for this workshop.  We hope to see you there!

Continue Package Execution After Error in SSIS

When it comes to ETL, I’m a pessimist.  Until proven otherwise, I assume that all data is bad, all connections are volatile, and all transformation logic is suspect.  As such, I spent a lot of time addressing how to prepare for and handle errors in the ETL pipeline with SSIS.  Building packages to expect and properly handle errors is a key component in a well-performing, durable, trustworthy ETL infrastructure.

SQL Server Integration Services has a number of built-in tools for handling errors and other anomalies.  Among these:

  • Event handlers, defined at the package, container, or task level
  • Precedence constraints on the data flow to change the execution path in the event of an error
  • Error outputs on sources and destinations in the data flow to redirect errors on a row-by-row basis
  • Fail the task (and its ancestors), which is the default behavior

All of these options provide a way to handle or otherwise report the error.  But what if your scenario requires that any error is simply ignored?  SSIS can do that, too.  In this post, I’ll show you how to use a particular attribute of event handlers to prevent the propagation of errors in SSIS executions.

Scenario

Continue after error. It's not that hard.

Continue after error. It’s not that hard.

Let’s say that we have an SSIS package that processes a series of flat files using an instance of the ForEach Loop Container.  The expected behavior is that we specify the directory of files to be processed, and the loop container will process the specified file(s) in that directory.  If a file exists in that directory that fails to process – perhaps its security settings do not allow it to be read by the account executing the SSIS package, or the file is improperly formatted – the default behavior is that the loop container would fail, which would fail the package and any parent package that executes it.  This is known as propagation (remember that word), which can be visualized as a “bubbling up” of errors.  In that way, propagation of errors is not unlike the way exceptions will bubble up in C# or VB.NET programming.

Now in this same scenario, let’s assume that our business needs dictate that we don’t have to successfully process all of the files for the ETL load to be considered a success, and having a partial load of data has greater business value than failing the entire load if one file fails to load.  In that case, we’ll want to override the default behavior of our package to allow certain elements to fail without affecting the outcome of the package.

As shown below, I’ve set up a package for our scenario in which we truncate the output staging table, and then loop through a specified directory to process each text file in that directory.

image

Notice that I’ve got precedence constraints set up after the data flow task in the loop container.  This will allow me to log either a success or failure of a load operation on a file-by-file basis.  The data flow task, which will be executed once per file in the source directory, will attempt to process each text file in that directory as shown below.

image

In this updated scenario, I don’t want to allow a failed file load to interrupt the load of the remainder of the files, so I’m going to make a couple of modifications.  First of all, I’ll create an error event handler for the data flow task.

image

You’ll notice that there are no tasks in the error event handler for the data flow task (DFT Load File).  Although you can add some custom logic here to execute upon error, you don’t necessarily need to do so.  However, creating the error event handler will expose a particular setting we’ll need to prevent the propagation of any errors.

While the error event handler shown above is still in view, open the list of SSIS variables.  Note that you’ll also have to set the variables window to show system variables, which are hidden by default.  In the list of variables, scroll down until you find a variable named Propagate.  This Boolean value is the setting that specifies whether errors within a given executable to bubble up to the ancestor tasks, containers, and packages.  To prevent these errors from bubbling up, I’ll change the value of the Propagate variable from True to False.

SNAGHTMLc26f2f4

The net result is that any error in this data flow task will still be shown as an error in this task, but that error will short circuit and will not be passed along to any executables further up the chain.

Final Thought

You’ll notice that in the control flow of this package, I included two different paths from the data flow task – one with the On Success constraint, and the other with the On Failure constraint.  I’ve done this to allow the package to separately log the file-by-file results (success or failure, along with row count).  The reason I’ve shown this is because I want to emphasize that, in most situations, you should be logging any error such as this – even if your package is built not to fail when an error is encountered.  In my opinion, most any operation that it attempted, even if it doesn’t affect the outcome of the package, should be logged – especially in the event of failure.

Conclusion

In this post I’ve demonstrated how the built-in components in SSIS can be used to allow for the continued operation after the failure of one executable in a package.  By modifying the Propagate system variable in the error event handler of a volatile executable, you can prevent the automatic failure of upstream package elements, allowing continued execution after the failure of noncritical operations.

Are you really an expert?

salesmanThrough the course of my career, I’ve spent time on both sides of the job interview table, which has given me an empathy for both job interviewees as well as their interviewers.  The former wants to put his best foot forward to demonstrate (or at least talk about) his most appealing attributes, while the latter seeks to find the best fit for the position while pitching her employer to qualified candidates.  On both sides of the table, folks do their best to paint their respective positions in the best light while (hopefully) remaining truthful.

Although the interviewing process can be stressful for interviewers, it’s particularly hard on the interviewees.  After all, they are the ones who will be most directly impacted by any hiring decision.  Have a bad interview and you’re going nowhere; nail the interview and you could reach a major career milestone.  As such, there’s a lot of pressure to make yourself appear to be the best candidate you can be.  Often, candidates will use superlative terms to describe themselves:

  • I am an expert in XYZ software.
  • I have senior level skills in widget making.
  • I have an advanced proficiency in flux capacitor maintenance.
  • I am a thought leader in the field of bacon curation.

There are a lot of very smart folks out there, a number of whom truly are experts.  But increasingly, my experience in this area has taught me that there are many candidates who apply to themselves label including expert, senior, and specialist, simply as a selling point without having real basis for such an assertion.  When candidates use these terms recklessly in their résumés and in interview conversations, they are setting themselves up for a hard landing at some point in the future.

Fake it ‘til you make it?

Describing oneself using superlative language can impress an interviewer, in some cases.  Take for example the typical corporate interview scenario, in which one sits first with interviewers from Human Resources.  The interviewer may or may not have specific knowledge about the field in which the candidate specializes.  A skilled interviewee will pick up on this, and may be compelled to dazzle the interviewer with buzzwords while describing his own skillset as superior.  And if he’s very lucky, he’ll get a second interview with an interviewer who assumes that HR has done the necessary vetting, and may not ask the necessary in-depth questions to weed out the unqualified candidate.

Although unlikely, it’s possible for a candidate to bluff his entire way through the interview process all the way to the job offer.  Even if things get that far, it’s still going to turn out badly for both sides.  When a person describes himself as an expert, such strong language sets an expectation for job performance.   Portraying oneself as an expert implies deep knowledge in the topic, good decision-making skills in the field of expertise, and a history of success.  When the expectations greatly exceed the actual results, it’s going to turn out badly for the “expert”.

The phrase “fake it ‘til you make it” often comes up when pitching oneself for work.  Although this might work at lower skill levels, it’s much harder to fake being an expert when it comes time to actually do the work.

Having conducted technical interviews with scores of candidates, I can tell you that those who described themselves as experts in a particular discipline usually got extra scrutiny in their self-described areas of specialization.  Many of them did quite well upon inquiry, but a disheartening number of folks who claimed to have superior skills in a particular area had a difficult time answering even the most basic questions.

Experts are made, not born

Becoming recognized as a thought leader in a particular area doesn’t come by applying a label to oneself.  It’s been written many times that it takes about 10,000 hours – about five working years – of doing something to truly become an expert in it.  Some things will take far less time to master (Minesweeper) while others require much more (neurosurgery).

With that in mind, don’t try to sell yourself as an expert if you aren’t.  Remember that there are other attributes on which you can rely that have a great deal of appeal to employers.  For example, if I’m looking for someone with senior level skills in a particular discipline, I might consider a candidate with midlevel skills who also demonstrates a great deal of enthusiasm and a strong desire and aptitude to learn.   And don’t forget: attitude, attitude, attitude.  Most hiring managers would choose someone with good skills and a great attitude than a jerk with tons of experience.

Sell yourself

Don’t take the advice in this post to mean that you shouldn’t talk yourself up to a potential employer.  Job interviewing is nothing more than sales: you’re trying to sell yourself to the company doing the hiring, and the employer is trying to decide if they want to “buy” what you’re selling (and hopefully, they’re trying to sell you on the company as well).  However, there is a difference between selling yourself, and selling yourself as something you’re not.  If you’re going to describe yourself as an expert in your field, make certain that the label is accurate.

As an aside, I’ve found that a large number of thought leaders in my field do not even describe themselves as experts.  The smartest folks out there realize that there’s still much left to learn in every vocation, and often refrain from labeling themselves as experts for fear of implying that they know everything there is to know.

Conclusion

The job interview process relies on trust and some measure of faith.  A candidate who unduly purports himself to be an expert is bound to be discovered at some point, and the later the discovery the worse the results tend to be.  Put your best foot forward during the interview process, but don’t sell yourself as an expert if you’ve not yet earned that distinction.