The Idea Book

How many times have you said to yourself, “Someone should build an application that does [x]…”, or “Wouldn’t it be easy to add automation to [y]”, or “It would be a lot of fun to work on a project to build [z]”? For me, this has happened a lot, and seems to occur more frequently the longer I’m in this business.

image In the early days of my career, these ideas were all great – for someone else. Someone with more skills and experience than me. Someone with more connections. Someone more confident, good-looking, eloquent. But definitely not me. I subconsciously put onto pedestals those other someones who actually could do the things I thought up. Sure, I had the occasional wild thoughts like “What if I were to build the next eBay?”, but thoughts of realism and self-doubt always reigned me in. I would move on, sooner or later forgetting that I’d even come up with the idea.

However, a few years into my career, I turned a corner. I started to mingle with those who were creating solutions, and I began to realize that they weren’t demigods. I learned that folks who were building successful widgets weren’t vastly smarter or more talented than me. They were just ordinary people who had an idea, explored its potential, and then worked like crazy to make it happen. It was this realization that led me to believe that I, too, could dream up and build successful widgets. From there, my Idea Book was born.

Yes, it really was a book

Version 1 of my idea book really was a physical book. More specifically, a spiral notebook. I carried it with me and would jot down ideas big and small. Most of the ideas I wrote down in my idea book were dumb. Many were wildly ambitious and unattainable by anyone at any skill level. Others were so narrowly focused that their implementation would benefit no one. Still others were solutions looking for problems that didn’t exist. The vast majority eventually found their way to the cutting room floor, and only a fraction of them had enough merit to really pursue. In spite of all of these deficiencies, my idea book has been a resounding success. I’ve carried several of the ideas to implementation. Some of them have made their way into the code tool bag that I use on a nearly-daily basis. A couple of the ideas from my idea book have made me a lot of money.

Having an idea book doesn’t make it a success, obviously. For the first year or two, my book was where ideas went to die (or more accurately, to fall into a coma). I was also timid at first about adding ideas I wasn’t sure would be practical or doable. My idea book didn’t bloom until I committed to two concepts: no idea is too big/small/ridiculous to be added to the idea book, and I must actively push the ideas through the funnel.

All ideas are welcome

Over time, my idea book has transformed from a physical notebook to a virtual one (I use Evernote). I’ve also worked up a semi-structured idea funnel, which I used to categorize ideas into buckets including brainstorming, researching, and in-progress. I’ve also resolved to allow any idea, however grandiose/trivial/odd, into my idea book. My idea book is mine and mine alone, and it doesn’t get shared with anyone – not even my wife or my business partners will ever see the unedited version. Any and all ideas are welcome, though not all will survive the vetting process.

My idea book is with me all the time, in one form or another. If I’m riding in an elevator and a new idea strikes me, I type out a quick note into my phone app to capture the moment. If I’m in the middle of a presentation and someone asks a question I want to pursue later, I scratch out a note on my spiral pad (yes, the old-fashioned method still works sometimes). On occasion, an idea will strike me in the middle of the night, and I’ll grab my Surface from the nightstand and jot down my idea. The medium(s) used to log the ideas isn’t critical, so long as I’m faithful about getting them into Evernote as soon as practical.

The real value shows through when I’ve got 20 minutes to spare, and I’m looking for something to fill the time. Sitting in the lobby of the Kwik Lube, waiting for my oil change. Waiting to see the doctor. On hold with tech support. Using my idea book (the new virtual version), I can review ideas I’ve previously logged, and sketch out pros and cons, flesh out why they will or won’t work, and weigh the benefits against the investment. Having my idea book easily accessible makes it easy to do this in bite-sized chunks rather than having to schedule several hours to go through these ideas.

So where is your book?

Should everyone have an idea book? Absolutely. Whether you’re a technical professional or not, newbie or seasoned veteran, a maker or a strategist, a coder or an executive, having an idea book can stoke the creative fire in you. Thinking through new ideas, analyzing them, and implementing the good ones helps keep you thinking about alternative approaches and different ways of doing things, and can keep you out of a career rut. And who knows, your idea book might end up producing the next eBay, Facebook, or Rubik’s Cube.

Fix Inconsistent Line Terminators in SSIS

There is a flat file processing issue I’ve run into a number of times over the years, and it’s come up again several times recently. The issue relates to the line terminators used in data files. Occasionally, changes to the systems generating these data files, or perhaps even manual edits, can change the way the file marks the end of a line. These changes can cause a failure of package execution, or even worse, they can be loaded successfully and cause data quality issues in the target.

In every text file, there are unprintable characters called line terminators that mark the end of each line. On most UNIX and Linux systems and some older operating systems, files are created using the line feed character (LF, or ASCII character 10), while files generated in Windows typically use the carriage return and line feed (CRLF, or ASCII characters 13 and 10, respectively) to mark line endings. The tricky part is that these characters are generally not displayed, so opening up a data file with Notepad or a similar editor will not present any visual differences between line feed-terminated files and those using carriage return plus line feed. However, these differences can have a significant impact on ETL processes, as well as any downstream systems relying on data from those files.

In this post, I’ll show some examples of both LF and CRLF terminated files, and how they behave in SSIS when the package is expecting one type but gets the other. I’ll then demonstrate two workarounds to prevent unexpected errors due to changing line endings.

The scenario

My client, Contoso, has a new supplier, and I’m setting up the ETL process to import some flat file data from this new supplier. I’m working from a file spec that includes column-level mappings and indicates that lines will be terminated using CRLF. I build the package, test it against sample data provided by Contoso’s new supplier, and send the successfully tested package to QA for final testing and deployment to production.

Weeks go by, and the package works great. However, one morning I get a call from Contoso, asking for advice in troubleshooting this new process. It appears that the package has failed without loading any data, logging a data truncation issue upon failure. Both Contoso and their new supplier have reviewed the data file causing the failure, and cannot find any reason for the error. I open the file up in Notepad++ and turn on the Show Line Endings feature, and the problem becomes readily apparent. The most recently successful file looks like this:

image

However, the failed file looks like this:

image

The difference is subtle but important: The second file uses the line feed character as a terminator, while the previous file uses a carriage return. This distinction is not visible when using tools such as Notepad, and in fact, even in Notepad++, these characters aren’t shown by default. However, even though these characters are not visible by default, the distinction is very important.

Why does it matter?

Although they are easy to forget about, incorrect line endings can wreck an ETL process. As shown in the hypothetical scenario above, in cases where the SSIS package expects to receive CRLF line endings but instead gets just an LF, most likely the package will fail due to either data truncation or data type issues. Even worse, if the package is set up to process LF line endings but receives a file with CRLF terminators, chances are good that the data will actually be loaded – with some extra baggage. In the latter case, if the last data field on the line is interpreted as a character data type (CHAR, NVARCHAR, etc.), the carriage return character would be preserved in the loaded data. In the example below, I’ll show how this can impact the quality of that data.

For this example, I’ve created an SSIS package to process a data file using LF line terminators. Then, I regenerate the same data file using CRLF line endings, and process the modified file. The package successfully loads the file, with no apparent problems. Below I can see in my target table that the data has been loaded.

image

Now, I want to find all products matching the first ItemName in the list. When I query this table using the exact ItemName value I just saw in my SSMS results window, I find that I get no results at all.

image

Even though I’m typing in the exact description I see, I get no results for that value. The reason is that I’m looking for the literal string ‘Mountain-100 Black, 42’, when in reality, the value in this field contains an unseen carriage return. Because the SSIS connection was configured to use LF as the line ending, it interprets the carriage return to be part of the data, and loads it to the output table. Copying that value from the SSMS results grid and pasting it into the query window confirms that the extra CR character is present at the end of the data value. Knowing this, I can modify the section criteria I used, changing the query from an exact match to a LIKE with a wildcard at the end to return the values I expected to see.

image

This confirms that the line ending is the problem, but what can be done to avoid this in the first place?

Fixing the Line Ending Problem

When coding to avoid issues with inconsistent line endings, there are three potential scenarios to plan for:

  • Lines with LF line terminators
  • Lines with CRLF line terminators
  • Lines with CR line terminators (a far less common scenario)

Planning for the first two scenarios listed above is relatively easy; the last one takes a bit of work. I’ll demonstrate the design patterns for handling each of these.

Coding for LF and CRLF

As I mentioned earlier, files originally specified as LF endings then getting switched to CRLF (or vice versa) is more common that you might think. However, this problem is fairly easy to resolve using the SSIS data flow. First, the flat file source should be updated to use only a line feed for the line terminator, as shown below.

image

Next, on the data flow, add a derived column transformation to the data pipeline. This transformation will remove any carriage return values (indicated by “\r” in the SSIS expression) found in the last data field.

image

When using this pattern, the output will be the same regardless of whether the lines in the data file are terminated with LF or CRLF. For the latter, the package will simply remove the extra carriage return in the data flow. This is a very easy pattern to implement, and will provide protection against line endings changing from LF to CRLF, or vice versa.

Coding for CR, LF, or CRLF

Building a package to handle any type of line ending – CR, LF, or CRLF – takes a bit more work. Since the SSIS flat file connection manager must be configured for the type of line ending to expect, preparing for line endings that are not known at design time requires a more versatile source: the script component. Using the System.IO namespace in the script component, I can open the file, read through each line, and parse out the values irrespective of the line endings used.

In this example, I’ve added a new script component to the data flow, and I have configured this as a source. Next, I added output columns to the default output on the script component, which match the metadata in the table to which we will be loading the data. Finally, I wrote the code below which will read each line in the file, assigning the values to their respective columns in the output.

public override void CreateNewOutputRows()
    {
        // Create a connection to the file
        StreamReader reader = new StreamReader(Connections.SalesFile.ConnectionString);

        // Skip header row
        reader.ReadLine();

        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();
            string[] columns = line.Split(Variables.vDelimiter.ToCharArray());

            // Use an increasing integer for indexing the columns below (so I can be lazy and paste below).
            int i = 0;

            // Add a new row to the output for each line in the file
            Output0Buffer.AddRow();

            // Assign the appropriate values into the output columns
            Output0Buffer.SalesOrderID = int.Parse(columns[i++]);
            Output0Buffer.SalesOrderDetailID = int.Parse(columns[i++]);
            Output0Buffer.CarrierTrackingNumber = columns[i++];
            Output0Buffer.OrderQty = sbyte.Parse(columns[i++]);
            Output0Buffer.ProductID = short.Parse(columns[i++]);
            Output0Buffer.SpecialOfferID = sbyte.Parse(columns[i++]);
            Output0Buffer.UnitPrice = float.Parse(columns[i++]);
            Output0Buffer.UnitPriceDiscount = float.Parse(columns[i++]);
            Output0Buffer.LineTotal = float.Parse(columns[i++]);
            Output0Buffer.rowguid = columns[i++];
            Output0Buffer.ModifiedDate = DateTime.Parse(columns[i++]);
            Output0Buffer.ItemName = columns[i++];
        }
    }

The reason this works in the C# code above and not in the flat file source is that C# treats lines within files a bit differently than the SSIS connection manager does. The flat file connection manager in SSIS has to be configured for a specific type of line ending, while the StreamReader.ReadLine() function simply reads to the end of the line irrespective of the line ending used.

Again, it’s been rare that I’ve had to code for the possibility of three different line ending types. However, I have seen this occur a few times, and for volatile data files, it’s a design pattern worth considering.

Conclusion

In the same way data professionals are usually skeptical of untested data, they must also be mindful of suspect metadata. Changes that appear to be as benign as a modification to the line terminator characters can have a serious negative impact on ETL and its target systems. Using the defensive ETL strategies I’ve shown here, you can help prevent errors or data quality issues related to changing line endings in data files.

Iterating Through Excel Worksheets with Biml

bimlWrangling large or complex Excel workbooks in SSIS can be a challenge. From managing data types (more about that in this post by Koen Verbeeck) to addressing multiple worksheets in a single document, configuring SSIS to properly read from or write to Excel documents is tedious at best. While there are no silver bullets to completely solve these problems, I’ve found that – like many other challenges in the SSIS world – using Biml can help eliminate some of the manual work involved.

In this post, I’ll demonstrate how you can use a few lines of BimlScript code to read through multiple worksheets in an Excel source.

The Challenge

For this scenario, I’m using a single Excel 2007 document containing numerous worksheets. All of the worksheets have a similar structure, each containing results from a different geographic area.

SNAGHTMLb27785d

What I want to accomplish is to programmatically infer from the Excel document the name and structure from each worksheet, and create an SSIS data flow for each one. Doing so can save a great deal of effort, especially if there are many columns, many worksheets, or if the structure of each worksheet can differ from the others.

The Biml Solution

The solution, using BimlScript, is designed as follows:

  • Connect to the source Excel file and loop through the worksheets
  • Create an SSIS data flow for each worksheet
  • In each data flow, create an Excel source component to connect to the worksheet specific to that data flow
  • In each data flow, create an OleDB destination component, and automatically map source columns from the Excel worksheet to the destination table

First, connect to the Biml file as shown below. Note that, if you haven’t already, you will need to install the Excel 2007 driver for this code to work.

  <#@ import namespace=”System.Data” #>
<#@ import namespace=”System.Data.OleDb” #>
<#
var connectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\\AccidentData.xlsx;Extended Properties=\”EXCEL 12.0 XML;HDR=YES;IMEX=1\”;”;
var connection = new OleDbConnection(connectionString);
connection.Open();
var worksheetCollection = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null).Rows.OfType<DataRow>().Select(i => i[“TABLE_NAME”].ToString()).Where(i => i.EndsWith(“$”));
#>

This will create an array named worksheetCollection, which contains the names of each worksheet found in the specified Excel file. (Big thanks to Scott Currie of Varigence for sharing the syntax of this operation.)

Next up, I will iterate through this list of worksheets, building an SSIS data flow for each one. I’ll also point out that I could also create one package per workbook, as opposed to a single package with multiple data flows; however, for the sake of simplicity in demonstration, I’m using just one package with a separate data flow per Excel worksheet. As shown below, I’m creating a foreach loop in BimlScript, to loop through each worksheet name. In each iteration of the loop, I’ve created a data flow, with the same name as the worksheet.

<# foreach (var worksheet in worksheetCollection) {#>
<Dataflow Name=”<#= worksheet.Replace(“$”,  “”) #>” DelayValidation=”true”>
</Dataflow>
<# } #>

So far this data flow task does nothing. Let’s add the Excel source connection.

<# foreach (var worksheet in worksheetCollection) {#>
<Dataflow Name=”<#= worksheet.Replace(“$”,  “”) #>” DelayValidation=”true”>
<Transformations>
<ExcelSource ConnectionName=”Accidents Source File” Name=”XLS Accidents – <#= worksheet.Replace(“$”,  “”) #>”>
<DirectInput>SELECT * FROM `<#= worksheet #>`</DirectInput>
</ExcelSource>
</Transformations>
        </Dataflow>
<# } #>

The code above creates a connection to the Excel document, building the SELECT statement using the name of the worksheet as the table name. A couple of things to note here: First, this snippet uses an Excel connection named Accidents Source File, which I set up in a previous step (the code for which is provided in the download). Also, the source name uses the name of the worksheet with the dollar sign ($) removed through the Replace() function in the BimlScript.

Finally, I’ll add the destination connection. Since all of the workbooks in this sample spreadsheet will be loaded to the same table, there is no dynamic code in the destination component. You may notice that this snippet also does not use any column-level mappings. This is by design; since each of the worksheets may have some differences in metadata (an extra column, or missing columns), I’ve excluded any static column mappings, instead relying on automatic column name mapping in Biml. Also, like the source transformation, this destination refers to a source name that was set up previously.

<# foreach (var worksheet in worksheetCollection) {#>
<Dataflow Name=”<#= worksheet.Replace(“$”,  “”) #>” DelayValidation=”true”>
<Transformations>
<ExcelSource ConnectionName=”Accidents Source File” Name=”XLS Accidents – <#= worksheet.Replace(“$”,  “”) #>”>
<DirectInput>SELECT * FROM `<#= worksheet #>`</DirectInput>
</ExcelSource>
<OleDbDestination Name=”OLEDST Accidents Table” ConnectionName=”AccidentData DB”>
<ExternalTableOutput Table=”[dbo].[AccidentData]”></ExternalTableOutput>
</OleDbDestination>
          </Transformations>
</Dataflow>
<# } #>

After saving the Biml file and generating packages, the resulting SSIS package appears as below.

image

Each data flow contains an Excel source connected to the common output table. Each of these Excel sources retrieves data from one worksheet, filtered by the SELECT query.

SNAGHTMLbae36df

The complete Biml file used for this example can be found here.

Conclusion

Working with Excel data in SSIS can be tedious and time-consuming. However, Biml can help by automating some of the manual operations involved in Excel-to-SSIS mappings.

On Failure: Getting Up

WP_20150205_004 In my continuing series entitled “On Failure”, I want to talk about skiing.

I’m not a great skier. It would probably be a stretch to say that I’m a good skier. Still, I enjoy doing it, and I want to (and can) get better at it. Since I live in the Dallas area, I don’t get a lot of opportunities to ski – usually 1-2 trips per year – but I try to make the most of it when I do get to go.

When I first started skiing, I fell – a lot. The first time I went skiing, I took a half-day lesson before I got started, after which I assumed I’d be a decent skier. I was wrong. Beginner ski school is more of a lesson in logistics (how to put on and take off your skis, how to board and deboard the ski lift, etc.) than an exercise in actually learning how to ski. So my first trip down the mountain was rife with tumbles, lost skis, snow burn, and “WHY DO MY LEGS HURT SO MUCH??” On those first few runs, I spent more time in the horizontal than the vertical. Because I had to get out of the snow and put my skis back on every few hundred yards, just completing each run was a slow, painful, exhausting process.

Toward the end of that day, I recall that I had a particularly nasty fall after inadvertently crossing my skis (again). I was exhausted, embarrassed, and hurting. All I wanted to do was to lie there, gather my thoughts, and let some of the pain subside. My friend, an experienced skier, was skiing behind me and stopped to help. When I told him I just wanted to lie there for a minute, he told me, “Get up. Lying in the snow just makes it worse.” Grumbling, I got up and continued the slow trek down the mountain.

I’ve thought about my friend’s advice a lot since then. As a skier, I’ve come to find that his words were quite true. Why?

  • Simply lying in the snow after a spill makes you colder, and the cold coupled with inactivity make it physically more difficult to get up.
  • It’s easier to talk yourself out of continuing when you’re lying there feeling sorry for yourself.
  • You’re physically in danger from other skiers crashing into you from behind.

But this statement doesn’t just apply to skiing. We can all use this advice in our careers and business relationships as well. Many of us have had some nasty spills in our careers, being on the wrong end of business failures, professional quarrels, terminations, and other career maladies. Personally, I can completely empathize with the desire to “just lie in the snow” after a career setback. I’ve been guilty of indulging the instinct to just stop moving forward, attempting to soothe those aches using self-doubt and self-pity, after a career setback. But just like the skiing analogy, such behavior only makes the situation worse. Refusing to move forward after going topsy-turvy will almost certainly impact your relationships and career prospects. Sometimes it hurts to get up and keep moving forward, but simply lying in the snow hurts even more.

Every failure, on the ski slope or in the cubicle farm, requires some small amount of time to regroup. But the key objective is to keep moving forward, even if it hurts to do so at first.

Advanced SSIS Training in Dallas

I’m very excited to offer a new course entitled “Advanced SSIS” in the Dallas area this spring. My friend and colleague Andy Leonard and I will be delivering this new 3-day course March 9-11, 2015 at the Microsoft offices in Irving, Texas. This course is intended for those who have experience using Integration Services who are looking to take their skills to the next level. In this course, we’ll each be sharing experiences and war stories from 10+ years in the data integration space.

Among the topics we’ll cover:

  • Data flow internals
  • Performance design patterns
  • Learning how to fail properly
  • Security in SSIS
  • Deployment design patterns
  • Data quality
  • Metadata management patterns
  • Biml
  • Known limitations of SSIS, and workarounds for them
  • ETL data edge cases

Also, there may or may not be stories of chicken farming.

We’ll bring the coffee and lunch for all three days. All you need to bring is your experience using SSIS and your readiness to advance your data integration skills.  Space is still available, and early bird pricing is in effect until February 6th. If you’ve got a team of folks who would benefit from this training, contact me about a group discount. If you’re coming in from out of town, there are lots of hotels close by, including the NYLO right next door and the Hampton Inn about a mile away.

Feel free to contact me with any questions. We hope to see you there!

On Failure

Well, there's another way not to make a lightbulb.The first rule of blogging is that you should write about topics you know a lot about. And I know a lot about failure. This post will be the first in a series on the topic, through which I’ll share a few of my own failures and how I’ve done my best to use them to my benefit.

In almost every context, the word fail is a negative:

  • Last night’s database backup failed.
  • Our data warehouse project was a failure.
  • We failed to close a deal with this prospect.
  • The boss failed to live up to his promise.

Failure means that something wasn’t done, or was done incorrectly. Failure is a missed deadline. It is a lack of planning, or misplaced trust. Failure is a lost parcel, a lost customer, or a lost cause. It is a business ending, a marriage dissolving, a career plan torn to shreds.  And it’s also an inevitable part of life.

I don’t consider myself an expert on failure, but I’ve experienced enough failures – both large and small – that I can speak with some measure of authority on the topic.  I’ve lived through multiple divorces of parents and grandparents. I’ve lived in poverty on the wrong side of the tracks. I nearly got fired – on multiple occasions – from my first job because of my immaturity and a bad attitude. I dropped out of college in the middle of a semester (and failed to withdraw, of course) and received grades commensurate with dropping out in the middle of a semester. I invested years in preparing for a career I’d dreamed about since junior high school only to discover that I didn’t want to do that anymore. I started a business which failed in under 2 years. I’ve missed out on dozens and dozens of business and career opportunities due to my own procrastination. And those are just the high-level failures I can think of off the top of my head that I’m willing to share – there are many more that I’ve forgotten, and some others are frankly too embarrassing to blog about.

But the beautiful thing is that I’m still here. I’m alive, I’m employed, I’m healthy, and I’m sane (stop laughing – I really am). But even more importantly, I’ve learned that failure is a part of life, and more specifically, it’s a part of my history. For every failure I experienced, for every hardship I brought on myself, I learned something. And because I still fail, I’m still learning.

I don’t know if there’s value to anyone else in my sharing this information. So in that way, this post may be a failure. Except that it won’t.  Even if neither of the people who subscribe to my blog get any value from this, I will have learned something from writing all this down. And at a minimum, I’ll have something that I can refer to on those days after I’ve had a particularly large failure and need a reminder that I haven’t failed in vain.

I realize that some of this may resemble bumper-sticker logic. I promise not to go all-out Tony Robbins on you, but here are a few of the points I’ll cover in this series.

  • Failure is necessary for growth. Not unlike the muscle-building process, to build we must first destroy. Failure is a little bit of destruction, but managed properly, will lead to personal and career growth.
  • Failure of integrity. This is the worst and most destructive kind of failure. How do you get past this?
  • Failure through inaction. Failing to seize an opportunity is a huge source of regret for many (this guy included).
  • Respond properly. You’ve got to know how to respond to failure (yours and that of others) to be able to properly manage it.
  • If you’ve not failed in a big way, you’re not taking enough chances. This is where I’ll tell you all about my business failure and what I learned from it.
  • Failure doesn’t have to be fatal. Failure is not the end of the line. It’s an obstacle in the road.
  • Failure demands both forgiveness and accountability. Learning to forgive failures (especially your own) is critical, but there must be accountability as well.

I’m not necessarily proud of my failures, but I try to remind myself every day to use those failures as a way to do it better next time.

Six practical tips for social media success in 2015

Social media is the new résumé.  In many ways, it’s even better than a résumé – a person’s social media stream can reveal attitudes, biases, and deficiencies that wouldn’t dare appear on a résumé.  Your online thoughts – blogs, Instagram pictures, tweets on Twitter, posts on Facebook, among others – help to make up the digital you, which friends and strangers alike will use to assess who you are and what you can contribute.  The things you share on social media become part of who you are.

Even more importantly, there’s a permanence to social media content that requires us to pay special attention to anything posted on the Internet.  There’s no Undo on the Send button; once you publish something to the Internet, it can be there forever.  Remember that potential clients and employers will most likely review your social media activities before making a hiring decision; in fact, a recent survey of human resources personnel revealed that over 90% of respondents looked to social media when checking out a candidate.  Even if you’re not looking for a job, consider that what you post today may still be around for years afterward.  Sure, you can edit or delete content or restrict its privacy settings, but have you read the terms of service for the medium on which you’re sharing that information?  In some cases, content you share online may be used in ways you don’t expect, according to the provider’s terms of service.  The bottom line is that privacy settings and deletion won’t necessarily keep your content private, so think twice before posting angry rants or NSFW after-hours photos.

With that, here are a few basic rules I try to follow when posting to social media.

Don’t write anything in the heat of the moment, especially if you’re hurt or angry.  Intense emotion often leaves logic behind, and those types of posts tend to be the ones you regret.  If you routinely find yourself posting to social media and later editing or deleting those posts, you might have a problem with this.  Things posted on social media can have a long life span, even when the original media is deleted.  The few minutes of satisfaction you get from sharing that angry tweet, Facebook post, or blog post might cost you years of embarrassment.  Take an hour and walk around the block before you post in an emotional state.

Find your pace.  Everyone has their own speed at which they share on social media.  Some will write a new blog post almost daily, while others do so just once or twice a month.  There are folks who post to Twitter a dozen times each day.  These are all acceptable, but the most important thing to remember is to be consistent.  Don’t publish a dozen blog posts in January and then stop blogging for the year.  Your audience, however larger or small, will follow you in part because of your volume and velocity.  Find a pace that you’re comfortable with, and most importantly, that is sustainable for the year.  The right scheduling tool can help with this, especially when the amount of time you have to devote to social media can vary from week to week.  (As a sidebar, I use HootSuite, though it’s just one of many such tools available, many of which are free.)

Check ur grammar.  I’ll admit it – I’m dogmatic when it comes to proper grammar and spelling, and I evaluate the quality of social media entries based in part on those criteria.  If your posts are littered with misspellings and grammatical errors, you could end up being passed over for a job or a gig.  It’s a fact that some folks are simply more attentive to this than others, so if you struggle with spelling and grammar, find a trusted adviser to proofread your posts (especially longer and more permanent compositions, such as web articles and blog posts).

Rid yourself of negative influence.  The things you read will affect how you write, and negativity breeds negativity.  You know the type – the blogger who complains about everything, the person on Facebook who’s all about drama, or the Twitter follower who’s always posting in anger.  I exercised a social media purge recently, either hiding or completely removing some folks who were constantly angry and negative.  Following people who post a constant stream of bile will almost certainly affect your mood and attitude, and is an unnecessary distraction.  Don’t disengage from someone over one online rant, but if they demonstrate a pattern of this behavior, cut ‘em off.

Have conversations.  Your social media presence can be advertisement, an online résumé, and a series of conversations.  Don’t neglect the last one!  You don’t want to be known as someone who simply broadcasts without listening.  The more you establish yourself as an expert on a topic, the more folks will want to chat with you, whether it’s to ask for advice, share an idea, or simply to get to know you.  While you don’t have to engage with everyone who reaches out to you (see the prior bullet), it’s usually best to err on the side of openness.

Last and most importantly, be you.  Don’t look to mimic someone else’s blog posts, tweets, or Facebook activity.  Your followers will read what you write because it’s yours, not because it resembles that of someone else in the community.  In fact, being different is a good way to gain even more followers; if you’re writing about things few other people are writing about, or if you’re approaching it on a level or from a perspective others aren’t, you’re likely to be different enough from the crowd that people will seek out your content.

Everyone uses social media differently, and each of us will have our own set of internal guidelines on what to post.   Just remember that your social media stream becomes an extension of, and a window into, your personality.  Take care in what you share, pace yourself, and be accessible.

[OT] Blog is in maintenance mode

I’m spending part of this holiday break repaying some technical debt on my website.  Among other things, I am importing some old content that I never brought over when I did my migration to WordPress a few years ago.  Most of the content I’m bringing over is old (most of it is at least 5 years old), and I’m adding it to this site to integrate all of my content, both recent and historical, in one place.  I don’t expect that the old posts will show up in RSS readers as new content.

In addition, I’m planning to try out a new WordPress layout.  I’ve been using the same simple (read: dull) theme for a while now, and I’d like to dress it up just a bit.

On Perspective

Perspective can make or break a career.  Maintaining a proper perspective is very often the differentiating factor between a good technologist and an incredible one.

6281420488_68b88bfc00_zIn my 15-ish years in IT, I’ve said a lot of dumb things.  Many of them I’ve forgotten, but I can’t shake the memory of one particular phrase I uttered more than a few times back in my early days of my career.  Even today, it still embarrasses me that I ever had the mindset to say these words about other people:

“… those stupid end users …”

Yep. I said that.  Why would I say those words?  Sure, there was some emotion and frustration involved, but even more than that, my perspective was all wrong.  Being new to the IT field, my expectation was that it was our job as technical professionals to dictate standards and practices, and that the end users we supported would modify their business processes and their workflow to match those standards.  I looked at most business problems as the fault of the users for not following our standards, or not using their software tools properly.  Looking back on 15 years of experience, it seems silly that I would have ever held that position.  But in my (at the time) limited field of vision, this was my expectation.

Fast-forward a few years.  With a little experience under my belt, my perspective had changed.  Through a few hard lessons, I had evolved to the point that I fully understood that my principal function as a technical professional was to serve the business, not the other way around.  My attitude significantly improved, and I became a more proficient technical professional.  But my perspective still had one significant shortcoming: I simply solved the business problems that were brought to my attention.  Sure, I had my technical resources in order – my backups were always done and tested, my code used common best practices and was checked into source control, and I did my best to get out in front of performance issues before they ballooned into bigger problems.  But I still considered business problems to be outside my purview until my assistance was specifically requested.  My perspective was limited in that I was still trying to be a technical professional, rather than focusing on being a business professional solving technical problems.

I still remember when it finally clicked for me.  I’d been working in the industry for about four years, and after multiple rounds of meetings to solve a particular business problem, it hit me: my perspective is all wrong.  I’ve been looking at this from the perspective of “Tell me your problem and I’ll fix it,” when the dialog should have been “Let me understand what you do and what you need so we can address our problems.”  That’s right – it’s not that those end users have business problems.  It’s that we have business problems and we need to solve them.  There’s nothing more comforting for a nontechnical person to hear, and rarely a statement more empowering for a technical person to make, than a sincere expression of “I feel your pain. Let’s solve this together.”  This is true whether you’re tasked with front-line technical support, you’re working deep in a server room, or you’re a senior consultant in the field.

I believe a person can be a moderately successful technologist by focusing strictly on understanding and solving technical problems.  Where one becomes a rockstar problem solver is the point at which he or she has the experience and maturity to see things through a perspective other than his or her own, while understanding and feeling the pain points of others.

SSIS Parent-Child Architecture in Catalog Deployment Mode

This is the third in a series of posts about SSIS parent-child architecture.  You can find the index page here.

In my previous posts on SSIS parent-child package architecture, I described the benefits of the parent-child package design pattern and demonstrated the implementation of such a pattern in package deployment mode.  In this post, I will demonstrate the parent-child pattern for implementations using catalog deployment mode, which is the default design on SSIS 2012 and 2014.

Catalog deployment mode in SSIS

If you’re reading this post and find yourself asking, “What is catalog deployment mode?”, here it is in a nutshell: Starting with SQL Server 2012, there were significant changes to the architecture of SSIS, most notably the move to a deployment/storage structure called catalog deployment model (which is also frequently referred to as the project deployment model).  In this model, SSIS code is more project-centric than package-centric; packages are deployed as an entire project instead of individually (though each package can still be executed independently).  Catalog deployment mode in SSIS also brought the addition of parameters, which can be used to externally set runtime values for package executions, as well as project connections, which can be used to easily share connection settings across all packages in a project.  Many other changes were introduced, including a simpler logging model and a dedicated SSIS database.

Among the many changes brought about by the catalog deployment model, the one that had the most impact on the parent-child design pattern is the addition of parameters.  In older versions of SSIS, it was possible to pass runtime values to a package, but the process was clunky at best.  When using SSIS 2012 or 2014 in catalog deployment mode, setting runtime values for a child package (or a standalone package, for that matter) is much easier and more straightforward than performing the same task in previous versions.

It is also worth noting that you don’t have to use the catalog deployment model in recent versions of SSIS.  Although catalog deployment model is the default setting in SSIS 2012 and 2014, you can set your project to use the package deployment model.  You’ll lose many of the new features (including parameterization and simplified logging) by choosing package deployment model, but this might be practical if your organization has made a significant investment in SSIS architecture that would be broken by moving to catalog deployment model.

Parent-child package execution

At the heart of parent-child architecture is the collection of parameters.  In catalog deployment mode, we can set up parameters at the package level or at the project level.  For values that would affect just a single package, using a package parameter would make the most sense.  However, if a value might need to be shared among several (or all) packages in a particular project, a project parameter would allow you to create the parameter once for the entire project rather than one for each package.

Execute package task

When executing a child package, the simplest method is still the execute package task.  Introduced in 2012, the execute package task now has a dropdown list (shown below, on the Package tab) to allow the SSIS developer to specify the target package.

image

There are a few limitations with this approach.  Most notably, this dropdown list selection only works when calling a package that exists in the same project.  You’ll notice that the selection above the package name, entitled ReferenceType, is set to Project Reference.   Though you can change ReferenceType to use a project located elsewhere, oddly enough you can’t use it to execute a package in a different project deployed to the SSIS catalog (you can read more about that limitation, as well as upvote the issue on Connect here).  I’ll discuss a couple of workarounds for this momentarily.

Clicking over to the Parameter bindings tab, we can specify which values to pass into the child package.  For each child package parameter, we specify exactly one value to be supplied at runtime.  Remember, like the dropdown list for package selection, this piece only works when executing packages in the same project (using the Project Reference setting on the ReferenceType from the Package tab).

image

Keep in mind that you have to use a parameter or variable (again, choosing from the dropdown list) to map to the child parameter.  You can’t simply type in a static value in the Binding parameter or value field.  Also, remember that you will only see package parameters (not project parameters) in the list of child package parameters that may be mapped.  This is by design – it wouldn’t make sense to map a value to a project parameter when executing a package in the same project, since that child package would already implicitly have access to all of the project parameters.

Another distinct advantage of using the execute package task is the process for handling errors in the child package.  In the event that a child package fails, the execute package task will fail as well.  This is a good thing, because if the child package does fail, in almost all cases we would want the parent package to fail to prevent dependent tasks from improperly executing.  Even better, error messages from the child package would be bubbled up to the parent package, allowing you to collect error messages from all child packages within the parent package.  Consolidated error handling and logging means less development time upfront, and less maintenance effort down the road.

If you have the option of using the execute package task for starting packages stored in the SSIS catalog, I recommend sticking with this method.

Execute SQL task

Another method for executing one package from another is by using the T-SQL stored procedures in the SSIS catalog itself.  Executing a package in the SSIS catalog in T-SQL is actually a 3-step process:

  • Create the execution entry in the catalog
  • Add in any parameter values
  • Execute the package

Catalog package execution via T-SQL, another new addition in SSIS 2012, allows us to overcome the limitation in the execute package task I mentioned earlier.  Using a T-SQL command (via the execute SQL task in SSIS), we can execute a package in any project.  It’s certainly more difficult to do so, because we lose the convenience of having the list of available packages and parameters exposed in a dropdown list in the GUI.  Here there be typing.  However, being able to execute packages in other projects – and for that matter, on other SSIS catalog servers entirely – somewhat makes up for the manual nature of this method.

To execute a child package using this method, you’d create an execute SQL task and drop in the appropriate commands, which might look something like the following:

DECLARE @execution_id BIGINT

EXEC [SSISDB].[catalog].[create_execution] @package_name = N'ChildPkgRemote.dtsx'
	,@execution_id = @execution_id OUTPUT
	,@folder_name = N'SSIS Parent-Child'
	,@project_name = N'SSIS Parent-Child Catalog Deployment - Child'
	,@use32bitruntime = False
	,@reference_id = NULL

-- Set user parameter value for filename
DECLARE @filename SQL_VARIANT = N'E:\Dropbox\Presentations\_sampleData\USA_small1.txt'

EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id
	,@object_type = 30
	,@parameter_name = N'pSourceFileName'
	,@parameter_value = @filename

-- Set execution parameter for logging level
DECLARE @loggingLevel SMALLINT = 1

EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id
	,@object_type = 50
	,@parameter_name = N'LOGGING_LEVEL'
	,@parameter_value = @loggingLevel

-- Set execution parameter for synchronized
DECLARE @synchronous SMALLINT = 1

EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id
	,@object_type = 50
	,@parameter_name = N'SYNCHRONIZED'
	,@parameter_value = @synchronous

-- Now execute the package
EXEC [SSISDB].[catalog].[start_execution] @execution_id

-- Show status
SELECT [status] AS [execution_status]
FROM SSISDB.CATALOG.executions
WHERE execution_id = @execution_id

Two things in particular I want to point out here.  First of all, by default when executing a package using T-SQL, the package is started asynchronously.  This means that when you call the stored procedure [SSISDB].[catalog].[start_execution], the T-SQL command will return immediately (assuming you passed in a valid package name and parameters), giving no indication of either success or failure.  That’s why, on this example, I’m setting the execution parameter named SYNCHRONIZED to force the T-SQL command to wait until the package has completed execution before returning.  (Note: For additional information about execution parameters, check out this post by Phil Brammer).  Second, regardless of whether you set the SYNCHRONIZED parameter, the T-SQL command will not return an error even if the package fails.  I’ve added the last query in this example, which will return the execution ID as well as the execution status.  I can use this to check the execution status of the child package before starting any subsequent dependent tasks.

image

As shown, I’ve set the SQLStatement value to the T-SQL code block I listed above, and set the ResultSet value to Single row, the latter of which will allow me to capture the output status of the executed package.  Below, I’ve set that execution status value to a new package variable.

image

To round out this design pattern, I set up my control flow as shown below.  Using precedence constraints coupled with SSIS expressions, I execute the package and then check the return value: a successful catalog execution returns a value of 7, and my parent package handles any return value other than a 7 as a failure.

SNAGHTML2fb7ad03

You may also have to give special consideration for errors in child packages when using T-SQL for package execution – especially when running packages interactively in the BIDS/SSDT designer.  Since the T-SQL command does not report the failure of a package by default, it also doesn’t “bubble up” errors in the traditional SSIS manner.  Therefore, you’ll need to rely on capturing any child package error messages from the SSIS catalog logging tables, especially when developing and testing packages in Visual Studio.

Script Task

It is also possible to execute SSIS packages programmatically from the script task.  This method is significantly more complicated, but also offers a great deal of flexibility.  A fellow SQL Server MVP and acquaintance of mine, Joost van Rossum, has a detailed blog post on how to programmatically execute SSIS packages from the SSIS script task.  I won’t restate what he has already covered in his comprehensive and well-written post on the topic, but if you need to programmatically fire child packages in the SSIS catalog, check out his write-up.

Conclusion

In this post, I’ve covered the essentials of executing child packages in the SSIS catalog, including provisioning for errors in the child package.  Though it can be quirky, especially when executing child packages in a different project or on a different server, there are several different ways to address this parent-child design pattern.

In my next post, I’ll talk a little more about passing values in a parent-child package, and will illustrate how to pass values from child packages back to the parent package.