I’d like to share a networking success story.  Last year, I blogged about my experience at the PASS Summit of 2005, where I was essentially a wallflower and didn’t really do any networking.  Since then, I’ve realized its importance and have embraced professional networking as a key component in a successful career. 

Ever since then I have carried through on the lesson I learned, spending as much time as possible getting to know my colleagues, and lending them a hand whenever possible.  At the PASS Summit this past November, I got the chance to redeem myself from the lack of initiative from my trip four years earlier, and took the opportunity to get to know as many people as I could.  During lunch on the third day of the Summit, I met a fellow Dallas-area business intelligence professional who works for a small consulting firm in my area.  He mentioned that his company was looking to hire one or two more senior BI people, and I hinted that I was considering making a move.

To make a long story short, that encounter led to a few phone calls and a series of meetings with this company, and as of next week, I will be a permanent part of their team!  My new role at Artis Consulting will be as a business intelligence consultant, solving complex business data problems alongside some very sharp coworkers.  I’ve had the opportunity to spend a little time with all of the leadership and several of the staff members, and I’m very excited about this move and the new challenges that it will bring.

So back to the success story… Looking back at the events of the past couple of months, I don’t believe things would have ended up this way without the groundwork I laid through networking.  In the last few years, I’ve spent a good deal of time working with and getting to know the folks in my local SQL Server user group, which in part led to my leadership role within that group.  That leadership position helped me to meet and develop friendships with other SQL Server group leaders, and one of those relationships led directly to a friendly introduction to my initial contact at Artis, resulting in the interviews and eventually the new career with that company.  It's important to note that my new role at this company was not openly advertised as a vacant position, so I would likely not have found this opportunity through a traditional job search.  I do believe that there was a greater comfort level on both sides of the interviewing fence after we came together through a known and trusted common contact.

My recent experience is further proof that building professional relationships through networking is a great strategy for career improvement.  If you’re like I used to be – introverted, a bit shy, perhaps doubting the value of professional networking – I encourage you to take a chance and get to know some of your peers and colleagues.  Find a local user group in your area of expertise, and set a goal to meet X number of people.  Attend a local technology event such as a product launch or a SQL Saturday, and introduce yourself to others there.  Invite a colleague you don’t know to lunch or coffee.  Volunteer to be part of a team in events such as GiveCamp.  There’s nothing to lose!  The very worst thing that can happen is that you’ll meet some people you’ll never see again.  And often, things work out such that your networking contacts work together to change your career for the better.

"Fortune favors the bold.”  -- Virgil

A relatively common requirement in ETL processing is to break records into disparate outputs based on an alphabetical split on a range of letters.  A practical example of this would be a work queue for collections staff based on last name; records would be pulled from a common source and then separated into multiple outputs based on a the Customer Last Name field, with the resulting output going to the person or group responsible for working that alphabetical subset of data.

There are a couple of different ways you can do this.  First is to use separate sources for each range of characters, and specify in your SELECT statement only those values that you want.  This is an effective quick-and-dirty option, but it doesn’t scale well as it requires multiple round trips to the database.  You could also accomplish this task using a simple text comparison for each letter of the alphabet, but this method is a typing-intensive operation.  For example, let’s say you want to group together the records for customers whose last names falls in the A-F range.  Using the Conditional Split transformation, your A-F output expression would look something like the following:
 

SUBSTRING(UPPER(LastName), 1, 1) == "A"
|| SUBSTRING(UPPER(LastName), 1, 1) == "B"
|| SUBSTRING(UPPER(LastName), 1, 1) == "C"
|| SUBSTRING(UPPER(LastName), 1, 1) == "D"
|| SUBSTRING(UPPER(LastName), 1, 1) == "E"
|| SUBSTRING(UPPER(LastName), 1, 1) == "F"


Your other groups would contain a similar statement to explicitly define each letter to be included in the group.  Not a complex operation, but one that requires a lot of typing.

 

An Easier Way

An easier way to do this is to use the relatively obscure CODEPOINT() function.  This method, which is part of the SSIS expression language, returns the numerical Unicode decimal value of the leftmost character of the input string.  The above grouping would be rewritten as follows using the CODEPOINT() function:

CODEPOINT(UPPER(LastName)) >= 65
&& CODEPOINT(UPPER(LastName)) <= 70


The difference is, rather than enumerating each possible starting letter within the range, I’m now evaluating the Unicode value of the first character in my LastName text field, and only including those in the 65 to 70 range (A through F inclusive) in this output.  I’ve saved myself a little typing, and this approach is easier to maintain and troubleshoot in my opinion.  A sample conditional split with four groupings is shown below:


screen1

 

Take It Up A Notch

So you might ask, “That’s great, smart guy, but why go through this just to save myself maybe 5 minutes of typing?”.  I’m glad you asked!  Let’s take our example a little bit further and assume we’re breaking these groupings down into smaller units.  Consider the possibility that, rather than grouping last names together based on the first letter of the last name, we’ve got a sufficient number of outputs that we’re now splitting the records within that first letter; for example, if we were to split the data stream where the last name starts with an M, we might slice our outputs on those starting with MA to MI, then MJ to MR, and finally MS to MZ.  By using the direct comparison method described above, our fully configured conditional split could have up to 26^2 possible permutations, which means we’ve got to do 676 comparisons (assuming all uppercase alpha characters) within the conditional split transformation, which will likely impact your package performance, not to mention the immense amount of typing required to set this up.  Fortunately, some creative use of the CODEPOINT() function can simplify this ETL requirement.

For this example, let’s assume that we need to separate our records within the letter M into three distinct groups as mentioned earlier, since statistically there are a lot of last names beginning with M.  For each “M” output, I’m going to use an direct string comparison to verify that the first letter is an M (since we’re looking for a single match and not a range in the first character), and second, I’ll use CODEPOINT() in conjunction with the SUBSTRING() function to check that the second letter falls within the expected range for each output. 

So for our first M grouping, the MA to MI group, the following expression would be used:

SUBSTRING(UPPER(LastName), 1, 1) == "M" 
&& (
CODEPOINT(SUBSTRING(UPPER(LastName), 2, 1)) >= 65
&&
CODEPOINT(SUBSTRING(UPPER(LastName), 2, 1)) <= 73)


The code above will match records where the first letter is a literal M, and the second character is between A (Unicode 65) and I (Unicode 73) inclusive. Similarly, the MJ to MR expression reads as such:
 

SUBSTRING(UPPER(LastName), 1, 1) == "M" 
&& (
CODEPOINT(SUBSTRING(UPPER(LastName), 2, 1)) >= 74
&&
CODEPOINT(SUBSTRING(UPPER(LastName), 2, 1)) <= 82)


And finally, the MS to MZ expression:

SUBSTRING(UPPER(LastName), 1, 1) == "M" 
&& (
CODEPOINT(SUBSTRING(UPPER(LastName), 2, 1)) >= 83
&&
CODEPOINT(SUBSTRING(UPPER(LastName), 2, 1)) <= 90)

 

The partially configured conditional split transformation would look similar to the following:

screen2

So you can see that you’ve still got a small chunk of code to write (or copy/paste and modify) for each of your outputs, but it’s far less trouble – and better performing, no doubt – than enumerating all of the possible combinations of the first two letters of the LastName field.  The further you go into the string for your split (for example, breaking all the way down to split “McA” to “McF”, “McG” to “McN”, etc.), the more significant your efficiency in using this method over direct comparisons.

One caveat that bears mentioning: You’ll notice that I’ve used the UPPER() function generously in these examples.  The reason for this is twofold: First, a direct string comparison in the SSIS expression language is case sensitive; for example, “M” does not equal “m”.  Second, the same holds true for the Unicode decimal values returned by CODEPOINT().  Uppercase M, or Unicode value 77, does not equal lowercase m, or Unicode value 109.  Use of the UPPER() function helps to ensure that we’re making accurate comparisons regardless of case.


Conclusion

The CODEPOINT() is a rarely used function in the SSIS expression language, but can be an effective tool in your ETL arsenal in some cases.  For alphabetical grouping or splitting of records, it’s a very handy function that helps to reduce a lot of typing at design time.

More information about the CODEPOINT() function can be found at this page on MSDN.