SSIS: Conditional File Processing in a ForEach Loop

I’ve fielded a number of requests recently asking how to interrogate a file within SSIS and change the processing rules based on the metadata of said file.  A recent forum poster specifically asked about using the foreach loop to iterate through the files in a directory and, on a per-file basis, either process the file or skip the file if it was updated after a specific date.  I’ll use that most recent request to illustrate one method to solve this problem.

Ingredients

For this demonstration, our SSIS package will require the following:

  • A foreach loop to process each file in a given directory
  • A script task to interrogate each file and, based on the timestamp, mark it to be either processed or skipped.
  • Four SSIS variables:
    • @SourceDirectory (String) – stores the directory to loop through
    • @MinDateStamp (DateTime) – indicates the earliest date to process
    • @Filename (String) – stores the current filename for each cycle of the foreach loop
    • @ProcessFile (Boolean) – a flag to indicate whether the current file should be processed
  • A precedence constraint which will be configured to validate an expression and task outcome.
  • A data flow task to process the validated files.


Set Up the Loop

Nothing groundbreaking here: after adding the foreach loop to the control flow pane, set it to work as a Foreach File Enumerator, and use an expression to set the source directory to be derived from the value of the @SourceDirectory variable:

img2

 

Script Task

Since there is no native SSIS task designed to interrogate file metadata, we’re going to need to use a script task to do this.  After dropping a script task from the toolbox into the foreach loop container, we’ll edit the script to create a FileInfo object as a logical hook to the file.  After confirming that the file exists, we’ll compare its LastWriteTime property to the earliest acceptable cut-off date (defined by the @MinDateStamp variable value) to determine if the timestamp meets the criteria for processing.  Based on the results of that comparison, we will set the @ProcessFile value to either True or False.  You can see the resulting code logic in the snippet below:

 

public void Main()
        {
            // Create a logical file object 
            System.IO.FileInfo theFile = new System.IO.FileInfo(Dts.Variables["Filename"].Value.ToString());

            // If the update date on the file is greater than the date specified in the MinDateStamp
            //  variable, set the variable flag to process the file.    
            if (theFile.Exists 
                && theFile.LastWriteTime > DateTime.Parse(Dts.Variables["MinDateStamp"].Value.ToString()))
            {
                // MessageBox.Show("Processing file " + Dts.Variables["Filename"].Value.ToString());
                Dts.Variables["ProcessFile"].Value = true;
            }
            else
            {
                // MessageBox.Show("Skipping file " + Dts.Variables["Filename"].Value.ToString());
                Dts.Variables["ProcessFile"].Value = false;
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

 

So with each iteration of the foreach loop, the @ProcessFile value will indicate whether the current file should be processed or skipped.  After adding a data flow task containing the necessary components to process the flat file, our next step would be to add a precedence constraint connecting the script task to the new data flow task.  This precedence constraint will be configured to use an expression and a constraint, and will confirm that the current file is to be processed by interrogating the value of the @ProcessFile variable.  If that value is true, then program flow continues to the data flow task; otherwise the loop starts again with the next file in turn.  The precedence constraint would be configured as such:

img3

 

After configuring all of the necessary tasks for this operation, the data flow pane should look similar to the following:

img1

Now, when the SSIS package is executed, the timestamp of each file in the specified directory will be checked, and only those that meet the date criteria will be processed in the data flow task.  Note that you could replace the timestamp in our example to some other file criteria; for example, you could check the file size, type, attributes, or other settings to determine if the file should be processed.

 

Conclusion

Although SSIS does not include a native component to conditionally process files, you can see from this example that a simple script can easily solve this ETL challenge.  You can download the sample package used in this example here.

SQL Saturday 28 Baton Rouge – Recap

Last weekend I went to Baton Rouge (by way of New Orleans) to attend and speak at the second annual SQL Saturday event there.  This event was my third of four scheduled SQL Saturday of the summer.

DSC06583 I flew out of Love Field in Dallas with fellow Dallas SQL Server pros Sean McCown and Tim Costello, and we met up in New Orleans with Trevor Barkhouse.  We rented a car and joined Steve Jones for lunch at Primo’s in the French Quarter of New Orleans.  After our late lunch, we made the 90 minute trek over to Baton Rouge, checked in to the hotel, and arrived fashionably late at the speaker/volunteer party.  I ate dinner with fellow Kevin Boles and a few of the sponsors, and got to spend some time with my friend Patrick LeBlanc (who organized this event) as well as Chris Bahnsen from the Pensacola SQL Server user group and Bryan Smith from Microsoft.

Patrick asked me to present one of the early bird sessions, so I volunteered to deliver a 7:30am session entitled “Building Your First SSIS Package”.  For this trip I shared a car with three other people, and unfortunately I underestimated the effort required to coordinate getting four people on the road by 7:00am.  As a result, we arrived at LSU at exactly 7:30 and my session started about 5 minutes late.  We got moving quickly and had about 25 people in attendance, most of whom got engaged in the Q&A at the end.  I was a little surprised that one of the volunteers handed me the session evaluations at the end; all of the other SQL Saturday events I’ve spoken at have collected these and sent me a summary of the feedback later. 

My second session was centered around using expressions and configurations in SSIS.  I always enjoy this presentation because I’ve encountered a lot of SSIS folks who avoid SSIS expressions, and and even more that don’t use SSIS configurations.  As a result, there are usually a few “A-ha!” moments during the demos, and this one was no exception.  Another good group, and lots of useful discussion during and after the presentation.

DSC06587 This event featured an Ask the Experts table, and I was asked to be a panelist during one of the afternoon sessions.  We didn’t have a lot of participation from attendees, largely because this was offered during the sessions when most attendees were already settled somewhere.  I think this was a good idea, but I’d like to see this offered as an all-day track and published on the event guide.  I’m considering this as an addition to our next big SQL Saturday event in the spring.

The attendee party was fun, even though there weren’t a lot of people in attendance.  Several of us broke out into a discussion around user groups, PASS, and the future of SQL Saturday.  Good times.

Patrick and crew did a great job organizing this year’s event.  I’m looking forward to coming back again next year.

It’s the Process

Unless you’ve spent the last two days hiding under a rock, you’ve heard the uproar around this week’s development in the PASS board election.  The Nominating Committee (NomCom) published the list of candidates who will be on the ballot for this year’s three open board positions.  Five candidates made the cut; two did not.  The buzz – mostly negative – wasn’t around who made the final list, but who didn’t.

Jack Corbett

Let me first give props to candidate Jack Corbett, one of the two candidates who were eliminated with yesterday’s announcement.  I’ve known Jack for a couple of years now, and consider him to be a friend so you can take my commentary with a grain of salt.  I’ve spent a good deal of time with Jack (including rooming with him during last year’s PASS Summit), and I have found him to be a wise person with strong morals who is not afraid to speak up when he sees things out of place.  Jack runs a large PASS chapter, is a frequent speaker and volunteer for PASS and non-PASS events, and has a passion for the SQL Server community.  He is highly motivated and qualified, and I believe Jack would make an excellent PASS board member.  I say that as a reminder to all that, in the chaos surrounding the other excluded candidate (more on that momentarily), let’s not forget that another good man was eliminated yesterday.  I would have liked to had the opportunity to vote for Jack; I hope that opportunity comes up again in the future.

Steve Jones

The elimination of Steve Jones sparked a firestorm around those of us who are, shall we say, vocal in the community.  Based on the rejection letter he received from the NomCom (he has shared it here), it seems that Steve was rejected due to his inexperience engaging with voluneers and PASS committees.  I can’t speak to Steve’s involvement – or lack thereof, if that’s the case – with PASS committees, so I won’t argue that point.  Perhaps he wasn’t qualified according to the rules given to the NomCom for candidate selection.  Based on comments from Stuart Ainsworth (read his blog post on the subject here) and Tom LaRock, the NomCom feels that they followed their commission to the letter.  Maybe they did – again, there’s a bit of secrecy surrounding the proceedings, so unfortunately we may never know how or why exactly this happened.  Still, it’s hard to imagine that a man of Steve’s accomplishment – cofounder of the largest SQL Server community in the world, prolific speaker, frequent volunteer, and industry visionary – wouldn’t be qualified to present to the PASS membership as a candidate for a board position.  Steve is a friend so I confess some personal bias, but for a reasonable person, an evaluation of the facts just doesn’t add up.

NomCom: “It’s The Process”

I applaud Stuart and Tom for standing up and defending their position.  As members of the NomCom, they could have ducked the questions or simply told inquisitive members of the community to take a hike.  They, and most especially Stuart, answered many questions from the community and did not try to assign blame or back away from their decision.  Still, their default position seemed to be “It’s the process”.  This is how things are, so who are they to change things, right?

So if this really is how things are done, at what point do we ask, “Does this process still work for us?”  For those who were engaged in last year’s election, you’ll remember the controversy when at least one strongly qualified candidate was rejected while another was included with little to no knowledge of the purpose or goals of PASS.  It’s hard to describe the last two PASS elections without using the word cluster.  Doesn’t anybody stop to think that maybe, just maybe, these pants don’t fit anymore?  I’d have hoped that last years debacle would have left a lasting impression, but sadly those lessons appear have been forgotten.  When common sense frequently gets overruled by The Process, it’s time for change.

Power to the People

So here’s my suggestion:  For next year, have the NomCom continue to exist, but take a far more conservative approach in elimination of candidates.  Do the candidates have a history of service and experience within PASS?  Do they have a sense of passion, and some level of leadership experience?  If so, put them on the ballot and let the community make the decision.  Remember, the NomCom doesn’t decide who will be on the board; they choose who may be elected to the board by PASS members.  This organization should be focused first on the constituency.  Let’s get back to that place.

For this year, however, we still have a problem.  This will not die; if nothing is done, the election will be the dominant hallway topic at the PASS Summit in November.  So, does PASS admit the flaw in The Process and invite all 7 (or even the original 9) to be on the ballot?  Or do they allow for a write-in vote as others suggest?  In my opinion, the former would go a long way toward smoothing things over with the community.  The latter is a half-measure and would be an insult to the excluded candidates without an acknowledgement that The Process is broken.  These aren’t the only two options, of course, but anything is better than hiding behind The Process.

Dallas-area NTSSUG Happenings

First, apologies to those of you outside the Dallas-Ft. Worth metro area… I’ve backed off from using my blog as a mouthpiece for local announcements, but we’ve got a couple of things going on that need some additional publicity.

Mailing List Changes

Those who are engaged with our e-mail distribution list know that we’ve had some issues with our listserv for several months.  Spam has gotten out of control, and messages were delayed by several hours (or simply never arrived) on more than one occasion.  Last week, it was brought to our attention that a number of people on the list had stopped receiving messages completely.  We contacted our host to report this most recent occurrence, and their staff responded with news we weren’t quite ready for: they were going out of business, and we were given 30 days notice to move to another provider.  Fortunately, my fellow board members and I had already been discussing a new group list provider anyway, and had begun a pilot program to test Google Groups as an alternative.  We were able to get the word out to group members, and sent out invites from the new list.  We also created a second group just for announcement, to separate the general group discussions from administrative announcement and group reminders.  If you’d like to sign up for one or both of the groups, visit our website for instructions.

Leadership

We have a problem – our group has grown!  A great problem to have, to be sure, but it also means that there’s more work to go around.  To accommodate our growth, the leadership has voted to add another position to the board.  We are accepting submissions for candidacy until noon on Thursday, and all interested candidates are invited to briefly address the group during the monthly meeting on Thursday.  Members are invited to vote starting Friday, and voting will continue for a week until the following Thursday at midnight.  We will announce the new board member on Friday, August 27th.  If you’re active with the NTSSUG group and are interested in being part of the leadership, consider joining the board!  You can find full details here.

… And a special guest for January

Yeah, I know it’s quite a while away, but put the January NTSSUG meeting on your calendar.  We’ve got a special guest coming in who I’m sure you’ll enjoy meeting.  Don’t miss it!

SQL Saturday 28 Baton Rouge and Ask the Experts

The second annual SQL Saturday Baton Rouge event is coming up in less than two weeks!  Have you registered?

I’m honored to be speaking at this event again this year.  I’ll be presenting an introductory SSIS class (one of the 7:30 early bird sessions).  Later in the morning I’ll be delivering a talk on expressions and configurations in SSIS, one of my favorite topics.

This event will feature a new track that has not been tried at a SQL Saturday event yet, at least as far as I know.  Patrick LeBlanc, the event organizer (and recent MVP awardee!) added a series of sessions entitled “Ask the Experts”, in which attendees can sit down with experts in a particular discipline and discuss just about anything.  Among the experts participating are Steve Jones, Denny Cherry, Kevin Boles, Bryan Smith from Microsoft, and others.  I will be cohosting the SSIS/ETL/Data Warehouse hour with Steve Simon, so please stop by if you want to talk about anything BI related, or if you just want to say hello.

What Are You Afraid to Ask?

There are some questions that can’t necessarily be answered in a book or a training video.  Those are often the questions revolving around “why?” rather than “how?”.  A prime example from my own experience is my lack of understanding of the criticality of transaction isolation levels.  As a very green SQL Server professional (and in this case, I define professional as the one who knew where to find the SQL Server 2000 disks), I had learned enough about transactions and the different isolation levels that I understood, at a high level, what these terms meant.  However, at the time I worked in a small IT shop where there was only one SQL Server installation, and the number of users on that database system was quite small.  Through the myopia of inexperience, I couldn’t understand why there was so much focus on dirty reads, phantom reads, concurrency, or even transactional rollbacks.  Envisioning only my current environment, it was difficult to conceptualize that more than one of my 20 or so users could be accessing the same record(s) at the same time.  So while I understood the basics of “how”, the “why” was elusive to me.

Talking with others who work in IT, I’m constantly reminded that there is a community around SQL Server that seems to be unmatched in any other sector.  I see everyday where SQL Server professionals go out of their way to help out others with no expectation of repayment.  Still, as altruistic as this community often is, I think there’s still a barrier for some of those who are new to a role in supporting SQL Server.  Those of us who have been around for a while tend to speak in language that is highly specific to our profession, and we sometimes assume that all of our fellow professionals possess a certain baseline of knowledge regardless of their experience (or lack thereof).  This can have the unintended effect of creating a barrier for the less experienced, who have a desire to learn but don’t want to appear completely clueless in front of more experienced colleagues, and as a result there are a lot of questions that go unasked.

Again, these are often questions of a nontechnical nature – after all, an inquisitive newbie can do a quick web search and find a blog, article, or even a video to describe how to accomplish a specific technical task.  The more difficult – and fundamental – inquiries are those that ask, “Why is this important?”  I think we need to have more open opportunities to have these types of questions asked and answered.

To spur on the conversation, I’m going to post a series of blogs that will chronicle some concepts I was afraid to admit that I didn’t quite get, along with the truths I’ve learned since.  Starting next week, you’ll see blogs tagged as “Afraid to Ask”, and I encourage those of you who are new to a role in SQL Server (or perhaps you’re new to a particular discipline within the product) to inject your own quandaries either as comments on this post or in a private message to me.  I’ll do my best to address all of the feedback in future blog posts in this series.