Wrangling large or complex Excel workbooks in SSIS can be a challenge. From managing data types (more about that in this post by Koen Verbeeck) to addressing multiple worksheets in a single document, configuring SSIS to properly read from or write to Excel documents is tedious at best. While there are no silver bullets to completely solve these problems, I’ve found that – like many other challenges in the SSIS world – using Biml can help. Iterating through Excel worksheets with Biml can help automate some of this manual work.
In this post, I’ll demonstrate how you can use a few lines of BimlScript code to read through multiple worksheets in an Excel source.
For this scenario, I’m using a single Excel 2007 document containing numerous worksheets. All of the worksheets have a similar structure, each containing results from a different geographic area.
What I want to accomplish is to programmatically infer from the Excel document the name and structure from each worksheet, and create an SSIS data flow for each one. Doing so can save a great deal of effort, especially if there are many columns, many worksheets, or if the structure of each worksheet can differ from the others.
Iterating Through Excel Worksheets with Biml
The solution, using BimlScript, is designed as follows:
- Connect to the source Excel file and loop through the worksheets
- Create an SSIS data flow for each worksheet
- In each data flow, create an Excel source component to connect to the worksheet specific to that data flow
- In each data flow, create an OleDB destination component, and automatically map source columns from the Excel worksheet to the destination table
First, connect to the Biml file as shown below. Note that, if you haven’t already, you will need to install the Excel 2007 driver for this code to work.
This will create an array named worksheetCollection, which contains the names of each worksheet found in the specified Excel file. (Big thanks to Scott Currie of Varigence for sharing the syntax of this operation.)
Next up, I will iterate through this list of worksheets, building an SSIS data flow for each one. I’ll also point out that I could also create one package per workbook, as opposed to a single package with multiple data flows; however, for the sake of simplicity in demonstration, I’m using just one package with a separate data flow per Excel worksheet. As shown below, I’m creating a foreach loop in BimlScript, to loop through each worksheet name. In each iteration of the loop, I’ve created a data flow, with the same name as the worksheet.
The code above creates a connection to the Excel document, building the SELECT statement using the name of the worksheet as the table name. A couple of things to note here: First, this snippet uses an Excel connection named Accidents Source File, which I set up in a previous step (the code for which is provided in the download). Also, the source name uses the name of the worksheet with the dollar sign ($) removed through the Replace() function in the BimlScript.
Finally, I’ll add the destination connection. Since all of the workbooks in this sample spreadsheet will be loaded to the same table, there is no dynamic code in the destination component. You may notice that this snippet also does not use any column-level mappings. This is by design; since each of the worksheets may have some differences in metadata (an extra column, or missing columns), I’ve excluded any static column mappings, instead relying on automatic column name mapping in Biml. Also, like the source transformation, this destination refers to a source name that was set up previously.
After saving the Biml file and generating packages, the resulting SSIS package appears as below. By processing the Excel worksheets with Biml, each worksheet becomes a data flow.
Each data flow contains an Excel source connected to the common output table. Each of these Excel sources retrieves data from one worksheet, filtered by the SELECT query.
The complete Biml file used for this example can be found here.
Working with Excel data in SSIS can be tedious and time-consuming. However, Biml can help by automating some of the manual operations involved in Excel-to-SSIS mappings. By processing those Excel worksheets with Biml, you can streamline data wrangling work that was previously done by hand.