Iterating Through Excel Worksheets with Biml

bimlWrangling large or complex Excel workbooks in SSIS can be a challenge. From managing data types (more about that in this post by Koen Verbeeck) to addressing multiple worksheets in a single document, configuring SSIS to properly read from or write to Excel documents is tedious at best. While there are no silver bullets to completely solve these problems, I’ve found that – like many other challenges in the SSIS world – using Biml can help eliminate some of the manual work involved.

In this post, I’ll demonstrate how you can use a few lines of BimlScript code to read through multiple worksheets in an Excel source.

The Challenge

For this scenario, I’m using a single Excel 2007 document containing numerous worksheets. All of the worksheets have a similar structure, each containing results from a different geographic area.

SNAGHTMLb27785d

What I want to accomplish is to programmatically infer from the Excel document the name and structure from each worksheet, and create an SSIS data flow for each one. Doing so can save a great deal of effort, especially if there are many columns, many worksheets, or if the structure of each worksheet can differ from the others.

The Biml Solution

The solution, using BimlScript, is designed as follows:

  • Connect to the source Excel file and loop through the worksheets
  • Create an SSIS data flow for each worksheet
  • In each data flow, create an Excel source component to connect to the worksheet specific to that data flow
  • In each data flow, create an OleDB destination component, and automatically map source columns from the Excel worksheet to the destination table

First, connect to the Biml file as shown below. Note that, if you haven’t already, you will need to install the Excel 2007 driver for this code to work.

  <#@ import namespace=”System.Data” #>
<#@ import namespace=”System.Data.OleDb” #>
<#
var connectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\\AccidentData.xlsx;Extended Properties=\”EXCEL 12.0 XML;HDR=YES;IMEX=1\”;”;
var connection = new OleDbConnection(connectionString);
connection.Open();
var worksheetCollection = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null).Rows.OfType<DataRow>().Select(i => i[“TABLE_NAME”].ToString()).Where(i => i.EndsWith(“$”));
#>

This will create an array named worksheetCollection, which contains the names of each worksheet found in the specified Excel file. (Big thanks to Scott Currie of Varigence for sharing the syntax of this operation.)

Next up, I will iterate through this list of worksheets, building an SSIS data flow for each one. I’ll also point out that I could also create one package per workbook, as opposed to a single package with multiple data flows; however, for the sake of simplicity in demonstration, I’m using just one package with a separate data flow per Excel worksheet. As shown below, I’m creating a foreach loop in BimlScript, to loop through each worksheet name. In each iteration of the loop, I’ve created a data flow, with the same name as the worksheet.

<# foreach (var worksheet in worksheetCollection) {#>
<Dataflow Name=”<#= worksheet.Replace(“$”,  “”) #>” DelayValidation=”true”>
</Dataflow>
<# } #>

So far this data flow task does nothing. Let’s add the Excel source connection.

<# foreach (var worksheet in worksheetCollection) {#>
<Dataflow Name=”<#= worksheet.Replace(“$”,  “”) #>” DelayValidation=”true”>
<Transformations>
<ExcelSource ConnectionName=”Accidents Source File” Name=”XLS Accidents – <#= worksheet.Replace(“$”,  “”) #>”>
<DirectInput>SELECT * FROM `<#= worksheet #>`</DirectInput>
</ExcelSource>
</Transformations>
        </Dataflow>
<# } #>

The code above creates a connection to the Excel document, building the SELECT statement using the name of the worksheet as the table name. A couple of things to note here: First, this snippet uses an Excel connection named Accidents Source File, which I set up in a previous step (the code for which is provided in the download). Also, the source name uses the name of the worksheet with the dollar sign ($) removed through the Replace() function in the BimlScript.

Finally, I’ll add the destination connection. Since all of the workbooks in this sample spreadsheet will be loaded to the same table, there is no dynamic code in the destination component. You may notice that this snippet also does not use any column-level mappings. This is by design; since each of the worksheets may have some differences in metadata (an extra column, or missing columns), I’ve excluded any static column mappings, instead relying on automatic column name mapping in Biml. Also, like the source transformation, this destination refers to a source name that was set up previously.

<# foreach (var worksheet in worksheetCollection) {#>
<Dataflow Name=”<#= worksheet.Replace(“$”,  “”) #>” DelayValidation=”true”>
<Transformations>
<ExcelSource ConnectionName=”Accidents Source File” Name=”XLS Accidents – <#= worksheet.Replace(“$”,  “”) #>”>
<DirectInput>SELECT * FROM `<#= worksheet #>`</DirectInput>
</ExcelSource>
<OleDbDestination Name=”OLEDST Accidents Table” ConnectionName=”AccidentData DB”>
<ExternalTableOutput Table=”[dbo].[AccidentData]”></ExternalTableOutput>
</OleDbDestination>
          </Transformations>
</Dataflow>
<# } #>

After saving the Biml file and generating packages, the resulting SSIS package appears as below.

image

Each data flow contains an Excel source connected to the common output table. Each of these Excel sources retrieves data from one worksheet, filtered by the SELECT query.

SNAGHTMLbae36df

The complete Biml file used for this example can be found here.

Conclusion

Working with Excel data in SSIS can be tedious and time-consuming. However, Biml can help by automating some of the manual operations involved in Excel-to-SSIS mappings.

Join me for the 2013 Biml Workshop

Of the many emerging technologies I’m currently using, there’s nothing that excites me more than Biml.  If you’re not familiar with it, Biml (short for Business Intelligence Markup Language) is a language that allows business intelligence developers to programmatically create and document SSIS packages.  I’ve found this to be incredibly useful in environments that have a large number of packages, multiple steps of staging of data, or frequent changes to the ETL pipeline.  Biml helps to eliminate through automation much of the mundane work of package creation and maintenance.

bimlws If you are interested in learning more about Biml, I’d like to invite you to join Andy Leonard, Scott Currie, and me on Tuesday, October 15th in Charlotte, NC for a full day workshop to discuss the awesomeness that is Biml.  Scott is the CEO at Varigence, the company that developed Biml, and Andy was an early adopter and pioneer of Biml.  In this workshop, we’ll review the Biml syntax and basic usage, as well as demonstrate some Biml design patterns we’ve picked up along the way.

The cost for the workshop is $69, which includes breakfast, lunch, and snacks.  For the next 2 days (until tomorrow at midnight EDT), you can still get early bird pricing of $49.

So if you’re in town for the SQL PASS Summit and aren’t planning on attending a Tuesday precon, I’d encourage you to consider joining us for this workshop.  We hope to see you there!