Wrangling large or complex Excel workbooks in SSIS can be a challenge. From managing data types (more about that in this post by Koen Verbeeck) to addressing multiple worksheets in a single document, configuring SSIS to properly read from or write to Excel documents is tedious at best. While there are no silver bullets to completely solve these problems, I’ve found that – like many other challenges in the SSIS world – using Biml can help. Iterating through Excel worksheets with Biml can help automate some of this manual work.
In this post, I’ll demonstrate how you can use a few lines of BimlScript code to read through multiple worksheets in an Excel source.
For this scenario, I’m using a single Excel 2007 document containing numerous worksheets. All of the worksheets have a similar structure, each containing results from a different geographic area.
What I want to accomplish is to programmatically infer from the Excel document the name and structure from each worksheet, and create an SSIS data flow for each one. Doing so can save a great deal of effort, especially if there are many columns, many worksheets, or if the structure of each worksheet can differ from the others.
Iterating Through Excel Worksheets with Biml
The solution, using BimlScript, is designed as follows:
- Connect to the source Excel file and loop through the worksheets
- Create an SSIS data flow for each worksheet
- In each data flow, create an Excel source component to connect to the worksheet specific to that data flow
- In each data flow, create an OleDB destination component, and automatically map source columns from the Excel worksheet to the destination table
First, connect to the Biml file as shown below. Note that, if you haven’t already, you will need to install the Excel 2007 driver for this code to work.
This will create an array named worksheetCollection, which contains the names of each worksheet found in the specified Excel file. (Big thanks to Scott Currie of Varigence for sharing the syntax of this operation.)
Next up, I will iterate through this list of worksheets, building an SSIS data flow for each one. I’ll also point out that I could also create one package per workbook, as opposed to a single package with multiple data flows; however, for the sake of simplicity in demonstration, I’m using just one package with a separate data flow per Excel worksheet. As shown below, I’m creating a foreach loop in BimlScript, to loop through each worksheet name. In each iteration of the loop, I’ve created a data flow, with the same name as the worksheet.
The code above creates a connection to the Excel document, building the SELECT statement using the name of the worksheet as the table name. A couple of things to note here: First, this snippet uses an Excel connection named Accidents Source File, which I set up in a previous step (the code for which is provided in the download). Also, the source name uses the name of the worksheet with the dollar sign ($) removed through the Replace() function in the BimlScript.
Finally, I’ll add the destination connection. Since all of the workbooks in this sample spreadsheet will be loaded to the same table, there is no dynamic code in the destination component. You may notice that this snippet also does not use any column-level mappings. This is by design; since each of the worksheets may have some differences in metadata (an extra column, or missing columns), I’ve excluded any static column mappings, instead relying on automatic column name mapping in Biml. Also, like the source transformation, this destination refers to a source name that was set up previously.
After saving the Biml file and generating packages, the resulting SSIS package appears as below. By processing the Excel worksheets with Biml, each worksheet becomes a data flow.
Each data flow contains an Excel source connected to the common output table. Each of these Excel sources retrieves data from one worksheet, filtered by the SELECT query.
The complete Biml file used for this example can be found here.
Working with Excel data in SSIS can be tedious and time-consuming. However, Biml can help by automating some of the manual operations involved in Excel-to-SSIS mappings. By processing those Excel worksheets with Biml, you can streamline data wrangling work that was previously done by hand.
Thanks for this post, it’s a timesaver! 🙂
Thanks Martin! I’m glad you found it useful.
In your sentence, “From managing data types … to addressing multiple workbooks in a single document …” did you really mean to say “multiple workbooks in a single document” or should it be “multiple worksheets in a workbook”? Also, it would be helpful if you could include more information about Biml (or at least hyperlinks to a Biml tutorial) as there are many people who know SSIS but don’t know Biml but would like to learn more.
Brian – good catch. Yes, that was badly worded, and should have been worksheets, not workbooks. Thanks!
Related to the essentials of Biml, there are several good resources I would recommend. To get started, the Stairway to Biml on SQLServerCentral.com is a great launching pad: http://www.sqlservercentral.com/stairway/100550/ Also, there are how-to articles and videos, code samples, and other resources over at BimlScript.com.
“Wrangling large or complex Excel workbooks in SSIS can be a challenge.”
I agree. What you describe can be even more complicated. Input output from the same sheet to the database and a dynamic number of sheets. I know that this Community is about SQL Server etc but I will recommend you to look at a solution the way around. In Excel you have VBA which is very effective in complex situations like this. Knowledge in SQL and VBA is a strong combination (in a team) or in a single scull.
Gösta, thanks for the reply. I agree that there are several ways to solve this problem, but in my experience, Excel VBA is a good solution for ad-hoc data extraction but is not ideally suited for enterprise ETL which requires error handling, logging, and advanced transformation which can be very difficult to accomplish in VBA.
It depends on what you mean with enterprise ETL. I would not use VBA for advanced transformation etc. What I say is that the combination of Tools like SSIS for “Classic” ETL and VBA for Excel interaction is a powerfull combination (datawarehous) . You can do error handeling and logging in VBA.
Thank you Tim for your contribution.
My pleasure, Miguel. Thanks for the kind words.
Great post! Do you know of an easy way to extend this so it creates a new table in the database for each worksheet? I’m guessing it can be done.
Hi Tim! Yes, this could definitely be extended to include a CREATE TABLE statement, either to be executed live during the Biml execution or in a separate package created from within Biml. That sounds like a good topic to blog about 🙂
Hmm… the “EPPlus.dll” (for .xlsx files) and NPOI.dll (for .xls or .xlsx files) .Net libraries can help with this, too. It works directly with Excel files w/o Excel, which should make server admins happy (no installing Excel, Jet or ACEDB engines on servers).
I’ve got worksheet names which contain spaces. For some reason they are filtered out with the .Where(i => i.EndsWith(“$”)). Seems quite strange as EndsWith should not be sensitive to spaces. If I remove the where lambda I get all the worksheets but also other tables. As you would expect.
I’m using VS2017, BimlExpress 2019 build 5.0.61408.0 and Microsoft.ACE.OLEDB.15.0