Using Python Code in SSIS

A zoomed photo of an orchestra, with two violins playing in tandemSQL Server Integration Services (SSIS) is a mature, proven tool for ETL orchestration and data movement. In recent years, Python has exploded in popularity as a data movement and analysis tool. Surprisingly, though, there are no native hooks for Python in SSIS. In my experience using each of these tools independently, I’d love to see an extension of SSIS to naturally host Python integrations.

Fortunately, with a bit of creativity, it is possible to invoke Python logic in SSIS packages. In this post, I’ll walk you through the tasks to merge Python and SSIS together. If you want to follow along on your own, you can clone the repo I created for this project.

Python in SSIS: The Basics

To use Python in SSIS, you’ll need the following on the machine where the package(s) will run:

From within the SSIS package, you’ll only need two tasks: the execute process task (for invoking the Python executable and scripts), and optionally, the script task for customizing the logging from the Python script into the SSIS log tables.

In this example, I’ve created a couple of trivial Python scripts to demonstrate how both successful and failed scripts will behave. The first one below shows a simple script called do_stuff.py that will always run successfully.

A simple Python script that is designed to show an example of a successful execution of Python in SSIS

To show what happens when the script fails, here’s do_fail.py which has a divide-by-zero error coded in it.

A simple Python script that is designed to show an example of a failed execution of Python in SSIS

Add the Python Scripts to SSIS Package

To include either of these Python scripts in SSIS, I’ll add an execute process task to the SSIS package as shown below.

A simple SSIS package to show the Execute Process Task

There are only two settings that must be configured on the execute process task for this simple example to work:

  • The path to the Python executable
  • The path to the Python script that is to be run

The configuration page of the SSIS Execute Process Task, showing the settings to configure the Python executable and the Python script to be run

Note that, for simplicity, I have hard-coded both of these values; I’ll make those dynamic in an example shortly.

With this starter package configured, we can do a test run from within the Visual Studio environment. As expected, the run of do_stuff.py is successful.

A simple successful execution of Python in SSIS

A successful script behaves as expected, but what happens when the Python script fails? To test this, I changed the execute process task to run do_fail.py instead of do_stuff.py. As I’d hoped, a failed Python script also causes the failure of the SSIS package. However, there’s little useful troubleshooting information in the logs when a failure occurs.

A snippet of the SSIS execution log, showing very little debugging details in the event of a faillure

From these logs, we know the Python script execution failed but nothing more. This isn’t helpful for identifying the root cause, so we’d need more detail about what caused this failure. Let’s make a change to include more detail about failed and successful executions of Python code in SSIS.

Add Output and Error Messages

Most every command-line application has special variables for standard output and standard error for outputting informational and error messages during runtime. The Python language refers to these as stdout and stderr, respectively. The stdout variable includes all informational messages (including Python print() statements), and stderr contains any error messages. Adding these output messages to the SSIS execution log will make the packages easier to debug, giving us one single location to review when an error or unexpected behavior occurs.

The SSIS execute process task has handlers for both standard out and standard error. Below, I’ve created an SSIS variable for each of these, and configured the execute process task with these variable names. This will load the output and error messages into those SSIS variables at runtime.

A screenshot of the SSIS Execute Process Task with SSIS variables configured to receive informational and error messages from Python script execution

With those output and error messages stored in SSIS variables, we’ll now add a script task to load these messages to the SSIS execution log. I’ve added the script task in the example below. Note that I’m using the “On Completion” precedence constraint, since the logging script task needs to run regardless of whether the Python script succeeds or fails.

An SSIS package control flow, with a newly-added script task for logging Python output and error messages

Parse stdOut and stdErr

For this next part, we’ll need to write some code, but not in Python. To load the output and error messages to the SSIS log, we’ll use a bit of C# in the script task to parse the stdout and stderr data, and load each message to the SSIS log table. The code below shows how to process those message and write each to the SSIS log.

A block of C# code that parses the stdout and stderr variables, writing each message to the SSIS log.

I know it’s a bit clunky to write a block of C# code just to help run Python. Don’t worry – you’ll only have to do this once! Remember to add the v_StdOut and v_StdErr variables to the list of ReadOnlyVariables in the script task.

A screenshot of the SSIS script task config page after adding stdOut and stdErr variables

Now when we execute the package, we see the addition of more verbose Python output (the second highlighted section) in the logs.

Logging output window in SSIS, showing more verbose log detail

Compared to the first example, this design provides much better output logging for troubleshooting and auditing. These logging messages will appear in the Progress tab of the Visual Studio designer. If you are using the SSIS catalog for package storage and execution, they’re also written to the catalog.event_messages log table.

Reduce the Message Clutter

While this second test is better, there’s still some unnecessary output in the error logs. The more verbose logging adds some non-useful output, including several lines that are just generic separators native to the Python output. These won’t derail the process, but they’ll add to the clutter of the already-busy SSIS logs. Ideally we’d suppress those unnecessary messages.

If your environment allows for modifications to the Python scripts themselves, you can update the above to refine the output you’re getting. I’ve included a modified version of the Python failure script from above. As shown, I’m overriding the default exception behavior by assigning a function to sys.excepthook, replacing that default exception logic with my custom function named custom_excepthook. In that custom function, I’m using the exception details (class, description, etc.) to craft my own output error message to write to the log. This allows me more customization options, and helps to cut down on the default exception messages that don’t add any value.

A snippet of Python code showing the override of the default exception logic using sys.excepthook

When we run this updated Python script, we see a more concise list of error messages in the SSIS log. Note that we also see the stdout messages (tagged as “Information”) in this log as well.

A snippet of the SSIS execution log, showing more concise error output when overriding the default exception behavior in Python

The change here is subtle but immediately valuable for anyone who manages mountains of output log messages.

Make it Dynamic

The above examples show how to run static code from Python in SSIS, but ideally we wouldn’t need one package for every Python script. With this final revision, I’ll make this more enterprise-ready by converting those static values into runtime parameters.

In the SSIS project.params file (in the Solution Explorer window of Visual Studio), I’ve added two parameters. The first defines the common location of the python.exe executable. The second is the full file path to the Python script to be run.

A screenshot of the SSIS project parameters, showing new params for the Python.exe executable and for the specific Python script to be run

Next, I’ll modify the execute process task to use the parameter values instead of the static paths from before. In the Expressions tab of the Execute Process Task Editor, I’ve added expressions for the values previously hard-coded. Now when the package is invoked, it will used the runtime-specified parameter values rather than the static paths.

A screenshot of SSIS, showing the use of expressions to apply project parameters for the execute process task

You can now specify these values from T-SQL or the SSMS UI when running the package from the SSIS catalog. This makes this package a type of utility gateway between SSIS and Python. It could be invoked many times, each with a different Python script specified. In that way, SSIS could work as an orchestrator for a collection of Python scripts.

Wrapping Up

Although there is no Python task for SSIS, these examples demonstrate how to use the execute process task to invoke Python scripts in SSIS. We’ve also covered how to effectively capture concise error messages. Finally, we’ve shown how to use dynamic parameters to make the SSIS-to-Python package infintely reusable.

As a reminder, you can find the code from above at this repo: https://github.com/tdmitch/python-ssis/

About the Author

Tim Mitchell
Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

Be the first to comment on "Using Python Code in SSIS"

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.