If you’ve used SSIS for any significant amount of time, you’ve almost certainly run into this roadblock: You need to download a file stored on an SFTP (secure FTP) server, but connecting to SFTP with SSIS is not natively supported. There is a built-in FTP task in Integration Services, but it is very limited in its functionality and will not connect to an SFTP server. However, it is possible to build your own solution using free tools and a bit of code.
In this post, I’ll show how to access SFTP with SSIS by building a reusable package to download secure FTP files.
Before I dive into my solution, I’ll point out that there are several ways to solve the problem of downloading files stored on SFTP with SSIS:
- Acquire a third-party SSIS task suited for this purpose (of which there are several free and paid versions)
- Develop a purely code-based solution using a third-party DLL
- Develop a code-based solution that calls a command-line SFTP application
I’ve used each of these solutions in the past, and each one has its own merits and risks. For example, using a third-party SSIS task provides a more streamlined approach (often requiring no manual coding at all), but often – especially in larger organizations – installing a third-party component requires so much extra testing and administrative approval that it makes it more trouble than its worth. For coding solutions using external DLLs, often the same issues exist because the external DLLs have to be registered on each server, which is an easy enough process but surprisingly difficult to get approved in many cases. In my work as a consultant, I’ve found that the easiest solution – when considering not just the technical components but the administrative work – is to use a command-line approach leveraging a mature and well-documented SFTP client. Therefore, my go-to solution for consuming SFTP with SSIS is the last one on the list above, which is the solution I describe herein.
SFTP With SSIS: The Tools
To develop and test this solution, I use just a couple of tools:
WinSCP. This is a mature, well-documented, and free (GPL) SFTP client. It has both a UI and a command-line component, though I use the latter exclusively for this solution. The portable executables (near the top of the downloads page) are the easiest to use, as you simply drop the two executables into a folder and start using them without going through an installation process. There are, of course, other free command-line SFTP tools available, but I’ve found none better than WinSCP in terms of reliability, ease of use, and documentation. For more information about WinSCP, check out the documentation pages, which are quite comprehensive and can give you a better understanding of how the command-line WinSCP tool works and its myriad configuration options. In particular, this article was very helpful for some of the syntax questions I had when developing this solution.
Core FTP mini-SFTP server. Although this is not a part of the deployed solution itself, this provides a very simple and easy-to-configure SFTP server for testing purposes.
Also, in this example I’m using SSIS 2014 to demonstrate the solution, but this design pattern for accessing SFTP with SSIS is portable across all versions of Integration Services.
Setting Up the Environment
For the initial setup, download WinSCP and the Core FTP mini-SFTP server executables listed above. To keep things simple, I recommend creating a utility directory to store these executables. As noted, WinSCP comes with two executables – WinSCP.exe and WinSCP.com – but this solution requires only the latter of these for command-line operations.
Next, start up the mini-SFTP server (msftpsrvr.exe). The UI is quite simple and self-explanatory, only requiring a user name, password, port number (which defaults to 22), and the root directory.
For testing, add a few files to the root directory specified above. Be sure to also include a few files that will not be downloaded as part of the SFTP operation for more comprehensive testing.
Downloading From SFTP with SSIS
With the testing environment set up, it’s time to build a solution. In a new SSIS project, create a new package (I’ve named mine DownloadSFTP.dtsx). Navigate to the Parameters tab, where we’ll create a handful of runtime values that will make the DownloadSFTP package more reusable.
As shown above, all of the values that might change over time – the server name, user ID and password, download directory, etc. – are parameterized. This accomplishes two things. First, when those values do change (and they will), there is a single point of administration rather than a bunch of hard-coded values in the package. Even better, by parameterizing these values, you can build a package that is process-agnostic, and have just one package that can be used by multiple ETL processes.
For those using older versions of SSIS (prior to 2012), you won’t be able to use parameters. However, with a little extra work you can accomplish the same thing I’ve described here by using package configurations.
I’ll call out a couple of these values that may not be obvious:
- pFilename: This is the file name to download from the server. Note that we can also use wildcards (assuming they are supported by the target server) – in the example above, we’ll be downloading all files ending in “.TXT”.
- pServerHostKey: This is used to satisfy a security mechanism built into the WinSCP process. By default, WinSCP will prompt the user to verify and add to local cache the host key when connecting to an SFTP server for the first time. Because this will be done in an automated, non-interactive process, getting that prompt would cause an error in our script. To prevent this, the script is built to supply the server host key to avoid the error, and also has the added benefit of ensuring we’re actually connecting to the correct SFTP server. This brief article on the WinSCP documentation site describes how to retrieve the server host key for the target server.
- pServerUserPassword: This is marked as sensitive to mask the password. As part of the script logic, this password will be decrypted before it is sent to the server.
With the parameters established, we can jump into the code portion of the solution. Create a new script task in the control flow, and add all 7 of the parameters shown above to the list of ReadOnlyVariables.
Next, we’ll add the code to set up the Process object that will ultimately trigger the execution of WinSCP.com to perform the download. As shown below, using the Main() function (which is created automatically in a new script task), we’re creating the Process object and configuring a few of the runtime options, including the name of the executable and the download directory.
The next step is to create the input strings that will make the connection and download the file. At the bottom of this snippet, there are 3 variables that will capture output messages, error messages, and the return value, all of which will be used to log runtime information.
With all of the options configured, it’s time to invoke WinSCP.com. The try/catch block below will attempt to connect and download the specified file from the server.
Technically, that’s all we have to do. However, I always like to build in enough information for logging that I know when and why a problem occurred. The final chunk of code below checks for error conditions – either a nonzero return code or a nonblank errStr value – and will log the message and fail the script task (thus failing the package) on error. If there is no error, we’ll simply use the Dts.Events.FireInformation() method to log the output from the execution of WinSCP.com.
That’s it! The package is ready to be executed. Assuming everything is configured properly, running the package on my system should download exactly two text files (remember, we used the wildcard “*.txt” to get all text files). And upon review, I find that’s exactly what happens when the package runs.
To briefly test out the error logging facilities when using this package to connect to SFTP with SSIS, I’m going to update the package parameter for User Name to an invalid value. Changing the user name to “userx” results in a package failure, exactly as we’ve designed it. When I review the log information, I can see that the log captures the fact that WinSCP.com failed, which should help determine that it is an authentication-related issue.
Usage in the Real World
This pattern takes a little effort to implement. Even if you copy verbatim the code I’ve used in this example, you’ll probably make some modifications, add behavior, etc. That takes time. However, this pattern has a significant advantage in that you could build this one time, and use it for all of your SFTP download operations. It’s built with reusability in mind, and with minor tweaks to suit your environment, you could use this package as a solution to all of your SFTP downloads, deploying it once (or at most, once per SSIS server) and continuing to reuse it.
Although accessing files stored on SFTP with SSIS is challenging, you can roll your own solution by using free tools and a little bit of script. In a future post, I’ll extend this pattern to demonstrate how to go the other direction and write files back to an SFTP server.
You can download the code from this solution here.