SSIS in the Cloud

SSIS in the CloudIf the cloud is the next frontier in data storage and analysis, then ETL is the Chisholm Trail. Although some have opined that the cloud would bring about the end of ETL, the need for reliable data movement and transformation remains critical in a cloud-based architecture. However, for traditional on-premises ETL tools such as SQL Server Integration Services, this path can be foggy. Using SSIS in the cloud requires some planning, but it can be done.

In this series of posts, we will take a journey in which we explore the ways that using SSIS in the cloud can help get data into and out of cloud-based storage.

The Challenge

“The cloud” isn’t just one thing. Cloud architectures involve storing data and/or processing logic on off-site servers using one of several architectures: SaaS, IaaS, PaaS, etc. For example, SQL Server is available through IaaS by installing SQL Server on a virtual machine, or through PaaS by using either SQL Database or SQL Data Warehouse. Similarly, services such as SSAS can be run in a VM or through Azure Analysis Services.

The challenge with Integration Services is that it only runs in the installed version of the database engine. Currently there is no PaaS version of SSIS. To use SSIS in the cloud requires an installation of the database engine and SSIS either in a VM or an on-premises server.

SSIS in the Cloud

In this series of will review the options for using SSIS in the cloud. Among the scenarios that will be covered:

We’ll also discuss and demonstrate tools including the Azure Feature Pack for Integration Services. In addition, we will touch on some alternatives to SSIS for interaction with cloud data.

Recommended Reading

In addition to the posts in this series, I recommend reading the following publications from Microsoft.

Running SSIS on Azure VM (IaaS) – A terse but useful list of potential use cases for SSIS running in a VM

SSIS for Azure and Hybrid Data Movement – An older but still applicable paper on hybrid load scenarios, including coverage on sharding and error handling.