If the cloud is the next frontier in data storage and analysis, then ETL is the Chisholm Trail. Although some have opined that the cloud would bring about the end of ETL, the need for reliable data movement and transformation remains critical in a cloud-based architecture. However, for traditional on-premises ETL tools such as SQL Server Integration Services, this path can be foggy. Using SSIS in the cloud requires some planning, but it can be done.
SSIS in the Cloud
In this series of posts, we will take a journey in which we explore the ways that using SSIS in the cloud can help get data into and out of cloud-based storage:
- Running an on-premises SSIS server which communicates with Azure databases
- Using an Azure VM to run SSIS to communicate between cloud sources and destinations
- Using the Azure-specific tasks and components available in SSIS
We’ll also discuss and demonstrate tools including the Azure Feature Pack for Integration Services. In addition, we will touch on some alternatives to SSIS for interaction with cloud data.
“The cloud” isn’t just one thing. Cloud architectures involve storing data and/or processing logic on off-site servers using one of several architectures: SaaS, IaaS, PaaS, etc. For example, SQL Server is available through IaaS by installing SQL Server on a virtual machine, or through PaaS by using either SQL Database or SQL Data Warehouse. Similarly, services such as SSAS can be run in a VM or through Azure Analysis Services.
The challenge with Integration Services is that it only runs in the installed version of the database engine. Currently there is no PaaS version of SSIS. To use SSIS in the cloud requires an installation of the database engine and SSIS either in a VM or an on-premises server.
In addition to the posts in this series, I recommend reading the following publications from Microsoft.
Running SSIS on Azure VM (IaaS) – A useful list of potential use cases for SSIS running in a VM
SSIS for Azure and Hybrid Data Movement – An older but still applicable paper on hybrid load scenarios, including coverage on sharding and error handling.
Azure Data Factory and SSIS Compared – A brief comparison of ADF and SSIS, with a reminder that ADF isn’t just “SSIS in the cloud”.