If the cloud is the next frontier in data storage and analysis, then ETL is the rocketship that takes us there. Although some have opined that the cloud would bring about the end of ETL, the need for reliable data movement and transformation remains critical in a cloud-based architecture. However, for traditional on-premises ETL tools such as SQL Server Integration Services, this path can be foggy. Using SSIS in the cloud requires some planning, but it can be done.
SSIS in the Cloud
In this series of posts, we will take a journey in which we explore the ways that using SSIS in the cloud can help get data into and out of cloud-based storage:
- Running an on-premises SSIS server which communicates with Azure databases
- Using an Azure VM to run SSIS to communicate between cloud sources and destinations
- Using the Azure-specific tasks and components available in SSIS
We’ll also discuss and demonstrate tools including the Azure Feature Pack for Integration Services. In addition, we will touch on some alternatives to SSIS for interaction with cloud data.
“The cloud” isn’t just one thing. Cloud architectures involve storing data and/or processing logic on off-site servers using one of several architectures: SaaS, IaaS, PaaS, etc. For example, SQL Server is available through IaaS by installing SQL Server on a virtual machine, or through PaaS by using either SQL Database or SQL Data Warehouse. Similarly, services such as SSAS can be run in a VM or through Azure Analysis Services.
With the introduction of cloud ETL tools such as Azure Data Factory (ADF), there was some concern among those with significant investments in SSIS that Integration Services would be pushed aside in favor of newer, cloud-based resources. However, Microsoft has signaled that SSIS is going to be supported and relevant for years to come with the ADF runtime for SSIS and its ecosystem for custom components.
In addition to the posts in this series, I recommend reading the following publications from Microsoft.
Running SSIS on Azure VM (IaaS) – A useful list of potential use cases for SSIS running in a VM
SSIS for Azure and Hybrid Data Movement – An older but still applicable paper on hybrid load scenarios, including coverage on sharding and error handling.
Azure Data Factory and SSIS Compared – A brief comparison of ADF and SSIS, with a reminder that ADF isn’t just “SSIS in the cloud”.