In the previous post in this series, I addressed how to use an on-premises instance of SSIS to move data to and from Azure databases. If you’re running in a pure Azure environment without on-prem SQL Server, that load architecture would present some challenges. However, by running SSIS in an Azure VM, you can communicate move data into or out of Azure SQL Database, Azure SQL Data Warehouse, or a data store on another VM.
In this post in the continuing series on SSIS in the cloud, I’ll briefly cover how using SSIS in an Azure VM can be a useful tool for moving data between cloud sources and destinations.
Why a VM?
As noted in my last post in this series, there is currently no PaaS version of Integration Services. Unlike SQL Database and SQL Server Analysis Services, SSIS packages must be stored and run on an installed version of SQL Server, either in an on-premises instance or in a cloud VM. If you’re loading data strictly between on-prem data stores, running SSIS on-prem is almost always the best way to go. If you’re moving data to or from the cloud, or from one cloud data store to another, setting up an Azure virtual machine as an SSIS gateway works quite well.
Running SSIS in an Azure VM has many of the same benefits as moving other workloads to the cloud. You get the benefit of rapid provisioning, avoiding weeks or even months of waiting for a new server. You can provision for your current needs without paying for tomorrow’s data processing needs today. And when your needs do grow, you can add capacity to your ETL server(s) without buying additional hardware or unnecessary downtime.
In many environments, the need for ETL processing is not a round-the-clock operation. Especially in smaller environments, SSIS packages are executed nightly, or in some cases only weekly or monthly. Similarly, it’s common to have a lighter day-to-day ETL workload, while the end of week or month processing requires more horsepower (and thus more servers) for those short-term spikes. In those cases, Azure VM(s) running SSIS could be started only when needed, and shut down until they’re needed again. This task can even be automated via PowerShell.
Create an Azure VM
Since this post is focused on SSIS interoperability, I’ll not get too deep into the nuts-and-bolts of actually creating the virtual machine that will run SSIS. You’ve got several options here, including BYOL (bring your own license), per-minute licensing, and manual installation. This post by Jimmy Wong from the SSIS team earlier this year outlines some of the licensing options. As always, pay attention to the license details so you remain compliant with licensing and don’t get surprised by charges.
Regardless of how you license your SSIS Azure VM, there are a few things that are required on that VM to make this work properly:
An installation of SQL Server on that VM. I recommend using at least SQL Server 2012, so you can make use of the new features introduced in that version (SSIS catalog, parameterization, etc.).
An SSIS catalog on the SSIS VM. This assumes that you are deploying packages to the SSIS catalog, which is highly recommended. Remember that the SSIS catalog must be created manually on a new installation of SQL Server, so don’t forget this step.
A logical connection to the source and destination locations. In a typical on-prem ETL architecture, data sources to which SSIS connects will either reside on the local network or will be reachable via internet address (such as SFTP servers). However, when running SSIS in an Azure VM, there is no physical LAN. You’ll need to use an Azure virtual network to build your logical LAN in the cloud, or use the Azure VPN gateway to connect multiple virtual networks. You can also use the VPN gateway to connect to on-prem data sources (more on that in a future post).
A solid security model. For most on-prem installations of SSIS, using Windows Authentication is the way to go. If your Azure infrastructure includes Active Directory integration (which is preferred), you can use this same authentication model on your Azure-based SSIS VM. For smaller environments without Active Directory integration in Azure, it’ll take a bit more attention to make it work since you’ll be relying on SQL authentication.
Running SSIS in an Azure VM
Once you have installed the components necessary to run SSIS in an Azure VM and set up the logical network connections, managing your SSIS architecture and running day-to-day ETL operations looks very much like an on-premises installation. The packages are stored on and executed from the SSIS catalog in the Azure VM, and that SSIS catalog becomes the center of the universe in terms of ETL ops and monitoring. Making connections to other SQL Server VMs, PaaS SQL databases (Azure SQL Database or SQL Data Warehouse), FTP servers, and other sources and destinations is very similar to those same operations in an on-prem SSIS server.
Be aware of how your SSIS database and other server-based assets are set up – especially where it comes to back ups and patching. If you do a roll-your-own setup where you build the VM from scratch, you’ll be responsible for all backups, patching, and other maintenance just like you would with an on-prem machine. For those new to running machines in Azure, this setup might be a good segue. On the other hand, using a pre-defined Azure image adds the benefit of automated backup and patching for SQL Server. There’s nothing wrong with either one of these options, but in either case, make sure your databases are properly backed up and maintained.
For package development, the same concepts would hold true for SSIS in an Azure VM as with an on-premises installation of SSIS. While it is possible to run the development tools directly on the VM that hosts the SSIS catalog, this is not recommended. Developing on a different machine – perhaps another Azure VM, or in an on-prem workstation – will reduce the performance impact and some of the other risks of logging in locally to the server.
As noted earlier, running SSIS in an Azure VM makes a lot of sense when data is being moved to or from a cloud destination. This architecture is particularly useful when the source and the destination are both in the cloud, where an Azure-based ETL VM would eliminate the overhead of bringing the data on-prem just to move it back to the cloud.
If, however, your ETL workload is limited to on-prem sources and destinations, you are unlikely to see as much immediate value in using a cloud-based ETL solution. There are probably some edge cases where an Azure SSIS node makes sense even for on-prem data movement, but those are few and far between.
Although there is no true platform-as-a-service (PaaS) version of SQL Server Integration Services, using SSIS in an Azure VM is a robust alternative. When part or all of your data is stored in the cloud, an SSIS-equipped Azure VM can reduce the time to get an ETL server configured, reduce the hardware costs, and allow for easy scaling up as your processing needs grow.