One of the most common questions I am asked about ETL configuration is how much memory SSIS requires. When configuring a new instance of SQL Server running SSIS, or adding SSIS to an existing SQL Server installation, it is important to understand how SSIS uses memory, and what factors in your package design will impact the amount of memory used by SSIS.
How much memory does SSIS need?
It depends. (Seriously, you weren’t expecting an exact number, right?)
SSIS can consume a lot of memory, so give it as much as you can. Integration Services is an enterprise ETL tool, and works best on enterprise-level memory. With RAM, more is almost always better. However, memory isn’t free, and simply maxing out the memory on every SQL Server running SSIS isn’t palatable for most hardware budgets.
Package design, architecture (ETL versus ELT), and parallelism will all impact memory use, so you have to include those factors in mind when configuring memory for SSIS.
To the question “How much memory does SSIS need?”, the only definitive answer is, “As much as possible.” However, before throwing a lot of expensive RAM at the problem, make sure you know how SSIS consumes memory during package execution.
How SSIS uses memory
When planning for memory needs, it is critical to understand how SQL Server Integration Services uses memory. SSIS will allocate memory from the unallocated system memory for each package executed, and surrenders that memory shortly after the package completes its execution. The memory allocated for SSIS package executions runs in the SSIS execution runtime process (ISServerExec.exe, if you are executing the package from the SSIS catalog).
Here’s where the package design has a significant impact on memory use. If a package uses an SSIS data flow, all of the data passing through that data flow is written to memory used by SSIS. For example, consider a package that loads 10 million rows from a flat file to a table. In this case, all 10 million rows will pass through the SSIS memory space during package execution.
However, that doesn’t mean that all 10 million rows will reside in memory at the same time. Internally, the SSIS data flow uses a buffer mechanism to break apart loads into smaller chunks, typically the smaller of either 10,000 rows or 10 megabytes in size. So for this example, it won’t load up all 10 million rows into SSIS memory at once. It will load a buffer at a time, and normally has a maximum of five active buffers for each data flow execution tree at any given time.
For data flow tasks containing asynchronous components (including the union all, sort, and some script components), there will be multiple execution trees within the data flow. Each of these execution trees will have its own set of memory buffers. Therefore, data flow tasks using asynchronous transformations can have a larger memory footprint than those with only synchronous components.
Lookup transformations and memory usage
There are some design patterns in which the entire set of data resides in SSIS memory during data flow processing. The most common of these is when using the lookup transformation in full cache mode (the default setting). In this setup, all of the data from the table or query used for the lookup will be cached in SSIS memory for the duration of the execution. This design allows the lookup transform to make its comparison directly in memory (rather than going back to the source each time, but also adds to the memory footprint of the package. The memory required for the lookup transformations is in addition to the memory allocated for the buffers described above.
It is possible to reduce the memory footprint by setting the lookup transformation’s cache mode to either partial cache or no cache. However, be careful when doing this, because it can slow down the overall package runtime by increasing the number of queries that must be made against the lookup source.
Fully blocking transformations and memory usage
Fully blocking transformations are the asterisk to the buffer behavior I described above. These transformations, including the sort, aggregate, and fuzzy lookup and grouping tools, will queue up all of the data in the data flow rather than passing through the individual buffers as they are processed. Because these transformations perform some activity on the entire set of data rather than individual rows, all of the data has to reside in memory at the same time for the SSIS data flow to perform these functions.
There are some legitimate uses for these fully blocking transformations. The example above shows sorting data coming from flat files, and if there were a relatively small amount of data that needed to be handled, using the sort transformation would be a good option. However, for larger data volumes requiring a lot of in-memory processing, consider using an ELT (rather than ETL) pattern with staging tables to let the database engine handle those operations. SQL Server (and in fact, most any relational database engine) is better than SSIS at some tasks. Sorting and aggregation are certainly included in that list.
SSIS memory use in the control flow
With very few exceptions, the data flow will be the source of any memory pressure in SSIS. The control flow tasks (apart from the data flow) typically do not process volumes of data through SSIS in the same way that the data flow does, so there is less of a risk of memory challenges arising from large amounts of data.
That’s not to say you can’t cause memory issues with just control flow objects, but it is very rare for control flow tasks other than the data flow to cause memory pressure.
SSIS memory use with the Execute SQL Task
If you use an execute SQL task in SSIS to run T-SQL code on a SQL Server, that SQL instance (not SSIS) will be doing most of the work. The SSIS execution engine basically hands off that SQL query to the specified SQL Server instance and waits for the code to complete.
Unless you are returning a large result set from the execute SQL task (which is very rare), you’ll not find SSIS memory use to be an issue with this task.
SQL Server and SSIS memory
SSIS does not exist in a vacuum, and it often runs alongside SQL Server on the same instance. Planning for SSIS memory needs requires an understanding of how SQL Server and Integration Services play together in the memory buckets.
SSIS memory is completely separate from SQL Server memory
SQL Server allows you to configure the minimum and maximum memory used by the database engine, but this is not the same set of memory available to SSIS. Integration Services uses its own pool of memory retrieved from available memory at runtime. As such, be sure to configure the SQL Server on which the SSIS packages run to leave enough memory to run the SSIS packages, the operating system functions, and the numerous other apps running on that machine.
You can sometimes make SSIS perform better by reducing the maximum memory available to SQL Server. By limiting how much memory SQL Server can use, you are also increasing the pool of memory available to other processes, including SSIS.
You can’t set memory minimum or maximums for SSIS
Unlike SQL Server, there is no way to configure a floor or cap on the amount of memory the SSIS execution engine will use. I’ve seen SSIS fill up a server’s RAM on more than a few occasions. Again, when configuring RAM on a machine (whether physical or virtual), more is better.
A multi-role SSIS server requires extra attention to memory management
I have a handful of clients who run dedicated SSIS servers, but the vast majority of SSIS environments run Integration Services on an existing SQL Server that serves other roles. In fact, a lot of SSIS loads run on the same server where the data is extracted from or loaded to. In a case like this, you have to configure memory (as well as CPU and I/O throughput) to make sure that server has enough elbow room to both run the SSIS packages and perform the database engine functions for the underlying CRUD operations of the ETL.
Parallelism and SSIS memory
Apart from package design as discussed earlier, the other big factor impacting SSIS memory use is parallelism. Each running package requires a certain amount of memory, so running a large number of packages simultaneously can overwhelm the physical memory of the machine even if the individual packages are modest in their needs.
Whether you are configuring a parent-child SSIS architecture, or just trying to schedule all of your ETL jobs to run in a specific maintenance window, be aware of how many packages will be running at once to avoid memory issues. Spreading out the executions over time can reduce the memory needs by limiting the number of package running in parallel.
Configuring the right amount of memory for SSIS
While there is no magic formula for calculating how much memory you’ll need to run SSIS, you can make some educated guesses based on data volume, data types (and their respective sizes), package design, and parallelism. Estimating the row width, the number of rows per buffer, and the maximum number of active buffers would give you at least a rough estimate of the amount of memory required for a given data flow.
If using this type of rough calculation, though, keep in mind that package changes, the number of data flows running at once, data type changes, or an increase in data volume can significantly change the memory needs of the server running SSIS packages. Be as generous as you can when allocating memory to SSIS, especially when configuring physical (not virtual) machines that require physical installation of additional RAM if needed. Configuring for growth will ultimately save time and money later on.
There is no magic formula to determine how much memory you’ll need to run SSIS packages. However, understanding how SSIS uses memory will help you to configure your environment, properly build your packages, and schedule strategically to make the most of your memory.