One of the most common questions I am asked about ETL configuration is how much memory SSIS requires. When configuring a new instance of SQL Server running SSIS, or adding SSIS to an existing SQL Server installation, it is important to understand how SSIS uses memory, and what factors in your package design will impact the amount of memory used by SSIS.
How much memory does SSIS need?
It depends. (Seriously, you weren’t expecting an exact number, right?)
SSIS can consume a lot of memory, so give it as much as you can. Integration Services is an enterprise ETL tool, and works best on enterprise-level memory. With RAM, more is almost always better. However, memory isn’t free, and simply maxing out the memory on every SQL Server running SSIS isn’t palatable for most hardware budgets.
Package design, architecture (ETL versus ELT), and parallelism will all impact memory use, so you have to include those factors in mind when configuring memory for SSIS.
To the question “How much memory does SSIS need?”, the only definitive answer is, “As much as possible.” However, before throwing a lot of expensive RAM at the problem, make sure you know how SSIS consumes memory during package execution.
How SSIS uses memory
When planning for memory needs, it is critical to understand how SQL Server Integration Services uses memory. SSIS will allocate memory from the unallocated system memory for each package executed, and surrenders that memory shortly after the package completes its execution. The memory allocated for SSIS package executions runs in the SSIS execution runtime process (ISServerExec.exe, if you are executing the package from the SSIS catalog).
Here’s where the package design has a significant impact on memory use. If a package uses an SSIS data flow, all of the data passing through that data flow is written to memory used by SSIS. For example, consider a package that loads 10 million rows from a flat file to a table. In this case, all 10 million rows will pass through the SSIS memory space during package execution.
However, that doesn’t mean that all 10 million rows will reside in memory at the same time. Internally, the SSIS data flow uses a buffer mechanism to break apart loads into smaller chunks, typically the smaller of either 10,000 rows or 10 megabytes in size. So for this example, it won’t load up all 10 million rows into SSIS memory at once. It will load a buffer at a time, and normally has a maximum of five active buffers for each data flow execution tree at any given time.
For data flow tasks containing asynchronous components (including the union all, sort, and some script components), there will be multiple execution trees within the data flow. Each of these execution trees will have its own set of memory buffers. Therefore, data flow tasks using asynchronous transformations can have a larger memory footprint than those with only synchronous components.
Lookup transformations and memory usage
There are some design patterns in which the entire set of data resides in SSIS memory during data flow processing. The most common of these is when using the lookup transformation in full cache mode (the default setting). In this setup, all of the data from the table or query used for the lookup will be cached in SSIS memory for the duration of the execution. This design allows the lookup transform to make its comparison directly in memory (rather than going back to the source each time, but also adds to the memory footprint of the package. The memory required for the lookup transformations is in addition to the memory allocated for the buffers described above.
It is possible to reduce the memory footprint by setting the lookup transformation’s cache mode to either partial cache or no cache. However, be careful when doing this, because it can slow down the overall package runtime by increasing the number of queries that must be made against the lookup source.
Fully blocking transformations and memory usage
Fully blocking transformations are the asterisk to the buffer behavior I described above. These transformations, including the sort, aggregate, and fuzzy lookup and grouping tools, will queue up all of the data in the data flow rather than passing through the individual buffers as they are processed. Because these transformations perform some activity on the entire set of data rather than individual rows, all of the data has to reside in memory at the same time for the SSIS data flow to perform these functions.
There are some legitimate uses for these fully blocking transformations. The example above shows sorting data coming from flat files, and if there were a relatively small amount of data that needed to be handled, using the sort transformation would be a good option. However, for larger data volumes requiring a lot of in-memory processing, consider using an ELT (rather than ETL) pattern with staging tables to let the database engine handle those operations. SQL Server (and in fact, most any relational database engine) is better than SSIS at some tasks. Sorting and aggregation are certainly included in that list.
SSIS memory use in the control flow
With very few exceptions, the data flow will be the source of any memory pressure in SSIS. The control flow tasks (apart from the data flow) typically do not process volumes of data through SSIS in the same way that the data flow does, so there is less of a risk of memory challenges arising from large amounts of data.
That’s not to say you can’t cause memory issues with just control flow objects, but it is very rare for control flow tasks other than the data flow to cause memory pressure.
SSIS memory use with the Execute SQL Task
If you use an execute SQL task in SSIS to run T-SQL code on a SQL Server, that SQL instance (not SSIS) will be doing most of the work. The SSIS execution engine basically hands off that SQL query to the specified SQL Server instance and waits for the code to complete.
Unless you are returning a large result set from the execute SQL task (which is very rare), you’ll not find SSIS memory use to be an issue with this task.
SQL Server and SSIS memory
SSIS does not exist in a vacuum, and it often runs alongside SQL Server on the same instance. Planning for SSIS memory needs requires an understanding of how SQL Server and Integration Services play together in the memory buckets.
SSIS memory is completely separate from SQL Server memory
SQL Server allows you to configure the minimum and maximum memory used by the database engine, but this is not the same set of memory available to SSIS. Integration Services uses its own pool of memory retrieved from available memory at runtime. As such, be sure to configure the SQL Server on which the SSIS packages run to leave enough memory to run the SSIS packages, the operating system functions, and the numerous other apps running on that machine.
You can sometimes make SSIS perform better by reducing the maximum memory available to SQL Server. By limiting how much memory SQL Server can use, you are also increasing the pool of memory available to other processes, including SSIS.
You can’t set memory minimum or maximums for SSIS
Unlike SQL Server, there is no way to configure a floor or cap on the amount of memory the SSIS execution engine will use. I’ve seen SSIS fill up a server’s RAM on more than a few occasions. Again, when configuring RAM on a machine (whether physical or virtual), more is better.
A multi-role SSIS server requires extra attention to memory management
I have a handful of clients who run dedicated SSIS servers, but the vast majority of SSIS environments run Integration Services on an existing SQL Server that serves other roles. In fact, a lot of SSIS loads run on the same server where the data is extracted from or loaded to. In a case like this, you have to configure memory (as well as CPU and I/O throughput) to make sure that server has enough elbow room to both run the SSIS packages and perform the database engine functions for the underlying CRUD operations of the ETL.
Parallelism and SSIS memory
Apart from package design as discussed earlier, the other big factor impacting SSIS memory use is parallelism. Each running package requires a certain amount of memory, so running a large number of packages simultaneously can overwhelm the physical memory of the machine even if the individual packages are modest in their needs.
Whether you are configuring a parent-child SSIS architecture, or just trying to schedule all of your ETL jobs to run in a specific maintenance window, be aware of how many packages will be running at once to avoid memory issues. Spreading out the executions over time can reduce the memory needs by limiting the number of package running in parallel.
Configuring the right amount of memory for SSIS
While there is no magic formula for calculating how much memory you’ll need to run SSIS, you can make some educated guesses based on data volume, data types (and their respective sizes), package design, and parallelism. Estimating the row width, the number of rows per buffer, and the maximum number of active buffers would give you at least a rough estimate of the amount of memory required for a given data flow.
If using this type of rough calculation, though, keep in mind that package changes, the number of data flows running at once, data type changes, or an increase in data volume can significantly change the memory needs of the server running SSIS packages. Be as generous as you can when allocating memory to SSIS, especially when configuring physical (not virtual) machines that require physical installation of additional RAM if needed. Configuring for growth will ultimately save time and money later on.
There is no magic formula to determine how much memory you’ll need to run SSIS packages. However, understanding how SSIS uses memory will help you to configure your environment, properly build your packages, and schedule strategically to make the most of your memory.
Excellent article! Can you comment on SSIS and hardware NUMA?
Thanks Kevin! I haven’t done any research on SSIS memory use and hardware NUMA.
I think its not NUMA aware…last I looked anyway, over a year ago. It was killing us to only use 1/4 of some beefy servers dedicated to SSIS.
Hi Kevin and Tim,
First, *fantastic* post, Tim! I love the drawings. I could do that but no one would be able to read my handwriting (Sometimes I can’t!).
Back in the day I led a team of ETL Developers at Unisys. Microsoft published an achievement of loading 1TB of data in 30 minutes using SSIS. My friend, Henk Van Der Valk, worked at Unisys at the time. Henk was able to achieve a 1TB load in 10 minutes on an ES7000 with 96 cores and a bunch of RAM running on NUMA architecture. He wrote about it here (http://henkvandervalk.com/sql2008-r2-dce-on-a-96-core-unisys-es7000-server-with-dsi-solid-state-storage-bulk-inserting-1-terabyte-within-10-minutes). It was… tricksy.
Hope this helps,
What’s about transformation steps in SSIS e.g. replacing ‘NULL’ or 0 with NULL values or adding a computed column? I assume, that the added column will consume RAM too, but replacing columns?
And has SSIS some sort of optimization, when I only need 10 columns of a 50 column flat file, does it store all 50 or only the 10 columns in the RAM (after reading all 50 of course)?
Thomas, if you use a derived column to do a value substitution, it should not add significantly to the memory footprint. On your second question, there is a data flow setting called RunInOptimizedMode, which is set to True by default, which will attempt to exclude from the memory provisioning those columns not used subsequently in the data flow. Even with this setting on, though, there must be an initial provisioning of memory for those columns when they are introduced in the source, so it’s best to exclude unused columns from the source entirely.
Excellent article Tim
I have a setting where SQl server and SSIS(in Intergration Service catalog) resides in same server (sql server 2017) . now my SSIS package is slow and we found that SSIS is not using the allocated SQL server memory.
What is the best way to set memory ?
Do we need to reduce SQL memory or it is better to increase the buffer size for each data flow task inside SSIS package
As always, the answer to such open-ended questions is “It Depends”. Keep in mind that SSIS package execution works from a separate pool of memory than SQL Server, so in some cases reducing the amount of memory allocated to SQL Server can improve performance by freeing up more memory for SSIS to use.
excellent post. havent seen any detailed post on ssis since long.
a really great article. Do you have any experience with RAM allocation by Visual Studio during the build or deployment of a SSIS project? I have the problem that Visual Studio allocates about 2.7 GB RAM in the build process and throws an out of memory error. Do you have any idea what influences the RAM allocation? Is it the number of packages (470) or the size of the packages (the largest has a size of 7.4 MB)?
Alexander, I’ve had some build processes that ran slowly due to the complexity or number of packages, but don’t recall that build process having failed due to memory starvation. I’d be curious about the machine on which the SSIS packages are being built… whether it’s limited on available memory, or if there are possibly other processes consuming memory during the build.
the first time that the error occured was on a new notebook with an I7, 16 GB RAM and the latest Version of Visual Studio Professional 2019. We temporarily upgraded the notebook on 32 GB RAM with no effect. We gave it another try on a desktop PC with the same result. I observe the RAM allocation of VS with the Task Manager and the mentioned 2.7 GB RAM are shown for the VS process with a lot of RAM free. The most curious thing to me is that I have on my desktop PC (I7, 32 GB RAM) with Visual Studio Professional 2017 no problem building and deploying the project. So we uninstalled VS 2019 on the notebook and installed VS 2017 but even more strange we got the same out of memory error. By now the only solution seems to me to split up the project in smaller ones.
Fantastic post! Thank you very much. Best, R.
Just and awesome article!!! Wow!!