Earlier this week, I blogged about the automatic cleanup process that that will clean up the SSIS catalog logging tables. This nightly process removes data for operations that are older than 365 days. While this is useful, many SSIS admins have complained that this process is very slow and contentious on large or busy SSISDB databases.
In this post, I’ll show to you one of the main reasons this purge process is slow, and will share a more efficient way of performing this delete operation.
Investigating the delete operation
When the catalog purge takes place, it removes the old operational and execution data from most of the log tables, including:
… and many others. Some of these tables can grow quite large, but since the process runs each night, the percent of data being deleted is relatively small. So why does it take so long to run?
To investigate more deeply, we’ll open up the stored procedure [internal].[cleanup_server_retention_window], which is invoked each night via SQL Server Agent job. When looking through this stored procedure, one would expect to see a score of score operations to touch each one of these tables. However, what you’d find is substantially simpler. Below is an example of the centerpiece of the delete operation in the purge stored procedure.
Rather than twenty or more DELETE statements, we find just one, which deletes from the [internal].[operations] table.
As it turns out, deleting from the [internal].[operations] table will handle all of the delete operations, for one reason: All of the foreign keys are built using the setting ON DELETE CASCADE. When using this setting in a foreign key, you can delete a row in a table referenced by a foreign key, and the delete operation will also remove any rows in the table that references that particular row. In theory, this is a good idea: it helps to ensure referential integrity, and reduces the amount of code you have to write to delete data. However, using the cascading deletes can lead to lots of blocking, particularly when those deletes are chained together across multiple tables.
One example of this chaining is the [internal].[execution_parameter_values] table. To delete from this table, the SSIS purge process deletes the [internal].[operations] table. This table is referenced by a foreign key on the [internal].[executions] table, so the update transaction is now deleting from two tables by way of the ON DELETE CASCADE foreign key. Finally, the [internal].[execution_parameter_values] table has a foreign key referencing [internal].[executions], which means that we now have three different tables involved in a delete of a single row on the parent table. While this may not present an issue with small and less busy SSISDB databases, it is not uncommon to encounter contention issues with the purge process for very busy SSIS catalog databases.
A Better Way to Clean Up the SSIS Catalog Database
While I am a fan of writing less code when possible, I’ve learned that relying on the ON DELETE CASCADE constraint of the foreign key is not the most efficient way to handle purge operations from the SSIS catalog. To that end, I’ve created my own deletion script that touches each table individually to purge old data. This script creates a working list of operation IDs in a temp table, which will be used as a reference to identify the rows to be deleted in each log table.
Using that list of IDs, this script will then process the essential logging tables in reverse order of granularity (smallest to largest) as defined by the foreign keys on those tables. The order of these delete operations is important to ensure an organic delete rather than relying on the ON DELETE CASCADE setting of the foreign keys.
The script below is a sample of what I use to perform the cleanup operation on the SSISDB catalog database. As you can see, it handles each table individually, using a list of execution_id values built by querying a list of old executions based on the retention settings in the SSIS catalog. As you’ll see, a few of the tables have special handling to delete data in batches of 1 million rows, due to the amount of data that can accumulate in those tables.
How to use this script
To use this script, I recommend that you create a separate user-defined stored procedure in the SSISDB database. Although you could simply replace the code in the internal.cleanup_server_retention_window stored procedure, it is a best practice to avoid modifying any of the existing DDL in the SSISDB database. It is much safer to create your own stored procedure for this, preferably in a new schema that you also create to build some logical and functional separation from the built-in SSIS catalog objects. You can then create your own SQL Server Agent job that executes this stored procedure.
While the built-in automated purge process in the SSISDB is fine to use for average workloads, one can encounter performance and contention issues on large and/or busy SSISDB databases. Using a more direct, table-by-table approach requires more code, but this custom process to clean up the SSIS catalog database will perform better and provide more flexibility.
Very cool. I have often fought with the SSISDB Catalog cleanup processes. But have never dove deep enough to work it out this way.
One thing to note … I may be mistaken, but in the sections of your script where you want to delete in batches, it appears that you left out the TOP (#####) from the DELETE statement.
Otherwise, well done.
Ah.. I see. SET ROWCOUNT 1000000. My bad. 🙂
Very elegant. 🙂
Wow, that is faster than other method. Thanks Tim!!!
Nice script, thanks
I have a system with 100 packages with execution every 5 min, my SSISDB where 300 GB after delete, I have to DROP Keys and Certificates like “cleanup_server_retention_window” do to free up space.
This action is slow, we have 33.000.000 rows in sys.symmetric_keys and sys.certificates and one drop take about 30 sec. it will take 31 years to clean it up 🙂
This will help the script to prevent this issue.
DECLARE @execution_id bigint = (select min(execution_id) from #executions)
DECLARE @execution_id_max bigint = (select max(execution_id) from #executions)
DECLARE @sqlString nvarchar(1024)
DECLARE @key_name [internal].[adt_name]
DECLARE @certificate_name [internal].[adt_name]
DECLARE @message varchar(100)
WHILE @execution_id <= @execution_id_max
SET @message = CAST(@execution_id as varchar(100))
RAISERROR( @message,10,1) with nowait
SET @key_name = 'MS_Enckey_Exec_'+CONVERT(varchar,@execution_id)
SET @certificate_name = 'MS_Cert_Exec_'+CONVERT(varchar,@execution_id)
SET @sqlString = 'IF EXISTS (SELECT name FROM sys.symmetric_keys WHERE name = ''' + @key_name +''') '
+'DROP SYMMETRIC KEY '+ @key_name
EXECUTE sp_executesql @sqlString
SET @sqlString = 'IF EXISTS (select name from sys.certificates WHERE name = ''' + @certificate_name +''') '
+'DROP CERTIFICATE '+ @certificate_name
EXECUTE sp_executesql @sqlString
SET @execution_id += 1
Thanks Tim. Your script got us out of a hole with our SSISDB that the regular Stored Procedure could not (it was generating so many log entries that it was unable to complete the procedure).
We’ll be adopting it as a normal overnight routine.
Jon, thanks for the feedback! I’m glad the script was useful to you.
Thank You, Thank You, Thank You. – I did make a little change, in that the SELECT INTO holds a schema lock, so I explicitly create the table. before inserting the catalog.executions. Which brings me to a question – how does that table get cleaned up – nowhere is there delete for catalog.executions. are we relying on the cascading deletes to remove it?
David, that’s good feedback on explicitly creating that table. Regarding catalog.executions, this is actually a view, not a table. The script deletes data from the underlying table (internal.executions) directly.