I recently had an interesting situation where an SSIS package was scheduled to run at 1 AM but failed with a deadlock message. This seems a bit strange given that it was a deadlock on the step executing the SSIS package, not an error message from within the package. There was no history of a failure in the SSIS execution logs, just the job failure. As you can see, the error only happened a couple of times even though all the 1:00 AM runs were at the same time as the other job.
The error message text read as follows:
Executed as user: <Agent Account Here>. Microsoft (R) SQL Server Execute Package Utility Version 11.0.2100.60 for 32-bit Copyright (C) Microsoft Corporation. All rights reserved. Started: 1:00:01 AM Failed to execute IS server package because of error 0x80131904. Server: <Server Name Here>, Package path: SSISDB<Folder><Project><SSIS Package Name>.dtsx, Environment reference Id: NULL. Description: Transaction (<SPID>) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. Source: .Net SqlClient Data Provider Started: 1:00:01 AM Finished: 1:00:09 AM Elapsed: 8.331 seconds. The package execution failed. The step failed.
The reasoning behind this failure is the stored procedure in the SSISDB catalog.create_execution. If multiple calls are made with that stored procedure at the same time then a deadlock could occur. Essentially that is what happens when you schedule two SSIS packages at the same exact time in SQL 2012, the procedure is run for each job that starts and causes the deadlock error seen above.
[notice]The following information is a direct copy from Microsoft Support Document 2699720[/notice]
Consider the following scenario:
- You install Microsoft SQL Server 2012.
- You schedule a Microsoft SQL Server Integration Services (SSIS) package to start by using the SSIS.catalog.create_execution stored procedure.
- You call the SSIS package again or schedule a second SSIS package to start at the same time by using the SSIS.catalog.create_execution stored procedure.
In this scenario, you may experience a deadlock condition. An error that resembles the following is logged in the SQL Server Agent Job History:
Description: Transaction (Process ID spid) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
While the article goes on to say the issue is resolved in Service Pack 1 and is even listed in the bug fixes (link here to full list of bug fixes in SP 1) the error occurred for me on a server with the latest service pack installed.
The resolution for me was to simply (as suggested in the workaround section of KB 2699720) stagger the timing. One package runs at 1:00 AM the second runs at 1:05 AM. You probably don’t need that much time between jobs, but the window I was in didn’t need to be very close, so I had some leeway to schedule other processes to start in between.
Sorry I didn’t respond to this when you posted that error via email. Now that I read your post I recognized this issue from another client and my fix was the same as yours, just make the packages not start at exactly the same second. I did some tests and I think that even if you start the 2nd execution 5 seconds or so after the first they will both run ok.
We brought this up with Microsoft last year and I guess I’m not too surprised that even SP1 didn’t fix it.
This is very similar to an issue in SSRS with the snapshots only being stored in the database down to the second, so if 2 snapshots are started for the same report at the same second one will overwrite the other.
Seems that MSFT is really not keeping track of things being run in large batches in enterprise setups.
I have exactly the same problem in SQL Server 2008. Do you know whether there is any fix for 2008 ?
Can you post the error message you are seeing? On 2008 I would think it should be different since there is no SSISDB. Are your packages stored in MSDB or on the file system? I could see a similar error coming up if you stored the packages in MSDB. Have you tried scheduling them a few seconds apart rather than the exact same time?
Thanks for your quick response. Yes, that’s exactly what we did as a workaroud, stagerring the SQL Server Agent jobs 1 minute apart and run all the packages every half hour instead of every 15 minutes
Unfortunately the only fix I know of is scheduling them at different times. I was originally doing them 5 minutes apart, but I tested running them 10 seconds apart and it worked just fine too. So if you need to get a smaller window move those schedules to a smaller interval. Another option, granted may not be a good one, is to make a master package that calls each of the packages you want to schedule at the given time. This would only work if all the packages ran on the same schedule which is partially why I don’t think it’s the best idea, but for some scenarios it may work well.
So for the time being the long and short answer is to code around it. If you find a better solution feel free to share it!
Thanks for checking out the blog too Farid.
A couple of ways around this.
1. you could edit catalog.create_execution and cleanup more Microsoft mess (they didn’t encrypt the original stored procs)
2. Just configure the job to retry 3-5 times in the advanced tab
What’s even more fun is when the internal cleanup proc runs and uses the FK on delete cascade stuff to cleanout about 10 tables…. 10 rows at a time = hundres of thousands if not millions of rows in the lower end tables. Depending on how many jobs your run per day that cleanup can take hours, even after the indexes Microsoft recommends. Better yet would be to not depend on the on delete cascade FKs and instead start the deletes at the bottom tables.
“Just configure the job to retry 3-5 times in the advanced tab”
Thank you Mark for this suggestion !!!