![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I have a very large DB (>300GB). Each day I receive a file containing an average of 2 million individual delete statements. The delete statements all delete based on full primary key so only an @@rowcount value of 0 or 1 is possible for each statement. The deletes can be from any one of several hundred tables in the database. Right now I'm just cursoring through that collection of delete statements and executing them one at a time as dynamic sql (since I get the whole SQL statement I don't know of another way to do it). It takes 1-2 hours to complete, depending on other traffic on the host. Batching them into a single trx gives me better performance, but I lose the ability to know which statement within the transaction failed if any in the batch do, or which ones come back with a zero row count (i.e., the row wasn't there, which is information I need to capture). I keep thinking there's an easy, elegant solution, but it's eluding me so I thought I'd ask. My undying gratitude to anyone who can help. |
#3
| |||
| |||
|
|
I have a very large DB (>300GB). Each day I receive a file containing an average of 2 million individual delete statements. The delete statements all delete based on full primary key so only an @@rowcount value of 0 or 1 is possible for each statement. The deletes can be from any one of several hundred tables in the database. Right now I'm just cursoring through that collection of delete statements and executing them one at a time as dynamic sql (since I get the whole SQL statement I don't know of another way to do it). It takes 1-2 hours to complete, depending on other traffic on the host. Batching them into a single trx gives me better performance, but I lose the ability to know which statement within the transaction failed if any in the batch do, or which ones come back with a zero row count (i.e., the row wasn't there, which is information I need to capture). I keep thinking there's an easy, elegant solution, but it's eluding me so I thought I'd ask. My undying gratitude to anyone who can help. |
#4
| |||
| |||
|
|
One thing that worries me when you talk about transactions, is that some deletes may be related. |
|
I have a very large DB (>300GB). Each day I receive a file containing an average of 2 million individual delete statements. The delete statements all delete based on full primary key so only an @@rowcount value of 0 or 1 is possible for each statement. The deletes can be from any one of several hundred tables in the database. Right now I'm just cursoring through that collection of delete statements and executing them one at a time as dynamic sql (since I get the whole SQL statement I don't know of another way to do it). It takes 1-2 hours to complete, depending on other traffic on the host. Batching them into a single trx gives me better performance, but I lose the ability to know which statement within the transaction failed if any in the batch do, or which ones come back with a zero row count (i.e., the row wasn't there, which is information I need to capture). I keep thinking there's an easy, elegant solution, but it's eluding me so I thought I'd ask. My undying gratitude to anyone who can help. |
#5
| |||
| |||
|
|
My first reaction was sympathy. What a mess to have to handle! The rest is just a few thoughts off the top of my head. As everyone else has said the proper place to fix this is at the source. But I am betting that is not possible. You mention a cursor. Are you loading these DELETE commands into a table and cursoring through them in a script or stored procedure? That would certainly be better than reading the file line by line and executing them that way. I think Erland raised the key question: One thing that worries me when you talk about transactions, is that some deletes may be related. If the collection only works when executed IN ORDER that seriously limits your options. Others have mentioned batching the commands, a good idea, so I will throw out one that is less good (executing them out of order) just to cover more bases. Run two procs, each with their own cursor, one for the odd-numbered rows and the other for the even-numbered rows. Or three, or four, whatever. If you are working from a staging table of DELETE commands it is easy enough for it to have an IDENTITY column to work from. Perhaps contention among the cursors would be a problem, but it you are desperate enough it might be worth a try. Or if you could work out a good way to pick out the table name you might set up one cursor for each table. If there is a hierarchy to them you could start the process for deleting from the bottom-most table first, then start each succeeding upper table after a pause, running several at a time. All of which is neither easy nor elegant, unfortunately. Roy Harvey Beacon Falls, CT On Wed, 31 Oct 2007 15:08:35 -0700, "halftim... (AT) gmail (DOT) com" halftim... (AT) gmail (DOT) com> wrote: I have a very large DB (>300GB). Each day I receive a file containing an average of 2 million individual delete statements. The delete statements all delete based on full primary key so only an @@rowcount value of 0 or 1 is possible for each statement. The deletes can be from any one of several hundred tables in the database. Right now I'm just cursoring through that collection of delete statements and executing them one at a time as dynamic sql (since I get the whole SQL statement I don't know of another way to do it). It takes 1-2 hours to complete, depending on other traffic on the host. Batching them into a single trx gives me better performance, but I lose the ability to know which statement within the transaction failed if any in the batch do, or which ones come back with a zero row count (i.e., the row wasn't there, which is information I need to capture). I keep thinking there's an easy, elegant solution, but it's eluding me so I thought I'd ask. My undying gratitude to anyone who can help.- Hide quoted text - - Show quoted text - |


#6
| |||
| |||
|
|
My first reaction was sympathy. What a mess to have to handle! The rest is just a few thoughts off the top of my head. As everyone else has said the proper place to fix this is at the source. But I am betting that is not possible. You mention a cursor. Are you loading these DELETE commands into a table and cursoring through them in a script or stored procedure? That would certainly be better than reading the file line by line and executing them that way. I think Erland raised the key question: One thing that worries me when you talk about transactions, is that some deletes may be related. If the collection only works when executed IN ORDER that seriously limits your options. Others have mentioned batching the commands, a good idea, so I will throw out one that is less good (executing them out of order) just to cover more bases. Run two procs, each with their own cursor, one for the odd-numbered rows and the other for the even-numbered rows. Or three, or four, whatever. If you are working from a staging table of DELETE commands it is easy enough for it to have an IDENTITY column to work from. Perhaps contention among the cursors would be a problem, but it you are desperate enough it might be worth a try. Or if you could work out a good way to pick out the table name you might set up one cursor for each table. If there is a hierarchy to them you could start the process for deleting from the bottom-most table first, then start each succeeding upper table after a pause, running several at a time. All of which is neither easy nor elegant, unfortunately. Roy Harvey Beacon Falls, CT On Wed, 31 Oct 2007 15:08:35 -0700, "halftim... (AT) gmail (DOT) com" halftim... (AT) gmail (DOT) com> wrote: I have a very large DB (>300GB). Each day I receive a file containing an average of 2 million individual delete statements. The delete statements all delete based on full primary key so only an @@rowcount value of 0 or 1 is possible for each statement. The deletes can be from any one of several hundred tables in the database. Right now I'm just cursoring through that collection of delete statements and executing them one at a time as dynamic sql (since I get the whole SQL statement I don't know of another way to do it). It takes 1-2 hours to complete, depending on other traffic on the host. Batching them into a single trx gives me better performance, but I lose the ability to know which statement within the transaction failed if any in the batch do, or which ones come back with a zero row count (i.e., the row wasn't there, which is information I need to capture). I keep thinking there's an easy, elegant solution, but it's eluding me so I thought I'd ask. My undying gratitude to anyone who can help.- Hide quoted text - - Show quoted text - |
#7
| |||||
| |||||
|
|
.. the source file is immutable. I don't control it and it would take an act of God (not to mention an epoch and a king's ransom) to get it changed. |
|
I am bulk-loading the delete statements into a table and cursoring through them. |
|
These statements can be executed in any order and there are no foreign key constraints. |
|
The bulk load puts the individual delete statements in with a rowID column, so I think I might have my solution. |
|
I know, fixing it at the source would by far be the better way, but at some point you just quit beating your head against immovable brick walls. ;^ |
![]() |
| Thread Tools | |
| Display Modes | |
| |