![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi We have numerous batch jobs that run each night. Some jobs might run weekly. In the course of the week there are times when the batch jobs reach 72% and that causes *another* job to fail! This is because when the transaction logs reach 72% the oldest running transaction is rolled back while the current job that is filling the tx logs continues on. It will only fail if it reaches a limit such as 90%. So we're in a situation where the logs occasionally go above 72% but do not reach 90%. (Also the job that is rolled back is only rolled back when it goes to commit which is at the very end so that "time" is not very useful either) Is there any way to identify the culprit that is causing the transaction logs to fill ? Is there some kind of monitoring that would cross-reference the tx logs to the unix process? Please note there are hundreds of jobs. |
#3
| |||
| |||
|
|
In the course of the week there are times when the batch jobs reach 72% and that causes *another* job to fail! ... (Also the job that is rolled back is only rolled back when it goes to commit which is at the very end so that "time" is not very useful either) |
|
Is there any way to identify the culprit that is causing the transaction logs to fill ? Is there some kind of monitoring that would cross-reference the tx logs to the unix process? Please note there are hundreds of jobs. |
#4
| |||
| |||
|
|
On Oct 26, 2011, at 5:59 AM, seldom isnice wrote: In the course of the week there are times when the batch jobs reach 72% and that causes *another* job to fail! ... (Also the job that is rolled back is only rolled back when it goes to commit which is at the very end so that "time" is not very useful either) That might be a clue. *Force-abort actually aborts the target transaction right away. *If the target is in a query, the query is aborted, and upon return to the sequencer, the transaction is aborted. *If the target is in between queries, the session thread is awakened to do the abort, but the force-abort can't be reported to the target until the next time it issues a query. *It would seem that in your case, that "next query" is a commit. *Perhaps the commit is being delayed for some reason? Is there any way to identify the culprit that is causing the transaction logs to fill ? Is there some kind of monitoring that would cross-reference the tx logs to the unix process? Please note there are hundreds of jobs. You might be able to use the number of log writes per transaction, although that's not a sure thing since it doesn't tell you the size of the log records. ipm can tell you this, or logstat -user_transactions. Logstat or ipm should also tell you when the force-abort actually happened, so that you can correlate the time with what's running. I imagine IMA can tell you as well, but I'm too lazy to look up the query at the moment. It would also be worth looking into the job being aborted to make sure that it's issuing its commit in a timely manner. Karl |
#5
| |||
| |||
|
|
The transaction that is being rolled back is embedded SQL/C that starts with: set autocommit on set nojournaling It is doing the following: truncate copy in indexing The copy-in statement gives the error: E_QE0024 The transaction log file is full. The transaction will be aborted. |
#6
| |||
| |||
|
|
seldom isnice wrote: The transaction that is being rolled back is embedded SQL/C that starts with: set autocommit on set nojournaling It is doing the following: * *truncate * *copy in * *indexing The copy-in statement gives the error: E_QE0024 The transaction log file is full. The transaction will be aborted. Ideally the COPY...FROM wouldn't be doing any logging at all. *It appears it may be getting tricked into doing an incremental load, which is very slow and does lots of logging. To ensure it does a bulk load the table needs to be empty (which it will be, because it's just been truncated); not journaled; no secondary index (which there won't be unless it was declared WITH PERSISTENCE, and it mustn't have any system maintained object_key or table_key columns. -- Roy UK Ingres User Association Conference 2012 will be on Tuesday June 19 2012. * The latest information is available fromwww.uk-iua.org.uk. |
#7
| |||
| |||
|
| I'm not suggesting that the copy-in sql/c job is *causing* the logging. We have been running this job for years and it hasn't been a problem. ie the table is truncated. There are no secondary indexes. autocommit is on nojournaling is on |
|
So the point I'm making is that when the copy in reaches the end of loading all the 7m records it appears to do a commit at which point the server says: tx aborted. (due to a log full that happened a while earlier) |
#8
| |||
| |||
|
|
On Oct 26, 2011, at 4:08 PM, seldom isnice wrote: I'm not suggesting that the copy-in sql/c job is *causing* the logging. We have been running this job for years and it hasn't been a problem. ie the table is truncated. There are no secondary indexes. autocommit is on nojournaling is on If the job is in fact doing "set nojournaling" that will not disable journaling for a table already being journaled. *You need "set nojournaling on tablename" for that. I don' t know that the copy-in job is causing the problem either, but if the target table is journaled than it is certainly doing its share of logging.. So the point I'm making is that when the copy in reaches the end of loading all the 7m records it appears to do a commit at which point the server says: tx aborted. (due to a log full that happened a while earlier) The way a copy-from interacts with force abort is that when the server decides that the copy-from session is the force abort target, it posts an unsolicited message to the client. *The copy-from code eventually polls for this and replies with an interrupt ack, at which point the server rolls back the transaction and ends the copy with the force-abort messages. *The poll time varies depending on Ingres version, but it's unlikely to be more than a few hundred rows worth of input, i.e. a second or three. if the copy were to complete successfully, it would indeed commit since autocommit is turned on. If the copy-from is really doing a bulk copy, it will write a handful of log records and then chug away doing its thing. *At the end, it will write a few more log records and finish up. * Abort due to force-abort would work the same way, though, bulk copy or no bulk copy. Have you mentioned which Ingres version is being used? Karl |
#9
| |||
| |||
|
|
Hi We have numerous batch jobs that run each night. Some jobs might run weekly. In the course of the week there are times when the batch jobs reach 72% and that causes *another* job to fail! |
|
Is there any way to identify the culprit that is causing the transaction logs to fill ? Is there some kind of monitoring that would cross-reference the tx logs to the unix process? Please note there are hundreds of jobs. |
#10
| |||
| |||
|
|
On 26/10/2011 10:59, seldom isnice wrote: Hi We have numerous batch jobs that run each night. Some jobs might run weekly. In the course of the week there are times when the batch jobs reach 72% and that causes *another* job to fail! snip Is there any way to identify the culprit that is causing the transaction logs to fill ? Is there some kind of monitoring that would cross-reference the tx logs to the unix process? Please note there are hundreds of jobs. As a software developer, if I look back on the development of software and the shift in emphasis from bare functionality to considering the whole user experience, it rather dismays me that a DBA is even having to ask this question. That's not a reflection on the DBA, but the fact that it's not an easy thing to do, yet the world over, for twenty years or more DBAs have probably encountered exactly the same issue and companies have had upheaval and downtime because of it. It would be good if Actian took a look at the user experience of their DBAs and moved it forward from the distant past into the modern era. DBAs shouldn't need to scrabble about on mailing lists to be able to solve this very basic stuff. Ingres (as in the software) should provide this information easily. Isn't it embarrassing? |
![]() |
| Thread Tools | |
| Display Modes | |
| |