dbTalk Databases Forums  

Advice needed: recurring DEADLOCK problem

comp.databases.sybase comp.databases.sybase


Discuss Advice needed: recurring DEADLOCK problem in the comp.databases.sybase forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Martijn Rutte
 
Posts: n/a

Default Advice needed: recurring DEADLOCK problem - 06-29-2004 , 05:26 AM






Hi there,

I would very much appreciate any advice on this deadlock question.

Running an third-party application on Sybase 12.5.03 and AIX 5.1 I
frequently encounter the following deadlock error:

00:00000:00052:2004/06/28 12:44:41.11 server Deadlock Id 10 detected
Deadlock Id 10: detected. 1 deadlock chain(s) involved.

Process (Familyid 33 33) (suid 4) was executing a SELECT command in
the procedure 'get_event'.
SQL Text: exec get_event 1088419472, '1', 6646647

Process (Familyid 52 52) (suid 4) was executing a INSERT command in
the procedure 'send_event'.
SQL Text: exec send_event
'ACTAB34rty02',3597,'md_rrwd0101_030_com','md_rrwd 0101_000_box','ACT',10,101,4,0,1088419472,0,'geit' ,0,0,338751,1,'',0

Process (Familyid 0, Spid 52) was waiting for a 'next key' lock on row
5 page 977 of the 'event' table in database 7 but process (Familyid 3

3, Spid 33) already held a 'shared row' 'range' lock on it.

Process (Familyid 0, Spid 33) was waiting for a 'shared row' lock on
row 7 page 979 of the 'event' table in database 7 but process
(Familyid

52, Spid 52) already held a 'exclusive row' lock on it.

Process (Familyid 0, 33) was chosen as the victim. End of deadlock
information.



Configuration:
1. deadlock checking period = 5000
2. Table EVENT, datarow locking, clustered index, about 20 - 40
records
3. Daemon triggering stored procedure
a. 'get_event', retrieving all records from EVENT every 3
minutes in one transaction at isolation level 3
b (for each row in this selection, a processes on some remote
Sybase Servers is triggered)
c. prompted by the information coming back from the remote
processes, 'send_event' is started, selecting and evaluating some
fields from EVENT, then inserting a

new record into it

String of events on the local Sybase server, which ends in deadlock:
1. get_event, selecting all rows from EVENT (family 33 spid 33)
2. send_event, inserting a new row on EVENT (Family 52 SPID 52)
3. again send_event, of which the SELECT on EVENT table waits for the
'next key lock'.
4. Finally, three seconds have passed, so another get_event is
triggered although the INSERT actions based on the first one haven't
finished yet.

Do I jump to conclusions when I state, that it is the relentlessly
scheduled SELECT, triggered by the daemon irrespective of whether the
preceding SELECT has finished, caused the deadlock situation?

Kind regards,

Martijn Rutte

Reply With Quote
  #2  
Old   
Mark A. Parsons
 
Posts: n/a

Default Re: Advice needed: recurring DEADLOCK problem - 06-29-2004 , 11:14 AM






Martijn Rutte wrote:
Quote:
Process (Familyid 33 33) (suid 4) was executing a SELECT command in
the procedure 'get_event'.
SQL Text: exec get_event 1088419472, '1', 6646647

Process (Familyid 52 52) (suid 4) was executing a INSERT command in
the procedure 'send_event'.
SQL Text: exec send_event
'ACTAB34rty02',3597,'md_rrwd0101_030_com','md_rrwd 0101_000_box','ACT',10,101,4,0,1088419472,0,'geit' ,0,0,338751,1,'',0

Process (Familyid 0, Spid 52) was waiting for a 'next key' lock on row
5 page 977 of the 'event' table in database 7 but process (Familyid 3

3, Spid 33) already held a 'shared row' 'range' lock on it.

Process (Familyid 0, Spid 33) was waiting for a 'shared row' lock on
row 7 page 979 of the 'event' table in database 7 but process
(Familyid

52, Spid 52) already held a 'exclusive row' lock on it.

Process (Familyid 0, 33) was chosen as the victim. End of deadlock
information.

Configuration:
1. deadlock checking period = 5000
2. Table EVENT, datarow locking, clustered index, about 20 - 40
records
3. Daemon triggering stored procedure
a. 'get_event', retrieving all records from EVENT every 3
minutes in one transaction at isolation level 3
Isolation level 3 is the most restrictive ... guaranteeing that repeated
reads generate the same exact results ... so not only does it hold locks on
all rows during the transaction, but it also makes sure no rows are
modified (updated/inserted/deleted) during the transaction which could
affect the results that a repeated read would expect (ie, putting a lock on
the range in which the current rows belong).

Quote:
b (for each row in this selection, a processes on some remote
Sybase Servers is triggered)
Holding a transaction open for an extended period of time (eg, row-by-row
processing by a client/application, or remote Sybase server) is not a very
good design. Transactions should be as short/fast as possible so as to
minimize (b)locking and, as you're seeing, the chance for deadlocks.

Quote:
c. prompted by the information coming back from the remote
processes, 'send_event' is started, selecting and evaluating some
fields from EVENT, then inserting a new record into it
When you say 'send_event' is started ... it would appear that this fires up
a separate connection (spid 52) to the local Sybase server. So now you
have the first connection (spid 33) holding very restrictive locks on the
event table while the second connection (spid 52) is trying to modify the
event table. In turn, spid 52 has obtained an exclusive lock that causes
some heart burn for spid 33. (At this point I'm a little hesitant to
comment on how/why 52 got that exclusive lock without some more details.)

NOTE: Sybase manages (dead)locks at the connection/spid level ... each
connection/spid is a stand alone process as far as Sybase is concerned ...
it has no concept of the fact (in this case) that 33 and 52 belong to the
same application.

Right off the bat this looks like another design flaw ... if a process is
going to read and update data it should probably look at doing all of these
operations over the same connection/spid; this would eliminate the
deadlocking issue.

Quote:
String of events on the local Sybase server, which ends in deadlock:
1. get_event, selecting all rows from EVENT (family 33 spid 33)
2. send_event, inserting a new row on EVENT (Family 52 SPID 52)
3. again send_event, of which the SELECT on EVENT table waits for the
'next key lock'.
4. Finally, three seconds have passed, so another get_event is
triggered although the INSERT actions based on the first one haven't
finished yet.
I assume you mean 3 minutes pass? If so ... 3 minutes is definitely
wwwwaaaaaayyyyyy tooooooo long for spid 33 to be holding that transaction
on the events table.

Quote:
Do I jump to conclusions when I state, that it is the relentlessly
scheduled SELECT, triggered by the daemon irrespective of whether the
preceding SELECT has finished, caused the deadlock situation?
Nah. The 2 select's should be able to obtain the same result set (even
under isolation level 3).

The deadlock, as I've mentioned above, is between the get_event process and
the send_event process. Now, which of the get_event processes is involved
(the first or subsequent) ... *shrug* ... can't tell from what you've
posted.

----------------

This process definitely needs to be rewritten. Some of the questions I'd
want to see answered ...

- why the need for the isolation level 3 locks? is this an attempt to keep
other processes from reading the table? if so, there are other less
restrictive means of doing this

- are other processes even suppose to access the event table? since the
get_event is fired off every 3 minutes it would appear to me, especially in
light of the fact that the get_event isn't finishing in 3 minutes, that no
one else will ever get access to this table? my guess is that the
get_event process is the only one to access the event table ... eh?

- since the send_event may initiate the need for a change to the event
table, this would appear to be contrary to what the isolation level 3 lock
is designed to do (ie, *NOT* allow any changes to the event table); which
is it ... allow changes to the event table or not?

- the send_event processing should, most likely, be conducted over the same
connection as the original get_event; this'll eliminate the deadlocking
issue in this case

---------------

Without more details this sounds like a work-flow process ... the
get_event/send_event processes constantly/repeatedly going through a list
(event table) of actions that need to be taken ... periodically adding a
new event to be processed.

Although there are a couple ways to do this ... the easiest, especially
since you appear to have just one work-flow process, would be to set the
event table up as a queue (heap table). With 'set rowcount 1' you could
pull the first row from the table ... process it ... delete it from the
event table ... and put any new event on the end of the table (simple
insert to the heap table).

Alternatively, if some events need to stay in the queue for a period of
time (ie, deleting and reinserting is overkill, not to mention produces
excess transaction logging) you could look at adding a timestamp or
datetime column to the event table ... updating said timestamp/datetime
column after processed/inserted/modified ... then pull the next event row
based on the oldest timestamp/datetime.

Couple other options come to mind ... but I digress, eh?

--
Mark A. Parsons

Iron Horse, Inc.
iron_horse (AT) NOSPAM (DOT) compuserve.com


Reply With Quote
  #3  
Old   
Martijn Rutte
 
Posts: n/a

Default Re: Advice needed: recurring DEADLOCK problem - 06-30-2004 , 06:13 AM



Dear Mark,

Thanks for your elaborate answers, which is clarifying.

Some answers to your questions:

Quote:
I assume you mean 3 minutes pass?
No, 3 seconds really!!!!

Quote:
The deadlock, as I've mentioned above, is between the get_event process and
the send_event process.
You're absolutely rigth, I misread the deadlock error message

Quote:
Now, which of the get_event processes is involved
(the first or subsequent) ... *shrug* ... can't tell from what you've
posted.
That's the frustrating part of it. And because it's an thirdparty
application, I cannot look into its source. I can try some auditing of
course...

Quote:
- why the need for the isolation level 3 locks? is this an attempt to keep
other processes from reading the table? if so, there are other less
restrictive means of doing this
Right, but they choose transaction level 3 and stick to it (already
asked them).

Quote:
get_event process is the only one to access the event table ... eh?
Apart from the send_event, yes.


Quote:
- since the send_event may initiate the need for a change to the event
table, this would appear to be contrary to what the isolation level 3 lock
is designed to do (ie, *NOT* allow any changes to the event table); which
is it ... allow changes to the event table or not?
thatīs right.


Quote:
- the send_event processing should, most likely, be conducted over the same
connection as the original get_event; this'll eliminate the deadlocking
issue in this case
Yes, that would do the trick, I suppose. Already asked the software
company if there is any possibility to do so.

Quote:
Although there are a couple ways to do this ... the easiest, especially
since you appear to have just one work-flow process, would be to set the
event table up as a queue (heap table).
So no indexing, just plain data you mean=

Quote:
Couple other options come to mind ... but I digress, eh?
No, it is very instructive, actually.

Thanks,

Martijn Rutte


Reply With Quote
  #4  
Old   
Mark A. Parsons
 
Posts: n/a

Default Re: Advice needed: recurring DEADLOCK problem - 06-30-2004 , 11:26 AM



Martijn Rutte wrote:
Quote:
I assume you mean 3 minutes pass?
No, 3 seconds really!!!!
Is it intended that this software package have multiple copies of itself
running at the same time? or was it designed to only have one copy running
at a time? If the latter ... someone goofed ... the second copy should be
exiting if it finds a previous copy still running.

Quote:
Now, which of the get_event processes is involved
(the first or subsequent) ... *shrug* ... can't tell from what you've
posted.
That's the frustrating part of it. And because it's an thirdparty
application, I cannot look into its source. I can try some auditing of
course...
You could also try "sp_configure 'deadlock checking period',<some_number>"
where <some_number> is in milliseconds (default is around 500 milliseconds,
ie, 0.5 seconds). Idea is to set this higher (15-30 secs?) so that you can
capture some info (eg, sp_who, sp_lock, etc.) while the deadlock situation
is happening (the longer delay keeps Sybase from killing one process for
<some_number> of milliseconds).

I wouldn't recommend leaving 'deadlock checking period' at this larger
number for anything other than your testing/analysis.

Quote:
- the send_event processing should, most likely, be conducted over the same
connection as the original get_event; this'll eliminate the deadlocking
issue in this case
Yes, that would do the trick, I suppose. Already asked the software
company if there is any possibility to do so.
Unfortunately, this may just push the deadlock issue to another area (eg,
second get/send_event fires off and deadlocks on the first
get/send_event??).

Without more detail on the actual processes involved it sounds like they've
designed this software for, shall we say, single-threaded operations ...
but try to run multiple threads to speed up (??) the work flow processing.
Can't have it both ways so they'll probably run into several more problems
until they sit down and re-design the entire process ... or only let one
process run at a time.

Quote:
Although there are a couple ways to do this ... the easiest, especially
since you appear to have just one work-flow process, would be to set the
event table up as a queue (heap table).

So no indexing, just plain data you mean=
Sure ... or just make sure that any index you use is non-clustered (ie,
doesn't maintain a physical ordering of the data).

Inserts to a heap (no indices) table go at the end of the table (assuming
no table partitioning, etc.).

With 'set rowcount 1' enabled a 'select' will stop after retrieving the
first row it finds which'll be the first row in the table (no index =>
table scan).

--------------

'course, this is a simplistic view. This assumes that the events can be
handled in a first-come-first-served (ie, a queue) basis. If there's a
need to prioritize which events are processed first ... this could be done
with the proper index (depends on *how* rows are marked/prioritized) or
sorting/ordering the rows before selecting that first row (need to look at
processing requirements if sorting/ordering a *LARGE* number of events
*EVERY* time you go to get a new event).

Quote:
Couple other options come to mind ... but I digress, eh?

No, it is very instructive, actually.
Well, other options start to look at how you would design a work flow
process with multiple processes hitting at the event table at the same time
.... which could be a mix of indices, locking scheme, definitely the SQL
involved and maybe even different tables. All of this would come down to
needing to know a bit more about the actual work flow process and the
operational requirements.

At this point though it sounds like your 3rd party software folks need to
a) reconsider their design or b) find out what's different in your
environment (assuming this works 'fine' at other clients).

--
Mark A. Parsons

Iron Horse, Inc.
iron_horse (AT) NOSPAM (DOT) compuserve.com


Reply With Quote
  #5  
Old   
Martijn Rutte
 
Posts: n/a

Default Re: Advice needed: recurring DEADLOCK problem - 07-01-2004 , 02:18 AM



Mark,

Quote:
Is it intended that this software package have multiple copies of itself
running at the same time? or was it designed to only have one copy running
at a time? If the latter ... someone goofed ... the second copy should be
exiting if it finds a previous copy still running.
And this is exactly what I suspect is happening: one instance of the
process is still running, when the next one (3 seconds) past is
started without checking if its predecessor still runs.

We have strong suspicion that other companies using this software,
encounter the same deadlock problem. Thing is, when the deadlock error
occurs, the scheduled process terminates and restarts again, as if
they tackled this deadlock error in their software error handling
instead of properly attacking it with the locking-instruments Sybase
offers.

Our Sybase/AIX log reader script, however, encounters deadlock
messages and we are beeped out of our bed at night when they occur
every 5 minutes. So the onlyh thing to do seems disabling the deadlock
logging, which we don't want, because it might point to serious
problems.

Well, that's the situation we're in now. I'll try one more time with
the software supplying company and talk about the heap-table solution
/ starting up an new instance of the read/insert process only after
the last one ended, a combination of which seems to offer some
reasonable possibilities of avoiding the deadlock problem.

Thanks for thinking along!

Martijn


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.