dbTalk Databases Forums  

Deadlock occurs on replication!

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Deadlock occurs on replication! in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
William
 
Posts: n/a

Default Deadlock occurs on replication! - 01-08-2006 , 08:54 PM






Hi,
I hava a master and a client in replication application. In the master
1000000 records are inserted into BDB successively within a loop, while
in the client I print the BDB records amount every 5 secords through
calling the function "stat_print".

When the master inserted approximately 200000 or 300000 records(the
actual number differs every time), both the master and client are
suspended. I tried many times, but the same problem always occured
unless I removed "stat_print". So I concluded this was caused by the
client.

After debugging with GDB, I found both "stat_print" and
"rep_process_message" called function "lock_get_internal", and the two
threads stopped at this function. Though I have not explored deeply, I
think the two threads respectively calling "stat_print" and
"rep_process_message" conflict when they require read lock and write
lock respectively.

Untill now I don' t know to solve this problem, any hints would be
greatly appreciated.

Regards,

William


Reply With Quote
  #2  
Old   
Ron
 
Posts: n/a

Default Re: Deadlock occurs on replication! - 01-09-2006 , 07:10 PM






William,

A Support Request will be created to investigate this issue inside
Sleepycat. You will hear from an Engineer soon.

Ron


Reply With Quote
  #3  
Old   
sleepycat-alan
 
Posts: n/a

Default Re: Deadlock occurs on replication! - 01-12-2006 , 04:09 PM



Hello William,

We have created Sleepycat Support Request #13955 to work this issue. We
tried to contact you directly, but we have not received any response.

Please send mail to support (AT) sleepycat (DOT) com if you would like to pursue
this any further with us.

Cheers,
Alan Bram
Sleepycat Software


Reply With Quote
  #4  
Old   
William
 
Posts: n/a

Default Re: Deadlock occurs on replication! - 01-18-2006 , 07:20 PM



Hello Alan,

Sorry for my late response, I had not check mail until yesterday.
Below is the stack trace for client process, but some unimportant
infomation is omitted by me.
(gdb)bt
#0 in sigsuspend() from /lib/i686/libc.so.6
#1 in _pthread_wait_for_restart_signal() from /lib/i686/libpthread.so.0
#2 in pthread_join() from /lib/i686/libpthread.so.0
....
(gdb)info thread
5 Thread in select() from /lib/i686/libc.so.6
4 Thread in select() from /lib/i686/libc.so.6
3 Thread in accept() from /lib/i686/libc.so.6
2 Thread in poll() from /lib/i686/libc.so.6
1 Thread in sigsuspend() from /lib/i686/libc.so.6
(gdb)thread 5
(gdb)bt
#0 in select() from /lib/i686/libc.so.6
#1 in __os_sleep(dbenv = 0x0, secs = 0, usecs = 0) at
.../os/os_spin.c:110
#2 in __os_yield(dbenv = 0x0, usecs = 10000) at ../os/os_spin.c:110
#3 in __db_tas_mutex_lock(dbenv = 0x804c8c8, mutexp = 0x405f59b0) at
.../mutex/mut_tas.c:180
#4 in __lock_get_internal(lt = 0x804cc00, locker = 161, flags = 0, obj
= 0x0, lock_mode = DB_LOCK_READ, timeout = 0, lock = 0x433857fc) at
.../lock/lock.c:871
#5 in __lock_get(dbenv = 0x804c8c8, locker = 161, flags = 0, obj =
0x805171c, lock_mode = DB_LOCK_READ, lock = 0x1) at ../lock/lock.c:414
#6 in __db_lget(dbc = 0x80516b0, action = 0, pgno = 4294966782, mode =
DB_LOCL_READ, lkflags = 0, lockp = 0x433857fc) at ../db/db_meta.c:470
#7 in __bam_traverse(dbc = 0x80516b0, mode = DB_LOCK_READ, root_pgno =
7246, callback = 0x4055de00 <__bam_stat_callback>, cookie= 0x8050b80)
at ../btree/bt_stat.c:566
#8 in __bam_traverse(dbc = 0x80516b0, mode = DB_LOCK_READ, root_pgno =
55829, callback = 0x4055de00 <__bam_stat_callback>, cookie= 0x8050b80)
at ../btree/bt_stat.c:582
#9 in __bam_traverse(...) at ../btree/bt_stat.c:582
#10 in __bam_stat(...) at ../btree/bt_stat.c:114
#11 in __bam_stat_print(...) at ../btree/bt_stat.c:224
#12 in __db_print_status(...) at ../db/db_stati.c:257
#13 in __db_stat_print(...) at ../db/db_stati.c:222
#14 in __db_stat_print_pp(dbp = 0x8051468, flags = 0) at
.../db/db_stati.c:199
....
(gdb)f 4
#4 in __lock_get_internal(...) at ../lock/lock.c.871
871 goto err;
Current language: auto; currently c
(gdb)thread 4
(gdb)bt
#0 in select() from /lib/i686/libc.so.6
#1 in __os_sleep(dbenv = 0x0, secs = 0, usecs = 0) at
.../os/os_sleep.c:84
#2 in __os_yield(dbenv = 0x0, usecs = 10000) at ../os/os_spin.c:110
#3 in __db_tas_mutex_lock(dbenv = 0x804c8c8, mutexp = 0x405f59b0) at
.../mutex/mut_tas.c:180
#4 in __lock_get_internal(lt = 0x804cc00, locker = 52769, flags = 0,
obj = 0x0, lock_mode = DB_LOCK_WRITE, timeout = 0, lock = 0x42b855fc)
at ../lock/lock.c:871
#5 in __lock_get_list(dbenv = 0x804c8c8, locker = 52769, flags = 0,
lock_mode = DB_LOCK_WRITE, list = 0xfffffdfe) at
.../lock/lock_list.c:263
#6 in __rep_process_txn(dbenv = 0x804c8c8, rec = 0xfffffdfe) at
.../repo/rep_record.c:1532
#7 in __rep_process_rec(dbenv = 0x804c8c8, rp = 0x8050658, rec =
0x42b85a9c, typep = 0x42b85810, ret_lsnp = 0x42b85814) at
.../rep/rep_record.c:2318
#8 in __rep_apply(...) at ../rep/rep_record.c:1264
#9 in __rep_process_message(...) at ../rep/rep_record.c:474
....

I am using BerkeleyDB 4.3.21. By the way, I once tried BDB 4.4.16NC,
the same problem still existed.

Regards,

William


Reply With Quote
  #5  
Old   
sleepycat-alan
 
Posts: n/a

Default Re: Deadlock occurs on replication! - 01-30-2006 , 12:07 PM



This message is for the benefit of any other readers who may have been
interested in this topic. (We tried a couple more times to e-mail
William, but have had no further response.)

The stack trace shown here in William's last message actually looks
like a fairly normal situation. I wondered whether he was running any
form of deadlock detection at the client.

By default, Berkeley DB does not run deadlock detection; you have to do
that yourself any time you have more than one thread accessing the
database, if at least one of them is modifying it.

There are a couple of ways to do this, explained in the docs:

./db-4.3.21/docs/ref/transapp/deadlock.html


(In William's original message, he explained that he was calling
DB_ENV->stat_print() repeatedly at a busy replication client, and
the stat_print thread was getting into a deadlock with the
rep_process_message() thread.)

- Alan Bram
Sleepycat Software


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.