dbTalk Databases Forums  

Concurrent Data Store Locking Question (BerkeleyDB)

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Concurrent Data Store Locking Question (BerkeleyDB) in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Nathan Hackett
 
Posts: n/a

Default Concurrent Data Store Locking Question (BerkeleyDB) - 08-20-2003 , 07:46 PM







From the Berkeley DB Reference Guide:

"
and by ensuring that at any given time only one thread of control
is allowed to simultaneously hold a read (shared) lock and attempt
to acquire a write (exclusive) lock.
"

From this I have concluded that only one process should be allowed to create
a cursor on a given database using DB_WRITECURSOR. So I wrote this perl
script to test this. The code is included at the end of this post. It opens
an environment, opens a db, opens a write cursor, then forks a new process
which attempts to open a write cursor on the same db.

The output that I expect from this script is:

Got env.
Got db.
Got first cursor.
Releasing first cursor.
Got second cursor.
Releasing second cursor.
Closing db and env.


Instead, what I actually get is:

Got env.
Got db.
Got first cursor.
Got second cursor.
Releasing first cursor.
Releasing second cursor.
Closing db and env.


I have tested this on my OSX laptop and my FreeBSD box, both give the
same result (As long as I use 4.1 on OSX). So the child process is
granted a write cursor even though the parent already has one? I
think that this works as expected if I make the child open it's own
env and db, but opening a db has overhead that I want to avoid.

/Nathan.


Here is the code:

#!/usr/bin/perl5 -Tw
# This is a perl script to test database locking

use strict;
use BerkeleyDB;

# ************************************************** ***************************
# some variables

my $dbhome = ".";
my $dberr = "err";
my $dbname = "db";
my ($env, $db, $cursor,$pid);

# create a new database environment
$env = new BerkeleyDB::Env -Home => $dbhome
, -ErrFile => $dberr, -Flags => (DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL);
if (!$env) { die "could not create env : $! '$BerkeleyDB::Error'\n"; }
print STDERR "Got env.\n";

# open the database
$db = new BerkeleyDB::Btree -Env => $env, -Filename => $dbname
, -Subname => "data"
, -Flags => DB_CREATE
;
if (!$db) { die "Cannot open database : $! '$BerkeleyDB::Error'\n"; }
print STDERR "Got db.\n";

# get a write cursor
$cursor=$db->db_cursor(DB_WRITECURSOR);
if (!$cursor) { die "Cannot get a cursor : $! '$BerkeleyDB::Error'\n"; }
print STDERR "Got first cursor.\n";

# fork a new process and get a cursor
if (($pid = fork()) == 0) {
my $cursorii;
eval {
local $SIG{INT} = sub { die "interrupted"; };
local $SIG{TERM} = sub { die "interrupted"; };
$cursorii = $db->db_cursor(DB_WRITECURSOR);
if (!$cursorii) {
die "Could not get second cursor.\n";
}
print STDERR "Got second cursor.\n";
while (1) { sleep 10; }
};
print STDERR "Releasing second cursor.\n";
$cursorii->c_close();
undef $cursorii;
exit 0;
}

sleep 5;

print STDERR "Releasing first cursor.\n";
$cursor->c_close();
undef $cursor;

sleep 5;

kill 'TERM',$pid;
waitpid $pid,0;

print STDERR "Closing db and env.\n";
$db->db_close();
undef $env;
exit 0;


Reply With Quote
  #2  
Old   
Philip Guenther
 
Posts: n/a

Default Re: Concurrent Data Store Locking Question (BerkeleyDB) - 08-20-2003 , 08:17 PM






hackett (AT) gardi (DOT) home.rapdat.com (Nathan Hackett) writes:
....
Quote:
From this I have concluded that only one process should be allowed to create
a cursor on a given database using DB_WRITECURSOR. So I wrote this perl
script to test this. The code is included at the end of this post. It opens
an environment, opens a db, opens a write cursor, then forks a new process
which attempts to open a write cursor on the same db.

From the Berkeley DB Reference Guide (ref/build_unix/notes.html)

4. I get core dumps when running programs that fork children.

Berkeley DB handles should not be shared across process forks,
each forked child should acquire its own Berkeley DB handles.

If you do call fork() in a process that has open DB handles, then only
one of the processes may use or close any open handles. The other must
act like the handles don't exist at all. If the BerkeleyDB perl module
reference-counts and automatically closes the BDB handles than it may be
necessary to explicitly call exec() or POSIX::_exit() to keep the
handles from being closed in the one process.


Philip Guenther
guenther at sendmail.com


Reply With Quote
  #3  
Old   
Nathan Hackett
 
Posts: n/a

Default Re: Concurrent Data Store Locking Question (BerkeleyDB) - 08-21-2003 , 01:34 AM



In article <2b65krollt.fsf (AT) functor (DOT) smi.sendmail.com>,
Philip Guenther <guenther (AT) functor (DOT) smi.sendmail.com> writes:
Quote:
From the Berkeley DB Reference Guide (ref/build_unix/notes.html)

4. I get core dumps when running programs that fork children.

Berkeley DB handles should not be shared across process forks,
each forked child should acquire its own Berkeley DB handles.

If you do call fork() in a process that has open DB handles, then only
one of the processes may use or close any open handles. The other must
act like the handles don't exist at all. If the BerkeleyDB perl module
reference-counts and automatically closes the BDB handles than it may be
necessary to explicitly call exec() or POSIX::_exit() to keep the
handles from being closed in the one process.

Thanks, that helps clear it up. After looking around a little it appears
that the preferred method is to use the same environment to open a new
db for each process. My concern is that I thought that I read somewhere
that a call to DB->open() has similar overhead as DB->stat() where it has
to traverse the entire tree. For a large database, that would be a
significant overhead penalty for each new process that is forked to
handle a database request.

Am I wrong about this or is there another way to get a new DB handle
that will avoid this overhead?

Thanks,

/Nathan.


Reply With Quote
  #4  
Old   
Philip Guenther
 
Posts: n/a

Default Re: Concurrent Data Store Locking Question (BerkeleyDB) - 08-22-2003 , 12:28 AM



hackett (AT) gardi (DOT) home.rapdat.com (Nathan Hackett) writes:
Quote:
Thanks, that helps clear it up. After looking around a little it appears
that the preferred method is to use the same environment to open a new
db for each process. My concern is that I thought that I read somewhere
that a call to DB->open() has similar overhead as DB->stat() where it has
to traverse the entire tree. For a large database, that would be a
significant overhead penalty for each new process that is forked to
handle a database request.
With the exception of recno tables for which the DB_SNAPSHOT flag has
been set, I don't *think* the overhead of DB->open() is related to table
size.

On the other hand, DB->close() will call DB->sync() unless you pass it
the DB_NOSYNC flag, which can get quite expensive as it writes out *all*
dirty pages, not just the ones dirtied by the process calling
DB->sync(). So, if you're using transactions and logging or if this is
a transient table, then you should consider using that flag.



Quote:
Am I wrong about this or is there another way to get a new DB handle
that will avoid this overhead?
Put it all in one threaded process. Using multiple processes with a
shared environment is no more robust than a single threaded process, so
this isn't a reliability hit...assuming your other libraries are
thread-safe...


I suppose a real masochist could try writing a memory allocator that
gave our chunks in a shared memory region, then tell DB to use that for
its internal allocations using db_env_set_func_{malloc,realloc,free}(),
hack DB to always use inter-process mutex locks, and see whether the
handles could then be shared between forked child processes, but that
would hardly be a supportable setup.


Philip Guenther
guenther at sendmail.com


Reply With Quote
  #5  
Old   
Nathan Hackett
 
Posts: n/a

Default Re: Concurrent Data Store Locking Question (BerkeleyDB) - 08-22-2003 , 12:22 PM



In article <2b1xventvv.fsf (AT) functor (DOT) smi.sendmail.com>,
Philip Guenther <guenther (AT) functor (DOT) smi.sendmail.com> writes:
Quote:
With the exception of recno tables for which the DB_SNAPSHOT flag has
been set, I don't *think* the overhead of DB->open() is related to table
size.

On the other hand, DB->close() will call DB->sync() unless you pass it
the DB_NOSYNC flag, which can get quite expensive as it writes out *all*
dirty pages, not just the ones dirtied by the process calling
DB->sync(). So, if you're using transactions and logging or if this is
a transient table, then you should consider using that flag.

Thanks, I'll try that. I am suprised that this feels a little like
breaking new ground since the standard method for implementing a server is
to fork a child process for each socket.

Quote:

Am I wrong about this or is there another way to get a new DB handle
that will avoid this overhead?

Put it all in one threaded process. Using multiple processes with a
shared environment is no more robust than a single threaded process, so
this isn't a reliability hit...assuming your other libraries are
thread-safe...

I'm not sure that the BerkeleyDB perl module supports this?

/Nathan.


Reply With Quote
  #6  
Old   
Michael Ubell
 
Posts: n/a

Default Re: Concurrent Data Store Locking Question (BerkeleyDB) - 08-22-2003 , 01:18 PM





Philip Guenther wrote:

Quote:
With the exception of recno tables for which the DB_SNAPSHOT flag has
been set, I don't *think* the overhead of DB->open() is related to table
size.

Am I wrong about this or is there another way to get a new DB handle
that will avoid this overhead?
Philip is correct, DB->open only reads a couple of pages of the
file (up to 5 pages if its opening one of several databases in the file).
Only one page must be read directly from the file to verify its
true identity, the rest may be cached in the shared region and may
not be actually read.

Quote:
Put it all in one threaded process. Using multiple processes with a
shared environment is no more robust than a single threaded process, so
this isn't a reliability hit...assuming your other libraries are
thread-safe...
Not sure I agree with this, but that's not the point here, I think.

Quote:

I suppose a real masochist could try writing a memory allocator that
gave our chunks in a shared memory region, then tell DB to use that for
its internal allocations using db_env_set_func_{malloc,realloc,free}(),
hack DB to always use inter-process mutex locks, and see whether the
handles could then be shared between forked child processes, but that
would hardly be a supportable setup.
I think its a bit more involved than that. If you are not careful
your shared memory region will not be at the same address in all processes,
then all pointers off the dbp and dbp->dbenv become relative rather
than absolute. Some OS do not promise to put shared memory at the
same address even if the processes are identical. (I don't actually
know if they fail to do so in that case.)

Michael Ubell
Sleepycat Software.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.