dbTalk Databases Forums  

Corrupt DBs in Perl w/ BerkeleyDB & MLDBM

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Corrupt DBs in Perl w/ BerkeleyDB & MLDBM in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Dwight Oakey
 
Posts: n/a

Default Corrupt DBs in Perl w/ BerkeleyDB & MLDBM - 05-08-2006 , 08:50 AM






Hello all,

I am new to the BerkeleyDB (BDB) environment, and I wish to seek
clarification on proper set-up and use. My environment is Windows 2003
Server running the built-in IIS. I am calling Perl CGI on every web
page, which may or may not actually use the BDB. My needs are (well,
should be) quite simple, in that I need multiple reads with only
occasional writes to the BDB. My BDB only has approximately 6000
records. The structure of the BDB is a Hash-of-Hases -- well,
actually, it's a Hash-of-Hash-of-Hash-of-Hashes. :-)

Quote:
From the CGI, I am (currently) connecting to the BDB like this, (brief
snippet of actual code):
<code>
use BerkeleyDB ;
use MLDBM qw(BerkeleyDB::Hash Storable);
my %lexicon;
tie %lexicon, "MLDBM", -Filename => $filename,
-Flags => DB_CREATE | DB_INIT_LOCK
or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;
# ...read a value from the tied hash, and set it to a variable
# that I use for output...
# ...if need be, "write" to the tied hash a new value...
untie %lexicon;
</code>

While this works in development and testing, I encounter "corruption"
in the Production environment. First off, there are more concurrent
users in Production than we had in Testing. Secondly, (and yes, I
know, it's my fault for not Testing this...) the Production environment
uses 2, "load-balanced" web servers, each with it's own BDB file; but,
the load-balancing can change a user in mid-session from one web server
to another (and therefore from one BDB to another.)

Now, with that said, I believe I should change my BDB access to this:
<code>
use BerkeleyDB ;
use MLDBM qw(BerkeleyDB::Hash Storable);
my $env = new BerkeleyDB::Env
-Flags => { DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL };
my %lexicon;
tie %lexicon, "MLDBM", -Filename => $filename, -Env => $env
or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;
# ...read a value from the tied hash, and set it to a variable
# that I use for output...
# ...if need be, "write" to the tied hash a new value...
untie %lexicon;
</code>

Could someone please confirm/verify if my second choice is good or
proper? My research seems to indicate that the Concurrent Data Mode
provides an adequate level of locking for my needs, without becoming
over complicated. Whereas, my first option just lacks the proper
execution (with what I'll call "manual" locking) during my occasional
writes. (I know I didn't include that bit of code here. I can if
that's necessary, but I didn't want to make this already long post even
longer...) :-[

Many thanks in advance!

-Dwight



Reply With Quote
  #2  
Old   
Paul Marquess
 
Posts: n/a

Default Re: Corrupt DBs in Perl w/ BerkeleyDB & MLDBM - 05-08-2006 , 09:11 AM







"Dwight Oakey" <doakey (AT) oneil (DOT) com> wrote

Quote:
Hello all,

I am new to the BerkeleyDB (BDB) environment, and I wish to seek
clarification on proper set-up and use. My environment is Windows 2003
Server running the built-in IIS. I am calling Perl CGI on every web
page, which may or may not actually use the BDB. My needs are (well,
should be) quite simple, in that I need multiple reads with only
occasional writes to the BDB. My BDB only has approximately 6000
records. The structure of the BDB is a Hash-of-Hases -- well,
actually, it's a Hash-of-Hash-of-Hash-of-Hashes. :-)

From the CGI, I am (currently) connecting to the BDB like this, (brief
snippet of actual code):
code
use BerkeleyDB ;
use MLDBM qw(BerkeleyDB::Hash Storable);
my %lexicon;
tie %lexicon, "MLDBM", -Filename => $filename,
-Flags => DB_CREATE | DB_INIT_LOCK
or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;
# ...read a value from the tied hash, and set it to a variable
# that I use for output...
# ...if need be, "write" to the tied hash a new value...
untie %lexicon;
/code

While this works in development and testing, I encounter "corruption"
in the Production environment. First off, there are more concurrent
users in Production than we had in Testing. Secondly, (and yes, I
know, it's my fault for not Testing this...) the Production environment
uses 2, "load-balanced" web servers, each with it's own BDB file; but,
the load-balancing can change a user in mid-session from one web server
to another (and therefore from one BDB to another.)

Now, with that said, I believe I should change my BDB access to this:
code
use BerkeleyDB ;
use MLDBM qw(BerkeleyDB::Hash Storable);
my $env = new BerkeleyDB::Env
-Flags => { DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL };
my %lexicon;
tie %lexicon, "MLDBM", -Filename => $filename, -Env => $env
or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;
# ...read a value from the tied hash, and set it to a variable
# that I use for output...
# ...if need be, "write" to the tied hash a new value...
untie %lexicon;
/code

Could someone please confirm/verify if my second choice is good or
proper? My research seems to indicate that the Concurrent Data Mode
provides an adequate level of locking for my needs, without becoming
over complicated. Whereas, my first option just lacks the proper
execution (with what I'll call "manual" locking) during my occasional
writes. (I know I didn't include that bit of code here. I can if
that's necessary, but I didn't want to make this already long post even
longer...) :-[
Your second choice is the one to go with.

One things to be careful of is if you carry out any updates on the database
using CDS mode. For example, consider this

$lexicon{"alpha"} ++ ;

Although that is a single Perl statement, it will trigger two low-level
Berkeley DB calls. Here is what happens behind the scenes

obtain a read lock
get "alpha" from the database
release read lock
increment value associated with "alpha"
obtain write lock
write to database
release write lock.

In short there is a race condition lurking in there. To fix this problem you
need a long-lived lock.

A convenience method, called cds_lock, is supplied with the BerkeleyDB
module for this purpose. Using cds_lock, the code can now be rewritten thus:



my $lk = $dbh->cds_lock() ;

$lexicon{"alpha"} ++ ;

$lk->cds_unlock;



or this, where scoping is used to limit the lifetime of the lock object



{

my $lk = $dbh->cds_lock() ;

$lexicon{"alpha"} ++ ;

}




Paul




Reply With Quote
  #3  
Old   
Dwight Oakey
 
Posts: n/a

Default Re: Corrupt DBs in Perl w/ BerkeleyDB & MLDBM - 05-08-2006 , 09:26 AM



Paul Marquess wrote:
Quote:
Your second choice is the one to go with.
Thank you. And, as I understand it, Thank You for the great module!

<snipped for brevity>

Quote:
In short there is a race condition lurking in there. To fix this problem you
need a long-lived lock.

A convenience method, called cds_lock, is supplied with the BerkeleyDB
module for this purpose. Using cds_lock, the code can now be rewritten thus:

my $lk = $dbh->cds_lock() ;
$lexicon{"alpha"} ++ ;
$lk->cds_unlock;

or this, where scoping is used to limit the lifetime of the lock object

{
my $lk = $dbh->cds_lock() ;
$lexicon{"alpha"} ++ ;
}

Paul
Paul, this is the type of "manual" locking I mentioned in my first
post. Is this _required_? Or is it just good practice? Presently, we
are using the tied Hash approach, and therefore we are NOT implementing
any _explicit_ locks. If this is required syntax, then should we be
using this OO approach, instead of the tied hash?

Again, thanks!

-Dwight



Reply With Quote
  #4  
Old   
Paul Marquess
 
Posts: n/a

Default Re: Corrupt DBs in Perl w/ BerkeleyDB & MLDBM - 05-08-2006 , 10:33 AM



"Dwight Oakey" <doakey (AT) oneil (DOT) com> wrote

Quote:
Paul Marquess wrote:
....

Paul, this is the type of "manual" locking I mentioned in my first
post. Is this _required_? Or is it just good practice? Presently, we
are using the tied Hash approach, and therefore we are NOT implementing
any _explicit_ locks. If this is required syntax, then should we be
using this OO approach, instead of the tied hash?
You can mix and match tied hashes with the OO interface by keeping a copy of
the object returned from the "tie" call, like this

my $db = tie %lexicon, "MLDBM", -Filename => $filename,
-Flags => DB_CREATE | DB_INIT_LOCK
or die "Cannot open file $filename: $! $BerkeleyDB::Error\n"
;


{
my $lock = $db->cds_lock();
++ $lexicon{"fred"}
}

Regarding whether you need to use explicit locking with CDS mode - you MUST
do a "manual" lock if you are doing an update of some kind. For reads from
and writes to the database that are completely independant of each other you
don't need to do any locking at all -- Berkeley DB will handle all the
locking behind the scenes for you.

Paul




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.