multi-database and multi-process locking from Perl -
01-30-2006
, 10:19 AM
We're about to swicth from using CDB, which has some beautiful
performance characteristics, but lacks acceptable support for
on-the-fly update of large databases, to BerkeleyDB, which can handle
our larger databases (~5-10m records) and also provide on-the-fly
changes. Here's the environment:
- Access from Perl/BerkeleyDB (thanks, Paul!)
- Running on redhat
- Databases are accessed from command-line scripts, a multi-process
daemon, and an apache/mod_perl application
- 90% of all activity is read only
- we have 6 distinct databases
- database updates are either a) occasional inserts of a few key/value
pairs, b) or insert/updates of anywhere from a few thousand to ~200,000
records
My great fear in making the switch is handling concurrent access
properly, and the possibility of database corruption. Per my read of
the docs and some other posts, here's what I believe should work:
1. As long as we open the database with "DB_CREATE| DB_INIT_CDB |
DB_INIT_MPOOL", Berkeley should handle all concurrency for us, even if
access is from completely different processes, for each of the
databases we interact with (via db_get, db_put, cursors, etc.). No
explicit lock calls should be required from the Perl code. (We expect
Berkeley to lock on a per database basis, as opposed to the behavior
under DB_CDB_ALLDB.)
2. As long as we trap interrupt signals and cleanly disconnect on
DESTROY, the database should remain relatively healthy and uncorrupted.
Are our assumptions correct? Are we missing something major?
Any insight into helping us manage our concurrency and database
integrity from Perl is greatly appreciated!
Best,
Matthew |