dbTalk Databases Forums  

sequences and transactions and recovery, oh my

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss sequences and transactions and recovery, oh my in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ben Pfaff
 
Posts: n/a

Default sequences and transactions and recovery, oh my - 03-22-2006 , 01:54 PM






I've been having some trouble with recovery in a transactional
application of mine. I managed to reduce the problem to recovery
of sequences. I've appended to this article a minimal test
program that illustrates the problem. The program creates a
transactional environment, creates a database in the environment,
creates a sequence in the database, retrieves one value from the
sequence, and exits without closing the sequence or database or
environment.

Here's how I can see the problem:

$ rm -rf db # Make sure the environment is clean.
$ mkdir db # Create directory for environment.
$ ./test # Run once to set everything up.
$ ./test # Run second time to recover.

The first run succeeds. The second run fails with:

Finding last valid log LSN: file: 1 offset 8870
Recovery starting from [1][28]
DB_ENV->log_flush: LSN of 513/512 past current end-of-log of 1/8870
Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
seq-db: unable to flush page: 2
txn_checkpoint: failed to flush the buffer cache DB_RUNRECOVERY: Fatal error, run database recovery
PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
PANIC: fatal region error detected; run recovery
unable to join the environment
test.c:42: "dbenv->open (dbenv, "db", DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN | DB_RECOVER | DB_THREAD, 0)" failed (DB_RUNRECOVERY: Fatal error, run database recovery)

I saw a FAQ that a common reason for failed log recovery is
failing to enclose an operation in a transaction. But I don't
think that is the problem here. After the first run, this is
what printlog shows:

$ db4.4_printlog -h db|grep txn
[1][28]__db_debug: rec: 47 txnid 80000001 prevlsn [0][0]
[1][76]__fop_create: rec: 143 txnid 80000002 prevlsn [0][0]
[1][126]__fop_write: rec: 145 txnid 80000002 prevlsn [1][76]
[1][4288]__fop_write: rec: 145 txnid 80000002 prevlsn [1][126]
[1][8450]__fop_rename: rec: 146 txnid 80000002 prevlsn [1][4288]
[1][8531]__txn_child: rec: 12 txnid 80000001 prevlsn [1][28]
[1][8571]__dbreg_register: rec: 2 txnid 80000001 prevlsn [1][8531]
[1][8654]__txn_regop: rec: 10 txnid 80000001 prevlsn [1][8571]
[1][8698]__ham_replace: rec: 25 txnid 80000003 prevlsn [0][0]
[1][8826]__txn_regop: rec: 10 txnid 80000003 prevlsn [1][8698]

I don't know where the LSN 513/512 in the error message is coming
from. There's only one log file:

$ ls -l db
total 100
-rw-r----- 1 blp blp 24576 Mar 22 11:48 __db.001
-rw-r----- 1 blp blp 278528 Mar 22 11:48 __db.002
-rw-r----- 1 blp blp 270336 Mar 22 11:48 __db.003
-rw-r----- 1 blp blp 98304 Mar 22 11:48 __db.004
-rw-r----- 1 blp blp 352256 Mar 22 11:48 __db.005
-rw-r----- 1 blp blp 16384 Mar 22 11:48 __db.006
-rw-r----- 1 blp blp 10485760 Mar 22 11:48 log.0000000001
-rw-r----- 1 blp blp 12288 Mar 22 11:48 seq-db

This is with Debian's packaged libdb4.4, version 4.4.20-3, on
x86. I also downloaded and compiled BDB directly from
sleepycat.com this morning and see the same behavior with that
library.

Can anyone provide me some guidance on this?

Here's the test program:

#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <db.h>

#define MUST_SUCCEED(expr) must_succeed (expr, #expr, __LINE__)

static void
must_succeed (int db_errno, const char *expr, int line_number)
{
if (db_errno != 0) {
fprintf (stderr, "%s:%d: \"%s\" failed (%s)\n",
__FILE__, line_number, expr, db_strerror (db_errno));
exit (EXIT_FAILURE);
}
}

int
main (void)
{
const DBTYPE type = DB_HASH;
const char *db_name = "seq-db";
const char *sequence_name = "seq-key";

DB_ENV *dbenv;
DB *db;
DBT sequence_key;
DB_SEQUENCE *seq;
db_seq_t value;

/* Open environment. */
MUST_SUCCEED (db_env_create (&dbenv, 0));
dbenv->set_errfile (dbenv, stderr);
MUST_SUCCEED (dbenv->set_verbose (dbenv, DB_VERB_RECOVERY, 1));
MUST_SUCCEED (dbenv->set_flags (dbenv, DB_AUTO_COMMIT, 1));
MUST_SUCCEED (dbenv->open (dbenv, "db",
DB_CREATE
Quote:
DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL
DB_INIT_TXN
DB_RECOVER
DB_THREAD, 0));
/* Open database. */
MUST_SUCCEED (db_create (&db, dbenv, 0));
MUST_SUCCEED (db->open (db, NULL, db_name, NULL, type,
DB_AUTO_COMMIT | DB_CREATE | DB_THREAD, 0));

/* Open sequence. */
memset (&sequence_key, 0, sizeof sequence_key);
sequence_key.data = (char *) sequence_name;
sequence_key.size = strlen (sequence_name);
MUST_SUCCEED (db_sequence_create (&seq, db, 0));
MUST_SUCCEED (seq->initial_value (seq, 1));
MUST_SUCCEED (seq->open (seq, NULL, &sequence_key,
DB_AUTO_COMMIT | DB_CREATE | DB_THREAD));

/* Obtain sequence number. */
MUST_SUCCEED (seq->get (seq, NULL, 1, &value, DB_AUTO_COMMIT));

return 0;
}

--
Ben Pfaff
email: blp (AT) cs (DOT) stanford.edu
web: http://benpfaff.org


Reply With Quote
  #2  
Old   
andrew.bell.ia@gmail.com
 
Posts: n/a

Default Re: sequences and transactions and recovery, oh my - 03-22-2006 , 02:42 PM






I don't have any problem with this if the database and environment are
closed:

MUST_SUCCEED (db->close(db, 0));
MUST_SUCCEED (dbenv->close(dbenv, 0));

Cheers,

-- Andrew Bell
andrew.bell.ia (AT) gmail (DOT) com


Reply With Quote
  #3  
Old   
Ben Pfaff
 
Posts: n/a

Default Re: sequences and transactions and recovery, oh my - 03-22-2006 , 02:46 PM



andrew.bell.ia (AT) gmail (DOT) com writes:

Quote:
I don't have any problem with this if the database and environment are
closed:

MUST_SUCCEED (db->close(db, 0));
MUST_SUCCEED (dbenv->close(dbenv, 0));
I thought that the point to transactional logging was that one
got reasonable behavior even when a program or a system is
terminated unexpectedly. Do I misunderstand?
--
Ben Pfaff
email: blp (AT) cs (DOT) stanford.edu
web: http://benpfaff.org


Reply With Quote
  #4  
Old   
andrew.bell.ia@gmail.com
 
Posts: n/a

Default Re: sequences and transactions and recovery, oh my - 03-22-2006 , 03:28 PM



Sorry,

Missed the first sentence where you said that this was a concious part
of your testing. I did notice that things work if you wrap things in a
transaction:

MUST_SUCCEED (dbenv->txn_begin(dbenv, NULL, &tid, 0));
MUST_SUCCEED (seq->open (seq, tid, &sequence_key,
DB_CREATE | DB_THREAD));

/* Obtain sequence number. */

MUST_SUCCEED (seq->get (seq, tid, 1, &value, 0));
MUST_SUCCEED (tid->commit(tid, 0));

There is something in the docs about putting the sequence calls in a
transaction if the db->open is transaction protected, but I didn't
understand the difference between the implicit transaction semantics.

Cheers,

-- Andrew Bell
andrew.bell.ia (AT) gmail (DOT) com


Reply With Quote
  #5  
Old   
Florian Weimer
 
Posts: n/a

Default Re: sequences and transactions and recovery, oh my - 03-22-2006 , 03:41 PM



* Ben Pfaff:

Quote:
I've been having some trouble with recovery in a transactional
application of mine. I managed to reduce the problem to recovery
of sequences. I've appended to this article a minimal test
program that illustrates the problem.
Interesting. There is no obvious usage error in your test case, so
this is probably a bug in Berkeley DB.

A work-around is to force a checkpoint just after you create the
sequence:

// ...
MUST_SUCCEED (seq->open (seq, NULL, &sequence_key,
DB_AUTO_COMMIT | DB_CREATE | DB_THREAD));
MUST_SUCCEED (dbenv->txn_checkpoint (dbenv, 0, 0, DB_FORCE));
// ...

A race condition probably remains, but depending on your application,
this might do the trick until a real fix is available.


Reply With Quote
  #6  
Old   
Patrick Schaaf
 
Posts: n/a

Default Re: sequences and transactions and recovery, oh my - 03-23-2006 , 12:36 AM



Ben Pfaff <blp (AT) cs (DOT) stanford.edu> writes:

Quote:
I thought that the point to transactional logging was that one
got reasonable behavior even when a program or a system is
terminated unexpectedly. Do I misunderstand?
transactional logging provides _consistent_ behaviour, for various
definitions of consistency (google for ACID), and of course in the
absence of plain software bugs.

"reasonable" is a personal value judgement, and unimplementable in principle.

best regards
Patrick


Reply With Quote
  #7  
Old   
Patrick Schaaf
 
Posts: n/a

Default Re: sequences and transactions and recovery, oh my - 03-23-2006 , 01:14 AM



Quote:
The first run succeeds. The second run fails with:
....
Quote:
test.c:42: "dbenv->open (dbenv, "db", DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN | DB_RECOVER | DB_THREAD, 0)" failed (DB_RUNRECOVERY: Fatal error, run database recovery)
Q: what happens if, instead of the second ./test run, you run 'db_recover'?
Same question for 'db_recover -c'?

best regards
Patrick


Reply With Quote
  #8  
Old   
Ben Pfaff
 
Posts: n/a

Default Re: sequences and transactions and recovery, oh my - 03-23-2006 , 10:20 AM



mailer-daemon (AT) bof (DOT) de (Patrick Schaaf) writes:

Quote:
The first run succeeds. The second run fails with:

...
test.c:42: "dbenv->open (dbenv, "db", DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN | DB_RECOVER | DB_THREAD, 0)" failed (DB_RUNRECOVERY: Fatal error, run database recovery)

Q: what happens if, instead of the second ./test run, you run 'db_recover'?
Same behavior:

$ rm -rf db
$ mkdir db
$ ./test
$ db4.4_recover -h db
db4.4_recover: DB_ENV->log_flush: LSN of 513/512 past current end-of-log of 1/8870
db4.4_recover: Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
db4.4_recover: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
db4.4_recover: seq-db: unable to flush page: 2
db4.4_recover: txn_checkpoint: failed to flush the buffer cache DB_RUNRECOVERY: Fatal error, run database recovery
db4.4_recover: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
db4.4_recover: DB_ENV->open: DB_RUNRECOVERY: Fatal error, run database recovery
$

Quote:
Same question for 'db_recover -c'?
Again the behavior is identical.
--
Ben Pfaff
email: blp (AT) cs (DOT) stanford.edu
web: http://benpfaff.org


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.