dbTalk Databases Forums  

Years of rewriting databases. DB_RUNRECOVERY. -30987. Also, -30981 DB_PAGE_NOTFOUND

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Years of rewriting databases. DB_RUNRECOVERY. -30987. Also, -30981 DB_PAGE_NOTFOUND in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Brandon Rotated
 
Posts: n/a

Default Years of rewriting databases. DB_RUNRECOVERY. -30987. Also, -30981 DB_PAGE_NOTFOUND - 12-13-2005 , 08:31 AM






Hello. For years we've been having periodic problems with our Berkeley
DBs -- some type of problem with the data file itself -- it seems to
affect databases more frequently if they are accessed at higher volume,
and it's likely, but not necessarily true, that it is related to
concurrent access to the databases. We have a locking system where
readers/writers must obtain the lock prior to the db->open() (shared
readers, exclusive writers; again, the lock is obtained before the file
is accessed in any way).

They are created as single files, not in environments, and there are
several thousand of them, each accessed at a frequency related to its
own area.

Periodically we'll end up with an error in a database, or in
get/putting records. Error -30987 (DB_RUNRECOVERY "Panic return"), or
-30981 (on a put(), for instance). -30981 is
DB_PAGE_NOTFOUND/"Requested page not found." (Details included out of
consideration for people searching messages).

When the problem(s) occur on any particular database (remember, there
are thousands), I use a small script to "cursor-through" copying every
key to a new database, then I move that one into place (of course, all
the while maintaining the separate lock).

Please tell me what possible reasons there could be for getting
corruption in databases? These are not multi-threaded, but
individually-hit CGI's coded in C. There's one location where
fork()'ing is used (actually a double fork() to free the web-visitor
from the process while some database handling is done), but
fflush(NULL) is called before, and each fork()'s child exits with
_exit(0).

If you need anything else answered to help, please do. We provide free
educational and communication services, and thousands of people are
affected by this daily, as it occurs pretty often.

Thank you.

Sincerely,
Brandon


Reply With Quote
  #2  
Old   
Florian Weimer
 
Posts: n/a

Default Re: Years of rewriting databases. DB_RUNRECOVERY. -30987. Also, -30981 DB_PAGE_NOTFOUND - 12-13-2005 , 09:54 AM






* Brandon Rotated:

Quote:
Please tell me what possible reasons there could be for getting
corruption in databases?
Faulty hardware, bugs in application code (I once omitted DB_THREAD on
an environment which was used for changing databases and checkpointing
at the same time, duh), kernel bugs, Berkeley DB bugs.

Try enabling DB_CHKSUM; in my experience, this is pretty good at
detecting hardware issues.


Reply With Quote
  #3  
Old   
Brandingularity@gmail.com
 
Posts: n/a

Default Re: Years of rewriting databases. DB_RUNRECOVERY. -30987. Also, -30981 DB_PAGE_NOTFOUND - 12-16-2005 , 10:27 AM



It doesn't seem like a hardware problem as it happens on multiple
very-different systems, and has happened over the years with complete
changes in hardware. I'm presently reading through the source to find
when a -30987 (DB_PAGE_NOTFOUND) is returned, and perhaps I can find
something as to its originating cause. I'm not familiar with the code
to Berkeley DB at all, so I'm at a huge disadvantage here. Again, I
figure there's some possibility my own code has some stray pointer
causing it, but I *did* find something which seemed to be the same
problem (or very close) in a Python mailing list where someone was
experiencing this problem with high frequency access to some databases
-- although I saw a very similar, if not same problem mentioned on a
Python mailing list:
https://sourceforge.net/tracker/?fun...&group_id=5470

(I responded to them, mentioning that they shouldn't so quickly give up
on the idea that it might be the method of interfacing to BDB, and not
BDB's fault itself, but for some reason I don't see my response there.)

In any case, I am not ignoring your recommendation to enable DB_CHKSUM,
it's just that there are thousands of already-existing databases at the
moment, on a live system, and from my reading it would seem that I
would need to rewrite each of them with DB_CHKSUM enabled (and I'm not
too certain about CHKSUM's overhead, nor if it would direct me to a
solution to be "worth it"). We'll see -- I'm not closed to the idea.




Florian Weimer wrote:
Quote:
* Brandon Rotated:

Please tell me what possible reasons there could be for getting
corruption in databases?

Faulty hardware, bugs in application code (I once omitted DB_THREAD on
an environment which was used for changing databases and checkpointing
at the same time, duh), kernel bugs, Berkeley DB bugs.

Try enabling DB_CHKSUM; in my experience, this is pretty good at
detecting hardware issues.


Reply With Quote
  #4  
Old   
bostic@sleepycat.com
 
Posts: n/a

Default Re: Years of rewriting databases. DB_RUNRECOVERY. -30987. Also, -30981 DB_PAGE_NOTFOUND - 12-29-2005 , 08:22 AM



Quote:
For years we've been having periodic problems with our Berkeley
DBs -- some type of problem with the data file itself -- it
seems to affect databases more frequently if they are accessed
at higher volume, and it's likely, but not necessarily true,
that it is related to concurrent access to the databases. We
have a locking system where readers/writers must obtain the lock
prior to the db->open() (shared readers, exclusive writers;
again, the lock is obtained before the file is accessed in any
way).
You might consider switching to the Berkeley DB Concurrent Data
Store configuration, as it provides multiple-reader, single
writer semantics, so you wouldn't have to implement your own
locking. BDB locking may offer advantages over your locking as
well, such as starvation avoidance. For more information,
please see:

http://www.sleepycat.com/docs/ref/cam/intro.html

Quote:
Periodically we'll end up with an error in a database, or in
get/putting records. Error -30987 (DB_RUNRECOVERY "Panic
return"), or -30981 (on a put(), for instance). -30981 is
DB_PAGE_NOTFOUND/"Requested page not found." (Details
included out of consideration for people searching messages).
Generally these errors (DB_PAGE_NOTFOUND, in particular)
indicate database corruption.

The most common cause of database corruption in
non-transactional applications is application or system failure.
In the case of failure, after the application has modified a
database and has not subsequently flushed the database to stable
storage (by calling either the DB->close, DB->sync or
DB_ENV->memp_sync methods), the database may be left in a
corrupted state.

In the case of failure, before accessing the database again, the
database should either be:

+ removed and re-created,

+ removed and restored from the last known good backup, or

+ verified using the DB->verify method or db_verify
utility. If the database does not verify cleanly, the
contents may be salvaged using the -R and -r options of
the db_dump utility.

Applications where the potential for data loss is unacceptable
should consider the Berkeley DB Transactional Data Store
product, which offers standard transactional durability
guarantees, including recoverability after failure.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (Yahoo IM)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com



Reply With Quote
  #5  
Old   
bostic@sleepycat.com
 
Posts: n/a

Default Re: Years of rewriting databases. DB_RUNRECOVERY. -30987. Also, -30981 DB_PAGE_NOTFOUND - 12-29-2005 , 08:22 AM



Quote:
For years we've been having periodic problems with our Berkeley
DBs -- some type of problem with the data file itself -- it
seems to affect databases more frequently if they are accessed
at higher volume, and it's likely, but not necessarily true,
that it is related to concurrent access to the databases. We
have a locking system where readers/writers must obtain the lock
prior to the db->open() (shared readers, exclusive writers;
again, the lock is obtained before the file is accessed in any
way).
You might consider switching to the Berkeley DB Concurrent Data
Store configuration, as it provides multiple-reader, single
writer semantics, so you wouldn't have to implement your own
locking. BDB locking may offer advantages over your locking as
well, such as starvation avoidance. For more information,
please see:

http://www.sleepycat.com/docs/ref/cam/intro.html

Quote:
Periodically we'll end up with an error in a database, or in
get/putting records. Error -30987 (DB_RUNRECOVERY "Panic
return"), or -30981 (on a put(), for instance). -30981 is
DB_PAGE_NOTFOUND/"Requested page not found." (Details
included out of consideration for people searching messages).
Generally these errors (DB_PAGE_NOTFOUND, in particular)
indicate database corruption.

The most common cause of database corruption in
non-transactional applications is application or system failure.
In the case of failure, after the application has modified a
database and has not subsequently flushed the database to stable
storage (by calling either the DB->close, DB->sync or
DB_ENV->memp_sync methods), the database may be left in a
corrupted state.

In the case of failure, before accessing the database again, the
database should either be:

+ removed and re-created,

+ removed and restored from the last known good backup, or

+ verified using the DB->verify method or db_verify
utility. If the database does not verify cleanly, the
contents may be salvaged using the -R and -r options of
the db_dump utility.

Applications where the potential for data loss is unacceptable
should consider the Berkeley DB Transactional Data Store
product, which offers standard transactional durability
guarantees, including recoverability after failure.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (Yahoo IM)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.