dbTalk Databases Forums  

Improving whole-database scan performance

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Improving whole-database scan performance in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Florian Weimer
 
Posts: n/a

Default Improving whole-database scan performance - 02-14-2007 , 03:24 AM






Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.

Would be using DB_RECNO or a sequence-based key a suitable workaround?

Reply With Quote
  #2  
Old   
Christopher Layne
 
Posts: n/a

Default Re: Improving whole-database scan performance - 02-14-2007 , 08:04 AM






Florian Weimer wrote:

Quote:
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.

Would be using DB_RECNO or a sequence-based key a suitable workaround?
What cache size are you using for the database? Default?


Reply With Quote
  #3  
Old   
Florian Weimer
 
Posts: n/a

Default Re: Improving whole-database scan performance - 02-14-2007 , 12:05 PM



* Christopher Layne:

Quote:
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.

Would be using DB_RECNO or a sequence-based key a suitable workaround?

What cache size are you using for the database? Default?
Uh-no, no, not quite. I've tried various values between 256 MB and
2.5 GB. And "db_dump -r" is acceptably fast (but plain db_dump
isn't).


Reply With Quote
  #4  
Old   
Klaas
 
Posts: n/a

Default Re: Improving whole-database scan performance - 02-14-2007 , 07:55 PM



On Feb 14, 1:24 am, Florian Weimer <f... (AT) deneb (DOT) enyo.de> wrote:
Quote:
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.
How much of your database is located in overflow pages?

-Mike



Reply With Quote
  #5  
Old   
Florian Weimer
 
Posts: n/a

Default Re: Improving whole-database scan performance - 02-15-2007 , 03:42 AM



* Klaas:

Quote:
On Feb 14, 1:24 am, Florian Weimer <f... (AT) deneb (DOT) enyo.de> wrote:
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.

How much of your database is located in overflow pages?
Very little, the records are quite small. I just looked at one of the
smaller database files, and it hasn't got any overflow pages. (The
larger database files contain similarly structured records.)


Reply With Quote
  #6  
Old   
Christopher Layne
 
Posts: n/a

Default Re: Improving whole-database scan performance - 02-16-2007 , 08:30 AM



Florian Weimer wrote:

Quote:
* Christopher Layne:

Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.

Would be using DB_RECNO or a sequence-based key a suitable workaround?

What cache size are you using for the database? Default?

Uh-no, no, not quite. I've tried various values between 256 MB and
2.5 GB. And "db_dump -r" is acceptably fast (but plain db_dump
isn't).
Here is what I see in the source for db_dump.c (4.4.20):
Pay specific attention to the logic of "is_salvage" inside db_init(). rflag
determines if "is_salvage" is 1 or 0. Also, the default cache size within
db_dump is 1 MB.

[these are the only places rflag is relevant:]
[...]
/* Initialize the environment. */
if (db_init(dbenv, home, rflag, cache, &private) != 0)
goto err;

/* Create the DB object and open the file. */
if ((ret = db_create(&dbp, dbenv, 0)) != 0) {
dbenv->err(dbenv, ret, "db_create");
goto err;
}

/*
* If we're salvaging, don't do an open; it might not be safe.
* Dispatch now into the salvager.
*/
if (rflag) {
/* The verify method is a destructor. */
ret = dbp->verify(dbp, argv[0], NULL, stdout,
DB_SALVAGE |
(Rflag ? DB_AGGRESSIVE : 0) |
(pflag ? DB_PRINTABLE : 0));
dbp = NULL;
if (ret != 0)
goto err;
goto done;
}
[...]

[local db_init:]

/*
* db_init --
* Initialize the environment.
*/
int
db_init(dbenv, home, is_salvage, cache, is_privatep)
DB_ENV *dbenv;
char *home;
int is_salvage;
u_int32_t cache;
int *is_privatep;
{
int ret;

/*
* Try and use the underlying environment when opening a database.
* We wish to use the buffer pool so our information is as up-to-date
* as possible, even if the mpool cache hasn't been flushed.
*
* If we are not doing a salvage, we want to join the environment;
* if a locking system is present, this will let us use it and be
* safe to run concurrently with other threads of control. (We never
* need to use transactions explicitly, as we're read-only.) Note
* that in CDB, too, this will configure our environment
* appropriately, and our cursors will (correctly) do locking as CDB
* read cursors.
*
* If we are doing a salvage, the verification code will protest
* if we initialize transactions, logging, or locking; do an
* explicit DB_INIT_MPOOL to try to join any existing environment
* before we create our own.
*/
*is_privatep = 0;
if ((ret = dbenv->open(dbenv, home,
DB_USE_ENVIRON | (is_salvage ? DB_INIT_MPOOL : 0), 0)) == 0)
return (0);
if (ret == DB_VERSION_MISMATCH)
goto err;

/*
* An environment is required because we may be trying to look at
* databases in directories other than the current one. We could
* avoid using an environment iff the -h option wasn't specified,
* but that seems like more work than it's worth.
*
* No environment exists (or, at least no environment that includes
* an mpool region exists). Create one, but make it private so that
* no files are actually created.
*/
*is_privatep = 1;
if ((ret = dbenv->set_cachesize(dbenv, 0, cache, 1)) == 0 &&
(ret = dbenv->open(dbenv, home,
DB_CREATE | DB_INIT_MPOOL | DB_PRIVATE | DB_USE_ENVIRON, 0)) == 0)
return (0);

/* An environment is required. */
err: dbenv->err(dbenv, ret, "DB_ENV->open");
return (1);
}



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.