dbTalk Databases Forums  

Explicitly clearing BDB cache?

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Explicitly clearing BDB cache? in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Klaas
 
Posts: n/a

Default Explicitly clearing BDB cache? - 07-27-2006 , 10:28 PM






I've been running some synthetic tests to weasel out the intricacies of
BDB performance. BTREE access method and small constant-size
keys/values were used for the tests.

As expected ordered BTREE insertion was ridiculously faster than
random. However, random insertion can be improved substantially by
increasing the cache size. Typically, the performance was equivalent
until the cache was exhausted, at which point the cached data was
written and performance plummeted.

Once the cache is exhausted and flushed, the data remains in the cache
(to speedup retrieval no doubt), so as more data is added, the cache is
no longer flushed in large batches so disk i/o is essentially random.

However, if the cache was flushed, then another large batch could be
constructed in memory and written to disk in-order. To simulate this,
I wrote in large, ordered batches (but random data within the batches),
and performance was only slightly degraded from a completely-ordered
insertion.

Is it possible to flush the cache explicitly (or automatically) if I
know the data is useless?

Thanks,
-Mike


Reply With Quote
  #2  
Old   
Florian Weimer
 
Posts: n/a

Default Re: Explicitly clearing BDB cache? - 07-29-2006 , 01:37 AM






* Klaas:

Quote:
However, if the cache was flushed, then another large batch could be
constructed in memory and written to disk in-order.
You'd still need to read the old page contents from disk, so flushing
the cache won't buy you nmuch, I guess.


Reply With Quote
  #3  
Old   
elie
 
Posts: n/a

Default Re: Explicitly clearing BDB cache? - 07-31-2006 , 01:08 PM



I'm having cache problems too,
http://groups.google.com/group/comp....1a274e/?hl=en#
the data on the disk is updated by process 2 but process 1 has it
cached and doesn't know that it needs updating.


Reply With Quote
  #4  
Old   
AT
 
Posts: n/a

Default Re: Explicitly clearing BDB cache? - 08-01-2006 , 03:05 AM



Hi Mike,

First of all, I want you to specify how does your data/keys pairs look
like and what's your cache size?

Maybe you can tell us what kind of Berkeley DB product are you using
and on what platform.
Indeed, disk I/O is very painful from the performance point of view.
Yes, is possible to flush the cache by using DB->sync method (
http://www.sleepycat.com/docs/api_c/db_sync.html ), which flushes any
cached information to disk.

You can find more informations regarding the database cache flushing,
here: http://www.sleepycat.com/docs/ref/am/sync.html

Also, what will depend on your application is that if you are using
checkpoints, you have to know that the checkpoints write dirty pages
from cache to files, but be aware that in the same time checkpointing
is very I/O intensive:
http://www.sleepycat.com/docs/ref/tr...heckpoint.html

Is your system multithreading, and if yes, how many threads are working
with DB app?
You might want to use DB_ENV->memp_trickle, which can be run
periodically to make sure that at least N% of the cache is clean:

http://www.sleepycat.com/docs/api_c/memp_trickle.html
http://www.sleepycat.com/docs/api_c/memp_list.html
Be aware that if the same set of pages are repeatedly written, trickle
may write them out over and over - which means wasting disk bandwidth
and CPU time, without increasing your performance.

I don't know if you are using db_stat to monitor cache efficiency, but
you should use all the statistics commands: db_stat -m or
DB_ENV->memp_stat or DB_ENV->memp_stat_print.
http://www.sleepycat.com/docs/ref/am/stat.html

For example, you can try db_stat -m -h HOMRDIR for mpool summary
statistics and per-file statistics, or db_stat -MA -h HOMEDIR for
detaild infor per-file and summery of each page in pool.

Regards,
Bogdan Coman, Oracle


Reply With Quote
  #5  
Old   
Klaas
 
Posts: n/a

Default Re: Explicitly clearing BDB cache? - 08-02-2006 , 03:37 PM



Florian Weimer wrote:

Quote:
You'd still need to read the old page contents from disk, so flushing
the cache won't buy you nmuch, I guess.
I think this is disproven by the fact that I can insert in large sorted
batches efficiently (each batch contains random key values: it is not
batches of a large sorted set, but sorted batches of a large random
set).

-Mike



Reply With Quote
  #6  
Old   
Klaas
 
Posts: n/a

Default Re: Explicitly clearing BDB cache? - 08-02-2006 , 03:48 PM



bogdan_dorian (AT) yahoo (DOT) com wrote:
Quote:
Hi Mike,

First of all, I want you to specify how does your data/keys pairs look
like and what's your cache size?
This is a purely synthetic test. Keys are SHA-1 hashes of random data
(20 random bytes), values consist of the single character '0'. The
product is BTREE, only one thread is being used, and the cache size
varies. The performance characteristics I describe require more data
to manifest with a larger cache, but the same thing eventually occurs.

Quote:
Maybe you can tell us what kind of Berkeley DB product are you using
and on what platform.
BDB 4.4.20 on fedora core 4. Underlying disk is software RAID-5 on
SATA-2 disks.

Quote:
Indeed, disk I/O is very painful from the performance point of view.
Yes, is possible to flush the cache by using DB->sync method (
http://www.sleepycat.com/docs/api_c/db_sync.html ), which flushes any
cached information to disk.
Indeed, but it does not empty the cache or clear of likely-unusable
pages.

Quote:
Also, what will depend on your application is that if you are using
checkpoints, you have to know that the checkpoints write dirty pages
from cache to files, but be aware that in the same time checkpointing
is very I/O intensive:
No worries about that.

Quote:
For example, you can try db_stat -m -h HOMRDIR for mpool summary
statistics and per-file statistics, or db_stat -MA -h HOMEDIR for
detaild infor per-file and summery of each page in pool.
I'll try that. Is there any way to influence the way BDB desides which
pages to evict? I think a put()-centric eviction policy might perform
better for large batches of insertions.

-Mike



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.