dbTalk Databases Forums  

Performance problem

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Performance problem in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
AT
 
Posts: n/a

Default Performance problem - 07-27-2006 , 05:54 AM






Hi all,

I'm using BDB C++ (BDB version 4.4.20 on a Solaris 10 platform with g++
3.4.3 compiler) since 1 month and I'm facing a serious performance
problem.

We use BDB as cache for a lot of information stored into another very
big DBMS because we need to reduce the load for the latter. We need to
handle a lot of searches (and in some case inserts/deletes/updates) in
the cache every second: at least 10K operation per second.

We choose BDB because first tests lead us to a comfortable rate with
our simulations, but, unfortunately it was the wrong simulation. I try
to explain it better.
The application we wrote creates an environment shared between two
databases where one is a secondary db associated to the other one. Both
databases are created with BTREE method. The environment is created as
Concurrent Data Store (CDB) as we don't want to handle transaction and
concurrency by ourselves. Keys for both databases are string (char* at
application level) containing only numeric chars, so in the real world
they are numbers.
We ran the product evaluation tests using a bulk of pkey-skey ordered
and contiguous and we obtained a good number (about 25K operations per
second). Now we are testing our system using a bulk of random pairs
(unordered and not contiguous) and we obtain a rate of 400-500
operations per second. We try to use the HASH method, but we obtained
about 1K operations per second, that is far from our target. We can't
order incoming data, because for now we are doing our tests using
pre-created files, but in the production environment we'll get requests
from the network and we won't have the chance to order them.

About the problem, we noticed a lot of I/O wait for our application, so
one action I think is possible to do is to increase the BDB page cache
size. Any suggestion?

Thank you

Daniele


Reply With Quote
  #2  
Old   
Michael Cahill
 
Posts: n/a

Default Re: Performance problem - 07-27-2006 , 06:48 PM






Hi Daniele,

Quote:
We ran the product evaluation tests using a bulk of pkey-skey ordered
and contiguous and we obtained a good number (about 25K operations per
second). Now we are testing our system using a bulk of random pairs
(unordered and not contiguous) and we obtain a rate of 400-500
operations per second. We try to use the HASH method, but we obtained
about 1K operations per second, that is far from our target.
This sounds as if most requests are missing the Berkeley DB cache.
This is the amount of data Berkeley DB maintains in RAM.

How big is your Berkeley DB cache, and how does that compare with the
amount of data you are searching? In other words, what is the possible
range of values of the incoming requests? When your application runs
in production, do you expect that requests will really be random?

Regards,
Michael.



Reply With Quote
  #3  
Old   
AT
 
Posts: n/a

Default Re: Performance problem - 07-28-2006 , 10:16 AM



Hi Michael,

thank you for your prompt reply.

I solved partially the problem by increasing the cache size using the
Env::set_cachesize method and setting it to 1Gb. I said 'partially'
because it seems that performs better for little bulks of data (about
3-4 Millions), but when you try to insert larger bulks the BTREE
performs bad. I think there are a lot of things I have to know about
this product. So any experienced suggestion would be appreciated.

Primary and secondary keys are identifiers composed by 16 digits (used
as C++ char*).

If with 'random requests' you mean if there will be locality between
two consecutive requests the answer is 'No'. I mean that a request
isn't related at all with the previous one and we aren't able to sort
their incoming.

Best regards

Daniele



Michael Cahill ha scritto:

Quote:
Hi Daniele,

We ran the product evaluation tests using a bulk of pkey-skey ordered
and contiguous and we obtained a good number (about 25K operations per
second). Now we are testing our system using a bulk of random pairs
(unordered and not contiguous) and we obtain a rate of 400-500
operations per second. We try to use the HASH method, but we obtained
about 1K operations per second, that is far from our target.

This sounds as if most requests are missing the Berkeley DB cache.
This is the amount of data Berkeley DB maintains in RAM.

How big is your Berkeley DB cache, and how does that compare with the
amount of data you are searching? In other words, what is the possible
range of values of the incoming requests? When your application runs
in production, do you expect that requests will really be random?

Regards,
Michael.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.