dbTalk Databases Forums  

Secondary database performance

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Secondary database performance in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
evolvah
 
Posts: n/a

Default Secondary database performance - 04-27-2006 , 12:37 PM






Hi!

I'm currently evaluating several DBMSs for te near real-time
application. Berkeley DB is one of the contenders. Originally, I
expected it to outperform all other competitiors due to the comparably
small overhead. But to my surprise in one of the tests it performed
real bad. I'll describe what type of a test I ran and I hope you may be
able to point out the root cause of such a slow execution.

I'm running Fedora Core 5 on a Dual Pentium4 2.80GHz each. I create the
primary and all the secondary databases on disk. The system is very
slightly loaded, it is my regular development workstation.

The primary table has two "long long int" fields and four "int" fields.
The first "long long" is the key in the primary database, while the
remaining 5 fields are the keys to the secondary databases. Secondaries
have DB_DUPSORT flag set. I insert 1 million records into the primary
table assigning the value of the current record number to every field.
In this setup the whole procedure takes about 16.5 minutes! Here is a
rough estimate provide by "time" comand:

real 16m32.803s
user 0m56.488s
sys 1m10.676s

As you can see it spends over 14 minutes of it waiting on something. I
suppose it is disk I/O.

If I turn on the built-in cache or change flags on the secondary
tables, it does not seem to have any dramatic effect on the numbers.

If I turn off secondary databases, I can have it finished in about 35
seconds.

So, I suspect there is some inefficiency related to how secondary
database are processed. Or, may be I'm overlooking something crucial?
Competitors beat these numbers badly, about 32 seconds vs 16.5 MINUTES
for 6 keys and 16 seconds vs 36 seconds with primary key only.

Any comments?

Berkeley DB version I'm using is 4.4.20. The test code snippet (C++)
can be found here:
http://files.evolver.spb.ru/src/sam-test-BDB.cpp

MyDB class definition is identical to what is used in
"examples_cxx/getting_started" of the original Berkeley DB source tree.


/Sergey


Reply With Quote
  #2  
Old   
Michael Cahill
 
Posts: n/a

Default Re: Secondary database performance - 04-27-2006 , 07:56 PM






Hi Sergey,

The problem with your test is that you're inserting keys essentially in
random order, from Berkeley DB's point of view. That's because your
keys are little-endian integers, which don't sort in the obvious way
when viewed as bit strings (which is what Berkeley DB does by default).

Please see question #5 on this page for more information:

http://www.sleepycat.com/docs/ref/am_misc/faq.html

By adding btree and duplicate comparison callbacks to your code that
compare the integers correctly (code is given on the "Btree comparison"
page linked from the above FAQ), I see about a 10x performance
improvement in your test (to over 30,000 inserts / second, once you
take into account that you are updating 6 databases for each insert).

Regards,
Michael.


Reply With Quote
  #3  
Old   
Karl Waclawek
 
Posts: n/a

Default Re: Secondary database performance - 05-06-2006 , 02:06 PM



evolvah wrote:
Quote:
Hi!

I'm currently evaluating several DBMSs for te near real-time
application. Berkeley DB is one of the contenders. Originally, I
expected it to outperform all other competitiors due to the comparably
small overhead. But to my surprise in one of the tests it performed
real bad. I'll describe what type of a test I ran and I hope you may be
able to point out the root cause of such a slow execution.
snip
Hi Sergey,

I saw your message on the Berkeley DB newsgroup.
I am curious, did the recommendations from Michael help you improve performance?
If yes, how does BDB compare now against the other DBs?

Regards,

Karl


Reply With Quote
  #4  
Old   
Sergey Maslyakov
 
Posts: n/a

Default Re: Secondary database performance - 05-07-2006 , 03:14 AM



Yes, Michael's recommendations did help. My test was able to finish in
about 27 seconds vs around 1000 seconds it took before I provided my
own comparison function.

I have to admit that the potential performance problem is described in
the BDB manual, however, it is just a very short subsection at the end
of the document and it does not emphasize the importance of that with
any even rough and exemplary numbers.


/Sergey


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.