dbTalk Databases Forums  

Pre-fetching to optimize secondary index range scans

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss Pre-fetching to optimize secondary index range scans in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ryan
 
Posts: n/a

Default Pre-fetching to optimize secondary index range scans - 07-25-2006 , 11:26 AM






Hi,

When performing a range scan on a secondary index I experience
significant slowdown compared to a normal table scan (~3x slowdown to
select 15% of a ~1GB dataset). I first retrieve the primary keys, sort
them, then probe the primary table for the data.

The index scan itself works great (< 5% of execution time is spent
retrieving the primary key set), and disk I/O isn't a problem because
the OS caches the whole DB in memory. As nearly as I can tell the
slowdown is due to bulk access (DB_MULTIPLE_KEY) being so much faster
than random access (DB_SET).

DB/2 achieves 3x speedup (vs. no index) for this query plan and
dataset, so there must be a way to make it work well...

Is there a way to issue pre-fetch commands in BDB? I have the whole
list of primary keys available before I start and it doesn't matter
what order the records arrive in.

Possibilities I've considered are to partition the set of keys and
assign several threads to fetch them in parallel, or to collect the set
of pages that the records live in and filter them manually. The first
seems like a lot of work if the buffer pool provides a better way, and
the second depends on having access to page IDs, which don't seem to be
part of the BDB API.

Has anyone had to do this before?
Ryan


Reply With Quote
  #2  
Old   
Andrei Costache
 
Posts: n/a

Default Re: Pre-fetching to optimize secondary index range scans - 08-01-2006 , 03:02 AM






Hello Ryan,

Quote:
From what you are saying it seems that the cache size is set right and
you don't pay performance penalties for I/O work, as the cache can hold
your DB in memory.
It would be interesting to know how many key/data pairs do you have in
the primary database, each key/data pair estimated size and the page
size setting.
The bulk fetch performs faster as you noted; the range scan pays the
load of evaluating the keys against the supplied key.

DB_MULTIPLE and DB_MULTIPLE_KEY flags may not be used when accessing
databases made into secondary indices using the DB->associate method,
so you only have available the DB_SET, DB_SET_RANGE, DB_GET_BOTH and
DB_GET_BOTH_RANGE flags to use according to whether you want only or
both keys and data.

Regarding the pre-fetch commands, there aren't any, but you can
simulate a pre-fetch behavior by reading part or all of the data from
the primary database in a sequential manner (using DB->get function or
a cursor and the DB_NEXT flag on DBcursor->c_get function in case of
duplicates) so as the data will be found in cache for future
operations.

As far as your suggested alternatives, only the first one is valid. It
implies a great deal of work and the result is not guaranteed to
provide faster range scan, but it is the best option to take into
consideration, as fetching the keys in parallel may help.
The second possibility is not valid since there is no way you can
access pages, no page ID is obtainable, not even through the use of
API, so you can't associate a page with the contained key/data pair.

Andrei Costache



Reply With Quote
  #3  
Old   
Ryan
 
Posts: n/a

Default Re: Pre-fetching to optimize secondary index range scans - 08-03-2006 , 08:27 AM



Andrei Costache wrote:
Quote:
Hello Ryan,

From what you are saying it seems that the cache size is set right and
you don't pay performance penalties for I/O work, as the cache can hold
your DB in memory.
After poking around some more, I realized that part of the problem was
a too-small BDB cache. Even though the file system cache eliminated
disk seeks there was still the overhead of getting the data to BDB.

However, even after fixing that the non-index version is still 1.5x
faster. I suspect the problem might be the cost of repeatedly drilling
down the primary database's btree. DB2 avoids this by indexing on RIDs
-- not primary keys -- and jumping directly to the record.

Does this make sense? If so, what access method provides the fastest
random access to in-cache data? After poking through the source it
looks like RECNO databases are just glorified btrees underneath, so
they're out. QUEUE databases do seem to jump directly to their records,
though. They can be used as normal databases (e.g. indexed) in spite of
the append/consume feature, right?

Quote:
It would be interesting to know how many key/data pairs do you have in
the primary database, each key/data pair estimated size and the page
size setting.
6M pairs, 8+140 bytes each, 8k page, 1.6GB cache (previously 600MB).

Quote:
The bulk fetch performs faster as you noted; the range scan pays the
load of evaluating the keys against the supplied key.
Actually, I bulk fetch from the index after calling DB_SET_RANGE once.
There doesn't seem to be a "next unless past the end of the range"
cursor API call, and iterating over the bulk-loaded entries is
significantly faster than calling DB_NEXT repeatedly, even if I do end
up with a few extras.

Fetching the whole list of qualifying primary keys takes < .5 seconds.
The performance problem is probing the primary database for the
corresponding data, which takes nearly 6 seconds. For comparison,
scanning the entire table and filtering manually takes only 4 seconds.

Quote:
Regarding the pre-fetch commands, there aren't any, but you can
simulate a pre-fetch behavior by reading part or all of the data from
the primary database in a sequential manner (using DB->get function or
a cursor and the DB_NEXT flag on DBcursor->c_get function in case of
duplicates) so as the data will be found in cache for future
operations.

As far as your suggested alternatives, only the first one is valid. It
implies a great deal of work and the result is not guaranteed to
provide faster range scan, but it is the best option to take into
consideration, as fetching the keys in parallel may help.
The second possibility is not valid since there is no way you can
access pages, no page ID is obtainable, not even through the use of
API, so you can't associate a page with the contained key/data pair.
The reason I asked was because when DB2 fetches records from an index
range scan it figures out what set of pages the resulting RIDs need and
prefetches them asynchronously. I don't think BDB can do this easily.
You would have to spawn N "run-ahead" threads, but it would be less
efficient because you would prefetch each page multiple times if
multiple RIDs happened to touch it.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.