dbTalk Databases Forums  

BTree comparison function

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss BTree comparison function in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mark Stevens
 
Posts: n/a

Default BTree comparison function - 09-01-2003 , 06:57 AM






Hi,

I'd be really grateful for any advice that you can give me regarding
user-defined btree comparison functions.


It might help to have a concrete example of what I'm trying to do, so
for the sake of argument let's say that my database contains keys of
the form:
struct my_key {
a_type_t A;
b_type_t B;
c_type_t C;
};

Perhaps this is a bad decision, but I have set up the database so that
the associated data are 0 bytes long. I took this decision because:
1) all the information I need is contained logically in the key,
2) a structure of this form arrives all nicely prepared for me, and
3) I want to be able to search the DB based on any field in the key,
as described below.


I would like to be able to perform searches on the database of the
kind: "Find all the database entries where key field B==value_to_match
regardless of what is in the other fields". In order to achieve this
I would like to pass to my btree comparison function a structure
initialised with:

struct my_key search_key;

search_key.A = IGNORE_ME;
search_key.B = value_to_match;
search_key.C = IGNORE_ME;

The comparison function that I had imagined I'd write would skip
fields A and C which contain the special value IGNORE_ME and would
return 0 (indicating a match) for each database entry which has
B==value_to_match in the key.

The thing that I'm unsure about is the return value from the
comparison function for cases where the keys don't match and how the
return is used internally by Berkeley DB. The docs say that the
comparison function:
"must return an integer value less than, equal to, or greater than
zero if the first key argument is considered to be respectively less
than, equal to, or greater than the second key argument."
There is also the requirement that the comparison function leads to
the entries being well-ordered.

What are the implications on these requirements of returning 0,
indicating that a match has been found, when only some parts of the
key have been compared?

Is the database affected in any way when a non-zero value is returned
by the comparison function when a match is not found? How important
is the value of the non-zero return for non-matches?

Is there a better way of doing this kind of search than the one I'm
proposing?

Thanks a lot for any thoughts and suggestions.

Mark


--
Mark Stevens
Concept Systems Ltd

Reply With Quote
  #2  
Old   
Hans-Bernhard Broeker
 
Posts: n/a

Default Re: BTree comparison function - 09-01-2003 , 10:29 AM






Mark Stevens <mas78910 (AT) hotmail (DOT) com> wrote:

Quote:
I would like to be able to perform searches on the database of the
kind: "Find all the database entries where key field B==value_to_match
regardless of what is in the other fields". In order to achieve this
I would like to pass to my btree comparison function a structure
initialised with:

struct my_key search_key;

search_key.A = IGNORE_ME;
search_key.B = value_to_match;
search_key.C = IGNORE_ME;
I don't think you can do that; or if you can, it's a bad idea.

The problem is that the comparison function is used not only for
searches, but also for *constructing* the BTree --- it defines the
sort order of entries in the database. This sort order is exploited
when running searches, too. Having an essentially unsortable key
field value like your "IGNORE_ME" violates this structure.

You should rather store your data as data in a RECNO or other
database, and set up a secondary index database for every field of
struct my_key.

--
Hans-Bernhard Broeker (broeker (AT) physik (DOT) rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.


Reply With Quote
  #3  
Old   
Mark Stevens
 
Posts: n/a

Default Re: BTree comparison function - 09-02-2003 , 03:35 AM



Hans-Bernhard Broeker <broeker (AT) physik (DOT) rwth-aachen.de> wrote

Quote:
Mark Stevens <mas78910 (AT) hotmail (DOT) com> wrote:

I would like to be able to perform searches on the database of the
kind: "Find all the database entries where key field B==value_to_match
regardless of what is in the other fields". In order to achieve this
I would like to pass to my btree comparison function a structure
initialised with:

struct my_key search_key;

search_key.A = IGNORE_ME;
search_key.B = value_to_match;
search_key.C = IGNORE_ME;

I don't think you can do that; or if you can, it's a bad idea.

The problem is that the comparison function is used not only for
searches, but also for *constructing* the BTree --- it defines the
sort order of entries in the database. This sort order is exploited
when running searches, too. Having an essentially unsortable key
field value like your "IGNORE_ME" violates this structure.

I understand that the comparison function will be used for
constructing the BTree. It would certainly only ever be the case that
complete structures would be inserted into the database (i.e. with no
fields set to IGNORE_ME). In this case the comparison function would
compare every field in the structure and should work OK, so long as it
conforms to the rules for comparison functions.

What I am suggesting is to use incomplete structures (i.e. with some
fields marked as IGNORE_ME) for *searching* the database. In other
words, my database would have previously been populated with
structures where A, B and C are all real values, and at some time
later I want to find all the entries which have field B==some_value,
regardless of what is in A and C.


Quote:
You should rather store your data as data in a RECNO or other
database, and set up a secondary index database for every field of
struct my_key.
I'm a bit reluctant to do this since the structures that I'm dealing
with are going to have at least 10 fields (possibly many more). Is
there an easy way to set up large numbers of secondary indices without
doing everything by hand?


Thanks,
Mark


Reply With Quote
  #4  
Old   
Hans-Bernhard Broeker
 
Posts: n/a

Default Re: BTree comparison function - 09-02-2003 , 06:49 AM



Mark Stevens <mas78910 (AT) hotmail (DOT) com> wrote:

Quote:
What I am suggesting is to use incomplete structures (i.e. with some
fields marked as IGNORE_ME) for *searching* the database.
I did understand that the first time round. But that can't work.

You can't have the comparison function sort the actual keys like this

(A1,B1) < (A1,B2) < (A2,B1) < (A2,B2) < (A3,B3)

and at the same time have it return both

(*,B1) == (A1,B1)
and
(*,B1) == (A2,B1)

because these two results would also, by the requirement of
transitivity of the comparison function, mandate

(A1,B1) == (A2,B1)

which violates the original sorting order shown above, since that has

(A1,B1) < (A2,B1)

Your comparison function would no longer set-up a well-ordering of
your keys.

Quote:
I'm a bit reluctant to do this since the structures that I'm dealing
with are going to have at least 10 fields (possibly many more). Is
there an easy way to set up large numbers of secondary indices
without doing everything by hand?
Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
such indices is one of the reasons the elders invented relational
database management systems...
--
Hans-Bernhard Broeker (broeker (AT) physik (DOT) rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.


Reply With Quote
  #5  
Old   
Keith Bostic
 
Posts: n/a

Default Re: BTree comparison function - 09-02-2003 , 11:53 AM



Hans-Bernhard Broeker <broeker (AT) physik (DOT) rwth-aachen.de> wrote

Hans-Bernhard Broeker <broeker (AT) physik (DOT) rwth-aachen.de> wrote


Quote:
Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
such indices is one of the reasons the elders invented relational
database management systems...
Berkeley DB can automatically maintain large numbers of secondary
indices, it's not a problem. You have to create a secondary index
function for each one, and call the Db.associate method to associate
the two databases, but that's not a lot of work. For more information,
see the "Secondary indices" section of the Berkeley DB Reference Guide,
included in your download package and also available at:

http://www.sleepycat.com/docs/ref/am/second.html

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com


Reply With Quote
  #6  
Old   
Chris Newcombe
 
Posts: n/a

Default Re: BTree comparison function - 09-02-2003 , 03:27 PM



Hans-Bernhard Broeker <broeker (AT) physik (DOT) rwth-aachen.de> wrote

Quote:
Mark Stevens <mas78910 (AT) hotmail (DOT) com> wrote:
I'm a bit reluctant to do this since the structures that I'm dealing
with are going to have at least 10 fields (possibly many more). Is
there an easy way to set up large numbers of secondary indices
without doing everything by hand?

Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
such indices is one of the reasons the elders invented relational
database management systems...

Or maybe you should try Berkeley DB XML, which adds automatic indexing
and a query layer to Berkeley DB. See the Sleepycat website for
details and download.

The obvious downside is that db-xml works only with XML documents, and
not everone is happy to accomodate the verbosity of XML records and
parsing overhead etc. However I recently suggested that db-xml allow
the application to install toXML and fromXML hooks when dealing with
physical records being passed to/from the database. These hooks would
map your own records to an equivalent XML document (often trivial),
and (if necessary -- I'm not sure that it is yet), convert an XML
document into one of your own records.

I think this would allow many classes of legacy application to keep
their data in the current packed-binary form, and still make use of
db-xml for indexing and queries. Obviously some parsing overhead
remains, but the application should not have to change very much,
which is the main thing.

The db-xml developers are considering this request. Apparently db-xml
2.0 architecture may cause problems with this, but it could be done
fairly easily to the existing (released) v1.0.

If you are interested then I'd recommend posting to xml (AT) sleepycat (DOT) com
and support (AT) sleepycat (DOT) com to add weight to the request.

Regards,

Chris


Reply With Quote
  #7  
Old   
Chris Newcombe
 
Posts: n/a

Default Re: BTree comparison function - 09-02-2003 , 03:27 PM



Hans-Bernhard Broeker <broeker (AT) physik (DOT) rwth-aachen.de> wrote

Quote:
Mark Stevens <mas78910 (AT) hotmail (DOT) com> wrote:
I'm a bit reluctant to do this since the structures that I'm dealing
with are going to have at least 10 fields (possibly many more). Is
there an easy way to set up large numbers of secondary indices
without doing everything by hand?

Well, maybe BerkeleyDB isn't for you, then. Automatic handling of
such indices is one of the reasons the elders invented relational
database management systems...

Or maybe you should try Berkeley DB XML, which adds automatic indexing
and a query layer to Berkeley DB. See the Sleepycat website for
details and download.

The obvious downside is that db-xml works only with XML documents, and
not everone is happy to accomodate the verbosity of XML records and
parsing overhead etc. However I recently suggested that db-xml allow
the application to install toXML and fromXML hooks when dealing with
physical records being passed to/from the database. These hooks would
map your own records to an equivalent XML document (often trivial),
and (if necessary -- I'm not sure that it is yet), convert an XML
document into one of your own records.

I think this would allow many classes of legacy application to keep
their data in the current packed-binary form, and still make use of
db-xml for indexing and queries. Obviously some parsing overhead
remains, but the application should not have to change very much,
which is the main thing.

The db-xml developers are considering this request. Apparently db-xml
2.0 architecture may cause problems with this, but it could be done
fairly easily to the existing (released) v1.0.

If you are interested then I'd recommend posting to xml (AT) sleepycat (DOT) com
and support (AT) sleepycat (DOT) com to add weight to the request.

Regards,

Chris


Reply With Quote
  #8  
Old   
Mark Stevens
 
Posts: n/a

Default Re: BTree comparison function - 09-03-2003 , 04:14 AM



bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309020853.3ed322aa (AT) posting (DOT) google.com>...
Quote:
Berkeley DB can automatically maintain large numbers of secondary
indices, it's not a problem. You have to create a secondary index
function for each one, and call the Db.associate method to associate
the two databases, but that's not a lot of work.
That looks like it might do the job, provided that it will work over
RPC.

My understanding from the RPC docs is that the primary and secondary
databases must be created locally on the server. I take this to mean
that the application which writes to the primary and secondary
databases cannot do so via RPC - it must be running on the machine
where berkeley_db_svc runs. From the docs it appears that any client
which needs to search the database using secondary indices may do so
via RPC so long as it opens the primary and secondary databases as
DB_RDONLY.

Is all this correct?

Many thanks for all the help and feedback that has been provided.

Mark


Reply With Quote
  #9  
Old   
Keith Bostic
 
Posts: n/a

Default Re: BTree comparison function - 09-03-2003 , 03:10 PM



mas78910 (AT) hotmail (DOT) com (Mark Stevens) wrote in message news:<c4a36f.0309030114.5064ceab (AT) posting (DOT) google.com>...
Quote:
bostic (AT) sleepycat (DOT) com (Keith Bostic) wrote in message news:<adecb6f.0309020853.3ed322aa (AT) posting (DOT) google.com>...

My understanding from the RPC docs is that the primary and secondary
databases must be created locally on the server. I take this to mean
that the application which writes to the primary and secondary
databases cannot do so via RPC - it must be running on the machine
where berkeley_db_svc runs. From the docs it appears that any client
which needs to search the database using secondary indices may do so
via RPC so long as it opens the primary and secondary databases as
DB_RDONLY.

Is all this correct?
Not quite. Because Berkeley DB uses function callbacks to
implement secondary indices, it's not possible to create a
secondary index from an RPC client -- there's not function
callback available.

However, if the server creates the necessary secondary index
databases and relationships, they can then be read and written
from the RPC client, as it's the server's callback function
that will be used.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic (AT) sleepycat (DOT) com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.