dbTalk Databases Forums  

multifield key

comp.databases.berkeley-db comp.databases.berkeley-db


Discuss multifield key in the comp.databases.berkeley-db forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
pbour
 
Posts: n/a

Default multifield key - 10-06-2006 , 04:10 AM






I want to create a primary database according to the relation R:

R(int A, int B, primary key(A,B)) where primary key is a BTREE

so the key field of the database should a complex one, with two
integers. For storing I use marshalling. The value field its empty.

BUT the question is what compare functions either for prefix or general
I should define in order to be able to query the database giving value
only to A integer? It is the analogous to write for the relation R the
SQL query:

SELECT * from R WHERE A = "...";


HAS anybody actually implemented a database like that? I even try to
figure out how mysql does it by studying the source code, but of course
it is very difficult to understand it.

I 've tried to use c_get with "DB_SET_RANGE" flag but I think this is
analogous to query with condition A < "..." .


Reply With Quote
  #2  
Old   
Philip Guenther
 
Posts: n/a

Default Re: multifield key - 10-09-2006 , 11:45 PM






On Oct 6, 2:10 am, "pbour" <panayiotis.bou... (AT) gmail (DOT) com> wrote:
Quote:
I want to create a primary database according to the relation R:

R(int A, int B, primary key(A,B)) where primary key is a BTREE

so the key field of the database should a complex one, with two
integers. For storing I use marshalling. The value field its empty.

BUT the question is what compare functions either for prefix or general
I should define in order to be able to query the database giving value
only to A integer?
If your marshalling stores the values in big-endian ("network") byte
order, than the default comparison and prefix functions will work and
no overriding is necessary. IMHO, this is by far the simplest and most
robust solution.

If you aren't marshalling in big-endian order, you'll need to make sure
you comparison function operates sanely when given a key that isn't
long enough to include all the fields. For example, if your integers
are exactly four bytes long giving a total key size of eight bytes,
then the comparison function should treat keys that are only four bytes
long as sorting just before all eight bytes keys with the same first
four bytes.

The ideal handling of keys that include just part of a field is not
clear. I would tend to sort them between keys that lack the field
completely and before keys that include the entire field, sorting among
such partial keys using a pure-lexical sort on the partial field bytes.
Alternatively, you could consider it a fatal data error and simply
abort() if that happens, but that may cause other problems.


Quote:
HAS anybody actually implemented a database like that?
Yep.


Quote:
I 've tried to use c_get with "DB_SET_RANGE" flag but I think this is
analogous to query with condition A < "..." .
No, it's comparable to A >= "..." ("the smallest key greater than or
equal to the specified key"). So, you do a DBC->c_get(DB_SET_RANGE).
If it returns DB_NOTFOUND, there are no entries in the entire database
that have a key that compares equal-to or greater-than the supplied
key. If it returns zero, then you unmarshal the returned key; if it's
too large, there are no matches. Otherwise you have your first match
right there. To fetch the next match (if any), you used
DBC->c_get(DB_NEXT) with the same cursor and the same checking of the
return value and returned key to determine whether there was another
match or not.


Philip Guenther



Reply With Quote
  #3  
Old   
pbour
 
Posts: n/a

Default Re: multifield key - 10-10-2006 , 12:50 PM



I 've manage to query the database for a value of A field using A,B
btree. Then I create a secondary index (database) for the B field.

The key extractor is:

int get_node_key(DB *sdbp, const DBT *pkey, const DBT *pdata, DBT
*skey)
{
memset(skey, 0, sizeof(DBT));

char *ptr = (char*)pkey->data;
int n1 = *((int*)ptr);
int n2 = *((int*)(ptr+sizeof(int)));


skey->data = &n2;
skey->size = sizeof(int);

// Return 0 to indicate that the record can be created/updated.
return (0);
}

But when I retrieve all records for secondary database I don't get the
skey for B.

while ((ret = sdbcp->c_pget(sdbcp, &skey, &key, &tuple, DB_NEXT)) !=
DB_NOTFOUND)
{
printf("skey: %d\t", *((int*)skey.data);
char *p = (char*)key.data;
printf("key: %d, %d\n", *((int*)p), *((int*)(p+sizeof(int))));
}


More specifically I get:
skey: 134548112 key: 1, 5
skey: 134548112 key: 1, 7
skey: 134548112 key: 2, 3
skey: 134548112 key: 2, 90


ANY ideas?


Reply With Quote
  #4  
Old   
pbour
 
Posts: n/a

Default Re: multifield key - 10-10-2006 , 02:26 PM



I solve it with a little bit of marshalling at key extractor function:

int get_node_key(DB *sdbp, const DBT *pkey, const DBT *pdata, DBT
*skey)
{
memset(skey, 0, sizeof(DBT));

char *ptr = (char*)pkey->data;
int n1 = *((int*)ptr);
int n2 = *((int*)(ptr+sizeof(int)));

char *bufferPtr;
char buffer[sizeof(int)];
int buffer_length = 0;

bufferPtr = &buffer[0];
memcpy(bufferPtr, &n2, sizeof(n2));
buffer_length += sizeof(int);

skey->data = bufferPtr;
skey->size = buffer_length;

// Return 0 to indicate that the record can be created/updated.
return (0);
}


Reply With Quote
  #5  
Old   
Philip Guenther
 
Posts: n/a

Default Re: multifield key - 10-10-2006 , 03:43 PM



On Oct 10, 12:26 pm, "pbour" <panayiotis.bou... (AT) gmail (DOT) com> wrote:
....
Quote:
int get_node_key(DB *sdbp, const DBT *pkey, const DBT *pdata, DBT
*skey)
{
....
char buffer[sizeof(int)];
....
bufferPtr = &buffer[0];
....
skey->data = bufferPtr;
}
The above code exhibits undefined behavior: it stores a pointer to an
automatic variable in skey->data which is then accessed after the
function returns and the automatic variable's lifetime has ended. The
same was true of the other version you posted. You need to either:

a) make 'buffer' a static or file-scope variable (but then the code
won't be thread-safe),
b) allocate memory for the buffer from the heap and set the
DB_DBT_APPMALLOC flag on the dbt, or
c) just point skey->data into the pkey data:

int get_node_key(DB *sdbp, const DBT *pkey, const DBT *pdata, DBT
*skey)
{
skey->data = (char *)pkey->data + sizeof(int);
skey->size = pkey->size - sizeof(int);
return 0;
}


Philip Guenther



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.