dbTalk Databases Forums  

general design issue

comp.databases comp.databases


Discuss general design issue in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
skink
 
Posts: n/a

Default general design issue - 04-22-2009 , 04:00 PM






hi,

i don't know much about SQL - just basics: tables, indexes, simple
selects and
i'm designing simple database storing key/value data (in fact its
read
only dictionary)

the schema is very straightforward:

CREATE TABLE words (key TEXT, entry TEXT)
CREATE INDEX wordsIdx ON words (key)

the database engine is quite slow (sqlite on embedded device) so
speed
is by far the **most**
important since i need my data in short time (40-60 ms).

the problem i have is as follows: for any key prefix (even not
existing in the database)
i'd like to know if there are keys starting with given prefix + 'a',
'b', 'c' etc

in other words: for prefix lets say 'inte' i'd like to know if there
are keys starting with
'intea', 'inteb', 'intec', 'inted' etc (just need true/false for each
letter not number of keys matching)

i cannot just run 26 queries for:

SELECT key FROM words WHERE key >= 'inteX' LIMIT 1

where X is letter [a..z] since each query takes ~15 ms

of course i don't need to iterate over all 26 letters: for example
when looking for 'intea'
my query might return 'integral' (means that there is no "intea",
"inteb", "intec", "inted", "intee"
and "intef" prefixes since the first key found has prefix "integ") so
i could start with
'inteh' in next iteratiion. but still in the worst scenario i need 26
queries...

my solution is to create second table storing prefix/32-bit-integer-
map where
each bit indicates one letter (for latin alphabet only 26 bits are
used)

CREATE TABLE prefixes (prefix TEXT, bitmap INTEGER)
CREATE INDEX prefixesIdx ON prefixes (prefix)

and use it as follows:

SELECT bitmap FROM prefixes WHERE prefix = 'inte'

this is very fast since i need only one database access to find out
the answer for my question.

however for longer prefixes it's quite expensive, e.g. for prefix
'internationalizatio' i have to store 19
prefixes just to find out there is one or two words starting with
that
prefix.

so my question is how to make it better keeping in mind that speed is
**key** factor but
database size is also important.

any ideas?

thanks
pskink

Reply With Quote
  #2  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: general design issue - 04-22-2009 , 05:10 PM






skink <pskink (AT) gmail (DOT) com> wrote:

[snip]

Quote:
the problem i have is as follows: for any key prefix (even not
existing in the database)
i'd like to know if there are keys starting with given prefix + 'a',
'b', 'c' etc
select * from words where key like prefix+"_"
The underscore means any one character. "%" is used for any number of
characters. Different dialects of SQL use diferent characters for
this, so check your documentation.

"key" is a reserved word in a number of SQL dialects. Consider
changing the name.

[snip]

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.


Reply With Quote
  #3  
Old   
Axel Hallez
 
Posts: n/a

Default Re: general design issue - 04-23-2009 , 03:32 AM



skink wrote:
Quote:
i cannot just run 26 queries for:

SELECT key FROM words WHERE key >= 'inteX' LIMIT 1

where X is letter [a..z] since each query takes ~15 ms

of course i don't need to iterate over all 26 letters: for example
when looking for 'intea'
my query might return 'integral' (means that there is no "intea",
"inteb", "intec", "inted", "intee"
and "intef" prefixes since the first key found has prefix "integ") so
i could start with
'inteh' in next iteratiion. but still in the worst scenario i need 26
queries...
Did you try this solution with a prepared statement, rather than
executing a fully new query for each variation?

Kind regards,
Axel Hallez


Reply With Quote
  #4  
Old   
Ciprian Dorin Craciun
 
Posts: n/a

Default Re: general design issue - 04-23-2009 , 05:21 AM



On Apr 23, 12:00*am, skink <psk... (AT) gmail (DOT) com> wrote:
Quote:
hi,

i don't know much about SQL - just basics: tables, indexes, simple
selects and
i'm designing simple database storing key/value data (in fact its
read
only dictionary)

the schema is very straightforward:

CREATE TABLE words (key TEXT, entry TEXT)
CREATE INDEX wordsIdx ON words (key)

the database engine is quite slow (sqlite on embedded device) so
speed
is by far the **most**
important since i need my data in short time (40-60 ms).

the problem i have is as follows: for any key prefix (even not
existing in the database)
i'd like to know if there are keys starting with given prefix + 'a',
'b', 'c' etc

in other words: for prefix lets say 'inte' i'd like to know if there
are keys starting with
'intea', 'inteb', 'intec', 'inted' etc (just need true/false for each
letter not number of keys matching)

i cannot just run 26 queries for:

SELECT key FROM words WHERE key >= 'inteX' LIMIT 1

where X is letter [a..z] since each query takes ~15 ms

of course i don't need to iterate over all 26 letters: for example
when looking for 'intea'
my query might return 'integral' (means that there is no "intea",
"inteb", "intec", "inted", "intee"
and "intef" prefixes since the first key found has prefix "integ") so
i could start with
'inteh' in next iteratiion. but still in the worst scenario i need 26
queries...

my solution is to create second table storing prefix/32-bit-integer-
map where
each bit indicates one letter (for latin alphabet only 26 bits are
used)

CREATE TABLE prefixes (prefix TEXT, bitmap INTEGER)
CREATE INDEX prefixesIdx ON prefixes (prefix)

and use it as follows:

SELECT bitmap FROM prefixes WHERE prefix = 'inte'

this is very fast since i need only one database access to find out
the answer for my question.

however for longer prefixes it's quite expensive, e.g. for prefix
'internationalizatio' i have to store 19
prefixes just to find out there is one or two words starting with
that
prefix.

so my question is how to make it better keeping in mind that speed is
**key** factor but
database size is also important.

any ideas?

thanks
pskink
Maybe what you need is not a relational database. For these sorts
of problems that involve key-value stores, I would use BerkeleyDB. As
it is embedded (as SQLite3), it has a small overhead, and it has
bindings (interfaces) to a lot of programming languages. (It is open
source and maintained by Oracle.)

The solution there resembles the table you have described. As a
hint I would use (in the case of BerkeleyDB) a BTree data store (if
you read through a tutorial you'll figure aut what I'm talking about),
and then when searching for a word, create a cursor (again see the
documentation), find the root word you are searching, and then just
keep going to the next word. Because of the BTree and the way the
cursor is created it assures that the next key will always be bigger
than the last one. And keep iterating until the root changes. This way
you are sure you have seen all possible words starting with the root.

As a side-note, the same solution could be obtained by using any
relational database. For example just use a query like: <<select *
from words where key > root order by key asc>> and apply the same
thechnique as above. It should work fine.

Ciprian Craciun.

P.S.: I've used this approach to both BerkeleyDB (100 million
records), SQLite3, and Postgres (both 4 million records), and worked
just fine. (It was a benchmark for sensor readings, with a schema
somehow resembling yours, and the best performance was obtained by
using BerkeleyDB, but it needs some fine tuning.)


Reply With Quote
  #5  
Old   
Jasen Betts
 
Posts: n/a

Default Re: general design issue - 04-24-2009 , 04:39 AM



On 2009-04-22, skink <pskink (AT) gmail (DOT) com> wrote:
Quote:
hi,

i don't know much about SQL - just basics: tables, indexes, simple
selects and
i'm designing simple database storing key/value data (in fact its
read
only dictionary)

the schema is very straightforward:

CREATE TABLE words (key TEXT, entry TEXT)
CREATE INDEX wordsIdx ON words (key)

the database engine is quite slow (sqlite on embedded device) so
speed
is by far the **most**
important since i need my data in short time (40-60 ms).

the problem i have is as follows: for any key prefix (even not
existing in the database)
i'd like to know if there are keys starting with given prefix + 'a',
'b', 'c' etc
gee! from that description it sounds like you may want to implement
soething like "dasher"
http://www.inference.phy.cam.ac.uk/dasher/


In any case a relational database it probably the wrong way to
implement your dictionary.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.