dbTalk Databases Forums  

Multiple field access with minimun index storage

comp.databases comp.databases


Discuss Multiple field access with minimun index storage in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Sebastian Araya
 
Posts: n/a

Default Multiple field access with minimun index storage - 10-16-2006 , 06:34 AM






Hello,


I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any
of one or more Ai fields in R, a relational table.

Let's suppouse that R has not null Ai fields with Vi associated'
values; an user may provide A1, A3 and A5 as parameters, so in order to
fast access Vi values, I must build an index with (A1,A3,A5) as
expression.

Now, if another user input is A2 and A6, I will have to build another
index expression (A2,A6,A7) -I made add A7 which will improve (A2),
(A2,A6) and (A2,A6,A7)-.

But, in order to generalize the access problem of any Ai fields...
how many expression index I have to build? Factorial(i) ??

Thanks in advance,


Sebastián


Reply With Quote
  #2  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-16-2006 , 06:55 AM






On 16.10.2006 13:34 Sebastian Araya wrote:
Quote:
Hello,


I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any
of one or more Ai fields in R, a relational table.

Let's suppouse that R has not null Ai fields with Vi associated'
values; an user may provide A1, A3 and A5 as parameters, so in order to
fast access Vi values, I must build an index with (A1,A3,A5) as
expression.

Now, if another user input is A2 and A6, I will have to build another
index expression (A2,A6,A7) -I made add A7 which will improve (A2),
(A2,A6) and (A2,A6,A7)-.

But, in order to generalize the access problem of any Ai fields...
how many expression index I have to build? Factorial(i) ??

I'd say that depends on the DBMS that you use. With Oracle you would
simply build one bitmap index for each column as bitmap indexes can be
combined by Oracle. I think Postgres offers something similar, you will
need to check the manual of your DBMS.

Thomas

--
It's not a RootKit - it's a Sony


Reply With Quote
  #3  
Old   
David Cressey
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-16-2006 , 11:05 AM




"Sebastian Araya" <numisys (AT) gmail (DOT) com> wrote

Hello,


I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any
of one or more Ai fields in R, a relational table.

Let's suppouse that R has not null Ai fields with Vi associated'
values; an user may provide A1, A3 and A5 as parameters, so in order to
fast access Vi values, I must build an index with (A1,A3,A5) as
expression.

Now, if another user input is A2 and A6, I will have to build another
index expression (A2,A6,A7) -I made add A7 which will improve (A2),
(A2,A6) and (A2,A6,A7)-.

But, in order to generalize the access problem of any Ai fields...
how many expression index I have to build? Factorial(i) ??

Thanks in advance,


Sebastián

The answer depends on the nature of the optimizer in the database server.

Some optimizers will be able to use three simple indexes, one on A1, one on
A3, and one on A5, in order to get (fairly) rapid access to rows that match
the queries parameters.

Other optimizers will only recognize one of the indexes, and not use thje
other two.

Almost all optimizers will do better with a compound index on (A1, A3, A5).
However, when a query specifies only A5, and not A1, the index will be
ignored.

In order to get ultimately fast response on any set of parameters, you
might need n factorial indexes. This would slow down input processing quite
a bit.

Try it for n=10, and write back to us by the time your database server has
finished inserting one row, and updating all those indexes! (just
kidding, don't try this at home).


Dave





Reply With Quote
  #4  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-16-2006 , 02:53 PM




Sebastian Araya wrote:
Quote:
Hello,


I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any
of one or more Ai fields in R, a relational table.

Let's suppouse that R has not null Ai fields with Vi associated'
values; an user may provide A1, A3 and A5 as parameters, so in order to
fast access Vi values, I must build an index with (A1,A3,A5) as
expression.

Now, if another user input is A2 and A6, I will have to build another
index expression (A2,A6,A7) -I made add A7 which will improve (A2),
(A2,A6) and (A2,A6,A7)-.

But, in order to generalize the access problem of any Ai fields...
how many expression index I have to build? Factorial(i) ??

Thanks in advance,


Sebastián
The first rule of optimization is KNOW THY DATA.

The correlary is know thy user's needs.
So you build indices for the most common queries, where the queries
prove to be too slow for the targeted response time. You might take
advantage of features of the particular DBMS and it's optimizer (is the
optimizer smart enough to combine two or more indices on individual
columns?).

Shoot I've been away from this too long. Given that incides are order
dependent, then for R(a,b,c,V1,V2,...VN). then for the attributes a,b,c
you need indices for all the permutations.
a
b
c
ab
ac
ba
ca
bc
cb
abc
acb
bac
bca
cba
cab
The number of permutations for n items taken r at a time is p(n,r) =
n!/(n-r)! and for all possible indices, you need the sum from r=1 to
r=n.
p(3,1)=3
p(3,2)=6
p(3,3)=6
and the sum is 15.
For that you should be able to work out the maximum you need. Since it
is a theoretical question, you don't care about the cost of maintaining
all those indices. In a real implementation, there would be a trade off
between having all possible indices and the maintenance costs.
Depending on the DBMS, it might even sometimes be cheaper to build the
particular index when it is needed and drop it afterward (say for end
of month reports, create the index, run the reports and drop the index
so that daily operations do not waste processing maintaing an index
that is not used every day).

HTH,
Ed



Reply With Quote
  #5  
Old   
Sebastian Araya
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-17-2006 , 07:22 AM



Hello again,


and thanks to everyone.

My RDMS is MySQL which only has BTree and Hash index types.

I've found this article from John Wu about FastBit:
http://sdm.lbl.gov/fastbit/. FastBit is a stand-alone indexing software
in C++. It does not work with MySQL yet, but they are preparing for a
LGPL release some time early next year.

I haven't opportunity to test it, but I'm sure a fresh new idea is
what we need in this case. Studying user' queries is the common
solution now, but for the future and playing around with datamining and
huge databases we will need more powerfull indexing structures.

Best,

Sebastián


Reply With Quote
  #6  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-17-2006 , 12:04 PM




Ed Prochak wrote:
Quote:
Sebastian Araya wrote:
Hello,


I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any
of one or more Ai fields in R, a relational table.

Let's suppouse that R has not null Ai fields with Vi associated'
values; an user may provide A1, A3 and A5 as parameters, so in order to
fast access Vi values, I must build an index with (A1,A3,A5) as
expression.

Now, if another user input is A2 and A6, I will have to build another
index expression (A2,A6,A7) -I made add A7 which will improve (A2),
(A2,A6) and (A2,A6,A7)-.

But, in order to generalize the access problem of any Ai fields...
how many expression index I have to build? Factorial(i) ??

Thanks in advance,


Sebastián

The first rule of optimization is KNOW THY DATA.

The correlary is know thy user's needs.
So you build indices for the most common queries, where the queries
prove to be too slow for the targeted response time. You might take
advantage of features of the particular DBMS and it's optimizer (is the
optimizer smart enough to combine two or more indices on individual
columns?).

Shoot I've been away from this too long. Given that incides are order
dependent, then for R(a,b,c,V1,V2,...VN). then for the attributes a,b,c
you need indices for all the permutations.
a
b
c
ab
ac
ba
ca
bc
cb
abc
acb
bac
bca
cba
cab
The number of permutations for n items taken r at a time is p(n,r) =
n!/(n-r)! and for all possible indices, you need the sum from r=1 to
r=n.
p(3,1)=3
p(3,2)=6
p(3,3)=6
and the sum is 15.
For that you should be able to work out the maximum you need. Since it
is a theoretical question, you don't care about the cost of maintaining
all those indices. In a real implementation, there would be a trade off
between having all possible indices and the maintenance costs.
Depending on the DBMS, it might even sometimes be cheaper to build the
particular index when it is needed and drop it afterward (say for end
of month reports, create the index, run the reports and drop the index
so that daily operations do not waste processing maintaing an index
that is not used every day).

HTH,
Ed
having a day to ponder this I realized I overstated the case. While the
sum I talked about will provide an index for every possible key based
query, you don't really ned that many on a modern DBMS. That's because
for example the index for a,b,c can also be used for a query with just
a,b or even just a in the condition (WHERE) clause.

Now there might be some trade off point such that if the number key
attributes is greater than some limit, then looking up a smaller number
of them might be more efficient doing a full table scan. (or creating
another index)

IOW, with a query on a,b,c the database optimizer might skip the index
on a,b,c,d,e,f,g,h,i,j,k as less efficient than scanning the full
table. Where that crossover point is likely depends on the optimizer,
the number N of attributes in the index versus the number M in the
query, and on the statistics of the table (if the a,b,c attributes are
very distinct)

So it seems an interesting theoretical as well as practical problem.

ED



Reply With Quote
  #7  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-17-2006 , 12:16 PM




Sebastian Araya wrote:
Quote:
Hello again,


and thanks to everyone.

My RDMS is MySQL which only has BTree and Hash index types.

I've found this article from John Wu about FastBit:
http://sdm.lbl.gov/fastbit/. FastBit is a stand-alone indexing software
in C++. It does not work with MySQL yet, but they are preparing for a
LGPL release some time early next year.

I haven't opportunity to test it, but I'm sure a fresh new idea is
what we need in this case. Studying user' queries is the common
solution now, but for the future and playing around with datamining and
huge databases we will need more powerfull indexing structures.

Best,

Sebastián
regarding your last comments, I wonder. Is most datamining done
manually by experts forming queries to make unique correlations? Or is
it done programmatically where systematic queries are run looking for
numerous results?

Either way, isn't the goal to make connections that were overlooked by
the DB designers? As such wouldn't that involve at least one non-key
attribute in the query?

IOW, given R(A1,A2,A3,...AN,V1,V2,V3,...VM)
don't you need to optimize access to a result set like
R(Ai,...Aj,Vh,...Vk)
or maybe even R(Vd,...Vg) ??

Or am I missing the point of datamining?

Ed



Reply With Quote
  #8  
Old   
Sebastian Araya
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-18-2006 , 03:47 PM




Ed Prochak wrote:
Quote:
regarding your last comments, I wonder. Is most datamining done
manually by experts forming queries to make unique correlations? Or is
it done programmatically where systematic queries are run looking for
numerous results?

Either way, isn't the goal to make connections that were overlooked by
the DB designers? As such wouldn't that involve at least one non-key
attribute in the query?

IOW, given R(A1,A2,A3,...AN,V1,V2,V3,...VM)
don't you need to optimize access to a result set like
R(Ai,...Aj,Vh,...Vk)
or maybe even R(Vd,...Vg) ??

Or am I missing the point of datamining?
No. My point is that R is a datamining process' result (combined with
several satelital tables to show it nicely). In fact, I'm building a
web layer to answer at subsecond time any field query combination.

Sebastián.

Quote:
Ed


Reply With Quote
  #9  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Multiple field access with minimun index storage - 10-19-2006 , 11:37 AM




Sebastian Araya wrote:
Quote:
Ed Prochak wrote:
regarding your last comments, I wonder. Is most datamining done
manually by experts forming queries to make unique correlations? Or is
it done programmatically where systematic queries are run looking for
numerous results?

Either way, isn't the goal to make connections that were overlooked by
the DB designers? As such wouldn't that involve at least one non-key
attribute in the query?

IOW, given R(A1,A2,A3,...AN,V1,V2,V3,...VM)
don't you need to optimize access to a result set like
R(Ai,...Aj,Vh,...Vk)
or maybe even R(Vd,...Vg) ??

Or am I missing the point of datamining?

No. My point is that R is a datamining process' result (combined with
several satelital tables to show it nicely). In fact, I'm building a
web layer to answer at subsecond time any field query combination.

Sebastián.

If this is only about a real production issue, then it really goes back
to current best practice:
a well tuned, efficient DBMS back end.
That's why modern DMBS have query optimizers. I know that doesn't
really answer your question (which, given this last bit of info, is
different from your original question), but AFAIK there is no
calculation to provide the optimal number of indices needed for result
set R() which is computed from joins on the relations R1(), R2(),
....Rk() for all possible joins. (but I could be wrong) How you
calculate that might be a question best asked in comp.databases.theory.

I'll watch for your posting there because I'd like to find out how
wrong I am.
Ed



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.