![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hello, I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any of one or more Ai fields in R, a relational table. Let's suppouse that R has not null Ai fields with Vi associated' values; an user may provide A1, A3 and A5 as parameters, so in order to fast access Vi values, I must build an index with (A1,A3,A5) as expression. Now, if another user input is A2 and A6, I will have to build another index expression (A2,A6,A7) -I made add A7 which will improve (A2), (A2,A6) and (A2,A6,A7)-. But, in order to generalize the access problem of any Ai fields... how many expression index I have to build? Factorial(i) ?? |
#3
| |||
| |||
|
#4
| |||
| |||
|
|
Hello, I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any of one or more Ai fields in R, a relational table. Let's suppouse that R has not null Ai fields with Vi associated' values; an user may provide A1, A3 and A5 as parameters, so in order to fast access Vi values, I must build an index with (A1,A3,A5) as expression. Now, if another user input is A2 and A6, I will have to build another index expression (A2,A6,A7) -I made add A7 which will improve (A2), (A2,A6) and (A2,A6,A7)-. But, in order to generalize the access problem of any Ai fields... how many expression index I have to build? Factorial(i) ?? Thanks in advance, Sebastián |
#5
| |||
| |||
|
#6
| |||
| |||
|
|
Sebastian Araya wrote: Hello, I'm facing a problem trying to access R(A1,A2,...,A7,V1,V2,V3) by any of one or more Ai fields in R, a relational table. Let's suppouse that R has not null Ai fields with Vi associated' values; an user may provide A1, A3 and A5 as parameters, so in order to fast access Vi values, I must build an index with (A1,A3,A5) as expression. Now, if another user input is A2 and A6, I will have to build another index expression (A2,A6,A7) -I made add A7 which will improve (A2), (A2,A6) and (A2,A6,A7)-. But, in order to generalize the access problem of any Ai fields... how many expression index I have to build? Factorial(i) ?? Thanks in advance, Sebastián The first rule of optimization is KNOW THY DATA. The correlary is know thy user's needs. So you build indices for the most common queries, where the queries prove to be too slow for the targeted response time. You might take advantage of features of the particular DBMS and it's optimizer (is the optimizer smart enough to combine two or more indices on individual columns?). Shoot I've been away from this too long. Given that incides are order dependent, then for R(a,b,c,V1,V2,...VN). then for the attributes a,b,c you need indices for all the permutations. a b c ab ac ba ca bc cb abc acb bac bca cba cab The number of permutations for n items taken r at a time is p(n,r) = n!/(n-r)! and for all possible indices, you need the sum from r=1 to r=n. p(3,1)=3 p(3,2)=6 p(3,3)=6 and the sum is 15. For that you should be able to work out the maximum you need. Since it is a theoretical question, you don't care about the cost of maintaining all those indices. In a real implementation, there would be a trade off between having all possible indices and the maintenance costs. Depending on the DBMS, it might even sometimes be cheaper to build the particular index when it is needed and drop it afterward (say for end of month reports, create the index, run the reports and drop the index so that daily operations do not waste processing maintaing an index that is not used every day). HTH, Ed |
#7
| |||
| |||
|
|
Hello again, and thanks to everyone. My RDMS is MySQL which only has BTree and Hash index types. I've found this article from John Wu about FastBit: http://sdm.lbl.gov/fastbit/. FastBit is a stand-alone indexing software in C++. It does not work with MySQL yet, but they are preparing for a LGPL release some time early next year. I haven't opportunity to test it, but I'm sure a fresh new idea is what we need in this case. Studying user' queries is the common solution now, but for the future and playing around with datamining and huge databases we will need more powerfull indexing structures. Best, Sebastián |
#8
| |||
| |||
|
|
regarding your last comments, I wonder. Is most datamining done manually by experts forming queries to make unique correlations? Or is it done programmatically where systematic queries are run looking for numerous results? Either way, isn't the goal to make connections that were overlooked by the DB designers? As such wouldn't that involve at least one non-key attribute in the query? IOW, given R(A1,A2,A3,...AN,V1,V2,V3,...VM) don't you need to optimize access to a result set like R(Ai,...Aj,Vh,...Vk) or maybe even R(Vd,...Vg) ?? Or am I missing the point of datamining? |
|
Ed |
#9
| |||
| |||
|
|
Ed Prochak wrote: regarding your last comments, I wonder. Is most datamining done manually by experts forming queries to make unique correlations? Or is it done programmatically where systematic queries are run looking for numerous results? Either way, isn't the goal to make connections that were overlooked by the DB designers? As such wouldn't that involve at least one non-key attribute in the query? IOW, given R(A1,A2,A3,...AN,V1,V2,V3,...VM) don't you need to optimize access to a result set like R(Ai,...Aj,Vh,...Vk) or maybe even R(Vd,...Vg) ?? Or am I missing the point of datamining? No. My point is that R is a datamining process' result (combined with several satelital tables to show it nicely). In fact, I'm building a web layer to answer at subsecond time any field query combination. Sebastián. |
![]() |
| Thread Tools | |
| Display Modes | |
| |