![]() | |
#1
| |||
| |||
|
#2
| ||||
| ||||
|
|
I am trying to compare and contrast the use of Hash-table and B-Trees as indexing mechanisms for record-based files. Here's what I have got so far. Kindly suggest additions/modifications. Hash-table 1. Simple indexing scheme 2. Insert and delete operations very efficient |
|
3. Retrieve operation very efficient 4. Need to provide a hashing function that hashes record keys into some number 5. Supporting GetNext and GetBulk operations quite expensive 6. Does not scale that well as the size of the data increases |
|
B-Tree 1. Complex indexing scheme 2. Insert and delete operations slightly expensive due to need to retain the B-Tree property (i.e. splitting/joining nodes) 3. Retrieve operation very efficient 4. No need to provide a hashing function 5. Supporting the GetNext and GetBulk operations very efficient 6. Scales extremely well as the size of the data increases |
|
Regards, KP Bhat [My apologies in advance if this query is OT for any of the posted groups] |
#3
| ||||||
| ||||||
|
|
KP Bhat wrote: I assume by B-Tree we also include balanced trees like AVL. Hash collision is handled thru a linked list. I am trying to compare and contrast the use of Hash-table and B-Trees as indexing mechanisms for record-based files. Here's what I have got so far. Kindly suggest additions/modifications. Hash-table 1. Simple indexing scheme 2. Insert and delete operations very efficient Delete is not efficient. A potentially long linked list need to be traversed in case of collisions. |
|
3. Retrieve operation very efficient 4. Need to provide a hashing function that hashes record keys into some number |
|
5. Supporting GetNext and GetBulk operations quite expensive |
|
6. Does not scale that well as the size of the data increases |
|
7. Collision handling can lead to poor searches ("retrieve operation"). The complexity is O(n) for search and delete. |
| B-Tree 1. Complex indexing scheme 2. Insert and delete operations slightly expensive due to need to retain the B-Tree property (i.e. splitting/joining nodes) 3. Retrieve operation very efficient 4. No need to provide a hashing function 5. Supporting the GetNext and GetBulk operations very efficient 6. Scales extremely well as the size of the data increases 7. search, insert, delete always O(log n) (thus supports #6 above) Once a solid library is implemented for the tree, its good to go with tree. You never know you may have to handle large data in future. |
#4
| |||
| |||
|
|
"kar1107 (AT) gmail (DOT) com" wrote: KP Bhat wrote: I assume by B-Tree we also include balanced trees like AVL. Hash collision is handled thru a linked list. I am trying to compare and contrast the use of Hash-table and B-Trees as indexing mechanisms for record-based files. Here's what I have got so far. Kindly suggest additions/modifications. Hash-table [snip] 6. Does not scale that well as the size of the data increases Scales just fine until the data base no longer fits in virtual memory. Then the B-tree is far superior. |
#5
| |||
| |||
|
|
"CBFalconer" <cbfalconer (AT) yahoo (DOT) com> wrote in message news:41B5A6AC.FCE24212 (AT) yahoo (DOT) com... "kar1107 (AT) gmail (DOT) com" wrote: KP Bhat wrote: I assume by B-Tree we also include balanced trees like AVL. Hash collision is handled thru a linked list. I am trying to compare and contrast the use of Hash-table and B-Trees as indexing mechanisms for record-based files. Here's what I have got so far. Kindly suggest additions/modifications. Hash-table [snip] 6. Does not scale that well as the size of the data increases Scales just fine until the data base no longer fits in virtual memory. Then the B-tree is far superior. Why does a B-tree become far superior in that case? |
|
There are only two fundamental advantages to a B-tree that I can think of: records are effectively sorted, and there's no wasted space. If neither of these are important then AFAICS a hash table will offer greater performance, greater scalability (ignoring the space issue), and probably a simpler implementation. Conversely, if the fact that records are sorted _is_ important - and I think it often is - then a hash table is simply not practical. Alex |
#6
| |||
| |||
|
|
"CBFalconer" <cbfalconer (AT) yahoo (DOT) com> wrote in message "kar1107 (AT) gmail (DOT) com" wrote: KP Bhat wrote: I assume by B-Tree we also include balanced trees like AVL. Hash collision is handled thru a linked list. I am trying to compare and contrast the use of Hash-table and B-Trees as indexing mechanisms for record-based files. Here's what I have got so far. Kindly suggest additions/modifications. Hash-table [snip] 6. Does not scale that well as the size of the data increases Scales just fine until the data base no longer fits in virtual memory. Then the B-tree is far superior. Why does a B-tree become far superior in that case? |
|
There are only two fundamental advantages to a B-tree that I can think of: records are effectively sorted, and there's no wasted space. If neither of these are important then AFAICS a hash table will offer greater performance, greater scalability (ignoring the space issue), and probably a simpler implementation. Conversely, if the fact that records are sorted _is_ important - and I think it often is - then a hash table is simply not practical. |
#7
| |||
| |||
|
#8
| |||
| |||
|
#9
| |||
| |||
|
#10
| |||
| |||
|
|
I looked in the webpage for hashlib. I am looking for description of the algorithm (I see mostly implementation/usage docs in the .zip file). Can you describe your algorithm? By O(1) you mean best/common case, right? AFAIK there isn't any datastructure that can do better than O(log n) search (ie theoretically you need atleast O(log n) comparisons to zero-in on the item being searched). |
![]() |
| Thread Tools | |
| Display Modes | |
| |