Michelle,
Hopefully, Mike has a "proper" description of a B-Tree. Meanwhile, I'll give
is a shot here.
B=Balanced, Binary - I forget!!
Primary index files contain loads of 2-field records: the fields being the
KEY of each data record, and the location within the data-file of that
record. This "Location" could be a Record-No (1, 20, 5), or could be the
physical offset (byte-position) within the data-file of the start of that
record.
Example: Customer table, 5-digit Customer-No is the primary key. Suppose 4
records have been created, in this order: Cust-No 3, 6, 2, 4. The Data file
will have 3-xxx, 6-xxx, 2-xxx, 4-xxx. If the Primary index is on Cust-No,
then, assuming the Record-No is the 2nd field in the pairing (rather than
the byte-offset), the index-file will contain these pairs of fields, and in
this order: 2/3, 3/1, 4/4.6/2. When you ask for Cust 6, Pdox will "search"
through the index quickly, find the "6", note that its data-record is at
location 2, and then passes you the data from slot-2 in the data-file.
As you add more-and-more key-pairs to the index file, searching it
sequentially from the start will get slower. To speed up this searching,
B-tree systems split the Index into "Blocks" - with a block-size chosen to
maximise performance. Block-sizes of Indices are typically 512 bytes, or
2048 bytes, or similar values. Key-pairs will be added to the first "block"
until it's full. An index-block might contain 20-30-40-50+ key-pairs.
Because the keys are always in sequence within a Block, "searching" through
a block of key-pairs might be done on a "Binary-Chop" basis - instead of
searching through all the key-pairs from the start, the search code might
locate the centre key, check if the required one is below-or-above, move to
that half of the pairs, split that half, and try from there... etc...
After the first block is full, then the addition of the next key-pair causes
that block to be "split" - this is the critical bit. Usually, it's split in
the middle:
- Suppose the First (and only) old block is Block-1
- A new block is activated, Block-2, and the first half of the old keys
from Bl-1 are inserted to Bl-2
- Another new block is activated, Block-3, and the second half of the old
keys from Bl-1 are inserted to Bl-3
- The new key-pair is inserted into either Bl-2 or Bl-3, depending on the
key-value of the new record.
- All key-pairs in Bl-2 and Bl-3 are kept in sequence. Both of these
blocks are now about half-full.
- Finally, Bl-1 is cleared, and the end key from Bl-2 and the end key from
Bl-3 are inserted into Bl-1, each with a reference to their respective
index-block (2 and 3).
- Thus, Bl-1 is now a "Parent" block (at, say, level-1), with only 2
key-pairs, and it refers only to Bl-2 and Bl-3 (at level-2).
When further key-pairs are added, these go into Bl-2 or Bl-3, until one of
them becomes full. If another key must be added to a full block, that block
is split - as above. Now, we have Bl-4 appearing al the lower level. The end
key-pair from Bl-4 will be added to Bl-1. This process continues, creating
many blocks at level-2, and, for each, adding an extra key-pair to Bl-1.
Finally, after, say, 20/30/40 blocks are created at level-1, then Bl-1 will
be full of key-pairs. When yet another block is created at level-2, and a
key-pair must be inserted to the parent, then the parent block (Bl-1) must
now be split, now creating 2 blocks at level-1, and creating a new 2-pair
parent block above it. At that point the B-tree will have 3 levels: One
parent block at the top, two child blocks at the next level, and dozens of
the "real" key-pairs at the bottom level. And so on.....
Searching: To locate a specific record: search the Top-level block to
identify the appropriate block at the next level. Use that to locate "the"
block below, and so on, until you reach the bottom block. Search that one to
get the actual location of the actual data-record. This results in very fast
searches. Typically, Indices would very rarely exceed 3 or 4 levels, so
locating a record among "millions" requires reading/searching only 2-3-4
blocks.
Deleting records is the reverse - blocks become empty, and we have "merging"
of blocks, and the collapse of Index levels.
"Balancing" is another layer of complexity - if many records are
added/deleted in one region of the index structure, we might have many
"levels" developing in that region of the index, and few levels in other
regions. When blocks are split (or merged) it's desirable to not only split
the keys of the full block, but to also check on other adjacent blocks, and
distribute the keys more evenly among many adjacent blocks. However,
maintaining "Balance" is tricky programming, and can be time consuming.
The above explanation is a "simple" one - intentionally. A common variation
is to AVOID duplication of all those key-pairs which refer to actual
data-records: when a block is split, the end key-pair of each half is kept
in the parent block ONLY, and is NOT duplicated at the ends of the 2 split
halves - but the search logic must, obviously, be aware of this approach.
I think a few simple diagrams would help a lot... Maybe Google would find
good descriptions of B-Trees. This might be useful , though I did not read
through it: http://en.wikipedia.org/wiki/B-tree
Now, you're sorry you asked!! You'll have to try to relate the above VERY
brief note to the errors you've seen. Better still would be to use some
utility software to view the contents of the PDox Index files. Maybe just
try copying them to ordinary "DB" files, and then see if Paradox will show
you the contents.
Whew!!
- Mike
"Michelle Burnore" <michelle (AT) compumail-directmail (DOT) com> wrote
Quote:
Sundial Services wrote:
+++ INDEX-FILES:
Index-files have a table-like structure. In the case of the primary
index,
the basic "record" consists of a primary-key value and a record-number;
in
a secondary-index the record consists of the secondary-key value, the
primary-key value, and a hint. But on top of that, and in the same
files,
is a "B-tree" data structure. We don't need to explain how a B-tree is
structured, but we do have to say that you must allow room for it.
Actually, I wish you would explain how a B-tree is structured. When my
tables get corrupted, it is invariably a "B-tree mismatch" error. Could
be my own fault for not packing them often enough.
Michelle |