dbTalk Databases Forums  

TechTips: The "Capacity" of a Paradox Table (and "vs. Access")

comp.databases.paradox comp.databases.paradox


Discuss TechTips: The "Capacity" of a Paradox Table (and "vs. Access") in the comp.databases.paradox forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Sundial Services
 
Posts: n/a

Default TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-04-2006 , 01:15 PM






So, just how much data can a Paradox table hold? Well, it depends ...

+++ BASIC STORAGE-FORMAT:
A Paradox table (.DB) file, /and/ the index-files (.PX, .Xnn, .Ynn), is
physically divided into "blocks." There can be up to 65535 of these
blocks, and the size will be some multiple of 2K as set in the BDE
Configuration utility.

Each block contains 6 bytes of overhead followed by an integral number of
records. Any excess space at the end of the block is not used.

Unlike MS-Access, each record is /fixed/ /length/ ... if a field is defined
as, say, "A255" then it will have exactly 255 bytes reserved for it, even
if the value is now one byte long. While on the one hand this is somewhat
wasteful of storage, it makes record-management always very simple even if
the lengths of string-fields change frequently.

Memo-fields are stored in a separate ".MB" file whose storage format is much
different. Various lengths of information are stored in various ways in
that file. Empty fields are not stored at all.

The maximum length of any single file is 2 gigabytes.

Space in a table is constantly recycled. As records are inserted, blocks
may be "split" to make room, and as they are removed, records may be
merged. The objective of Paradox in this regard is not economy of storage
but rather speed. When a block becomes completely empty, it is recycled.


+++ INDEX-FILES:
Index-files have a table-like structure. In the case of the primary index,
the basic "record" consists of a primary-key value and a record-number; in
a secondary-index the record consists of the secondary-key value, the
primary-key value, and a hint. But on top of that, and in the same files,
is a "B-tree" data structure. We don't need to explain how a B-tree is
structured, but we do have to say that you must allow room for it.



+++ SPEED VS. SPACE:
Generally speaking, you should prize "speed," and /consistency/ of speed,
over "space."

When a table is "packed," BDE copies the information from the old table into
a new one, then renames them. This will reduce the storage allocation
somewhat and it will rebuild the B-trees of the indexes. In production
situations, though, it is often faster to rebuild the B-trees without
packing the table itself.

Furthermore, a database structure is designed to be self-maintaining. When
you pack or reindex a table, performance will be about as best as it can
be, and over time it will degrade ... somewhat. But then it will begin to
stabilize.

We recommend that ChimneySweep(R) jobs should be run at least once a night
to back-up the databases on-disc, and that this is a good time to reindex
the tables; packing isn't needed.


+++ SINGLE-FILE VS. MULTIPLE-FILE STRUCTURE:
Whereas a Microsoft Access table .. and everything else .. is stored in a
single file and is subject to the space-limits of such a file (~2GB), the
Paradox system uses a family of files to store each table, and every other
object in the system as well. This gives the resulting system much more
redundancy and also a greater overall storage capacity in general.

While the storage format of Paradox is arguably "wasteful," it must be
frankly said that disk-space is no longer a serious consideration anyway.
Speed, and consistency in speed, is. Paradox consistently out-performs
Access under load for this reason. And it should also be noted that many
SQL databases use the multiple-files-per-table format as well.

----
ChimneySweep(R): F-A-A-ST table repair at a click of the mouse!
http://www.sundialservices.com/products/chimneysweep

Reply With Quote
  #2  
Old   
Larry DiGiovanni
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-04-2006 , 01:33 PM






Sundial Services wrote:

Quote:
+++ SINGLE-FILE VS. MULTIPLE-FILE STRUCTURE:
Whereas a Microsoft Access table .. and everything else .. is
stored in a single file and is subject to the space-limits of
such a file (~2GB)
An Access database can be distributed among multiple individual .MDB files,
so an application storage is not limited in the way you suggested above.

--
Larry DiGiovanni
Digico, Inc.
IT Consulting and Staffing Solutions
www.digicoinc.com
Check out www.thedbcommunity.com for Paradox resources.




Reply With Quote
  #3  
Old   
Sundial Services
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-04-2006 , 01:35 PM



Larry DiGiovanni wrote:
Quote:
Sundial Services wrote:

+++ SINGLE-FILE VS. MULTIPLE-FILE STRUCTURE:
Whereas a Microsoft Access table .. and everything else .. is
stored in a single file and is subject to the space-limits of
such a file (~2GB)

An Access database can be distributed among multiple individual .MDB
files, so an application storage is not limited in the way you suggested
above.
You are correct, Larry, that multiple MDB files can be "attached." When
this is done, each one of the MDB-files (which might each contain just one
table) can grow to its respective limit.

In the context of Access, this is somewhat a "workaround." In the context
of Paradox it is the normal procedure of the system.

The intention of this article was "simple comparison," not "which is better
or worse." Thank you for the important clarification.

----
ChimneySweep(R): F-A-A-ST table repair at a click of the mouse!
http://www.sundialservices.com/products/chimneysweep


Reply With Quote
  #4  
Old   
Michelle Burnore
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-06-2006 , 02:29 PM



Sundial Services wrote:

Quote:
+++ INDEX-FILES:
Index-files have a table-like structure. In the case of the primary index,
the basic "record" consists of a primary-key value and a record-number; in
a secondary-index the record consists of the secondary-key value, the
primary-key value, and a hint. But on top of that, and in the same files,
is a "B-tree" data structure. We don't need to explain how a B-tree is
structured, but we do have to say that you must allow room for it.
Actually, I wish you would explain how a B-tree is structured. When my
tables get corrupted, it is invariably a "B-tree mismatch" error. Could
be my own fault for not packing them often enough.

Michelle


Reply With Quote
  #5  
Old   
Michael Kennedy
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-07-2006 , 12:35 PM



Michelle,

Hopefully, Mike has a "proper" description of a B-Tree. Meanwhile, I'll give
is a shot here.

B=Balanced, Binary - I forget!!

Primary index files contain loads of 2-field records: the fields being the
KEY of each data record, and the location within the data-file of that
record. This "Location" could be a Record-No (1, 20, 5), or could be the
physical offset (byte-position) within the data-file of the start of that
record.

Example: Customer table, 5-digit Customer-No is the primary key. Suppose 4
records have been created, in this order: Cust-No 3, 6, 2, 4. The Data file
will have 3-xxx, 6-xxx, 2-xxx, 4-xxx. If the Primary index is on Cust-No,
then, assuming the Record-No is the 2nd field in the pairing (rather than
the byte-offset), the index-file will contain these pairs of fields, and in
this order: 2/3, 3/1, 4/4.6/2. When you ask for Cust 6, Pdox will "search"
through the index quickly, find the "6", note that its data-record is at
location 2, and then passes you the data from slot-2 in the data-file.

As you add more-and-more key-pairs to the index file, searching it
sequentially from the start will get slower. To speed up this searching,
B-tree systems split the Index into "Blocks" - with a block-size chosen to
maximise performance. Block-sizes of Indices are typically 512 bytes, or
2048 bytes, or similar values. Key-pairs will be added to the first "block"
until it's full. An index-block might contain 20-30-40-50+ key-pairs.
Because the keys are always in sequence within a Block, "searching" through
a block of key-pairs might be done on a "Binary-Chop" basis - instead of
searching through all the key-pairs from the start, the search code might
locate the centre key, check if the required one is below-or-above, move to
that half of the pairs, split that half, and try from there... etc...

After the first block is full, then the addition of the next key-pair causes
that block to be "split" - this is the critical bit. Usually, it's split in
the middle:
- Suppose the First (and only) old block is Block-1
- A new block is activated, Block-2, and the first half of the old keys
from Bl-1 are inserted to Bl-2
- Another new block is activated, Block-3, and the second half of the old
keys from Bl-1 are inserted to Bl-3
- The new key-pair is inserted into either Bl-2 or Bl-3, depending on the
key-value of the new record.
- All key-pairs in Bl-2 and Bl-3 are kept in sequence. Both of these
blocks are now about half-full.
- Finally, Bl-1 is cleared, and the end key from Bl-2 and the end key from
Bl-3 are inserted into Bl-1, each with a reference to their respective
index-block (2 and 3).
- Thus, Bl-1 is now a "Parent" block (at, say, level-1), with only 2
key-pairs, and it refers only to Bl-2 and Bl-3 (at level-2).

When further key-pairs are added, these go into Bl-2 or Bl-3, until one of
them becomes full. If another key must be added to a full block, that block
is split - as above. Now, we have Bl-4 appearing al the lower level. The end
key-pair from Bl-4 will be added to Bl-1. This process continues, creating
many blocks at level-2, and, for each, adding an extra key-pair to Bl-1.
Finally, after, say, 20/30/40 blocks are created at level-1, then Bl-1 will
be full of key-pairs. When yet another block is created at level-2, and a
key-pair must be inserted to the parent, then the parent block (Bl-1) must
now be split, now creating 2 blocks at level-1, and creating a new 2-pair
parent block above it. At that point the B-tree will have 3 levels: One
parent block at the top, two child blocks at the next level, and dozens of
the "real" key-pairs at the bottom level. And so on.....

Searching: To locate a specific record: search the Top-level block to
identify the appropriate block at the next level. Use that to locate "the"
block below, and so on, until you reach the bottom block. Search that one to
get the actual location of the actual data-record. This results in very fast
searches. Typically, Indices would very rarely exceed 3 or 4 levels, so
locating a record among "millions" requires reading/searching only 2-3-4
blocks.

Deleting records is the reverse - blocks become empty, and we have "merging"
of blocks, and the collapse of Index levels.

"Balancing" is another layer of complexity - if many records are
added/deleted in one region of the index structure, we might have many
"levels" developing in that region of the index, and few levels in other
regions. When blocks are split (or merged) it's desirable to not only split
the keys of the full block, but to also check on other adjacent blocks, and
distribute the keys more evenly among many adjacent blocks. However,
maintaining "Balance" is tricky programming, and can be time consuming.

The above explanation is a "simple" one - intentionally. A common variation
is to AVOID duplication of all those key-pairs which refer to actual
data-records: when a block is split, the end key-pair of each half is kept
in the parent block ONLY, and is NOT duplicated at the ends of the 2 split
halves - but the search logic must, obviously, be aware of this approach.

I think a few simple diagrams would help a lot... Maybe Google would find
good descriptions of B-Trees. This might be useful , though I did not read
through it: http://en.wikipedia.org/wiki/B-tree

Now, you're sorry you asked!! You'll have to try to relate the above VERY
brief note to the errors you've seen. Better still would be to use some
utility software to view the contents of the PDox Index files. Maybe just
try copying them to ordinary "DB" files, and then see if Paradox will show
you the contents.

Whew!!
- Mike

"Michelle Burnore" <michelle (AT) compumail-directmail (DOT) com> wrote

Quote:
Sundial Services wrote:


+++ INDEX-FILES:
Index-files have a table-like structure. In the case of the primary
index,
the basic "record" consists of a primary-key value and a record-number;
in
a secondary-index the record consists of the secondary-key value, the
primary-key value, and a hint. But on top of that, and in the same
files,
is a "B-tree" data structure. We don't need to explain how a B-tree is
structured, but we do have to say that you must allow room for it.

Actually, I wish you would explain how a B-tree is structured. When my
tables get corrupted, it is invariably a "B-tree mismatch" error. Could
be my own fault for not packing them often enough.

Michelle




Reply With Quote
  #6  
Old   
Sundial Services
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-09-2006 , 07:48 PM



Michelle Burnore wrote:
Quote:
Actually, I wish you would explain how a B-tree is structured. When my
tables get corrupted, it is invariably a "B-tree mismatch" error. Could
be my own fault for not packing them often enough.
Very well... here goes.

In a Paradox table, the ".PX" (primary index) file contains the primary-key
values and the corresponding record-numbers. The ".Xnn" and ".Ynn" file
pairs (secondary indexes) have the same basic relationship to each other as
a ".DB" file has to a ".PX," and the data-record consists of the {secondary
key, primary key}. So let's focus our attention on the ".PX" and what is
actually in it.

The purpose of a primary-index, as we well know, is to enable us to find any
primary-key value quickly, without searching the entire ".DB" file. We can
search the index-file instead. And so, the index-file (.PX) contains not
only the {primary key, record#} information that we described, but also a
"B-tree" structure.

The reason why we call it a "tree" is that, conceptually, the structure
consists of "nodes" (limbs...) and "leaves." The leaves of this tree are
the entries that actually point to records in the DB-file. The nodes are
records that point elsewhere in the index-file itself.

So let me describe a simple search, using the analogy of a set of Chinese
boxes... where one box contains other (smaller) boxes and those (smaller)
boxes may, in turn, contain more (still smaller) boxes. We're looking for
a primary-key that is a number... the number 1234.

We go to the "root node" of the index, and here we find four very large
boxes, labeled "0 and up", "2000 and up", "4000 and up", and "6000 and up."
So which box do we open? Yes indeed, the first one. [Paradox sees, in the
root-node of the index, the string of values {0, recno1, 2000, recno2,
4000, recno3, 6000, recno4} and selects the first one. The next "Chinese
box" will be in the index file at record "recno1."]

Opening the "0 and up" box, which represents the second-tier depth of nodes
in our tree structure, we find "247 and up", "563 and up", "1011 and up,"
and nothing more. Clearly we see that there are no values in this table at
all which have a primary-key less than 247. And we can once again see
which box we need to open: the third one.

Now at the third-tier depth of our tree structure, we open the "1011 and up"
box and we find just two slips of paper in it. One says "the record with
key '1011' is at record #4042 in the DB file." The second says "the record
with key '1234' is at record #4678 in the DB-file." We've found our
record! And even though there are several thousand records in the table,
we only had to examine the contents of three Chinese boxes to find it.

Ahh... but now we check just one thing more. We go to the DB-file and pull
out record #4678, AND WE CHECK IT TO MAKE SURE that it really DOES contain
the key that we expected: "1234."

If for whatever reason it did NOT, we'd have a "B-tree mismatch" error (one
kind of such error, actually) and the index would therefore be "out of
date" ... which is to say, "it's wrong!"

The PX-file therefore contains, in addition to the "leaves" (the third box
we opened was a "leaf" because the slips of paper inside "pointed to"
DB-file records), one or more tiers of "nodes" beginning at the root.

The Paradox system uses all kinds of clever algorithms to "maintain" the
B-tree structures in the various files so that they are always up-to-date
and so that they can deliver predictable performance.

----
ChimneySweep(R): F-A-A-ST table repair at a click of the mouse!
http://www.sundialservices.com/products/chimneysweep


Reply With Quote
  #7  
Old   
Jim Hargan
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-10-2006 , 08:59 PM



Neat! Many thanks.

Jim Hargan

Reply With Quote
  #8  
Old   
Michelle Burnore
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-19-2006 , 10:25 AM



Thank you! You'd make a great teacher. You explain things well.

Michelle

Reply With Quote
  #9  
Old   
Sundial Services
 
Posts: n/a

Default Re: TechTips: The "Capacity" of a Paradox Table (and "vs. Access") - 01-19-2006 , 10:30 AM



Michelle Burnore wrote:
Quote:
Thank you! You'd make a great teacher. You explain things well.
*!* :-}

----
ChimneySweep(R): F-A-A-ST table repair at a click of the mouse!
http://www.sundialservices.com/products/chimneysweep


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.