dbTalk Databases Forums  

D3/NT maximum modulo

comp.databases.pick comp.databases.pick


Discuss D3/NT maximum modulo in the comp.databases.pick forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Brian Bond
 
Posts: n/a

Default D3/NT maximum modulo - 07-07-2005 , 05:49 PM






Is there a limit (within reason) to the maximum modulo of a file for a D3/NT
system? I need to resize a file and from my calculations, it should be
300,001 frames in size at 4K per frame. I seem to remember that older Pick
systems had a limit to the modulo (with a vague recollection that perhaps
300K was around that limit). No mention of this in Books Online. This file
will be in the FSI.

thanks



Reply With Quote
  #2  
Old   
Mark Brown
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-07-2005 , 06:26 PM






The only real size limitiation is the 2GB limit for any single item in
Windows, at least up to NT. Since a D3 "file" is a single "item" as far as
NT is concerned, that's the limit.

300K * 4K is only 1.2GB so you should be safe.

Mark Brown


"Brian Bond" <brian (AT) XYZXYZXYZXYZbonanzapress (DOT) com> wrote

Quote:
Is there a limit (within reason) to the maximum modulo of a file for a
D3/NT system? I need to resize a file and from my calculations, it should
be 300,001 frames in size at 4K per frame. I seem to remember that older
Pick systems had a limit to the modulo (with a vague recollection that
perhaps 300K was around that limit). No mention of this in Books Online.
This file will be in the FSI.

thanks




Reply With Quote
  #3  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-08-2005 , 04:57 AM



Mark, are you referring to the memory limit, or maybe FAT16 and not
NTFS? According to this page <http://tinyurl.com/3827d> NTFS has a
file size limit in the terabyte range.

Brian, disk/modulo shouldn't be an issue at all. Just make sure you
have enough memory and paging space to accommodate this file in
addition to all other system requirements.

If all you're concerned about is the modulo, just make the frame size
8k, 16k, or larger and divide the mod by 2 or 4 as appropriate.
That's a luxury of FSI not available in a VME, then again we used to
have Separation too... It would be very interesting to see some
benchmarks for different kinds of data in files that are more "deep"
like this rather than "wide".

You might have problems indexing files that are this large. Test,
Test, and Test before running files like this in a production mode.

BTW,
- which release of D3NT are you running?
- why don't you just e-mail RD Support to get the answer to this?

T

"Mark Brown" <mbrown (AT) drexelmgt (DOT) com> wrote:

Quote:
The only real size limitiation is the 2GB limit for any single item in
Windows, at least up to NT. Since a D3 "file" is a single "item" as far as
NT is concerned, that's the limit.

300K * 4K is only 1.2GB so you should be safe.

Mark Brown


"Brian Bond" <brian (AT) XYZXYZXYZXYZbonanzapress (DOT) com> wrote in message
news:11crcd216o8an36 (AT) corp (DOT) supernews.com...
Is there a limit (within reason) to the maximum modulo of a file for a
D3/NT system? I need to resize a file and from my calculations, it should
be 300,001 frames in size at 4K per frame. I seem to remember that older
Pick systems had a limit to the modulo (with a vague recollection that
perhaps 300K was around that limit). No mention of this in Books Online.
This file will be in the FSI.

thanks




Reply With Quote
  #4  
Old   
Brian Bond
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-08-2005 , 01:17 PM



All,

Thanks for the information.

I didn't email RD because I'd just sent them a couple of other questions and
didn't feel like being a nuisance <g>.

We are running 7.4.4.

I am really not concerned about the modulo size per se, I just needed to
ensure D3 wouldn't blow up if it was too big. Agreed that I could certainly
halve it by doubling the frame size, and while I would expect a performance
hit, I doubt it would be enough to matter. Basically, I guess I have
several modulo/frame size combinations to work with, each with its own
particular combination of speed vs. storage efficiency. And I bet all of
them are well within the performance parameters I need (defined as "it
works"). I was using the 4K frame simply because it is the default.

The file in question has no indexes, just a lot of records that we don't
want to purge. And I do not want the file into overflow, as it is now.

"Frame size" seems to have replaced "separation" as a file tuning parameter,
presumably providing a bit more efficiency due to the lack of links for the
frames that formerly comprised a primary group. I did some benchmarks
several years back on a UV system (perhaps not completely relevant here,
since the internal file structures are different, but since they do both
have primary and overflow sections I am guessing there is some pertinence),
comparing separations and performance when the file went into overflow. If
I can track them down, I'll post them, but in a nutshell, IF you end up with
overflow, larger seperations yielded faster select times at the cost of more
wasted space; conversely, smaller seperations yielded better space
efficiency but were significantly slower for selects when the file is in
overflow. There were some other interesting observations as well, but I
forget what they were, so perhaps they weren't THAT interesting....

- BB


"Tony Gravagno" <g6q3x9lu53001 (AT) sneakemail (DOT) com.invalid> wrote

Quote:
Mark, are you referring to the memory limit, or maybe FAT16 and not
NTFS? According to this page <http://tinyurl.com/3827d> NTFS has a
file size limit in the terabyte range.

Brian, disk/modulo shouldn't be an issue at all. Just make sure you
have enough memory and paging space to accommodate this file in
addition to all other system requirements.

If all you're concerned about is the modulo, just make the frame size
8k, 16k, or larger and divide the mod by 2 or 4 as appropriate.
That's a luxury of FSI not available in a VME, then again we used to
have Separation too... It would be very interesting to see some
benchmarks for different kinds of data in files that are more "deep"
like this rather than "wide".

You might have problems indexing files that are this large. Test,
Test, and Test before running files like this in a production mode.

BTW,
- which release of D3NT are you running?
- why don't you just e-mail RD Support to get the answer to this?

T

"Mark Brown" <mbrown (AT) drexelmgt (DOT) com> wrote:

The only real size limitiation is the 2GB limit for any single item in
Windows, at least up to NT. Since a D3 "file" is a single "item" as far
as
NT is concerned, that's the limit.

300K * 4K is only 1.2GB so you should be safe.

Mark Brown


"Brian Bond" <brian (AT) XYZXYZXYZXYZbonanzapress (DOT) com> wrote in message
news:11crcd216o8an36 (AT) corp (DOT) supernews.com...
Is there a limit (within reason) to the maximum modulo of a file for a
D3/NT system? I need to resize a file and from my calculations, it
should
be 300,001 frames in size at 4K per frame. I seem to remember that
older
Pick systems had a limit to the modulo (with a vague recollection that
perhaps 300K was around that limit). No mention of this in Books
Online.
This file will be in the FSI.

thanks






Reply With Quote
  #5  
Old   
Mark Brown
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-08-2005 , 01:26 PM



Creating a file with separation simply acknowledges that there will be
overflow, so get ready by pre-assigning the first X layers of overflow
frames. When the base frame overflows, the system won't have to go to the
pool to carve another frame.

It's great for dynamic files (end of period work files) because it saves a
lot of time fiddling with the overflow table.

Even with a separation of 101, select performance shouldn't be an issue if
the data is all in base frames because the search stops at the last segment
mark in the group.

Mark Brown


"Brian Bond" <brian (AT) XYZXYZXYZXYZbonanzapress (DOT) com> wrote

Quote:
All,

Thanks for the information.

I didn't email RD because I'd just sent them a couple of other questions
and didn't feel like being a nuisance <g>.

We are running 7.4.4.

I am really not concerned about the modulo size per se, I just needed to
ensure D3 wouldn't blow up if it was too big. Agreed that I could
certainly halve it by doubling the frame size, and while I would expect a
performance hit, I doubt it would be enough to matter. Basically, I guess
I have several modulo/frame size combinations to work with, each with its
own particular combination of speed vs. storage efficiency. And I bet all
of them are well within the performance parameters I need (defined as "it
works"). I was using the 4K frame simply because it is the default.

The file in question has no indexes, just a lot of records that we don't
want to purge. And I do not want the file into overflow, as it is now.

"Frame size" seems to have replaced "separation" as a file tuning
parameter, presumably providing a bit more efficiency due to the lack of
links for the frames that formerly comprised a primary group. I did some
benchmarks several years back on a UV system (perhaps not completely
relevant here, since the internal file structures are different, but since
they do both have primary and overflow sections I am guessing there is
some pertinence), comparing separations and performance when the file went
into overflow. If I can track them down, I'll post them, but in a
nutshell, IF you end up with overflow, larger seperations yielded faster
select times at the cost of more wasted space; conversely, smaller
seperations yielded better space efficiency but were significantly slower
for selects when the file is in overflow. There were some other
interesting observations as well, but I forget what they were, so perhaps
they weren't THAT interesting....

- BB


"Tony Gravagno" <g6q3x9lu53001 (AT) sneakemail (DOT) com.invalid> wrote in message
news:37hsc1dbcv412f6eadvut14ts8qou2ulna (AT) 4ax (DOT) com...
Mark, are you referring to the memory limit, or maybe FAT16 and not
NTFS? According to this page <http://tinyurl.com/3827d> NTFS has a
file size limit in the terabyte range.

Brian, disk/modulo shouldn't be an issue at all. Just make sure you
have enough memory and paging space to accommodate this file in
addition to all other system requirements.

If all you're concerned about is the modulo, just make the frame size
8k, 16k, or larger and divide the mod by 2 or 4 as appropriate.
That's a luxury of FSI not available in a VME, then again we used to
have Separation too... It would be very interesting to see some
benchmarks for different kinds of data in files that are more "deep"
like this rather than "wide".

You might have problems indexing files that are this large. Test,
Test, and Test before running files like this in a production mode.

BTW,
- which release of D3NT are you running?
- why don't you just e-mail RD Support to get the answer to this?

T

"Mark Brown" <mbrown (AT) drexelmgt (DOT) com> wrote:

The only real size limitiation is the 2GB limit for any single item in
Windows, at least up to NT. Since a D3 "file" is a single "item" as far
as
NT is concerned, that's the limit.

300K * 4K is only 1.2GB so you should be safe.

Mark Brown


"Brian Bond" <brian (AT) XYZXYZXYZXYZbonanzapress (DOT) com> wrote in message
news:11crcd216o8an36 (AT) corp (DOT) supernews.com...
Is there a limit (within reason) to the maximum modulo of a file for a
D3/NT system? I need to resize a file and from my calculations, it
should
be 300,001 frames in size at 4K per frame. I seem to remember that
older
Pick systems had a limit to the modulo (with a vague recollection that
perhaps 300K was around that limit). No mention of this in Books
Online.
This file will be in the FSI.

thanks








Reply With Quote
  #6  
Old   
Brian Bond
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-08-2005 , 02:38 PM



I might not consider it to be an ackowledgement that there will be overflow.
Rather, separation (or frame size) is useful to tune file performance.

As an GENERALIZED example, if your disk is read in 4K chunks, then creating
each group to be physically 4K in size will yield better performance. But
I'd also think that if frames are too big, after the physical read is
finished, one has to account for the extra time it takes to traverse
through, say, a 16K frame vs. a 4K frame. So I do not think that there is a
"one size fits all" solution.

Performance really goes in the dumpster when you start linking on additional
frames to the group; restores excepted, those frames will probably end up
physically scattered all over the disk, making it unlikely they'll be in
cache, and thus requiring another physical read to find each frame in the
group. Thus, the smaller the frame/separation and the more frames linked
per group, the more physical reads you'll end up doing to find your record
(and the more useless data you'll read in the process from the disk as
well).

So, while overflow is not good, it is much more not good with smaller frames
and separations.

I hope this makes some sense. If it doesn't, then I will take the
responsibility due to my poor explanation.





"Mark Brown" <mbrown (AT) drexelmgt (DOT) com> wrote

Quote:
Creating a file with separation simply acknowledges that there will be
overflow, so get ready by pre-assigning the first X layers of overflow
frames. When the base frame overflows, the system won't have to go to the
pool to carve another frame.

It's great for dynamic files (end of period work files) because it saves a
lot of time fiddling with the overflow table.

Even with a separation of 101, select performance shouldn't be an issue if
the data is all in base frames because the search stops at the last
segment mark in the group.

Mark Brown



Reply With Quote
  #7  
Old   
Ed Sheehan
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-08-2005 , 03:33 PM



Comments inline...

"Mark Brown" <mbrown (AT) drexelmgt (DOT) com> wrote

Quote:
Creating a file with separation simply acknowledges that there will be
overflow, so get ready by pre-assigning the first X layers of overflow
frames. When the base frame overflows, the system won't have to go to the
pool to carve another frame.

It's great for dynamic files (end of period work files) because it saves a
lot of time fiddling with the overflow table.

Even with a separation of 101, select performance shouldn't be an issue if
the data is all in base frames because the search stops at the last
segment mark in the group.
Don't want to launch into a diatribe, but with a separation of 101, and, say
1 frame of usage, you'll have to jump 100 frames to get to the next chunk of
data, since separation is always sequential to the next group. That's head
movement, and it can be costly. Extreme example, granted. It's also good to
keep in mind that separation-based "overflow" (2nd and greater frame in the
group) is more efficient than "on the fly" overflow, because it's physically
adjacent to "primary" (1st frame) data rather than splattered all over the
platter.

Ed

Quote:
Mark Brown


"Brian Bond" <brian (AT) XYZXYZXYZXYZbonanzapress (DOT) com> wrote in message
news:11ctgrm5ctqii6e (AT) corp (DOT) supernews.com...
All,

Thanks for the information.

I didn't email RD because I'd just sent them a couple of other questions
and didn't feel like being a nuisance <g>.

We are running 7.4.4.

I am really not concerned about the modulo size per se, I just needed to
ensure D3 wouldn't blow up if it was too big. Agreed that I could
certainly halve it by doubling the frame size, and while I would expect a
performance hit, I doubt it would be enough to matter. Basically, I
guess I have several modulo/frame size combinations to work with, each
with its own particular combination of speed vs. storage efficiency. And
I bet all of them are well within the performance parameters I need
(defined as "it works"). I was using the 4K frame simply because it is
the default.

The file in question has no indexes, just a lot of records that we don't
want to purge. And I do not want the file into overflow, as it is now.

"Frame size" seems to have replaced "separation" as a file tuning
parameter, presumably providing a bit more efficiency due to the lack of
links for the frames that formerly comprised a primary group. I did some
benchmarks several years back on a UV system (perhaps not completely
relevant here, since the internal file structures are different, but
since they do both have primary and overflow sections I am guessing there
is some pertinence), comparing separations and performance when the file
went into overflow. If I can track them down, I'll post them, but in a
nutshell, IF you end up with overflow, larger seperations yielded faster
select times at the cost of more wasted space; conversely, smaller
seperations yielded better space efficiency but were significantly slower
for selects when the file is in overflow. There were some other
interesting observations as well, but I forget what they were, so perhaps
they weren't THAT interesting....

- BB


"Tony Gravagno" <g6q3x9lu53001 (AT) sneakemail (DOT) com.invalid> wrote in message
news:37hsc1dbcv412f6eadvut14ts8qou2ulna (AT) 4ax (DOT) com...
Mark, are you referring to the memory limit, or maybe FAT16 and not
NTFS? According to this page <http://tinyurl.com/3827d> NTFS has a
file size limit in the terabyte range.

Brian, disk/modulo shouldn't be an issue at all. Just make sure you
have enough memory and paging space to accommodate this file in
addition to all other system requirements.

If all you're concerned about is the modulo, just make the frame size
8k, 16k, or larger and divide the mod by 2 or 4 as appropriate.
That's a luxury of FSI not available in a VME, then again we used to
have Separation too... It would be very interesting to see some
benchmarks for different kinds of data in files that are more "deep"
like this rather than "wide".

You might have problems indexing files that are this large. Test,
Test, and Test before running files like this in a production mode.

BTW,
- which release of D3NT are you running?
- why don't you just e-mail RD Support to get the answer to this?

T

"Mark Brown" <mbrown (AT) drexelmgt (DOT) com> wrote:

The only real size limitiation is the 2GB limit for any single item in
Windows, at least up to NT. Since a D3 "file" is a single "item" as far
as
NT is concerned, that's the limit.

300K * 4K is only 1.2GB so you should be safe.

Mark Brown


"Brian Bond" <brian (AT) XYZXYZXYZXYZbonanzapress (DOT) com> wrote in message
news:11crcd216o8an36 (AT) corp (DOT) supernews.com...
Is there a limit (within reason) to the maximum modulo of a file for a
D3/NT system? I need to resize a file and from my calculations, it
should
be 300,001 frames in size at 4K per frame. I seem to remember that
older
Pick systems had a limit to the modulo (with a vague recollection that
perhaps 300K was around that limit). No mention of this in Books
Online.
This file will be in the FSI.

thanks










Reply With Quote
  #8  
Old   
Art Martz
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-10-2005 , 01:14 PM



Tony Gravagno wrote:

Quote:
That's a luxury of FSI not available in a VME, then again we used to
have Separation too... It would be very interesting to see some
benchmarks for different kinds of data in files that are more "deep"
like this rather than "wide".
It's been years, but I ran benchmarks on exactly this one time, wanting
to know myself. It might have be "R"something on an IBM Series 1. It had
the old 512 frames, as I recall. The bench I ran was a version of one I
posted here before, a lot a little records. A separation of 2 was faster
than 1, and sep 3 and 4 returned the best times, as I recall. After
that, times went downhill as the separation was increased. So that
pretty well matches the 1k and 2k frames common today.

Art


Reply With Quote
  #9  
Old   
Brian Bond
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-11-2005 , 02:24 PM



I found the results of a performance test I conducted serveral years ago.
The tests were performed on a UV/NT system, the details of what kind of
processor, UV version, etc are lost to time. But I believe that these
numbers are still good for showing relative performance and the adverse
effects of overflow. I would also expect that they are reasonably relevent
for non-UV systems as well.

Records used an incrementing "zero-filled" sequential record key (this
provides the most even hashing). Records were equally sized; the intent was
to squeeze as many records as possible into a group without creating
overflow (unless indicated). I do not recall if the records were loading
sequentially (which would have created the most efficient physical layout of
overflow groups), or if I had scrambled the loading sequence (which would
have caused much less physical efficiency in the overflow). But in all
liklihood, records were created sequentially, so overflow effects on a
production system would be even worse that shown below.

"Load time" is the number of seconds it took to create all the records.

The worst aspect of this data is that I am not sure if the benchmark times
are HH:MM or MM:SS! So make of it what you will. Regardless, the numbers do
show just how bad or a preformance hit overflow causes, albeit less on
adding records than with selects.

I think the select and sselect tests were based on the key, not any
attributes, but I could be wrong about this. One test I didn't run and
should have, is the performance of partially primary groups. Anyway, I hope
this data is of some use.


Tests A, B, and C tested various separations of a file without going into
overflow.

Tests D and E tested overflow of one frame and two frames respectively, and
would best be compared to test A.


(note: set to fixed width font for best display)

test modulo sep record physical data load records sselect
select
count size size time per group time time

A: 20011 4 1700935 40984576 40982528 1345 85 2:35 0:24
B: 40499 2 1700935 41472000 40822440 650 41 2:37 0:23
C: 80996 1 1700935 41480704 41470408 393 21 2:30 N/A
D: 10005 4 1700935 41156608 40982520 1162 170 4:25 2:11
E: 4988 8 1700935 61296640 40822440 2320 341 5:36 2:51

N/A = not available (didn't run)

The system was rebooted for each test, after the records were loaded into
the file and before benchmarking.



Reply With Quote
  #10  
Old   
Scott Ballinger
 
Posts: n/a

Default Re: D3/NT maximum modulo - 07-11-2005 , 03:25 PM



I'm not sure how valid these old benchmarks are. Newer systems are
likely to have much more ram (20x would not be unusual) and clock speeds
that are also much faster (166Mhz vs 3.6Ghz = 20x) as well as "smarter"
cpus, controllers, and disks in general. Disk speeds have not increased
commensurably (5400rpm - 15000 rpm = 3x, 14ms vs 5ms = 3x). Thus I think
that the time it takes to get a frame into memory has sped up a little,
but the time it takes to process (scan?) that frame by the cpu has sped
up quite a lot. So you are way better off to minimize disk reads and
maximize ram reads (because of disk geometry, you can read 4k just about
as fast as 512 bytes, n'est pas?), especially since the time it takes to
scan that entire 4k frame once it is in memory is a lot less than the
time it took to scan a 512 byte frame on an older system. This is the
simple reason why newer versions of D3 (for example) keep increasing the
frame size. I also recall that Sequoia had 4k frames back in the 90's,
probably for just the same reasons. Most OSs also read-ahead
automatically (as do controllers, as do the disk drives themselves- 8MB
of memory on a drive is not uncommon these days), so whether the next
"contiguous" frame is 512 bytes away or 16K bytes away, it is also
probably already in memory.

As previously mentioned, overflow is bad because it never proximate to
the current primary frame on the disk, thus nearly always necessitating
one or more additional disk reads. I would guess that the extra reads
cause the process to loose its timeslice(s), thus slowing everything
down even further.

The real question is (which I think Mark alluded to) is whether or not
with larger separation all the frames associated with a non-overflowed
group are read in at once. If the frames are read in as-needed, then a
larger separation is really like "pre-allocated" overflow (which I think
was already mentioned) and the benefit is primarily one of less head
travel and the probability that next needed frame is already in memory
somewhere due to the various read-ahead strategies employed by the OS,
controller, and disk. If the entire group (all frames defined by the
separation) is read in at once, then it's gravy. In any case, it seems
to me that when in doubt, a too-large separation is better than one that
is too-small.

Scott Ballinger
Pareto Corporation
Edmonds WA USA
206 713 6006

Brian Bond wrote:
Quote:
I found the results of a performance test I conducted serveral years ago.
The tests were performed on a UV/NT system, the details of what kind of
processor, UV version, etc are lost to time. But I believe that these
numbers are still good for showing relative performance and the adverse
effects of overflow. I would also expect that they are reasonably relevent
for non-UV systems as well.

Records used an incrementing "zero-filled" sequential record key (this
provides the most even hashing). Records were equally sized; the intent was
to squeeze as many records as possible into a group without creating
overflow (unless indicated). I do not recall if the records were loading
sequentially (which would have created the most efficient physical layout of
overflow groups), or if I had scrambled the loading sequence (which would
have caused much less physical efficiency in the overflow). But in all
liklihood, records were created sequentially, so overflow effects on a
production system would be even worse that shown below.

"Load time" is the number of seconds it took to create all the records.

The worst aspect of this data is that I am not sure if the benchmark times
are HH:MM or MM:SS! So make of it what you will. Regardless, the numbers do
show just how bad or a preformance hit overflow causes, albeit less on
adding records than with selects.

I think the select and sselect tests were based on the key, not any
attributes, but I could be wrong about this. One test I didn't run and
should have, is the performance of partially primary groups. Anyway, I hope
this data is of some use.


Tests A, B, and C tested various separations of a file without going into
overflow.

Tests D and E tested overflow of one frame and two frames respectively, and
would best be compared to test A.


(note: set to fixed width font for best display)

test modulo sep record physical data load records sselect
select
count size size time per group time time

A: 20011 4 1700935 40984576 40982528 1345 85 2:35 0:24
B: 40499 2 1700935 41472000 40822440 650 41 2:37 0:23
C: 80996 1 1700935 41480704 41470408 393 21 2:30 N/A
D: 10005 4 1700935 41156608 40982520 1162 170 4:25 2:11
E: 4988 8 1700935 61296640 40822440 2320 341 5:36 2:51

N/A = not available (didn't run)

The system was rebooted for each test, after the records were loaded into
the file and before benchmarking.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.