dbTalk Databases Forums  

File Overflow Stats Report

comp.databases.pick comp.databases.pick


Discuss File Overflow Stats Report in the comp.databases.pick forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
sh
 
Posts: n/a

Default File Overflow Stats Report - 10-25-2011 , 10:48 AM






I am trying to check the overflow status of the files on my system. I
can do so by going to File Manager (on D3/NT), open the domain, go to
the account, and right click on the "Tables" icon. This gives me an
option "check overflow files". It's a good process report. The only
problem is that I can't print out the report. It's a screen only report.

Does anyone know what proc or program this options runs, so I can modify
it and run it from tcl to my hearts content?

Thanks

Sholom

Reply With Quote
  #2  
Old   
Scott Ballinger
 
Posts: n/a

Default Re: File Overflow Stats Report - 10-29-2011 , 05:55 PM






I think what you want can be got from the fof (file-of-files) in the
dm account.
Look at list-file-stats for more info.

(from ref...)
The attributes in the "file-of-files" file are:

ac attribute-name description......................

0 f-fms Total number of frames in the primary
file space as of last file save.
0 f-size Total number of bytes in the primary
file space as of last file save.
0 file# Item-id assigned in order in which file
was created.
0 t-frames Total number of frames in the file space
as of last file-save.
0 stat.acc Sum of all reads and writes
0 stat.ovf Sum of all overflow group accesses
0 sug.mod Suggested modulo for this file, based on
file-of-files information.
1 md Name of the md owning the file.
2 file-name File name stored in the master dictionary
(md) of the account.
3 data-name One dictionary may have many subsidiary
data files each with its own unique name.
The default data section name is identical
to the file name as stored in the md.
4 base Base of file.
4 mod Modulo of file. The number of contiguous
frames comprising the primary file space.
4 modulo Modulo of file. The number of contiguous
frames comprising the primary file space.
5 items Total number (including pointer items) of
items in file from last file save.
6 ptr-items Number of pointer items saved in last file
save.
7 ovf-itms Number of item which were partially or
wholly stored in secondary file space
during the last file save.
8 bytes Total number of bytes in file as of last
file save.
9 ptr-bytes Total number of bytes in all pointer items as
of last file save.
10 frames Total number of frames in file as of last
file save. (Does not include index frames).
11 ptr-fms Total number of frames of pointer items as
of last file save.
12 svdate Date when file was last saved.
13 reel# Tape, diskette, etc. number in a multi
'reel' file save where this file begins.
14 seq# Decimal sequence number indicating the order
in which the file was saved on the file
save media.
15 opendate Date when file was last opened (update at
save-time).
16 stat# Number of last file save on which this
file was saved.
17 mask Masks desired operations from being logged
in attributes 18-20 (release 7.0 and above only).
Currently supported masks are: "c" clear file;
and "d" delete-file.
18 file-code Valid file statuses are: "c" clear file;
"d" delete file; "n" new file; "r" rename
file; "t" restored from tape. This
attribute controls attributes 19 (timedate)
and 20 (user).
19 timedate Time-date when file activity occurred. This
attribute is dependent on attribute 18
(file-code).
20 user User id concatenated with account id causing
the associated file action. This attribute
is dependent on attribute 18 (file-code).
21 dx/dy-date Date when dx/dy file was skipped on save.
25 save-list Contains list of specific items to be saved
for this file. Used to selectively save
items in a file. Each item name is a
value. To save all items in a file, use
an asterisk (*).
29 stat.date Contains the date when the file access
statistics were last cleared.
30 stat.rdu Number of READU operations on the file.
31 stat.rdub Number of blocked READU operations on the
file.
32 stat.rdul Number of READU LOCKED operations on the file
33 stat.rdulb Number of blocked READU LOCKED operations on
the file.
34 stat.rdptr Number of pointer items read.
35 stat.rd Total number reads on the file.
36 stat.wtu Number of WRITEU operations to the file
37 stat.wtb Number of blocked writes to the file
38 stat.wtptr Number of pointer items written.
39 stat.wt Total number of writes to the file.
40 stat.sel Total number of selects on the file.
41 stat.dels Total number of deletes to the file.
42 stat.clr Total number of clear-file's done on the file.
43 stat.open Total number of open's on the file.
44 stat.ovf Read overflow group accesses.
45 stat.wtovf Write overflow group accesses.
46 read-date Last date the file was read.
47 write-date Last date the file was written.
60 Seg.Bas Segment bases for resized files.
61 Seg.Mod Segment modulos for resized files.

/Scott Ballinger
Pareto Corporation
Edmonds WA USA

Reply With Quote
  #3  
Old   
sh
 
Posts: n/a

Default Re: File Overflow Stats Report - 10-31-2011 , 07:20 AM



Thanks Scott. I knew about the fof, but I was looking for the specific
Access statement that's being executed by that procedure. I never found it.

I finally cobbled together on my own what I was looking for. As it turns
out, at least as concerns D3/NT, a better source of stats is the
fsidm,FileOfFiles, file. That file contains the most up-to-date data on
the file stats of the fsi. I found the FileOfFiles file (also known as
the STAT-FILE) had less info.

I also wanted to create some of my own dict defs that would have
specific relationships that I wanted. To create them in the above files
wouldn't be optimum, since they would get wiped out during an OS
upgrade. So I decided to create my own little file, and do a REFORMAT of
specific data from the fsi fof into my own file. Now I can do whatever I
want, to my hearts content, with the data and my defs will never be
wiped out.

Thanks again, Scott.

On 10/29/2011 6:55 PM, Scott Ballinger wrote:
Quote:
I think what you want can be got from the fof (file-of-files) in the
dm account.
Look at list-file-stats for more info.

(from ref...)
The attributes in the "file-of-files" file are:

ac attribute-name description......................

0 f-fms Total number of frames in the primary
file space as of last file save.
0 f-size Total number of bytes in the primary
file space as of last file save.
0 file# Item-id assigned in order in which file
was created.
0 t-frames Total number of frames in the file space
as of last file-save.
0 stat.acc Sum of all reads and writes
0 stat.ovf Sum of all overflow group accesses
0 sug.mod Suggested modulo for this file, based on
file-of-files information.
1 md Name of the md owning the file.
2 file-name File name stored in the master dictionary
(md) of the account.
3 data-name One dictionary may have many subsidiary
data files each with its own unique name.
The default data section name is identical
to the file name as stored in the md.
4 base Base of file.
4 mod Modulo of file. The number of contiguous
frames comprising the primary file space.
4 modulo Modulo of file. The number of contiguous
frames comprising the primary file space.
5 items Total number (including pointer items) of
items in file from last file save.
6 ptr-items Number of pointer items saved in last file
save.
7 ovf-itms Number of item which were partially or
wholly stored in secondary file space
during the last file save.
8 bytes Total number of bytes in file as of last
file save.
9 ptr-bytes Total number of bytes in all pointer items as
of last file save.
10 frames Total number of frames in file as of last
file save. (Does not include index frames).
11 ptr-fms Total number of frames of pointer items as
of last file save.
12 svdate Date when file was last saved.
13 reel# Tape, diskette, etc. number in a multi
'reel' file save where this file begins.
14 seq# Decimal sequence number indicating the order
in which the file was saved on the file
save media.
15 opendate Date when file was last opened (update at
save-time).
16 stat# Number of last file save on which this
file was saved.
17 mask Masks desired operations from being logged
in attributes 18-20 (release 7.0 and above only).
Currently supported masks are: "c" clear file;
and "d" delete-file.
18 file-code Valid file statuses are: "c" clear file;
"d" delete file; "n" new file; "r" rename
file; "t" restored from tape. This
attribute controls attributes 19 (timedate)
and 20 (user).
19 timedate Time-date when file activity occurred. This
attribute is dependent on attribute 18
(file-code).
20 user User id concatenated with account id causing
the associated file action. This attribute
is dependent on attribute 18 (file-code).
21 dx/dy-date Date when dx/dy file was skipped on save.
25 save-list Contains list of specific items to be saved
for this file. Used to selectively save
items in a file. Each item name is a
value. To save all items in a file, use
an asterisk (*).
29 stat.date Contains the date when the file access
statistics were last cleared.
30 stat.rdu Number of READU operations on the file.
31 stat.rdub Number of blocked READU operations on the
file.
32 stat.rdul Number of READU LOCKED operations on the file
33 stat.rdulb Number of blocked READU LOCKED operations on
the file.
34 stat.rdptr Number of pointer items read.
35 stat.rd Total number reads on the file.
36 stat.wtu Number of WRITEU operations to the file
37 stat.wtb Number of blocked writes to the file
38 stat.wtptr Number of pointer items written.
39 stat.wt Total number of writes to the file.
40 stat.sel Total number of selects on the file.
41 stat.dels Total number of deletes to the file.
42 stat.clr Total number of clear-file's done on the file.
43 stat.open Total number of open's on the file.
44 stat.ovf Read overflow group accesses.
45 stat.wtovf Write overflow group accesses.
46 read-date Last date the file was read.
47 write-date Last date the file was written.
60 Seg.Bas Segment bases for resized files.
61 Seg.Mod Segment modulos for resized files.

/Scott Ballinger
Pareto Corporation
Edmonds WA USA

Reply With Quote
  #4  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: File Overflow Stats Report - 10-31-2011 , 12:04 PM



Sholom, over-all my advice here is to step back and reconsider your
approach.

The FSI FOF is indeed different from the VME FOF, but then again it's
a different file system. The FOF is updated in real-time, unlike
R83-era stat-file which was only updated by the Save verb. From
Scott's note:

Quote:
39 stat.wt Total number of writes to the file.
40 stat.sel Total number of selects on the file.
41 stat.dels Total number of deletes to the file.
42 stat.clr Total number of clear-file's done on the file.
43 stat.open Total number of open's on the file.
So if you copy the items out you're going to be out of sync
immediately, though for purposes described here I guess it doesn't
matter.

If you do create definitions and queries which are useful, I encourage
you to post them to the TL forum or elsewhere so that others can get
them. (I'd suggest PickWiki.com but I don't think we can upload files
to that site yet.)

D3 File resizing is different from what it used to be:
- The frame size can be changed on a per-file basis. This is almost
equivalent to using a Separation, but even with Sep there is a frame
boundary between contiguous frames. I'm not sure internally if there
are any real differences or benefits between Sep and larger frame size
but I haven't used Sep in a decade.
- Files can be resized on the fly, so you don't need to save/restore
to implement the change. Files are still physically fragmented
between the original space and the new space, only re-integrated on a
restore, but benefits are immediate. If you think about it, disk is
fragmented all the time anyway these days, as we don't have control
over the platter structure like we used to when the DBMS was an OS.

The advantage with immediate resizing is that you can resize files
through the File Manager utility, then re-run that utility to see if
it did what you want. The process is relatively fast and you get
immediate results. I'm as geeky as the next guy when it comes to
wanting to look through a stat report to calculate the best modulo,
but I have other things to do, and I don't have a problem now letting
the utility deal with such things.

I also don't like to over-automate. I don't trust spell checkers and
I generally don't agree with grammar checkers. I also don't like an
auto-resizer messing with some of my files. Some files are
intentionally large and should not be shrunk. There's a setting to
prevent this from happening automatically. Because of their key
structure, some auto-resizes don't result in a decent distribution of
items, so I prefer to manually size these which whatever Modulo seems
to work.

So, as with everything, I choose the moderate approach which is to let
the system do as much as it can, but no more or less.

HTH
T

Reply With Quote
  #5  
Old   
sh
 
Posts: n/a

Default Re: File Overflow Stats Report - 11-01-2011 , 12:44 PM



Tony

Being out-of-sync is not a problem. I do this once every 6 months.

The dicts I create are not really any big deal. They are just ratios
that are not provided by the built in dicts.

The more difficult part is the black art of deciding on a better modulo.
I know the math of calculating this (calculate the avg size of an item,
figure how many will go into some percentage of the frame size, and
divide the number of items in the file by that number). An ISTAT
suggests a fairly good number. The problem is that not always do the
actual item-ids hash as well as the number would suggest. Even using
HASH-TEST apparently can give a false sense of security. (I just had a
situation where HASH-TEST corroborated that I would save many frames
with the new modulo, but when I did the RESIZE, it actually used
significantly more overflow than my original modulo. Go figure!)

I'm also not so sure about the graph that File Manager produces on a new
hash. When I try various test modulos, my gut tells me that the graph
results don't make sense.

Anyway, I only analyze a handful of files - only the ones where speed
and efficiency are critical. The others are not worth the effort (with
the speed of computers today).

On 10/31/2011 2:04 PM, Tony Gravagno wrote:
Quote:
Sholom, over-all my advice here is to step back and reconsider your
approach.

The FSI FOF is indeed different from the VME FOF, but then again it's
a different file system. The FOF is updated in real-time, unlike
R83-era stat-file which was only updated by the Save verb. From
Scott's note:

39 stat.wt Total number of writes to the file.
40 stat.sel Total number of selects on the file.
41 stat.dels Total number of deletes to the file.
42 stat.clr Total number of clear-file's done on the file.
43 stat.open Total number of open's on the file.

So if you copy the items out you're going to be out of sync
immediately, though for purposes described here I guess it doesn't
matter.

If you do create definitions and queries which are useful, I encourage
you to post them to the TL forum or elsewhere so that others can get
them. (I'd suggest PickWiki.com but I don't think we can upload files
to that site yet.)

D3 File resizing is different from what it used to be:
- The frame size can be changed on a per-file basis. This is almost
equivalent to using a Separation, but even with Sep there is a frame
boundary between contiguous frames. I'm not sure internally if there
are any real differences or benefits between Sep and larger frame size
but I haven't used Sep in a decade.
- Files can be resized on the fly, so you don't need to save/restore
to implement the change. Files are still physically fragmented
between the original space and the new space, only re-integrated on a
restore, but benefits are immediate. If you think about it, disk is
fragmented all the time anyway these days, as we don't have control
over the platter structure like we used to when the DBMS was an OS.

The advantage with immediate resizing is that you can resize files
through the File Manager utility, then re-run that utility to see if
it did what you want. The process is relatively fast and you get
immediate results. I'm as geeky as the next guy when it comes to
wanting to look through a stat report to calculate the best modulo,
but I have other things to do, and I don't have a problem now letting
the utility deal with such things.

I also don't like to over-automate. I don't trust spell checkers and
I generally don't agree with grammar checkers. I also don't like an
auto-resizer messing with some of my files. Some files are
intentionally large and should not be shrunk. There's a setting to
prevent this from happening automatically. Because of their key
structure, some auto-resizes don't result in a decent distribution of
items, so I prefer to manually size these which whatever Modulo seems
to work.

So, as with everything, I choose the moderate approach which is to let
the system do as much as it can, but no more or less.

HTH
T

Reply With Quote
  #6  
Old   
Tony Gravagno
 
Posts: n/a

Default Re: File Overflow Stats Report - 11-01-2011 , 05:18 PM



There are some factors in file-sizing that can throw off manual
calculations. The effect is that you can have two files of the same
modulo, with the same number of items, and the same byte counts, but
they'll hash and overflow very differently.

One factor is "average" item size versus actual. Items over about
1700 bytes (or 80% of frame size) are converted to pointer items, and
saved out-of-frame. So you might plan for one item per frame and you
get 90.

Another factor relates to numeric keys. If your IDs go from 1 to 2
million, you're going to get even distribution of the first million
items because the first character changes fairly regularly. But after
1 million all the keys start with "1" and after 2 million all keys
start with "2". So the place where you can expect items in the 1M+
range is different from the first million.

Another one has to do with the key structure. Numeric items
distribute with excellent regularity but variable length alphanumeric
keys can often have irregular distribution. It's again for this
reason that I have in the past suggested that we use keys that have no
meaning (not related to data like name, city, etc). If you key off of
social security number, for example, your file will hash differently
based on the demographic of your audience.

Anyway, with reference to my last post, sure it's sort of a black art
to try to get this right, but at some point it has to be "good
enough". Let the system suggest initial resizing and then make
educated guesses as to whether you need to override the suggestions.
I had a database not long ago that had hundreds of gigabytes of data,
and it was worth it to me to spend some time to get the hashing right.
Most systems I see these days have anywhere from 0.5-3GB of data in
the entire system. Mid-sized companies might go to 20GB for an
account, larger companies of course more. It's hardly worth anyone's
time to optimize the small databases, and arguably the mid-sized ones,
unless there's some really tiny files with a lot of data.

Have fun,
T

sh wrote:

Quote:
Tony

Being out-of-sync is not a problem. I do this once every 6 months.

The dicts I create are not really any big deal. They are just ratios
that are not provided by the built in dicts.

The more difficult part is the black art of deciding on a better modulo.
I know the math of calculating this (calculate the avg size of an item,
figure how many will go into some percentage of the frame size, and
divide the number of items in the file by that number). An ISTAT
suggests a fairly good number. The problem is that not always do the
actual item-ids hash as well as the number would suggest. Even using
HASH-TEST apparently can give a false sense of security. (I just had a
situation where HASH-TEST corroborated that I would save many frames
with the new modulo, but when I did the RESIZE, it actually used
significantly more overflow than my original modulo. Go figure!)

I'm also not so sure about the graph that File Manager produces on a new
hash. When I try various test modulos, my gut tells me that the graph
results don't make sense.

Anyway, I only analyze a handful of files - only the ones where speed
and efficiency are critical. The others are not worth the effort (with
the speed of computers today).

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.