![]() | |
#11
| |||
| |||
|
|
I'm not sure how valid these old benchmarks are. Newer systems are likely to have much more ram (20x would not be unusual) and clock speeds that are also much faster (166Mhz vs 3.6Ghz = 20x) as well as "smarter" cpus, controllers, and disks in general. Disk speeds have not increased commensurably (5400rpm - 15000 rpm = 3x, 14ms vs 5ms = 3x). Thus I think that the time it takes to get a frame into memory has sped up a little, but the time it takes to process (scan?) that frame by the cpu has sped up quite a lot. So you are way better off to minimize disk reads and maximize ram reads (because of disk geometry, you can read 4k just about as fast as 512 bytes, n'est pas?), especially since the time it takes to scan that entire 4k frame once it is in memory is a lot less than the time it took to scan a 512 byte frame on an older system. This is the simple reason why newer versions of D3 (for example) keep increasing the frame size. I also recall that Sequoia had 4k frames back in the 90's, probably for just the same reasons. Most OSs also read-ahead automatically (as do controllers, as do the disk drives themselves- 8MB of memory on a drive is not uncommon these days), so whether the next "contiguous" frame is 512 bytes away or 16K bytes away, it is also probably already in memory. As previously mentioned, overflow is bad because it never proximate to the current primary frame on the disk, thus nearly always necessitating one or more additional disk reads. I would guess that the extra reads cause the process to loose its timeslice(s), thus slowing everything down even further. The real question is (which I think Mark alluded to) is whether or not with larger separation all the frames associated with a non-overflowed group are read in at once. If the frames are read in as-needed, then a larger separation is really like "pre-allocated" overflow (which I think was already mentioned) and the benefit is primarily one of less head travel and the probability that next needed frame is already in memory somewhere due to the various read-ahead strategies employed by the OS, controller, and disk. If the entire group (all frames defined by the separation) is read in at once, then it's gravy. In any case, it seems to me that when in doubt, a too-large separation is better than one that is too-small. Scott Ballinger Pareto Corporation Edmonds WA USA 206 713 6006 Brian Bond wrote: I found the results of a performance test I conducted serveral years ago. The tests were performed on a UV/NT system, the details of what kind of processor, UV version, etc are lost to time. But I believe that these numbers are still good for showing relative performance and the adverse effects of overflow. I would also expect that they are reasonably relevent for non-UV systems as well. Records used an incrementing "zero-filled" sequential record key (this provides the most even hashing). Records were equally sized; the intent was to squeeze as many records as possible into a group without creating overflow (unless indicated). I do not recall if the records were loading sequentially (which would have created the most efficient physical layout of overflow groups), or if I had scrambled the loading sequence (which would have caused much less physical efficiency in the overflow). But in all liklihood, records were created sequentially, so overflow effects on a production system would be even worse that shown below. "Load time" is the number of seconds it took to create all the records. The worst aspect of this data is that I am not sure if the benchmark times are HH:MM or MM:SS! So make of it what you will. Regardless, the numbers do show just how bad or a preformance hit overflow causes, albeit less on adding records than with selects. I think the select and sselect tests were based on the key, not any attributes, but I could be wrong about this. One test I didn't run and should have, is the performance of partially primary groups. Anyway, I hope this data is of some use. Tests A, B, and C tested various separations of a file without going into overflow. Tests D and E tested overflow of one frame and two frames respectively, and would best be compared to test A. (note: set to fixed width font for best display) test modulo sep record physical data load records sselect select count size size time per group time time A: 20011 4 1700935 40984576 40982528 1345 85 2:35 0:24 B: 40499 2 1700935 41472000 40822440 650 41 2:37 0:23 C: 80996 1 1700935 41480704 41470408 393 21 2:30 N/A D: 10005 4 1700935 41156608 40982520 1162 170 4:25 2:11 E: 4988 8 1700935 61296640 40822440 2320 341 5:36 2:51 N/A = not available (didn't run) The system was rebooted for each test, after the records were loaded into the file and before benchmarking. |
#12
| |||
| |||
|
|
I'm not sure how valid these old benchmarks are. Newer systems are likely to have much more ram (20x would not be unusual) and clock speeds that are also much faster (166Mhz vs 3.6Ghz = 20x) as well as "smarter" cpus, controllers, and disks in general. Disk speeds have not increased commensurably (5400rpm - 15000 rpm = 3x, 14ms vs 5ms = 3x). Thus I think that the time it takes to get a frame into memory has sped up a little, but the time it takes to process (scan?) that frame by the cpu has sped up quite a lot. So you are way better off to minimize disk reads and maximize ram reads (because of disk geometry, you can read 4k just about as fast as 512 bytes, n'est pas?), especially since the time it takes to scan that entire 4k frame once it is in memory is a lot less than the time it took to scan a 512 byte frame on an older system. This is the simple reason why newer versions of D3 (for example) keep increasing the frame size. I also recall that Sequoia had 4k frames back in the 90's, probably for just the same reasons. Most OSs also read-ahead automatically (as do controllers, as do the disk drives themselves- 8MB of memory on a drive is not uncommon these days), so whether the next "contiguous" frame is 512 bytes away or 16K bytes away, it is also probably already in memory. As previously mentioned, overflow is bad because it never proximate to the current primary frame on the disk, thus nearly always necessitating one or more additional disk reads. I would guess that the extra reads cause the process to loose its timeslice(s), thus slowing everything down even further. The real question is (which I think Mark alluded to) is whether or not with larger separation all the frames associated with a non-overflowed group are read in at once. If the frames are read in as-needed, then a larger separation is really like "pre-allocated" overflow (which I think was already mentioned) and the benefit is primarily one of less head travel and the probability that next needed frame is already in memory somewhere due to the various read-ahead strategies employed by the OS, controller, and disk. If the entire group (all frames defined by the separation) is read in at once, then it's gravy. In any case, it seems to me that when in doubt, a too-large separation is better than one that is too-small. Scott Ballinger Pareto Corporation Edmonds WA USA 206 713 6006 |
![]() |
| Thread Tools | |
| Display Modes | |
| |