dbTalk Databases Forums  

sequential disk read speed

comp.databases.theory comp.databases.theory


Discuss sequential disk read speed in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #111  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM






On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #112  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM






On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #113  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM



On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #114  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM



On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #115  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM



On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #116  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM



On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #117  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM



On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #118  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-28-2008 , 10:43 PM



On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com....


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire trackat
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?
Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.
It’s insignificant when reading or writing 1Mb at a time.



Reply With Quote
  #119  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-29-2008 , 06:47 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com...


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or
roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed
in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire track
at
once. Isn't it true that there is a physical affinity between related
data?
Isn't it more likely that an index will occupy contiguous stripes than
some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the
number
of tracks accessed by 1024 fine stripe reads?

Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.

You're oversimplifying. With a stripe size of 64K, it is highly unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf node.
Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

By the way, Oracle documentation states that an 8K block size is optimal for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 = 64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents, which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.

It’s insignificant when reading or writing 1Mb at a time.




Reply With Quote
  #120  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-29-2008 , 06:47 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 28, 9:34 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:40d67c8b-d516-4721-a52d-20579c2ca9ac (AT) r35g2000prm (DOT) googlegroups.com...





On Aug 28, 10:47 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9 (AT) a3g2000prm (DOT) googlegroups.com...


, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or
roughly
.625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).

Drives that support disconnection or some other command queueing
mechanism
are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

Your argument rests on the assumption that data is randomly distributed
in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire track
at
once. Isn't it true that there is a physical affinity between related
data?
Isn't it more likely that an index will occupy contiguous stripes than
some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the
number
of tracks accessed by 1024 fine stripe reads?

Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.

You're oversimplifying. With a stripe size of 64K, it is highly unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf node.
Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

By the way, Oracle documentation states that an 8K block size is optimal for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 = 64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents, which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Quote:
I think using a coarse stripe is counterproductive. There would be a
bigger
chance that a seek in the middle of the read would be required.
Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average
every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no
additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28,
one
additional seek for every 29 reads is a whole lot better than one for
every
4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.

It’s insignificant when reading or writing 1Mb at a time.




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.