dbTalk Databases Forums  

sequential disk read speed

comp.databases.theory comp.databases.theory


Discuss sequential disk read speed in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #151  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 12:43 AM






On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #152  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 12:43 AM






On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #153  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 12:43 AM



On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #154  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 12:43 AM



On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #155  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 02:32 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is
highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments
where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking
as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same
number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

I'm not sure how I implied that elevator seeking isn't worthwhile.

You didn't, and I didn't say that you did. But your implication that it
doesn't affect the number of seeks, while true, is an oversimplification in
that it doesn't take into account that in a concurrent environment, many of
those seeks can be shared by other queries.

Quote:
By the way, Oracle documentation states that an 8K block size is
optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8
=
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology
employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is
too
large will cause transactions to block more often. Do you know of a
sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

I don't see how since there is clearly a correlation between the size of
each unit of I/O and the contention for what is on each unit of I/O.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?

Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

I think that if the disk subsystem is sophisticated enough, the performance
benefit of an increased block size is lost. For example, if the controller
caches entire tracks to eliminate latency, then a larger block size would
not improve performance one bit--in fact, it would tend to reduce it because
of the vastly increased volume of data that must be transferred from the
cache to RAM. In the example above, you would have to move more than 16
times as much data from the cache to RAM to answer the same query.

Quote:
The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.

It would be equally silly to physically store multi-resolution terra pixel
images alongside scalar data that can be joined or restricted on. If
necessary, place the image heap on a separate disk subsystem with a separate
stripe size and depth, but store the scalar data on disk subsystem with a
stripe size optimal for computing joins and restrictions on it.




Reply With Quote
  #156  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 02:32 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is
highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments
where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking
as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same
number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

I'm not sure how I implied that elevator seeking isn't worthwhile.

You didn't, and I didn't say that you did. But your implication that it
doesn't affect the number of seeks, while true, is an oversimplification in
that it doesn't take into account that in a concurrent environment, many of
those seeks can be shared by other queries.

Quote:
By the way, Oracle documentation states that an 8K block size is
optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8
=
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology
employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is
too
large will cause transactions to block more often. Do you know of a
sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

I don't see how since there is clearly a correlation between the size of
each unit of I/O and the contention for what is on each unit of I/O.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?

Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

I think that if the disk subsystem is sophisticated enough, the performance
benefit of an increased block size is lost. For example, if the controller
caches entire tracks to eliminate latency, then a larger block size would
not improve performance one bit--in fact, it would tend to reduce it because
of the vastly increased volume of data that must be transferred from the
cache to RAM. In the example above, you would have to move more than 16
times as much data from the cache to RAM to answer the same query.

Quote:
The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.

It would be equally silly to physically store multi-resolution terra pixel
images alongside scalar data that can be joined or restricted on. If
necessary, place the image heap on a separate disk subsystem with a separate
stripe size and depth, but store the scalar data on disk subsystem with a
stripe size optimal for computing joins and restrictions on it.




Reply With Quote
  #157  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 02:32 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is
highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments
where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking
as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same
number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

I'm not sure how I implied that elevator seeking isn't worthwhile.

You didn't, and I didn't say that you did. But your implication that it
doesn't affect the number of seeks, while true, is an oversimplification in
that it doesn't take into account that in a concurrent environment, many of
those seeks can be shared by other queries.

Quote:
By the way, Oracle documentation states that an 8K block size is
optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8
=
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology
employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is
too
large will cause transactions to block more often. Do you know of a
sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

I don't see how since there is clearly a correlation between the size of
each unit of I/O and the contention for what is on each unit of I/O.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?

Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

I think that if the disk subsystem is sophisticated enough, the performance
benefit of an increased block size is lost. For example, if the controller
caches entire tracks to eliminate latency, then a larger block size would
not improve performance one bit--in fact, it would tend to reduce it because
of the vastly increased volume of data that must be transferred from the
cache to RAM. In the example above, you would have to move more than 16
times as much data from the cache to RAM to answer the same query.

Quote:
The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.

It would be equally silly to physically store multi-resolution terra pixel
images alongside scalar data that can be joined or restricted on. If
necessary, place the image heap on a separate disk subsystem with a separate
stripe size and depth, but store the scalar data on disk subsystem with a
stripe size optimal for computing joins and restrictions on it.




Reply With Quote
  #158  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 02:32 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is
highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments
where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking
as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same
number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

I'm not sure how I implied that elevator seeking isn't worthwhile.

You didn't, and I didn't say that you did. But your implication that it
doesn't affect the number of seeks, while true, is an oversimplification in
that it doesn't take into account that in a concurrent environment, many of
those seeks can be shared by other queries.

Quote:
By the way, Oracle documentation states that an 8K block size is
optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8
=
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology
employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is
too
large will cause transactions to block more often. Do you know of a
sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

I don't see how since there is clearly a correlation between the size of
each unit of I/O and the contention for what is on each unit of I/O.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?

Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

I think that if the disk subsystem is sophisticated enough, the performance
benefit of an increased block size is lost. For example, if the controller
caches entire tracks to eliminate latency, then a larger block size would
not improve performance one bit--in fact, it would tend to reduce it because
of the vastly increased volume of data that must be transferred from the
cache to RAM. In the example above, you would have to move more than 16
times as much data from the cache to RAM to answer the same query.

Quote:
The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.

It would be equally silly to physically store multi-resolution terra pixel
images alongside scalar data that can be joined or restricted on. If
necessary, place the image heap on a separate disk subsystem with a separate
stripe size and depth, but store the scalar data on disk subsystem with a
stripe size optimal for computing joins and restrictions on it.




Reply With Quote
  #159  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 02:32 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is
highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments
where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking
as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same
number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

I'm not sure how I implied that elevator seeking isn't worthwhile.

You didn't, and I didn't say that you did. But your implication that it
doesn't affect the number of seeks, while true, is an oversimplification in
that it doesn't take into account that in a concurrent environment, many of
those seeks can be shared by other queries.

Quote:
By the way, Oracle documentation states that an 8K block size is
optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8
=
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology
employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is
too
large will cause transactions to block more often. Do you know of a
sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

I don't see how since there is clearly a correlation between the size of
each unit of I/O and the contention for what is on each unit of I/O.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?

Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

I think that if the disk subsystem is sophisticated enough, the performance
benefit of an increased block size is lost. For example, if the controller
caches entire tracks to eliminate latency, then a larger block size would
not improve performance one bit--in fact, it would tend to reduce it because
of the vastly increased volume of data that must be transferred from the
cache to RAM. In the example above, you would have to move more than 16
times as much data from the cache to RAM to answer the same query.

Quote:
The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.

It would be equally silly to physically store multi-resolution terra pixel
images alongside scalar data that can be joined or restricted on. If
necessary, place the image heap on a separate disk subsystem with a separate
stripe size and depth, but store the scalar data on disk subsystem with a
stripe size optimal for computing joins and restrictions on it.




Reply With Quote
  #160  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 02:32 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is
highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments
where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking
as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same
number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

I'm not sure how I implied that elevator seeking isn't worthwhile.

You didn't, and I didn't say that you did. But your implication that it
doesn't affect the number of seeks, while true, is an oversimplification in
that it doesn't take into account that in a concurrent environment, many of
those seeks can be shared by other queries.

Quote:
By the way, Oracle documentation states that an 8K block size is
optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8
=
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology
employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is
too
large will cause transactions to block more often. Do you know of a
sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

I don't see how since there is clearly a correlation between the size of
each unit of I/O and the contention for what is on each unit of I/O.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?

Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

I think that if the disk subsystem is sophisticated enough, the performance
benefit of an increased block size is lost. For example, if the controller
caches entire tracks to eliminate latency, then a larger block size would
not improve performance one bit--in fact, it would tend to reduce it because
of the vastly increased volume of data that must be transferred from the
cache to RAM. In the example above, you would have to move more than 16
times as much data from the cache to RAM to answer the same query.

Quote:
The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.

It would be equally silly to physically store multi-resolution terra pixel
images alongside scalar data that can be joined or restricted on. If
necessary, place the image heap on a separate disk subsystem with a separate
stripe size and depth, but store the scalar data on disk subsystem with a
stripe size optimal for computing joins and restrictions on it.




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.