dbTalk Databases Forums  

sequential disk read speed

comp.databases.theory comp.databases.theory


Discuss sequential disk read speed in the comp.databases.theory forum.

Reply
 
Thread Tools Display Modes
  #141  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-29-2008 , 09:14 PM







"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...
[snip]

Quote:
You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

Quote:
The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.
Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?






Reply With Quote
  #142  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-29-2008 , 09:14 PM







"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...
[snip]

Quote:
You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

Quote:
The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.
Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?




Reply With Quote
  #143  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-29-2008 , 09:14 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...
[snip]

Quote:
You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

Quote:
The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.
Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?




Reply With Quote
  #144  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-29-2008 , 09:14 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...
[snip]

Quote:
You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

Quote:
The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.
Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?




Reply With Quote
  #145  
Old   
Brian Selzer
 
Posts: n/a

Default Re: sequential disk read speed - 08-29-2008 , 09:14 PM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...
[snip]

Quote:
You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

Quote:
The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.
Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?




Reply With Quote
  #146  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 01:43 AM



On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #147  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 01:43 AM



On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #148  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 01:43 AM



On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #149  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 01:43 AM



On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
  #150  
Old   
David BL
 
Posts: n/a

Default Re: sequential disk read speed - 08-30-2008 , 01:43 AM






On Aug 30, 9:14 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:6463ace9-eddf-4889-8e65-17d070220a94 (AT) t1g2000pra (DOT) googlegroups.com...

On Aug 29, 7:47 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message
news:f6ef3678-c7e9-4cd2-acaf-13cac28819d6 (AT) a1g2000hsb (DOT) googlegroups.com...

[snip]

You're oversimplifying. With a stripe size of 64K, it is highly
unlikely
that a leaf node will span more than one stripe; therefore, it is highly
unlikely for every drive to contribute to the reading of every leaf
node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

Yes, I think the stripe size should be a multiple of the DBMS block size.

The choice of DBMS block size is another question entirely.

Also, you appear to be discounting concurrency, and environments where
concurrency is important such as typical OLTP environments are where
technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

That wasn't why I brought up concurrency. You dismissed elevator seeking as
an optimization mechanism with respect to the number of seeks required by
focusing on what appeared to be a single query. Ultimately the same number
of seeks will be required for a particular query, but when combined with
ninety-nine other queries, some of those seeks can be shared with other
queries, thus reducing the total number of seeks required to satisfy the
hundred.
I'm not sure how I implied that elevator seeking isn't worthwhile.

Quote:
By the way, Oracle documentation states that an 8K block size is optimal
for
most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 =
64K.
Interestingly, Sql Server uses 8K pages organized into 64K extents,
which
happens to be the unit of physical storage allocation. Do you know
something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

Sql Server 2008 still uses 8K pages and 64K extents. Also, the Oracle
documentation that cited an 8K block size as being optimal was for their
latest version, 11g.

I think that 64K extents are still optimal because the technology employed
for serializing updates is still locking, and in a concurrent environment
with an escalating locking heirachy, a page size or extent size that is too
large will cause transactions to block more often. Do you know of a sound
and practicable alternative to an escalating locking heirarchy for
serializing updates?
I think these systems lock pages not extents, and anyway locking
granularity can in principle be orthogonal to the unit of I/O or unit
of allocation.

Quote:
64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.

Why is that a problem? Isn't it more efficient if I can satisfy the same
query by reading 100 64K blocks instead of 100 1M blocks?
Yes, but a DBMS will often be able to satisfy a given query by reading
fewer blocks. For example full table scans are much more efficient
with 1M blocks. Also, if you increase the block size by 10x then the
height of a B+Tree will tend to smaller. For example a tree of height
3 may be able to index a trillion rather than only a billion records.

The advantages of a larger block size are more apparent in a database
storing data where there is a greater tendency for locality based on
affinity to be useful. For example, it would be rather silly to use
64k blocks to store multi-resolution terra pixel images.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.