dbTalk Databases Forums  

Partial sort?

comp.databases.mysql comp.databases.mysql


Discuss Partial sort? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Doug Miller
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 09:21 PM






In article <i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle <jstucklex (AT) attglobal (DOT) net> wrote:
Quote:
On 10/22/2010 2:10 PM, Lennart Jonsson wrote:
On 2010-10-22 18:53, Erick T. Barkhuis wrote:
[...]

Suppose we have these records (never mind the symantics):

id date city
-------------------------
12 2009-08-12 London
14 2009-07-16 Athens
15 2010-01-10 Paris


Now, if I have an index on 'city', what do I actually have?
- just an array [0=>Athens, 1=>London, 2=>Paris] ?
- an array [Athens=>???, London=>???, Paris=>???] ?
- the full table ordered by 'city'?
- something else?


Think of the index as a sorted tree where the leaf nodes point to a page
in the data. You will typically find the row that match your predicate
with log(n) i/o operations, and then fetch any columns not part of the
index. For a large table this will save a lot of time, for a small table
the optimizer would probably choose to ignore the index and scan the
table instead.

/Lennart

[...]

Erick, an index isn't an array at all (an array is a programming
concept, not a file concept). If anything, an index would be closer to
rows in a database or spreadsheet. And while the exact implementation
details may vary from one database manufacturer to another, the concept
is the same.

Each index entry is directly related to a single row of data. The entry
has two parts: the data in the columns used to create the index, and a
pointer to the row's physical location in the database.
Close, but no cigar.

... the data in the columns used to create the index, and the primary key of
the row.

If the index used "a pointer to the row's physical location" that would make
the job of a DBA very difficult indeed.

Reply With Quote
  #12  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 10:08 PM






On 10/22/2010 10:21 PM, Doug Miller wrote:
Quote:
In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle<jstucklex (AT) attglobal (DOT) net> wrote:
On 10/22/2010 2:10 PM, Lennart Jonsson wrote:
On 2010-10-22 18:53, Erick T. Barkhuis wrote:
[...]

Suppose we have these records (never mind the symantics):

id date city
-------------------------
12 2009-08-12 London
14 2009-07-16 Athens
15 2010-01-10 Paris


Now, if I have an index on 'city', what do I actually have?
- just an array [0=>Athens, 1=>London, 2=>Paris] ?
- an array [Athens=>???, London=>???, Paris=>???] ?
- the full table ordered by 'city'?
- something else?


Think of the index as a sorted tree where the leaf nodes point to a page
in the data. You will typically find the row that match your predicate
with log(n) i/o operations, and then fetch any columns not part of the
index. For a large table this will save a lot of time, for a small table
the optimizer would probably choose to ignore the index and scan the
table instead.

/Lennart

[...]

Erick, an index isn't an array at all (an array is a programming
concept, not a file concept). If anything, an index would be closer to
rows in a database or spreadsheet. And while the exact implementation
details may vary from one database manufacturer to another, the concept
is the same.

Each index entry is directly related to a single row of data. The entry
has two parts: the data in the columns used to create the index, and a
pointer to the row's physical location in the database.

Close, but no cigar.

.. the data in the columns used to create the index, and the primary key of
the row.

Which is what I said. However, this may or may not be the primary key
(if a primary key even exists).

Quote:
If the index used "a pointer to the row's physical location" that would make
the job of a DBA very difficult indeed.
Not at all. The physical location is not known to the dba, and the dba
has no control over it - just like a file's physical location on the
disk is unknown to you. However, the disk directory knows where the
file is.

The whole purpose of the index is to be able to get to the data very
quickly. And the fastest way is with a pointer to the data.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #13  
Old   
Doug Miller
 
Posts: n/a

Default Re: Partial sort? - 10-23-2010 , 07:13 AM



In article <i9tjj1$8mn$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle <jstucklex (AT) attglobal (DOT) net> wrote:
Quote:
On 10/22/2010 10:21 PM, Doug Miller wrote:
In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry
Stuckle<jstucklex (AT) attglobal (DOT) net> wrote:

Each index entry is directly related to a single row of data. The entry
has two parts: the data in the columns used to create the index, and a
pointer to the row's physical location in the database.

Close, but no cigar.

.. the data in the columns used to create the index, and the primary key of
the row.


Which is what I said.
Umm... no, that's not what you said. You wrote "... the row's physical
location in the database." That's most emphatically *not* the same as the
primary key.

Quote:
However, this may or may not be the primary key
(if a primary key even exists).

If the index used "a pointer to the row's physical location" that would make
the job of a DBA very difficult indeed.

Not at all. The physical location is not known to the dba, and the dba
has no control over it - just like a file's physical location on the
disk is unknown to you.
Which is exactly why a DBA's job would be very, very difficult if indexes
indeed included the physical location of each record in a table.

Quote:
However, the disk directory knows where the
file is.
Which is why indexes point to the primary key, not the physical location.
Quote:
The whole purpose of the index is to be able to get to the data very
quickly. And the fastest way is with a pointer to the data.
Exactly so. But that's *not* a pointer to the physical location of the data.

Reply With Quote
  #14  
Old   
Robert Hairgrove
 
Posts: n/a

Default Re: Partial sort? - 10-23-2010 , 07:42 AM



Doug Miller wrote:
Quote:
In article <i9tjj1$8mn$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle <jstucklex (AT) attglobal (DOT) net> wrote:
On 10/22/2010 10:21 PM, Doug Miller wrote:
In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry
Stuckle<jstucklex (AT) attglobal (DOT) net> wrote:

Each index entry is directly related to a single row of data. The entry
has two parts: the data in the columns used to create the index, and a
pointer to the row's physical location in the database.
Close, but no cigar.

.. the data in the columns used to create the index, and the primary key of
the row.

Which is what I said.

Umm... no, that's not what you said. You wrote "... the row's physical
location in the database." That's most emphatically *not* the same as the
primary key.

However, this may or may not be the primary key
(if a primary key even exists).

If the index used "a pointer to the row's physical location" that would make
the job of a DBA very difficult indeed.
Not at all. The physical location is not known to the dba, and the dba
has no control over it - just like a file's physical location on the
disk is unknown to you.

Which is exactly why a DBA's job would be very, very difficult if indexes
indeed included the physical location of each record in a table.

However, the disk directory knows where the
file is.

Which is why indexes point to the primary key, not the physical location.
The whole purpose of the index is to be able to get to the data very
quickly. And the fastest way is with a pointer to the data.

Exactly so. But that's *not* a pointer to the physical location of the data.
Since it is possible to create an index for column(s) of a table which
has no primary key, then the index must point to something else.

Reply With Quote
  #15  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Partial sort? - 10-23-2010 , 07:46 AM



On 10/23/2010 8:13 AM, Doug Miller wrote:
Quote:
In article<i9tjj1$8mn$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle<jstucklex (AT) attglobal (DOT) net> wrote:
On 10/22/2010 10:21 PM, Doug Miller wrote:
In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry
Stuckle<jstucklex (AT) attglobal (DOT) net> wrote:

Each index entry is directly related to a single row of data. The entry
has two parts: the data in the columns used to create the index, and a
pointer to the row's physical location in the database.

Close, but no cigar.

.. the data in the columns used to create the index, and the primary key of
the row.


Which is what I said.

Umm... no, that's not what you said. You wrote "... the row's physical
location in the database." That's most emphatically *not* the same as the
primary key.

No, I did say the data in the columns is used to create the index. But
the physical location of the data is also there.

Quote:
However, this may or may not be the primary key
(if a primary key even exists).

If the index used "a pointer to the row's physical location" that would make
the job of a DBA very difficult indeed.

Not at all. The physical location is not known to the dba, and the dba
has no control over it - just like a file's physical location on the
disk is unknown to you.

Which is exactly why a DBA's job would be very, very difficult if indexes
indeed included the physical location of each record in a table.

They do, and the actual value is immaterial to the DBA, just like a
file's physical location on the disk is immaterial (and unknown) to the
sysadmin.

Quote:
However, the disk directory knows where the
file is.

Which is why indexes point to the primary key, not the physical location.
There is no requirement that the table even have a primary key, so your
statement is provably false. And even if it did have a primary key, if
the primary key were separate from the rest of the data, how would the
system get to the rest of the data?

Quote:
The whole purpose of the index is to be able to get to the data very
quickly. And the fastest way is with a pointer to the data.

Exactly so. But that's *not* a pointer to the physical location of the data.
I can see you have never worked with the internals of a database. It is
a pointer to the physical location of the data. The example I gave
earlier is quite close to what MySQL does.

If you want to argue, I first suggest you spend a few days in the MySQL
source code.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #16  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: Partial sort? - 10-23-2010 , 09:25 AM



On Sat, 23 Oct 2010 08:46:41 -0400, Jerry Stuckle wrote:
Quote:
On 10/23/2010 8:13 AM, Doug Miller wrote:
Which is why indexes point to the primary key, not the physical location.

There is no requirement that the table even have a primary key, so your
statement is provably false. And even if it did have a primary key, if
the primary key were separate from the rest of the data, how would the
system get to the rest of the data?


The whole purpose of the index is to be able to get to the data very
quickly. And the fastest way is with a pointer to the data.

Exactly so. But that's *not* a pointer to the physical location of the data.

I can see you have never worked with the internals of a database. It is
a pointer to the physical location of the data. The example I gave
earlier is quite close to what MySQL does.
*grin* As an aside, this is one place where a little time on a
particular not-unix machine can also come in handy. Physical files on
the IBM System 38/AS-400/iSeries/i5 machines are tables, with structured
column/field in them but are programmatically accessable also by
"Relative Record Number". It's exactly what it sounds like: it's the
number of the record in the file, starting with 1 up to however many
records the file has ever contained since the last reorganization, in
the order of writing, mostly. (That is, if you tell the file it can
reuse rows, it will, so if you delete a row that happens to have been in
RRN 37, and insert a new row, it will end up in where 37 was, assigned
RRN 37, but bearing no other relation to the delete row and probably
"belong" elsewhere. But indexes take care of that part.)

This all happens in spite of primary keys and one of the routine
maintenance operations is that files that change a lot can be
"reorganized", which is an explicit rebuilding of the physical file
(obviously with exclusive access to the file during the rebuild),
purging out the delete rows and putting the rows in order by the primary
key(s).

Why I'm even talking about this is that it makes very clear (to i5
programmers, at least) what goes into the index entries along with keyed
values: the RRN. It also makes it clear why and when one shouldn't use
the RRN for anything, that the RRN is subject to change between openings
of the the file, and that smart kids let the OS-maintained indexes
keep track of such things for them. Part of the difficulty new people
have with databases and SQL in general is that SQL doesn't talk about
these concepts at all. Now, that's for good and sufficient reason:
it's keeping the n00bs from setting their hair on fire by trying to
use something like a RRN inappropriately (like trying to do manually
something that the database can handle already), and there's not been a
case to my knowlege where a well-chosen index wasn't good enough in
light of those risks.

--
14. The hero is not entitled to a last kiss, a last cigarette, or any
other form of last request.
--Peter Anspach's list of things to do as an Evil Overlord

Reply With Quote
  #17  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Partial sort? - 10-23-2010 , 03:03 PM



On 10/23/2010 10:25 AM, Peter H. Coffin wrote:
Quote:
On Sat, 23 Oct 2010 08:46:41 -0400, Jerry Stuckle wrote:
On 10/23/2010 8:13 AM, Doug Miller wrote:
Which is why indexes point to the primary key, not the physical location.

There is no requirement that the table even have a primary key, so your
statement is provably false. And even if it did have a primary key, if
the primary key were separate from the rest of the data, how would the
system get to the rest of the data?


The whole purpose of the index is to be able to get to the data very
quickly. And the fastest way is with a pointer to the data.

Exactly so. But that's *not* a pointer to the physical location of the data.

I can see you have never worked with the internals of a database. It is
a pointer to the physical location of the data. The example I gave
earlier is quite close to what MySQL does.

*grin* As an aside, this is one place where a little time on a
particular not-unix machine can also come in handy. Physical files on
the IBM System 38/AS-400/iSeries/i5 machines are tables, with structured
column/field in them but are programmatically accessable also by
"Relative Record Number". It's exactly what it sounds like: it's the
number of the record in the file, starting with 1 up to however many
records the file has ever contained since the last reorganization, in
the order of writing, mostly. (That is, if you tell the file it can
reuse rows, it will, so if you delete a row that happens to have been in
RRN 37, and insert a new row, it will end up in where 37 was, assigned
RRN 37, but bearing no other relation to the delete row and probably
"belong" elsewhere. But indexes take care of that part.)

This all happens in spite of primary keys and one of the routine
maintenance operations is that files that change a lot can be
"reorganized", which is an explicit rebuilding of the physical file
(obviously with exclusive access to the file during the rebuild),
purging out the delete rows and putting the rows in order by the primary
key(s).

Why I'm even talking about this is that it makes very clear (to i5
programmers, at least) what goes into the index entries along with keyed
values: the RRN. It also makes it clear why and when one shouldn't use
the RRN for anything, that the RRN is subject to change between openings
of the the file, and that smart kids let the OS-maintained indexes
keep track of such things for them. Part of the difficulty new people
have with databases and SQL in general is that SQL doesn't talk about
these concepts at all. Now, that's for good and sufficient reason:
it's keeping the n00bs from setting their hair on fire by trying to
use something like a RRN inappropriately (like trying to do manually
something that the database can handle already), and there's not been a
case to my knowlege where a well-chosen index wasn't good enough in
light of those risks.

No, it's not a RRN. That only works if you have fixed length records.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.