![]() | |
#11
| |||
| |||
|
|
On 10/22/2010 2:10 PM, Lennart Jonsson wrote: On 2010-10-22 18:53, Erick T. Barkhuis wrote: [...] Suppose we have these records (never mind the symantics): id date city ------------------------- 12 2009-08-12 London 14 2009-07-16 Athens 15 2010-01-10 Paris Now, if I have an index on 'city', what do I actually have? - just an array [0=>Athens, 1=>London, 2=>Paris] ? - an array [Athens=>???, London=>???, Paris=>???] ? - the full table ordered by 'city'? - something else? Think of the index as a sorted tree where the leaf nodes point to a page in the data. You will typically find the row that match your predicate with log(n) i/o operations, and then fetch any columns not part of the index. For a large table this will save a lot of time, for a small table the optimizer would probably choose to ignore the index and scan the table instead. /Lennart [...] Erick, an index isn't an array at all (an array is a programming concept, not a file concept). If anything, an index would be closer to rows in a database or spreadsheet. And while the exact implementation details may vary from one database manufacturer to another, the concept is the same. Each index entry is directly related to a single row of data. The entry has two parts: the data in the columns used to create the index, and a pointer to the row's physical location in the database. |
#12
| |||
| |||
|
|
In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle<jstucklex (AT) attglobal (DOT) net> wrote: On 10/22/2010 2:10 PM, Lennart Jonsson wrote: On 2010-10-22 18:53, Erick T. Barkhuis wrote: [...] Suppose we have these records (never mind the symantics): id date city ------------------------- 12 2009-08-12 London 14 2009-07-16 Athens 15 2010-01-10 Paris Now, if I have an index on 'city', what do I actually have? - just an array [0=>Athens, 1=>London, 2=>Paris] ? - an array [Athens=>???, London=>???, Paris=>???] ? - the full table ordered by 'city'? - something else? Think of the index as a sorted tree where the leaf nodes point to a page in the data. You will typically find the row that match your predicate with log(n) i/o operations, and then fetch any columns not part of the index. For a large table this will save a lot of time, for a small table the optimizer would probably choose to ignore the index and scan the table instead. /Lennart [...] Erick, an index isn't an array at all (an array is a programming concept, not a file concept). If anything, an index would be closer to rows in a database or spreadsheet. And while the exact implementation details may vary from one database manufacturer to another, the concept is the same. Each index entry is directly related to a single row of data. The entry has two parts: the data in the columns used to create the index, and a pointer to the row's physical location in the database. Close, but no cigar. .. the data in the columns used to create the index, and the primary key of the row. |
|
If the index used "a pointer to the row's physical location" that would make the job of a DBA very difficult indeed. |
#13
| ||||
| ||||
|
|
On 10/22/2010 10:21 PM, Doug Miller wrote: In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle<jstucklex (AT) attglobal (DOT) net> wrote: Each index entry is directly related to a single row of data. The entry has two parts: the data in the columns used to create the index, and a pointer to the row's physical location in the database. Close, but no cigar. .. the data in the columns used to create the index, and the primary key of the row. Which is what I said. |
|
However, this may or may not be the primary key (if a primary key even exists). If the index used "a pointer to the row's physical location" that would make the job of a DBA very difficult indeed. Not at all. The physical location is not known to the dba, and the dba has no control over it - just like a file's physical location on the disk is unknown to you. |
|
However, the disk directory knows where the file is. |
|
The whole purpose of the index is to be able to get to the data very quickly. And the fastest way is with a pointer to the data. |
#14
| |||
| |||
|
|
In article <i9tjj1$8mn$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle <jstucklex (AT) attglobal (DOT) net> wrote: On 10/22/2010 10:21 PM, Doug Miller wrote: In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle<jstucklex (AT) attglobal (DOT) net> wrote: Each index entry is directly related to a single row of data. The entry has two parts: the data in the columns used to create the index, and a pointer to the row's physical location in the database. Close, but no cigar. .. the data in the columns used to create the index, and the primary key of the row. Which is what I said. Umm... no, that's not what you said. You wrote "... the row's physical location in the database." That's most emphatically *not* the same as the primary key. However, this may or may not be the primary key (if a primary key even exists). If the index used "a pointer to the row's physical location" that would make the job of a DBA very difficult indeed. Not at all. The physical location is not known to the dba, and the dba has no control over it - just like a file's physical location on the disk is unknown to you. Which is exactly why a DBA's job would be very, very difficult if indexes indeed included the physical location of each record in a table. However, the disk directory knows where the file is. Which is why indexes point to the primary key, not the physical location. The whole purpose of the index is to be able to get to the data very quickly. And the fastest way is with a pointer to the data. Exactly so. But that's *not* a pointer to the physical location of the data. |
#15
| ||||
| ||||
|
|
In article<i9tjj1$8mn$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle<jstucklex (AT) attglobal (DOT) net> wrote: On 10/22/2010 10:21 PM, Doug Miller wrote: In article<i9so1r$vko$1 (AT) news (DOT) eternal-september.org>, Jerry Stuckle<jstucklex (AT) attglobal (DOT) net> wrote: Each index entry is directly related to a single row of data. The entry has two parts: the data in the columns used to create the index, and a pointer to the row's physical location in the database. Close, but no cigar. .. the data in the columns used to create the index, and the primary key of the row. Which is what I said. Umm... no, that's not what you said. You wrote "... the row's physical location in the database." That's most emphatically *not* the same as the primary key. |
|
However, this may or may not be the primary key (if a primary key even exists). If the index used "a pointer to the row's physical location" that would make the job of a DBA very difficult indeed. Not at all. The physical location is not known to the dba, and the dba has no control over it - just like a file's physical location on the disk is unknown to you. Which is exactly why a DBA's job would be very, very difficult if indexes indeed included the physical location of each record in a table. |
|
However, the disk directory knows where the file is. Which is why indexes point to the primary key, not the physical location. |
|
The whole purpose of the index is to be able to get to the data very quickly. And the fastest way is with a pointer to the data. Exactly so. But that's *not* a pointer to the physical location of the data. |
#16
| |||
| |||
|
|
On 10/23/2010 8:13 AM, Doug Miller wrote: Which is why indexes point to the primary key, not the physical location. There is no requirement that the table even have a primary key, so your statement is provably false. And even if it did have a primary key, if the primary key were separate from the rest of the data, how would the system get to the rest of the data? The whole purpose of the index is to be able to get to the data very quickly. And the fastest way is with a pointer to the data. Exactly so. But that's *not* a pointer to the physical location of the data. I can see you have never worked with the internals of a database. It is a pointer to the physical location of the data. The example I gave earlier is quite close to what MySQL does. |
#17
| |||
| |||
|
|
On Sat, 23 Oct 2010 08:46:41 -0400, Jerry Stuckle wrote: On 10/23/2010 8:13 AM, Doug Miller wrote: Which is why indexes point to the primary key, not the physical location. There is no requirement that the table even have a primary key, so your statement is provably false. And even if it did have a primary key, if the primary key were separate from the rest of the data, how would the system get to the rest of the data? The whole purpose of the index is to be able to get to the data very quickly. And the fastest way is with a pointer to the data. Exactly so. But that's *not* a pointer to the physical location of the data. I can see you have never worked with the internals of a database. It is a pointer to the physical location of the data. The example I gave earlier is quite close to what MySQL does. *grin* As an aside, this is one place where a little time on a particular not-unix machine can also come in handy. Physical files on the IBM System 38/AS-400/iSeries/i5 machines are tables, with structured column/field in them but are programmatically accessable also by "Relative Record Number". It's exactly what it sounds like: it's the number of the record in the file, starting with 1 up to however many records the file has ever contained since the last reorganization, in the order of writing, mostly. (That is, if you tell the file it can reuse rows, it will, so if you delete a row that happens to have been in RRN 37, and insert a new row, it will end up in where 37 was, assigned RRN 37, but bearing no other relation to the delete row and probably "belong" elsewhere. But indexes take care of that part.) This all happens in spite of primary keys and one of the routine maintenance operations is that files that change a lot can be "reorganized", which is an explicit rebuilding of the physical file (obviously with exclusive access to the file during the rebuild), purging out the delete rows and putting the rows in order by the primary key(s). Why I'm even talking about this is that it makes very clear (to i5 programmers, at least) what goes into the index entries along with keyed values: the RRN. It also makes it clear why and when one shouldn't use the RRN for anything, that the RRN is subject to change between openings of the the file, and that smart kids let the OS-maintained indexes keep track of such things for them. Part of the difficulty new people have with databases and SQL in general is that SQL doesn't talk about these concepts at all. Now, that's for good and sufficient reason: it's keeping the n00bs from setting their hair on fire by trying to use something like a RRN inappropriately (like trying to do manually something that the database can handle already), and there's not been a case to my knowlege where a well-chosen index wasn't good enough in light of those risks. |
![]() |
| Thread Tools | |
| Display Modes | |
| |