dbTalk Databases Forums  

Partial sort?

comp.databases.mysql comp.databases.mysql


Discuss Partial sort? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Roy Smith
 
Posts: n/a

Default Partial sort? - 10-22-2010 , 07:31 AM






In a query such as:

SELECT id FROM foo WHERE status = 'NORMAL' ORDER BY id LIMIT k

(status is an enum, with an index)

Does MySQL select all the rows which meet the WHERE condition, sort them
all, then pick the top k? Or is it smart enough to optimize the
combined ORDER & LIMIT operations into a single partial sort which runs
in O(n log k), as opposed to O(n log n)?

Reply With Quote
  #2  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 08:03 AM






On Oct 22, 1:31*pm, Roy Smith <r... (AT) panix (DOT) com> wrote:
Quote:
In a query such as:

SELECT id FROM foo WHERE status = 'NORMAL' ORDER BY id LIMIT k

(status is an enum, with an index)

Does MySQL select all the rows which meet the WHERE condition, sort them
all, then pick the top k? *Or is it smart enough to optimize the
combined ORDER & LIMIT operations into a single partial sort which runs
in O(n log k), as opposed to O(n log n)?
If the index was status + id, then it could do it without needing to
sort at all.

Reply With Quote
  #3  
Old   
Erick T. Barkhuis
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 09:08 AM



Captain Paralytic:

Quote:
On Oct 22, 1:31*pm, Roy Smith <r... (AT) panix (DOT) com> wrote:
In a query such as:

SELECT id FROM foo WHERE status = 'NORMAL' ORDER BY id LIMIT k

(status is an enum, with an index)

Does MySQL select all the rows which meet the WHERE condition,
sort them all, then pick the top k?

If the index was status + id, then it could do it without needing to
sort at all.
How would MySql, using that index, know what the greatest or smallest
10 id's are without sorting them? Is an index always already sorted?

[I'm not saying it isn't...I just don't know and would like to
hear some more]


--
Erick

Reply With Quote
  #4  
Old   
Willem Bogaerts
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 09:27 AM



Quote:
How would MySql, using that index, know what the greatest or smallest
10 id's are without sorting them? Is an index always already sorted?
That is the whole purpose of an index. It speeds up searching
tremendously, as you can do a binary search.

(See http://www.topcoder.com/tc?module=St...2=binarySearch )

Best regards,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Reply With Quote
  #5  
Old   
The Natural Philosopher
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 10:55 AM



Erick T. Barkhuis wrote:
Quote:
Captain Paralytic:

On Oct 22, 1:31 pm, Roy Smith <r... (AT) panix (DOT) com> wrote:
In a query such as:

SELECT id FROM foo WHERE status = 'NORMAL' ORDER BY id LIMIT k

(status is an enum, with an index)

Does MySQL select all the rows which meet the WHERE condition,
sort them all, then pick the top k?

If the index was status + id, then it could do it without needing to
sort at all.

How would MySql, using that index, know what the greatest or smallest
10 id's are without sorting them? Is an index always already sorted?

[I'm not saying it isn't...I just don't know and would like to
hear some more]


its sort of sorted yes. Broadly that's what index MEANS.

Reply With Quote
  #6  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 11:04 AM



On 10/22/2010 11:55 AM, The Natural Philosopher wrote:
Quote:
Erick T. Barkhuis wrote:
Captain Paralytic:

On Oct 22, 1:31 pm, Roy Smith <r... (AT) panix (DOT) com> wrote:
In a query such as:

SELECT id FROM foo WHERE status = 'NORMAL' ORDER BY id LIMIT k

(status is an enum, with an index)

Does MySQL select all the rows which meet the WHERE condition,
sort them all, then pick the top k?

If the index was status + id, then it could do it without needing to
sort at all.

How would MySql, using that index, know what the greatest or smallest
10 id's are without sorting them? Is an index always already sorted?

[I'm not saying it isn't...I just don't know and would like to hear
some more]


its sort of sorted yes. Broadly that's what index MEANS.
You don't have "sort of sorted" - it's either sorted or not sorted.
Indexes are ALWAYS sorted - that's their purpose.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #7  
Old   
Erick T. Barkhuis
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 11:53 AM



Willem Bogaerts:

Quote:
How would MySql, using that index, know what the greatest or
smallest 10 id's are without sorting them? Is an index always
already sorted?

That is the whole purpose of an index. It speeds up searching
tremendously, as you can do a binary search.

(See
http://www.topcoder.com/tc?module=St...2=binarySearch
)

Please allow me to ask a bit further. I'm still not quite sure how it
works. I thought that the purpose of indexes was to enable the database
software to directly find a record in the disk structure.

Suppose we have these records (never mind the symantics):

id date city
-------------------------
12 2009-08-12 London
14 2009-07-16 Athens
15 2010-01-10 Paris


Now, if I have an index on 'city', what do I actually have?
- just an array [0=>Athens, 1=>London, 2=>Paris] ?
- an array [Athens=>???, London=>???, Paris=>???] ?
- the full table ordered by 'city'?
- something else?

If it's the first mentioned structure, what good would that do to
direct searches as in:
SELECT date FROM mytable WHERE city="London"
?
Just the first array wouldn't speed up the search, so I suppose there's
more in the index than just the values in a proper order.



--
Erick

Reply With Quote
  #8  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 01:10 PM



On 2010-10-22 18:53, Erick T. Barkhuis wrote:
[...]
Quote:
Suppose we have these records (never mind the symantics):

id date city
-------------------------
12 2009-08-12 London
14 2009-07-16 Athens
15 2010-01-10 Paris


Now, if I have an index on 'city', what do I actually have?
- just an array [0=>Athens, 1=>London, 2=>Paris] ?
- an array [Athens=>???, London=>???, Paris=>???] ?
- the full table ordered by 'city'?
- something else?

Think of the index as a sorted tree where the leaf nodes point to a page
in the data. You will typically find the row that match your predicate
with log(n) i/o operations, and then fetch any columns not part of the
index. For a large table this will save a lot of time, for a small table
the optimizer would probably choose to ignore the index and scan the
table instead.

/Lennart

[...]

Reply With Quote
  #9  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 02:18 PM



On 10/22/2010 2:10 PM, Lennart Jonsson wrote:
Quote:
On 2010-10-22 18:53, Erick T. Barkhuis wrote:
[...]

Suppose we have these records (never mind the symantics):

id date city
-------------------------
12 2009-08-12 London
14 2009-07-16 Athens
15 2010-01-10 Paris


Now, if I have an index on 'city', what do I actually have?
- just an array [0=>Athens, 1=>London, 2=>Paris] ?
- an array [Athens=>???, London=>???, Paris=>???] ?
- the full table ordered by 'city'?
- something else?


Think of the index as a sorted tree where the leaf nodes point to a page
in the data. You will typically find the row that match your predicate
with log(n) i/o operations, and then fetch any columns not part of the
index. For a large table this will save a lot of time, for a small table
the optimizer would probably choose to ignore the index and scan the
table instead.

/Lennart

[...]
Erick, an index isn't an array at all (an array is a programming
concept, not a file concept). If anything, an index would be closer to
rows in a database or spreadsheet. And while the exact implementation
details may vary from one database manufacturer to another, the concept
is the same.

Each index entry is directly related to a single row of data. The entry
has two parts: the data in the columns used to create the index, and a
pointer to the row's physical location in the database.

Entries in the index are sorted by the indexed columns. This allows very
fast access to the a matching index entry via a binary sort. And once
the index entry is found, there is a direct pointer to the data.

Now, when a new row is added to the table, it can be places anywhere
there is space in the table. The database then builds an index entry
and adds it in the appropriate place. As this is almost never at the
end of the index, any entries following the location of the new index
entry may have to be read in and rewritten (it depends on the exact
implementation details - but some entries always need to be rewritten).
The same is true of an UPDATE which affects the index column(s) or a
DELETE. This is why you don't index everything - while retrieval may be
faster, other operations more time for each index you have.

So let's take an example - for simplicity, assume rows of a fixed size
of 50 bytes (variable row size makes no difference):

Offset id
0 114
50 23
100 free space
150 88
200 free space
250 42

An index on the id would have:

id offset
23 50
42 250
88 150
114 0

Now obviously there isn't a lot of advantage to an index on a table with
only 4 rows, but assume there were 4,000,000 rows.

To get to id 42 (because it happens to be the last) without an index, we
would have to read 6 rows (table scan). id 88 would require 4 rows, etc.

But doing a binary search on the index, we need a maximum of 3 reads to
get to any id, then we can directly access the data via the offset.

If we had 1M rows, we would have a maximum of 1M rows to read to get the
data we want, with an average of 500K rows. But with a binary search of
the index, we only need a maximum of 21 reads to get to the appropriate
index entry, then one read for the data itself.

Does this help?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #10  
Old   
Doug Miller
 
Posts: n/a

Default Re: Partial sort? - 10-22-2010 , 09:16 PM



In article <8idk6iFb6pU1 (AT) mid (DOT) individual.net>, "Erick T. Barkhuis" <erick.use-net (AT) ardane (DOT) c.o.m> wrote:

Quote:
How would MySql, using that index, know what the greatest or smallest
10 id's are without sorting them? Is an index always already sorted?
An index wouldn't be much use if it weren't sorted. Same concept exactly as
the index in a book.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.