dbTalk Databases Forums  

Why does this work?

comp.databases.mysql comp.databases.mysql


Discuss Why does this work? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Captain Paralytic
 
Posts: n/a

Default Why does this work? - 02-19-2011 , 06:57 AM






I'm hoping that Axel can answer this one for me.

I have a need to add row numbers to the result set of a query, to give
ranking within a particular sort order. A sample query (in practice
there are a lot more columns extracted and calculated and which may be
used as a sort column) is:

SELECT
@downloads_per_month := total_downloads/2 downloads_per_month,
FORMAT(@downloads_per_month,GREATEST(PERIOD_DIFF(E XTRACT(YEAR_MONTH
FROM NOW()),EXTRACT(YEAR_MONTH FROM member_since)),1))
downloads_per_month_display,
@start := @start + 1 position
FROM contributors2
JOIN (SELECT @start:=0) as start
ORDER BY downloads_per_month
LIMIT 20

Now the above query correctly numbers the rows from 1 to 20.
However, if instead I sort on downloads_per_month_display, the row
numbering goes haywire. An EXPLAIN shows that the "good" query uses
filesort, whereas the "bad" one uses temporary and then filesort.

Now what I don't get is why the "good" query actually works. Since the
value of downloads_per_month isn't known until it is extracted, the
query cannot use an index to pull the rows in the correct order. I
would have expected that MySQL would pull the data for a row,
calculate all the calculated columns (including the position one) and
then once it had them all, sort them and take the first 20. This
should mean that, unless I nested the query and did the numbering in
the outer query, the row numbers would always be wrong.

So how does it get the row numbers correct in this case?

Reply With Quote
  #2  
Old   
Tony Mountifield
 
Posts: n/a

Default Re: Why does this work? - 02-19-2011 , 09:15 AM






In article <15d5097d-f220-4b59-8742-d740f32a51ae (AT) l15g2000yqe (DOT) googlegroups.com>,
Captain Paralytic <paul_lautman (AT) yahoo (DOT) com> wrote:
Quote:
I'm hoping that Axel can answer this one for me.

I have a need to add row numbers to the result set of a query, to give
ranking within a particular sort order. A sample query (in practice
there are a lot more columns extracted and calculated and which may be
used as a sort column) is:

SELECT
@downloads_per_month := total_downloads/2 downloads_per_month,
FORMAT(@downloads_per_month,GREATEST(PERIOD_DIFF(E XTRACT(YEAR_MONTH
FROM NOW()),EXTRACT(YEAR_MONTH FROM member_since)),1))
downloads_per_month_display,
@start := @start + 1 position
FROM contributors2
JOIN (SELECT @start:=0) as start
ORDER BY downloads_per_month
LIMIT 20

Now the above query correctly numbers the rows from 1 to 20.
However, if instead I sort on downloads_per_month_display, the row
numbering goes haywire. An EXPLAIN shows that the "good" query uses
filesort, whereas the "bad" one uses temporary and then filesort.

Now what I don't get is why the "good" query actually works. Since the
value of downloads_per_month isn't known until it is extracted, the
query cannot use an index to pull the rows in the correct order. I
would have expected that MySQL would pull the data for a row,
calculate all the calculated columns (including the position one) and
then once it had them all, sort them and take the first 20. This
should mean that, unless I nested the query and did the numbering in
the outer query, the row numbers would always be wrong.

So how does it get the row numbers correct in this case?
I'm not Axel, but my guess would be as follows:

1. It fetches the required columns from each row, viz. total_downloads
and member_since, and sorts the rows in order based on downloads_per_month.

2. It then delivers the rows, calculating the @variables as it goes,
including any dependent columns, such as the downloads_per_month_display
column that is dependent on the value of @downloads_per_month. Since
@start is calculated as the rows are delivered, it comes out in the
correct order.

3. If you instead sort on downloads_per_month_display, it needs to deliver
all the rows somewhere (temporary), to generate the dependent columns
before it can do the sort. So by the time it does the sort, it has already
calculated the @start value for each row.

4. You don't need to use @downloads_per_month as a temporary variable. I
think if you just repeat total_downloads/2 instead, it should work, since
it can calculate the value of downloads_per_month_display and sort on it
without having to wait for a variable that is only calculated during
initial row delivery. However, you might still get unexpected results due
to the commas that get inserted by FORMAT. In fact, I can't understand
what your FORMAT is doing - it looks like you are using the number of
months as the number of decimal places! Did you mean to put a '/' before
GREATEST rather than a ','? I would remove FORMAT and do the
formatting either in a wrapper query or else at the application layer
(e.g. with number_format() in PHP).

SELECT
total_downloads/2 downloads_per_month,
(total_downloads/2)/GREATEST(
PERIOD_DIFF(
EXTRACT(YEAR_MONTH FROM NOW()),
EXTRACT(YEAR_MONTH FROM member_since)
),
1
) downloads_per_month_display,
@start := @start + 1 position
FROM contributors2
JOIN (SELECT @start:=0) as start
ORDER BY downloads_per_month_display
LIMIT 20

I haven't tried it, since I lack a data set to try it on.

Cheers
Tony
--
Tony Mountifield
Work: tony (AT) softins (DOT) co.uk - http://www.softins.co.uk
Play: tony (AT) mountifield (DOT) org - http://tony.mountifield.org

Reply With Quote
  #3  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: Why does this work? - 02-19-2011 , 01:18 PM



tony (AT) mountifield (DOT) org (Tony Mountifield) wrote:
Quote:
Captain Paralytic <paul_lautman (AT) yahoo (DOT) com> wrote:

SELECT
@downloads_per_month := total_downloads/2 downloads_per_month,
FORMAT(@downloads_per_month,GREATEST(PERIOD_DIFF(E XTRACT(YEAR_MONTH
FROM NOW()),EXTRACT(YEAR_MONTH FROM member_since)),1))
downloads_per_month_display,
@start := @start + 1 position
FROM contributors2
JOIN (SELECT @start:=0) as start
ORDER BY downloads_per_month
LIMIT 20

Now the above query correctly numbers the rows from 1 to 20.
However, if instead I sort on downloads_per_month_display, the row
numbering goes haywire.

I'm not Axel
But I am!

Quote:
but my guess would be as follows:
No need to guess. Quoting from

http://dev.mysql.com/doc/refman/5.1/...variables.html

"As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the results
you expect, but this is not guaranteed."

So indeed Paul relies on undefined behavior. Also

"In a SELECT statement, each select expression is evaluated only when
sent to the client. This means that in a HAVING, GROUP BY, or ORDER BY
clause, referring to a variable that is assigned a value in the select
expression list does not work as expected"

This second statement has to be taken with a pinch of salt, because
expressions are also evaluated if the result goes to a temporary table.
But we can learn that expressions mith be evaluated late in query
execution, after filtering and maybe even sorting rows.

So if a query requires a temporary table but one still wants to have
line numbers, one should use a "derived table" subquery like so:

SELECT @start:=@start+1 AS position, inner.*
FROM (/* the original query, including SORT, LIMIT */) AS inner
JOIN (SELECT @start:=0) as start

BTW, that bogus JOIN to initialize @start is a nice trick. Haven't
seen that before! Of course this also relies on undefined behavior -
the JOIN order. Better use STRAIGHT_JOIN and put the initializer
part first.


XL

Reply With Quote
  #4  
Old   
The Natural Philosopher
 
Posts: n/a

Default Re: Why does this work? - 02-19-2011 , 06:43 PM



Axel Schwenke wrote:
Quote:
tony (AT) mountifield (DOT) org (Tony Mountifield) wrote:
Captain Paralytic <paul_lautman (AT) yahoo (DOT) com> wrote:

SELECT
@downloads_per_month := total_downloads/2 downloads_per_month,
FORMAT(@downloads_per_month,GREATEST(PERIOD_DIFF(E XTRACT(YEAR_MONTH
FROM NOW()),EXTRACT(YEAR_MONTH FROM member_since)),1))
downloads_per_month_display,
@start := @start + 1 position
FROM contributors2
JOIN (SELECT @start:=0) as start
ORDER BY downloads_per_month
LIMIT 20

Now the above query correctly numbers the rows from 1 to 20.
However, if instead I sort on downloads_per_month_display, the row
numbering goes haywire.
I'm not Axel

But I am!

but my guess would be as follows:

No need to guess. Quoting from

http://dev.mysql.com/doc/refman/5.1/...variables.html

"As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the results
you expect, but this is not guaranteed."

So indeed Paul relies on undefined behavior. Also

"In a SELECT statement, each select expression is evaluated only when
sent to the client. This means that in a HAVING, GROUP BY, or ORDER BY
clause, referring to a variable that is assigned a value in the select
expression list does not work as expected"

This second statement has to be taken with a pinch of salt, because
expressions are also evaluated if the result goes to a temporary table.
But we can learn that expressions mith be evaluated late in query
execution, after filtering and maybe even sorting rows.

So if a query requires a temporary table but one still wants to have
line numbers, one should use a "derived table" subquery like so:

SELECT @start:=@start+1 AS position, inner.*
FROM (/* the original query, including SORT, LIMIT */) AS inner
JOIN (SELECT @start:=0) as start

BTW, that bogus JOIN to initialize @start is a nice trick. Haven't
seen that before! Of course this also relies on undefined behavior -
the JOIN order. Better use STRAIGHT_JOIN and put the initializer
part first.


XL
Something similar bit me when I wanted to sort things on a date field.
They were in very odd order till I realised I had a
select...function(date) as date, order by date...

and when I looked at it, even I couldn't work out what the correct
behaviour out to be. ;-)

That's what happens when you make changes under pressure..without
reading the code through first..

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.