dbTalk Databases Forums  

Is ROLLUP really better than doing a nested query when rank is involved?

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss Is ROLLUP really better than doing a nested query when rank is involved? in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
csimam@gmail.com
 
Posts: n/a

Default Is ROLLUP really better than doing a nested query when rank is involved? - 09-03-2007 , 10:06 PM






Here is a thread I posted elsewhere that I wonder if someone here
could shed some light on...

The original question I had:

--- BEGIN ---
Which one is better?
1. To perform aggregate queries on data, and use that data to home in
and perform more detailed analysis? This gives result sets that are
potentially filled with columns that contain the aggregate data
repeated across many rows.
I meant here by homing in: within the same query - using
nested SELECTS. For example, the inner query performs the aggregate
analysis, and returns whatever data we are interested in, and then
the
outer query performs another SELECT on the same data, resulting in
the
aggregate data being distributed across 'detailed results' from the
outer query.

2. To perform the aggregate query once, and then hit the database
again for a detailed analysis, and return the detailed analysis
results in an order that is easily cross-correlated with the original
aggregate data? This means running the aggregate query essentially
twice, just the second time it is hidden as a subquery.
--- END ---

I realize the question was vague, but a super DBA understood what I
was asking and suggested I study up on ROLLUP and CUBE (I am not a DBA
by the way). So I did... and here is what I found:

--- BEGIN ---
The data I have, I want to generate totals and subtotals, say of
sales: total sales per store, and subtotal of sales per store per day
for example.

The problem is that I ultimately only want the top 25 stores in total
sales BUT with their subtotals as well... So I used ROLLUP to
generate
the totals and subtotals... and in the same query I generated a
ranking of the data... and there I found my problem: the ranking does
not distinguish between totals and subtotals. So the results look
like:

store1, realTotal, 1
store1, subTotal, 2
store2, realTotal, 3
store7, realTotal, 4
store10, realTotal, 5
store10, subTotal, 6

Clearly, the rows with subtotals should NOT be ranked... but I am
struggling with SQL to figure out how to get exactly what I want.

Well... I have a solution. But I'm not sure about it. It involves
something I learned about only recently: a user-defined aggregate
function that concatenates values from across rows. Using this
aggregate function, stragg, I can do another pass through the results
mentioned above while grouping by store, and produce:

store1, 'realTotal subTotal', 1
store2, 'realTotal subTotal', 3
store7, 'realTotal subTotal', 4
store10, 'realTotal subTotal', 5

And another trick: use min( rank ) in the same aggregation, and order
by it as well.

Together that gives one a very compact result set that has everything
ranked and grouped perfectly.

....but... a worry: is this REALLY better than doing a nested query? I
wonder if the use of the user-defined aggregate function is very
expensive...

For the benefit of anyone wondering, here's the structure that
works... in one pass in theory... but again, I am now wondering if it
really is faster than the nested query equivalent that hits the
database twice on the table 'stores' (seems to me that the middle
select below is effectively hitting the table again, just in
temporary
form):

select rownum as rank, id, days, totals
from
(
select id, stragg(day) as days, stragg(total) as totals,
min( rank ) as innerrank
from
(
select id, day, sum(sales) as total, rank() over (order by
sum(sales) desc) rank
from stores
group by id, rollup(day)
)
group by id
order by min(rank)
)
where rownum <= 25

--- END ---

Any comments? Is this query really better than the following:

-- get totals by day
select rank, id, day, total, sum(sales) as dailytotal
from
(
select * from
(
-- get grand totals regardless of day
select id as storeid, sum( sales ) as total,
rank() over (order by sum(sales) desc) rank
from stores
group by id
)
where rank <= 25
), stores
-- NOTE join with stores again
where id = storeid
-- but does the where condition above clip the query and make it
efficient?
group by rank, id, day, total
order by rank, day


Reply With Quote
  #2  
Old   
Jason Lepack
 
Posts: n/a

Default Re: Is ROLLUP really better than doing a nested query when rank is involved? - 09-04-2007 , 04:09 PM






What does the query plan show for the results of these two queries?
you keep talking about somethat should, in theory be correct, but why
not check if it ACTUALLY does this in reality... That would the
easiest way to compare the queries, because that's where it matters.

On Sep 3, 11:06 pm, csi... (AT) gmail (DOT) com wrote:
Quote:
Here is a thread I posted elsewhere that I wonder if someone here
could shed some light on...

The original question I had:

--- BEGIN ---
Which one is better?
1. To perform aggregate queries on data, and use that data to home in
and perform more detailed analysis? This gives result sets that are
potentially filled with columns that contain the aggregate data
repeated across many rows.
I meant here by homing in: within the same query - using
nested SELECTS. For example, the inner query performs the aggregate
analysis, and returns whatever data we are interested in, and then
the
outer query performs another SELECT on the same data, resulting in
the
aggregate data being distributed across 'detailed results' from the
outer query.

2. To perform the aggregate query once, and then hit the database
again for a detailed analysis, and return the detailed analysis
results in an order that is easily cross-correlated with the original
aggregate data? This means running the aggregate query essentially
twice, just the second time it is hidden as a subquery.
--- END ---

I realize the question was vague, but a super DBA understood what I
was asking and suggested I study up on ROLLUP and CUBE (I am not a DBA
by the way). So I did... and here is what I found:

--- BEGIN ---
The data I have, I want to generate totals and subtotals, say of
sales: total sales per store, and subtotal of sales per store per day
for example.

The problem is that I ultimately only want the top 25 stores in total
sales BUT with their subtotals as well... So I used ROLLUP to
generate
the totals and subtotals... and in the same query I generated a
ranking of the data... and there I found my problem: the ranking does
not distinguish between totals and subtotals. So the results look
like:

store1, realTotal, 1
store1, subTotal, 2
store2, realTotal, 3
store7, realTotal, 4
store10, realTotal, 5
store10, subTotal, 6

Clearly, the rows with subtotals should NOT be ranked... but I am
struggling with SQL to figure out how to get exactly what I want.

Well... I have a solution. But I'm not sure about it. It involves
something I learned about only recently: a user-defined aggregate
function that concatenates values from across rows. Using this
aggregate function, stragg, I can do another pass through the results
mentioned above while grouping by store, and produce:

store1, 'realTotal subTotal', 1
store2, 'realTotal subTotal', 3
store7, 'realTotal subTotal', 4
store10, 'realTotal subTotal', 5

And another trick: use min( rank ) in the same aggregation, and order
by it as well.

Together that gives one a very compact result set that has everything
ranked and grouped perfectly.

...but... a worry: is this REALLY better than doing a nested query? I
wonder if the use of the user-defined aggregate function is very
expensive...

For the benefit of anyone wondering, here's the structure that
works... in one pass in theory... but again, I am now wondering if it
really is faster than the nested query equivalent that hits the
database twice on the table 'stores' (seems to me that the middle
select below is effectively hitting the table again, just in
temporary
form):

select rownum as rank, id, days, totals
from
(
select id, stragg(day) as days, stragg(total) as totals,
min( rank ) as innerrank
from
(
select id, day, sum(sales) as total, rank() over (order by
sum(sales) desc) rank
from stores
group by id, rollup(day)
)
group by id
order by min(rank)
)
where rownum <= 25

--- END ---

Any comments? Is this query really better than the following:

-- get totals by day
select rank, id, day, total, sum(sales) as dailytotal
from
(
select * from
(
-- get grand totals regardless of day
select id as storeid, sum( sales ) as total,
rank() over (order by sum(sales) desc) rank
from stores
group by id
)
where rank <= 25
), stores
-- NOTE join with stores again
where id = storeid
-- but does the where condition above clip the query and make it
efficient?
group by rank, id, day, total
order by rank, day



Reply With Quote
  #3  
Old   
csimam@gmail.com
 
Posts: n/a

Default Re: Is ROLLUP really better than doing a nested query when rank is involved? - 09-04-2007 , 09:09 PM



On Sep 5, 12:09 am, Jason Lepack <jlep... (AT) gmail (DOT) com> wrote:
Quote:
What does the query plan show for the results of these two queries?
you keep talking about somethat should, in theory be correct, but why
not check if it ACTUALLY does this in reality... That would the
easiest way to compare the queries, because that's where it matters.

You are right, and I figured I needed to do that. Since I am not a DBA
though, I am still learning how to do that: check the query plan...



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.