![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hello, I'd like to start by saying I'm not a DBA, rather I am a Java developer. But I'm learning more and more about database design, and it's very interesting. Here is the general issue I'm facing: Which one is better? 1. To perform aggregate queries on data, and use that data to home in and perform more detailed analysis? This gives result sets that are potentially filled with columns that contain the aggregate data repeated across many rows. |
#3
| |||
| |||
|
|
Which one is better? 1. To perform aggregate queries on data, and use that data to home in and perform more detailed analysis? This gives result sets that are potentially filled with columns that contain the aggregate data repeated across many rows. |
|
By the way, I meant here by homing in: within the same query - using nested SELECTS. For example, the inner query performs the aggregate analysis, and returns whatever data we are interested in, and then the outer query performs another SELECT on the same data, resulting in the aggregate data being distributed across 'detailed results' from the outer query. |
#4
| |||
| |||
|
|
The best is of course to get all at once in the same result set, since then you only need to scan the data once. in SQL 2005 you can achieve this with the proprietary extensions WITH CUBE and WITH ROLLUP to the GROUP BY clause. SQL 2008, currently in beta, implements GROUPING SETS from the ANSI standard, and are a lot more powerful. (They are also a lot more confusing to use.) |
#5
| |||
| |||
|
|
The data I have, I want to generate totals and subtotals, say of sales: total sales per store, and subtotal of sales per store per day for example. The problem is that I ultimately only want the top 25 stores in total sales BUT with their subtotals as well... So I used ROLLUP to generate the totals and subtotals... and in the same query I generated a ranking of the data... and there I found my problem: the ranking does not distinguish between totals and subtotals. So the results look like: store1, realTotal, 1 store1, subTotal, 2 store2, realTotal, 3 store7, realTotal, 4 store10, subTotal, 5 Clearly, the rows with subtotals should NOT be ranked... but I am struggling with SQL to figure out how to get exactly what I want. |
#6
| |||
| |||
|
|
The problem is that I ultimately only want the top 25 stores in total sales BUT with their subtotals as well... So I used ROLLUP to generate the totals and subtotals... and in the same query I generated a ranking of the data... and there I found my problem: the ranking does not distinguish between totals and subtotals. So the results look like: store1, realTotal, 1 store1, subTotal, 2 store2, realTotal, 3 store7, realTotal, 4 store10, subTotal, 5 Clearly, the rows with subtotals should NOT be ranked... but I am struggling with SQL to figure out how to get exactly what I want. |
|
Well... I have a solution. But I'm not sure about it. It involves something I learned about only recently: a user-defined aggregate function that concatenates values from across rows. Using this aggregate function, stragg, I can do another pass through the results mentioned above while grouping by store, and produce: store1, 'realTotal subTotal', 1 store2, 'realTotal subTotal', 3 store7, 'realTotal subTotal', 4 store10, 'realTotal subTotal', 5 |
|
...but... a worry: is this REALLY better than doing a nested query? I wonder if the use of the user-defined aggregate function is very expensive... |
#7
| |||
| |||
|
|
(csimam (AT) gmail (DOT) com) writes: The problem is that I ultimately only want the top 25 stores in total sales BUT with their subtotals as well... So I used ROLLUP to generate the totals and subtotals... and in the same query I generated a ranking of the data... and there I found my problem: the ranking does not distinguish between totals and subtotals. So the results look like: store1, realTotal, 1 store1, subTotal, 2 store2, realTotal, 3 store7, realTotal, 4 store10, subTotal, 5 Clearly, the rows with subtotals should NOT be ranked... but I am struggling with SQL to figure out how to get exactly what I want. An interesting challenge to do in one query and only hitting the table once, and I was not able to think of a solution without using the imaginary Previous() function that we have discussed internally among us MVPs. You can easily do in one query, if you hit the table twice, simply one query with GROUP BY store and one with GROUP BY store, month. Rank the first and join. And it's possible to hit the table once, if you first get the monthly aggregations into a temp table. But in a single query? I don't think it's possible is you want the format: Store1, Grand Total, 1000000 Store1, Jan 2000, 12000 Store1, Feb 2000, 10000, Store2, Grand Total, .... But I will have to see if I get Itzik Ben-Gan to tackle this one. Well... I have a solution. But I'm not sure about it. It involves something I learned about only recently: a user-defined aggregate function that concatenates values from across rows. Using this aggregate function, stragg, I can do another pass through the results mentioned above while grouping by store, and produce: store1, 'realTotal subTotal', 1 store2, 'realTotal subTotal', 3 store7, 'realTotal subTotal', 4 store10, 'realTotal subTotal', 5 One problem is that a UDA is limited to 8000 characters, so if there are many subtotals, you query will bomb. ...but... a worry: is this REALLY better than doing a nested query? I wonder if the use of the user-defined aggregate function is very expensive... The way to find out is to run both queries a couple of times on an idle server. Just don't forget about the effects of caching. Either first run the queries once, so that you know that table is entirely in cache. If it's too big for that, run DBCC DROPCLEANBUFFERS between each run, so that both queries run under the same condition. |
#8
| |||
| |||
|
|
An interesting challenge to do in one query and only hitting the table once, and I was not able to think of a solution without using the imaginary Previous() function that we have discussed internally among us MVPs. You can easily do in one query, if you hit the table twice, simply one query with GROUP BY store and one with GROUP BY store, month. Rank the first and join. And it's possible to hit the table once, if you first get the monthly aggregations into a temp table. But in a single query? I don't think it's possible is you want the format: Store1, Grand Total, 1000000 Store1, Jan 2000, 12000 Store1, Feb 2000, 10000, Store2, Grand Total, .... But I will have to see if I get Itzik Ben-Gan to tackle this one. |
#9
| |||
| |||
|
|
To wit, the fastest solution was one that I wrote that uses a temp table to hold the subtotals, and then I compute the grand totals from that one. But the really big surprise was that a fairly naïve query where I scan the table twice, once for the subtotals and one for the grand totals was a lot faster than the queries that were one-pass without using a temp table. The problem they had, is that the execution plan included a spool operator, that is a worktable introduced by the optimizer. |
|
The moral[e]: you know [don't] what is best until you have benchmarked. |
![]() |
| Thread Tools | |
| Display Modes | |
| |