dbTalk Databases Forums  

Group By

microsoft.public.sqlserver.clients microsoft.public.sqlserver.clients


Discuss Group By in the microsoft.public.sqlserver.clients forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jeremy
 
Posts: n/a

Default Group By - 04-19-2007 , 08:40 AM






I'm trying to aggregate values and grouping the results by defined periods (6
month, 12 month, 18 month, and lifetime). I'd like to do this with one pass,
but using a CASE statement did not aggregrate correctly. The results are
more like 6 months, 7-12 months, and 13-18 months, 19-lifetime. I would like
the results of the 12 month grouping to include all of the 6 month, the 18
month to include all of the 6 and 12 month grouping, etc.


/**/
CASE WHEN pd.accountingdate > 0 THEN 'LIFE'
WHEN pd.accountingdate BETWEEN '10/19/2004' AND '04/19/2007' THEN '18MONTH'
WHEN pd.accountingdate BETWEEN '04/19/2006' AND '04/19/2007' THEN '12MONTH'
WHEN pd.accountingdate BETWEEN '10/19/2006' AND '04/19/2007' THEN '06MONTH'

ELSE NULL
END
/**/

Thanks.

Reply With Quote
  #2  
Old   
Scott Morris
 
Posts: n/a

Default Re: Group By - 04-19-2007 , 09:03 AM






"Jeremy" <Jeremy (AT) discussions (DOT) microsoft.com> wrote

Quote:
I'm trying to aggregate values and grouping the results by defined periods
(6
month, 12 month, 18 month, and lifetime). I'd like to do this with one
pass,
but using a CASE statement did not aggregrate correctly. The results are
more like 6 months, 7-12 months, and 13-18 months, 19-lifetime. I would
like
the results of the 12 month grouping to include all of the 6 month, the 18
month to include all of the 6 and 12 month grouping, etc.


/**/
CASE WHEN pd.accountingdate > 0 THEN 'LIFE'
WHEN pd.accountingdate BETWEEN '10/19/2004' AND '04/19/2007' THEN
'18MONTH'
WHEN pd.accountingdate BETWEEN '04/19/2006' AND '04/19/2007' THEN
'12MONTH'
WHEN pd.accountingdate BETWEEN '10/19/2006' AND '04/19/2007' THEN
'06MONTH'

ELSE NULL
END
In situations like this, you should post DDL, sample data, and the actual
query. A non-working snippet of a query doesn't really help anyone
understand the complete situation.

Making some assumptions, I believe your problem is related to the use of a
single column to represent different periods. The above case expression
represents what? Is it the 6 month data, the 12 month data, ...? It can't
represent more than one "attribute" - in this case, period.

The solution is to generate separate period values. This can be done in one
of two ways. Either you aggregate the periods as separate columns or you
create a situation where you join the data to be aggregated to a table
containing the periods. In the first example, you get period data as
separate columns within the result set. In the second example, you get
period data as separate rows. You decide which way you want to proceed. It
will facilitate discussion to use either Pubs or Northwind for sample data
and queries since most people have those available (and thus do not require
the posting of DDL or sample data).




Reply With Quote
  #3  
Old   
Jeremy
 
Posts: n/a

Default Re: Group By - 04-19-2007 , 10:16 AM



Understood Scott, I've already considered the approach you've suggested, but
let me expand to help everyone better understand (as you recommended).

Using a case statement gives me the following from the Orders table in
Northwind:

customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|2|94.7100
AROUT|18MONTH|3|101.5500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|5|419.5500
BERGS|18MONTH|4|426.2300
BERGS|LIFE|2|101.6700

/* sample statement */
SELECT customerid,
CASE WHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders
WHERE customerid IN ('AROUT', 'BERGS')
GROUP BY
customerid,
CASE WHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END
/**/

However to get the results I seek I have to make four passes:

customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|8|275.6900
AROUT|18MONTH|13|471.9500
AROUT|LIFE|13|471.9500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|7|612.0700
BERGS|18MONTH|16|1457.8500
BERGS|LIFE|18|1559.5200

/* sample statement */
SELECT customerid,
'06MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders
WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1997' AND '05/06/1998'
GROUP BY customerid

UNION
SELECT customerid,
'12MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders
WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1997' AND '05/06/1998'
GROUP BY customerid

UNION
SELECT customerid,
'18MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders

WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1996' AND '05/06/1998'
GROUP BY customerid
UNION
SELECT customerid,
'LIFE' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders
WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate > 0
GROUP BY customerid
ORDER BY customerid, period
/**/

The results of the first query are affected by the sequence of the
conditions in the CASE statement. And there's the rub. I'd like to have my
results in rows because I'm aggregrating at least 20 columns and 4 periods
would push it to 80 columns. Thanks for you help.

"Scott Morris" wrote:

Quote:
"Jeremy" <Jeremy (AT) discussions (DOT) microsoft.com> wrote in message
news:79FBE5E0-E112-406A-B406-A5D19626FA96 (AT) microsoft (DOT) com...
I'm trying to aggregate values and grouping the results by defined periods
(6
month, 12 month, 18 month, and lifetime). I'd like to do this with one
pass,
but using a CASE statement did not aggregrate correctly. The results are
more like 6 months, 7-12 months, and 13-18 months, 19-lifetime. I would
like
the results of the 12 month grouping to include all of the 6 month, the 18
month to include all of the 6 and 12 month grouping, etc.


/**/
CASE WHEN pd.accountingdate > 0 THEN 'LIFE'
WHEN pd.accountingdate BETWEEN '10/19/2004' AND '04/19/2007' THEN
'18MONTH'
WHEN pd.accountingdate BETWEEN '04/19/2006' AND '04/19/2007' THEN
'12MONTH'
WHEN pd.accountingdate BETWEEN '10/19/2006' AND '04/19/2007' THEN
'06MONTH'

ELSE NULL
END

In situations like this, you should post DDL, sample data, and the actual
query. A non-working snippet of a query doesn't really help anyone
understand the complete situation.

Making some assumptions, I believe your problem is related to the use of a
single column to represent different periods. The above case expression
represents what? Is it the 6 month data, the 12 month data, ...? It can't
represent more than one "attribute" - in this case, period.

The solution is to generate separate period values. This can be done in one
of two ways. Either you aggregate the periods as separate columns or you
create a situation where you join the data to be aggregated to a table
containing the periods. In the first example, you get period data as
separate columns within the result set. In the second example, you get
period data as separate rows. You decide which way you want to proceed. It
will facilitate discussion to use either Pubs or Northwind for sample data
and queries since most people have those available (and thus do not require
the posting of DDL or sample data).




Reply With Quote
  #4  
Old   
Jeremy
 
Posts: n/a

Default Re: Group By - 04-19-2007 , 10:30 AM



Understood Scott. I've already considered the approaches you've suggested,
but let me expand to help us better understand my problem as you recommended.

The CASE statement approach gives me the following results from the Orders
table in Northwind:

customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|2|94.7100 /* freight should = 370.40*/
AROUT|18MONTH|3|101.5500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|5|419.5500
BERGS|18MONTH|4|426.2300
BERGS|LIFE|2|101.6700

/* sample code */
SELECT customerid,
CASE
WHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END period,
COUNT(orderid) orders,
SUM(freight) freight

FROM orders

WHERE customerid IN ('AROUT', 'BERGS')
GROUP BY
customerid,
CASE
WHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END
/**/

However, I would like my results to read:

customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|10|370.4000
AROUT|18MONTH|13|471.9500
AROUT|LIFE|13|471.9500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|12|1031.6200
BERGS|18MONTH|16|1457.8500
BERGS|LIFE|18|1559.5200

/* sample code */
SELECT customerid,
'06MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders
WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1997' AND '05/06/1998'
GROUP BY customerid

UNION
SELECT customerid,
'12MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders
WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '05/06/1997' AND '05/06/1998'
GROUP BY customerid

UNION
SELECT customerid,
'18MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders

WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1996' AND '05/06/1998'
GROUP BY customerid
UNION
SELECT customerid,
'LIFE' period,
COUNT(orderid) orders,
SUM(freight) freight
FROM orders
WHERE customerid IN ('AROUT', 'BERGS')
AND orderdate > 0
GROUP BY customerid
ORDER BY customerid, period
/**/

The results of the first query are affected by the sequence of the
conditions in the CASE statement. And there's the rub. I need the 18 month
period to include 6 months and 12 months. The only way I know to do that is
through multiple passes. I prefer my results to rows as I'm aggregating at
least 20 columns and if I push the results to columns with 4 different
periods, then I would expect at least 80 columns! Any help is greatly
appreciated. Thanks.


"Scott Morris" wrote:

Quote:
"Jeremy" <Jeremy (AT) discussions (DOT) microsoft.com> wrote in message
news:79FBE5E0-E112-406A-B406-A5D19626FA96 (AT) microsoft (DOT) com...
I'm trying to aggregate values and grouping the results by defined periods
(6
month, 12 month, 18 month, and lifetime). I'd like to do this with one
pass,
but using a CASE statement did not aggregrate correctly. The results are
more like 6 months, 7-12 months, and 13-18 months, 19-lifetime. I would
like
the results of the 12 month grouping to include all of the 6 month, the 18
month to include all of the 6 and 12 month grouping, etc.


/**/
CASE WHEN pd.accountingdate > 0 THEN 'LIFE'
WHEN pd.accountingdate BETWEEN '10/19/2004' AND '04/19/2007' THEN
'18MONTH'
WHEN pd.accountingdate BETWEEN '04/19/2006' AND '04/19/2007' THEN
'12MONTH'
WHEN pd.accountingdate BETWEEN '10/19/2006' AND '04/19/2007' THEN
'06MONTH'

ELSE NULL
END

In situations like this, you should post DDL, sample data, and the actual
query. A non-working snippet of a query doesn't really help anyone
understand the complete situation.

Making some assumptions, I believe your problem is related to the use of a
single column to represent different periods. The above case expression
represents what? Is it the 6 month data, the 12 month data, ...? It can't
represent more than one "attribute" - in this case, period.

The solution is to generate separate period values. This can be done in one
of two ways. Either you aggregate the periods as separate columns or you
create a situation where you join the data to be aggregated to a table
containing the periods. In the first example, you get period data as
separate columns within the result set. In the second example, you get
period data as separate rows. You decide which way you want to proceed. It
will facilitate discussion to use either Pubs or Northwind for sample data
and queries since most people have those available (and thus do not require
the posting of DDL or sample data).




Reply With Quote
  #5  
Old   
Scott Morris
 
Posts: n/a

Default Re: Group By - 04-19-2007 , 12:14 PM



Quote:
The results of the first query are affected by the sequence of the
conditions in the CASE statement. And there's the rub. I'd like to have
my
results in rows because I'm aggregrating at least 20 columns and 4 periods
would push it to 80 columns. Thanks for you help.
The secret is that you need a table of periods - which you then join to your
data while aggregating. This can be done in one pass since each "raw data
row" joins once to each appropriate period. So a row for today joins to the
6 month period while a row from 10 months ago joins to both the 6 and 12
month periods (and so forth). The trick is to generate the period table -
dynamically if needed. Below is a query (that can be used as a derived
table) to do this. Obviously, the variable isn't actually required. You
could create a table-valued function to do the same thing (especially if you
needed to reuse this logic).

declare @today datetime
set @today = DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)

select @today, dateadd(month, -6, @today)
union all
select @today, dateadd(month, -12, @today)
union all
select @today, dateadd(month, -18, @today)

You may need to adjust the logic depending on your definition of periods -
the last day of the month (esp Feb in and out of a leap year) is always fun
for these types of things. With the table of periods, you simply join to
the transaction table where date of transaction is between the period start
and end dates.




Reply With Quote
  #6  
Old   
Jeremy
 
Posts: n/a

Default Re: Group By - 04-23-2007 , 07:12 PM



Your solution is perfect! This increases performance and keeps my DBA happy.
I'll add this technique to my bag of tricks. Thanks Scott!

"Scott Morris" wrote:

Quote:
The results of the first query are affected by the sequence of the
conditions in the CASE statement. And there's the rub. I'd like to have
my
results in rows because I'm aggregrating at least 20 columns and 4 periods
would push it to 80 columns. Thanks for you help.

The secret is that you need a table of periods - which you then join to your
data while aggregating. This can be done in one pass since each "raw data
row" joins once to each appropriate period. So a row for today joins to the
6 month period while a row from 10 months ago joins to both the 6 and 12
month periods (and so forth). The trick is to generate the period table -
dynamically if needed. Below is a query (that can be used as a derived
table) to do this. Obviously, the variable isn't actually required. You
could create a table-valued function to do the same thing (especially if you
needed to reuse this logic).

declare @today datetime
set @today = DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)

select @today, dateadd(month, -6, @today)
union all
select @today, dateadd(month, -12, @today)
union all
select @today, dateadd(month, -18, @today)

You may need to adjust the logic depending on your definition of periods -
the last day of the month (esp Feb in and out of a leap year) is always fun
for these types of things. With the table of periods, you simply join to
the transaction table where date of transaction is between the period start
and end dates.




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.