dbTalk Databases Forums  

How can a group by clause be modified so the end weeks of the yearwon't be split?

comp.databases.theory comp.databases.theory


Discuss How can a group by clause be modified so the end weeks of the yearwon't be split? in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ted Byers
 
Posts: n/a

Default How can a group by clause be modified so the end weeks of the yearwon't be split? - 01-28-2009 , 02:10 PM






I have a SELECT statement similar to the following:

SELECT YEAR(transaction_date) AS y,WEEK(transaction_date) AS w, COUNT
(*) AS c FROM transaction_data_view WHERE mid = 300
GROUP BY YEAR(transaction_date),WEEK(transaction_date)
ORDER BY YEAR(transaction_date),WEEK(transaction_date);

The ONLY problem, here, is that if New Years day occurs in the middle
of the week, that week's data will be split at the first second of the
New Year. Of course, my real select statement is much more complex
than this, involving joins of both tables and views (and it is quick),
but this suffices to make the only remaining problem obvious.

If it matters, this is being done in MySQL.

So, how, then, can I fix the GROUP BY and ORDER BY clauses?

Thanks,

Ted

Reply With Quote
  #2  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: How can a group by clause be modified so the end weeks of the year won't be split? - 01-28-2009 , 03:53 PM






On Wed, 28 Jan 2009 12:10:34 -0800 (PST), Ted Byers wrote:

Quote:
I have a SELECT statement similar to the following:

SELECT YEAR(transaction_date) AS y,WEEK(transaction_date) AS w, COUNT
(*) AS c FROM transaction_data_view WHERE mid = 300
GROUP BY YEAR(transaction_date),WEEK(transaction_date)
ORDER BY YEAR(transaction_date),WEEK(transaction_date);

The ONLY problem, here, is that if New Years day occurs in the middle
of the week, that week's data will be split at the first second of the
New Year. Of course, my real select statement is much more complex
than this, involving joins of both tables and views (and it is quick),
but this suffices to make the only remaining problem obvious.

If it matters, this is being done in MySQL.

So, how, then, can I fix the GROUP BY and ORDER BY clauses?
Hi Ted,

I don't know the date calculation functions in MySQL, but a generic
approach would be to either
a) calculate number of weeks since a fixed date, or
b) calculate number of days since a fixed date and divide by 7 (using
integer division, or if MySQL doesn't have that, truncating the result
to integer).

The former is shorter and easier to understand, but depends on you and
the server agreeing on what day is considered the first day of the week.
The latter gives you ultimate control over what day of the week is
first, by changing the fixed date in the calculation.

In SQL Server, the code would be

GROUP BY DATEDIFF(week, '20000101', transaction_date)
or
GROUP BY DATEDIFF(day, '20000101', transaction_date) / 7

(I chose Jan 1st 2000 at random, I have no idea what day of the week
will be the week boundary and what date to use to get saturday, sunday,
or monday as first day of the week - some experimenting should help).

Best, Hugo


Reply With Quote
  #3  
Old   
toby
 
Posts: n/a

Default Re: How can a group by clause be modified so the end weeks of theyear won't be split? - 01-28-2009 , 03:57 PM





Ted Byers wrote:
Quote:
I have a SELECT statement similar to the following:

SELECT YEAR(transaction_date) AS y,WEEK(transaction_date) AS w, COUNT
(*) AS c FROM transaction_data_view WHERE mid = 300
GROUP BY YEAR(transaction_date),WEEK(transaction_date)
ORDER BY YEAR(transaction_date),WEEK(transaction_date);

The ONLY problem, here, is that if New Years day occurs in the middle
of the week, that week's data will be split at the first second of the
New Year. Of course, my real select statement is much more complex
than this, involving joins of both tables and views (and it is quick),
but this suffices to make the only remaining problem obvious.
Try GROUP BY TO_DAYS(transaction_date+x) DIV 7
Where x is a correction to give you the correct week starting day.

Quote:
If it matters, this is being done in MySQL.

So, how, then, can I fix the GROUP BY and ORDER BY clauses?

Thanks,

Ted

Reply With Quote
  #4  
Old   
Bob Badour
 
Posts: n/a

Default Re: How can a group by clause be modified so the end weeks of theyear won't be split? - 01-28-2009 , 04:11 PM



Ted Byers wrote:
Quote:
I have a SELECT statement similar to the following:

SELECT YEAR(transaction_date) AS y,WEEK(transaction_date) AS w, COUNT
(*) AS c FROM transaction_data_view WHERE mid = 300
GROUP BY YEAR(transaction_date),WEEK(transaction_date)
ORDER BY YEAR(transaction_date),WEEK(transaction_date);

The ONLY problem, here, is that if New Years day occurs in the middle
of the week, that week's data will be split at the first second of the
New Year. Of course, my real select statement is much more complex
than this, involving joins of both tables and views (and it is quick),
but this suffices to make the only remaining problem obvious.

If it matters, this is being done in MySQL.

So, how, then, can I fix the GROUP BY and ORDER BY clauses?

Thanks,

Ted
Normalize the date for calculating the year. Instead of grouping by the
YEAR of the current date, group by the YEAR of the first week of the
year that contains the current date. I will leave as an exercise how to
calculate the first week from the current date using the YEAR and WEEK
functions above plus whatever date arithmetic operators MySQL uses.


Reply With Quote
  #5  
Old   
Philipp Post
 
Posts: n/a

Default Re: How can a group by clause be modified so the end weeks of theyear won't be split? - 01-29-2009 , 05:10 AM



Ted,

Quote:
The ONLY problem, here, is that if New Years day occurs in the middle of the week, that week's data will be split at the first second of the New Year. *
A nobody said it yet: a calendar table often might help in such a
situation. A web search brings up some samples how to construct that.
Basically you create a table holding all date values in which your
busineess might occurr. Then you add columns for the week number and
whatever you else may need. Then a simple JOIN operation on the date
columns helps to get the desired week number out of the calendar
table.

brgds

Philipp Post


Reply With Quote
  #6  
Old   
Bob Badour
 
Posts: n/a

Default Re: How can a group by clause be modified so the end weeks of theyear won't be split? - 01-29-2009 , 08:29 AM



Philipp Post wrote:

Quote:
Ted,

The ONLY problem, here, is that if New Years day occurs in the middle of the week, that week's data will be split at the first second of the New Year.

A nobody said it yet: a calendar table often might help in such a
situation. A web search brings up some samples how to construct that.
Basically you create a table holding all date values in which your
busineess might occurr. Then you add columns for the week number and
whatever you else may need. Then a simple JOIN operation on the date
columns helps to get the desired week number out of the calendar
table.

brgds

Philipp Post
Since a function is just a relation with particular qualities, what is
the theoretical difference between joining to a table to get the WEEK
and calling a WEEK function?


Reply With Quote
  #7  
Old   
-CELKO-
 
Posts: n/a

Default Re: How can a group by clause be modified so the end weeks of theyear won't be split? - 01-29-2009 , 09:21 AM



Quote:
Since a function is just a relation with particular qualities, what is the theoretical difference between joining to a table to get the WEEK and calling a WEEK function?
The theoretical answer is that one defines the set by intention and
the other extension.

The practical answer is that a look-up table is adjustable for
holidays and irregular reporting periods. And there is a very good
chance it will run faster because of simple indexing and in parallel
because the table can be hashed over several processors.



Reply With Quote
  #8  
Old   
Ted Byers
 
Posts: n/a

Default Re: How can a group by clause be modified so the end weeks of theyear won't be split? - 01-29-2009 , 11:00 AM



Thanks one and all, for the ideas.

The following scriptlet defining a couple functions is what I came up
with as drop in alternates to WEEK() and YEAR(). Of course, the names
could be better, and the arguments could be made more rationale (i.e.
to take a DATE rather than a much too long VARCHAR - but this was a
quick and dirty first attempt to handle the data that is there using
ideas inspired by your responses, which includes fine grained
timestamps along with date).

DELIMITER //

DROP FUNCTION IF EXISTS `adjWholeWeek`//
CREATE FUNCTION `adjWholeWeek`(dt VARCHAR(256)) RETURNS INTEGER
BEGIN
DECLARE theDate DATE DEFAULT DATE(dt);
DECLARE priorSunday DATE DEFAULT DATE_ADD(theDate,INTERVAL -
(DAYOFWEEK(theDate)-1) DAY);
DECLARE rv INTEGER DEFAULT WEEK(priorSunday);
RETURN rv;
END
//

DROP FUNCTION IF EXISTS `adjWholeYEAR`//
CREATE FUNCTION `adjWholeYEAR`(dt VARCHAR(256)) RETURNS INTEGER
BEGIN
DECLARE theDate DATE DEFAULT DATE(dt);
DECLARE priorSunday DATE DEFAULT DATE_ADD(theDate,INTERVAL -
(DAYOFWEEK(theDate)-1) DAY);
DECLARE rv INTEGER DEFAULT YEAR(priorSunday);
RETURN rv;
END
//

DELIMITER ;


I figure creating functions like these, once validated and refined,
will provide a solution for all time, avoiding maintenance issues that
would be associated with calendar tables. This code is to be used in
risk reports generated automatically (in a cron job or scheduled task)
and I don't want to have to worry about that failing, or giving bad
output, because someone forgot to update a calendar table for a coming
year in time.

I did not opt for the tables in part because this situation is simpler
than the situation described in relation to tables. This business
operates 24-7-52" i.e. no holidays, ever. I think, if I had to worry
about holidays, given this is a global business, I'd do the date
manipulation in Perl (which has a number of marvelous packages for
handling them) because I'd then be facing a nightmare of time zones
interacting with a huge variety of national and religious holidays as
they'd exist across the planet. Thank God this business knows nothing
of holidays (and the data is entirely stored in UTC)!!!! Timezones
complicate things enough as it is (at present, we use them only for
displaying data in certain reports sent to clients).

Thanks again

Ted

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.