dbTalk Databases Forums  

consolidate time periods

comp.databases comp.databases


Discuss consolidate time periods in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
--CELKO--
 
Posts: n/a

Default consolidate time periods - 03-24-2007 , 11:47 AM






Can I get a little help? Given a very simple table of events, with
their durations, I want to consolidate time periods for an event that
(1) touch each other or (2) overlap each other.

CREATE TABLE Events
(event_id VARCHAR(25) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK (start_date <= end_date),
PRIMARY KEY (event_id, start_date, end_date));

INSERT INTO Events
VALUES ('Pie Eating', '2009-01-01', '2009-01-02'),
('Pie Eating', '2009-01-03', '2009-01-05'),
('Pie Eating', '2009-01-04', '2009-01-07'),
('Pie Eating', '2009-02-01', '2009-02-07');

The goal is to reduce these three rows into fewer rows that show how
many days we were eating pies.

('Pie Eating', '2009-01-01', '2009-01-07')

The first and second rows in the INSERT INTO statement touch each
other and can be replaced with:

('Pie Eating', '2009-01-01', '2009-01-05')

The third row will overlap with this new row and can be consolidated
with it, as shown before. However, the fourth row has data that
occurs a month after the first three and it is disjoint from all the
other rows.

What I want is:

1) A portable, simple cursor solution
2) A Recursive CTE solution

I am going to use this in a book, so you will get credit.


Reply With Quote
  #2  
Old   
David Cressey
 
Posts: n/a

Default Re: consolidate time periods - 03-24-2007 , 12:24 PM







"--CELKO--" <jcelko212 (AT) earthlink (DOT) net> wrote

Quote:
Can I get a little help? Given a very simple table of events, with
their durations, I want to consolidate time periods for an event that
(1) touch each other or (2) overlap each other.

CREATE TABLE Events
(event_id VARCHAR(25) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK (start_date <= end_date),
PRIMARY KEY (event_id, start_date, end_date));

INSERT INTO Events
VALUES ('Pie Eating', '2009-01-01', '2009-01-02'),
('Pie Eating', '2009-01-03', '2009-01-05'),
('Pie Eating', '2009-01-04', '2009-01-07'),
('Pie Eating', '2009-02-01', '2009-02-07');

The goal is to reduce these three rows into fewer rows that show how
many days we were eating pies.

('Pie Eating', '2009-01-01', '2009-01-07')

The first and second rows in the INSERT INTO statement touch each
other and can be replaced with:

('Pie Eating', '2009-01-01', '2009-01-05')

The third row will overlap with this new row and can be consolidated
with it, as shown before. However, the fourth row has data that
occurs a month after the first three and it is disjoint from all the
other rows.

What I want is:

1) A portable, simple cursor solution
2) A Recursive CTE solution

I am going to use this in a book, so you will get credit.

Joe,

Have you considered the possiblity of storing all the intervals where 'Pie
Eating' was NOT taking place? It seems easy to me, once you have such a set,
to make a minimal set of intervals like the ones you specify.

The (minimal) number of intervals where Pie Eating was not taking place is
within one of the (minimal) number of intervals where Pie Eating was
taking place (one or more times). This remains true, even if the past is
extended indefinitely into prehistory. You of course do not know for the
future whether Pie eating will or will not take place in the future,
barring some machine like "Back to the Future".

Whenever you insert a new Pie Eating event into the database, you will
either shrink one of the non eating intervals by a computable interval, or
you will fragment a non eating interval into two disjoint intervals, or you
will make no changes at all.

I haven't worked out the details yet, but this seems to be the outline of a
workable solution.






Reply With Quote
  #3  
Old   
--CELKO--
 
Posts: n/a

Default Re: consolidate time periods - 03-24-2007 , 01:26 PM



Quote:
Have you considered the possiblity of storing all the intervals where 'Pie Eating' was NOT taking place?
Nope, I haven't! Interesting idea. Thinking out loud, I would need a
starting time for my universe and could use a NULL as a token for
Eternity.

Quote:
I haven't worked out the details yet, but this seems to be the outline of a workable solution.
I have my old solutions from SQL FOR SMARTIES, but I have the feeling
that the new OLAP stuff and CTEs have to be better. As far as finding
the overlaps, I tend to use a Calendar table instead of self-joins:

SELECT E.event_id, C.cal_date
FROM Calendar AS C, Events AS E
WHERE C.cal_date BETWEEN E.start_date AND E.end_date
AND C.cal_date <= CURRENT_TIMESTAMP
GROUP BY E.event_id, C.cal_date
HAVING COUNT(event_id) > 1;




Reply With Quote
  #4  
Old   
David Cressey
 
Posts: n/a

Default Re: consolidate time periods - 03-24-2007 , 01:26 PM




"David Cressey" <cressey73 (AT) verizon (DOT) net> wrote

Quote:
"--CELKO--" <jcelko212 (AT) earthlink (DOT) net> wrote in message
news:1174758437.830959.214400 (AT) y80g2000hsf (DOT) googlegroups.com...
Can I get a little help? Given a very simple table of events, with
their durations, I want to consolidate time periods for an event that
(1) touch each other or (2) overlap each other.

CREATE TABLE Events
(event_id VARCHAR(25) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK (start_date <= end_date),
PRIMARY KEY (event_id, start_date, end_date));

INSERT INTO Events
VALUES ('Pie Eating', '2009-01-01', '2009-01-02'),
('Pie Eating', '2009-01-03', '2009-01-05'),
('Pie Eating', '2009-01-04', '2009-01-07'),
('Pie Eating', '2009-02-01', '2009-02-07');

The goal is to reduce these three rows into fewer rows that show how
many days we were eating pies.

('Pie Eating', '2009-01-01', '2009-01-07')

The first and second rows in the INSERT INTO statement touch each
other and can be replaced with:

('Pie Eating', '2009-01-01', '2009-01-05')

The third row will overlap with this new row and can be consolidated
with it, as shown before. However, the fourth row has data that
occurs a month after the first three and it is disjoint from all the
other rows.

What I want is:

1) A portable, simple cursor solution
2) A Recursive CTE solution

I am going to use this in a book, so you will get credit.


Joe,

Have you considered the possiblity of storing all the intervals where 'Pie
Eating' was NOT taking place? It seems easy to me, once you have such a
set,
to make a minimal set of intervals like the ones you specify.

The (minimal) number of intervals where Pie Eating was not taking place is
within one of the (minimal) number of intervals where Pie Eating was
taking place (one or more times). This remains true, even if the past is
extended indefinitely into prehistory. You of course do not know for the
future whether Pie eating will or will not take place in the future,
barring some machine like "Back to the Future".

Whenever you insert a new Pie Eating event into the database, you will
either shrink one of the non eating intervals by a computable interval,
or
you will fragment a non eating interval into two disjoint intervals, or
you
will make no changes at all.

I haven't worked out the details yet, but this seems to be the outline of
a
workable solution.




After looking at it for a few minutes, I'm pretty sure that my above answer
was completely bunk.
Sigh!





Reply With Quote
  #5  
Old   
Lennart
 
Posts: n/a

Default Re: consolidate time periods - 03-24-2007 , 02:28 PM



--CELKO-- wrote:
Quote:
Can I get a little help? Given a very simple table of events, with
their durations, I want to consolidate time periods for an event that
(1) touch each other or (2) overlap each other.

CREATE TABLE Events
(event_id VARCHAR(25) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK (start_date <= end_date),
PRIMARY KEY (event_id, start_date, end_date));

INSERT INTO Events
VALUES ('Pie Eating', '2009-01-01', '2009-01-02'),
('Pie Eating', '2009-01-03', '2009-01-05'),
('Pie Eating', '2009-01-04', '2009-01-07'),
('Pie Eating', '2009-02-01', '2009-02-07');

The goal is to reduce these three rows into fewer rows that show how
many days we were eating pies.

('Pie Eating', '2009-01-01', '2009-01-07')

The first and second rows in the INSERT INTO statement touch each
other and can be replaced with:

('Pie Eating', '2009-01-01', '2009-01-05')

The third row will overlap with this new row and can be consolidated
with it, as shown before. However, the fourth row has data that
occurs a month after the first three and it is disjoint from all the
other rows.

What I want is:

1) A portable, simple cursor solution
2) A Recursive CTE solution

I am going to use this in a book, so you will get credit.

A naive approach would be to calculate a calendar and then generate the
gaps. Each inteval must exist in the minimum gap around its endpoints. So:

with cal(d) as (
select min(start_date) - 1 day from events
union all
select d + 1 day from cal
where d < (select max(end_date) + 1 day from events)
), gap(d) as (
select d from cal c
where not exists (
select 1 from events
where c.d between start_date and end_date
)
), interval(event_id,start_date,lower_bound,end_date, upper_bound) as (
select e.event_id, e.start_date, max(g1.d) + 1 day,
e.end_date, min(g2.d) - 1 day
from gap g1, gap g2, events e
where g1.d < e.start_date
and g2.d > e.end_date
group by e.event_id, e.start_date, end_date
) select distinct event_id, lower_bound, upper_bound
from interval

However, the gaps are possible to derive without the calendar since we
can enumerate the dates:

select distinct
e.event_id,
(select max(start_date)
from events e2
where e2.start_date <= e.start_date
and not exists (
select 1 from events e3
where e2.start_date - 1 day between e3.start_date
and e3.end_date)),
(select min(end_date) from events e4
where e4.end_date >= e.end_date
and not exists (
select 1 from events e5
where e4.end_date + 1 day between e5.start_date
and e5.end_date))
from events e

I havent thought about it but perhaps it is possible to do something
clever with the "window clause" and an aggregate function. If I get the
time I'll try to think of something


/Lennart


Reply With Quote
  #6  
Old   
Joachim Pense
 
Posts: n/a

Default Re: consolidate time periods - 03-25-2007 , 01:34 AM



Am 24 Mar 2007 10:47:17 -0700 schrieb --CELKO--:

Quote:
Can I get a little help? Given a very simple table of events, with
their durations, I want to consolidate time periods for an event that
(1) touch each other or (2) overlap each other.

CREATE TABLE Events
(event_id VARCHAR(25) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK (start_date <= end_date),
PRIMARY KEY (event_id, start_date, end_date));

INSERT INTO Events
VALUES ('Pie Eating', '2009-01-01', '2009-01-02'),
('Pie Eating', '2009-01-03', '2009-01-05'),
('Pie Eating', '2009-01-04', '2009-01-07'),
('Pie Eating', '2009-02-01', '2009-02-07');

Sorry, no contribution, only a curious question:

What is the reason for putting the end_date into the PK? Just an
efficiency consideration to enable index-only execution plans, or are
you indeed having events in mind that need to be identified by both
start and end date?

Joachim


Reply With Quote
  #7  
Old   
Sylwester Lewandowski
 
Posts: n/a

Default Re: consolidate time periods - 03-25-2007 , 10:55 AM



--CELKO-- napisaƂ(a):
Quote:
Can I get a little help? Given a very simple table of events, with
their durations, I want to consolidate time periods for an event that
(1) touch each other or (2) overlap each other.

CREATE TABLE Events
(event_id VARCHAR(25) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK (start_date <= end_date),
PRIMARY KEY (event_id, start_date, end_date));

INSERT INTO Events
VALUES ('Pie Eating', '2009-01-01', '2009-01-02'),
('Pie Eating', '2009-01-03', '2009-01-05'),
('Pie Eating', '2009-01-04', '2009-01-07'),
('Pie Eating', '2009-02-01', '2009-02-07');

The goal is to reduce these three rows into fewer rows that show how
many days we were eating pies.

('Pie Eating', '2009-01-01', '2009-01-07')

The first and second rows in the INSERT INTO statement touch each
other and can be replaced with:

('Pie Eating', '2009-01-01', '2009-01-05')

The third row will overlap with this new row and can be consolidated
with it, as shown before. However, the fourth row has data that
occurs a month after the first three and it is disjoint from all the
other rows.

What I want is:

1) A portable, simple cursor solution
2) A Recursive CTE solution

I am going to use this in a book, so you will get credit.

Hello,

I use SQL Server and I had to use DATEADD function.
I hope this recursive cte solution will work as well in other cases.


WITH EventsCTE AS
(
SELECT E1.event_id,E1.start_date, E1.end_date
FROM Events E1
UNION ALL
SELECT E.event_id,E.start_date, CTE.end_date
FROM EventsCTE CTE
INNER JOIN Events E
ON (
E.event_id = CTE.event_id
AND
(
(CTE.start_date = DATEADD(DAY,1,E.end_date))
OR
(E.end_date >= CTE.start_date AND E.end_date <= CTE.end_date)
AND
(E.start_date < CTE.start_date)
)
)
UNION ALL
SELECT E.event_id,CTE.start_date, E.end_date
FROM EventsCTE CTE
INNER JOIN Events E
ON (
E.event_id = CTE.event_id
AND
(
(CTE.end_date = DATEADD(DAY,-1,E.start_date) )
OR
(E.start_date >= CTE.start_date AND E.start_date <= CTE.end_date)
AND
(E.end_date > CTE.end_date)
)
)
UNION ALL
SELECT E.event_id,E.start_date, E.end_date
FROM EventsCTE CTE
INNER JOIN Events E
ON (E.event_id = CTE.event_id AND (CTE.start_date > E.start_date AND
CTE.end_date < E.end_date) )
)
SELECT event_id, start_date, end_date
FROM EventsCTE
EXCEPT
SELECT event_id, start_date, end_date
FROM EventsCTE C1
WHERE EXISTS (
SELECT *
FROM EventsCTE C2
WHERE (C1.start_date = C2.start_date AND C1.end_date < C2.end_date)
OR (C1.start_date > C2.start_date AND C1.end_date < C2.end_date)
OR (C1.start_date > C2.start_date AND C1.end_date = C2.end_date)
AND C1.event_id = C2.event_id
)


event_id start_date end_date
--------------------- ------------------------ ---------
Pie Eating 2009-01-01 00:00:00.000 2009-01-07 00:00:00.000
Pie Eating 2009-02-01 00:00:00.000 2009-02-07 00:00:00.000





Reply With Quote
  #8  
Old   
--CELKO--
 
Posts: n/a

Default Re: consolidate time periods - 03-25-2007 , 12:57 PM



It looks good; the only change I would make is to get rid of the extra
parentheses, extra spacing and use BETWEEN predicates.

WITH EventsCTE
AS
(SELECT E1.event_id, E1.start_date, E1.end_date
FROM Events AS E1

UNION ALL
SELECT E.event_id, E.start_date, CTE.end_date
FROM EventsCTE AS CTE, Events AS E
WHERE (E.event_id = CTE.event_id
AND (CTE.start_date = E.end_date + INTERVAL '1' DAY
OR (E.end_date >= CTE.start_date
AND E.end_date <= CTE.end_date)
AND (E.start_date < CTE.start_date))
)

UNION ALL
SELECT E.event_id, CTE.start_date, E.end_date
FROM EventsCTE AS CTE, Events as E
WHERE E.event_id = CTE.event_id
AND ((CTE.end_date = E.start_date - INTERVAL '1' DAY)
OR (E.start_date >= CTE.start_date
AND E.start_date <= CTE.end_date)
AND (E.end_date > CTE.end_date)))

UNION ALL
SELECT E.event_id, E.start_date, E.end_date
FROM EventsCTE AS CTE, Events AS E
WHERE E.event_id = CTE.event_id
AND CTE.start_date > E.start_date
AND CTE.end_date < E.end_date

SELECT event_id, start_date, end_date
FROM EventsCTE
EXCEPT
SELECT event_id, start_date, end_date
FROM EventsCTE AS C1
WHERE EXISTS
(SELECT *
FROM EventsCTE AS C2
WHERE C1.event_id = C2.event_id)
AND ((C1.start_date = C2.start_date
AND C1.end_date < C2.end_date)
OR (C1.start_date > C2.start_date
AND C1.end_date < C2.end_date)
OR (C1.start_date > C2.start_date
AND C1.end_date = C2.end_date));



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.