dbTalk Databases Forums  

Collapse date records into groups

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss Collapse date records into groups in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Fin
 
Posts: n/a

Default Collapse date records into groups - 06-10-2011 , 06:20 PM






Given a table with the following very small data extract, I am trying
to find a way to collapse this into sets of distinct grouped dates.
Reading the dates by ID, you can see there are distinct 'groups' of
dates that are occurring together and I am trying to find a way of
giving these group_id's with the distinct dates. There are probably
about 10 groups in the below data, just at a glance. I won't need the
ID field as shown, but rather giving the group_id for repeated sets of
dates with the same count. So records 900000 - 9000019 would all
collapse to group_id 1 with the same three dates and so on.

My table has over 300k records and I need a way to collapse it and
create a date_group table with a group_id, a CNT and dates associated
to that distinct group. My table has many other CNT values ranging
from say 2 to 5 dates.

There must be a way of doing this but after struggling for hours I am
drawing a blank. Any bright ideas from those more experienced than I ?

ID CNT DATE
9000000 3 10/25/2002
9000000 3 07/22/2005
9000000 3 02/15/2008
9000015 3 10/25/2002
9000015 3 07/22/2005
9000015 3 02/15/2008
9000018 3 10/25/2002
9000018 3 07/22/2005
9000018 3 02/15/2008
9000019 3 10/25/2002
9000019 3 07/22/2005
9000019 3 02/15/2008
9000321 3 10/25/2002
9000321 3 08/04/2004
9000321 3 02/15/2008
9000322 3 10/25/2002
9000322 3 08/04/2004
9000322 3 02/15/2008
9000323 3 10/25/2002
9000323 3 08/04/2004
9000323 3 02/15/2008
9000382 3 09/07/2001
9000382 3 10/25/2002
9000382 3 02/15/2008
9000383 3 09/07/2001
9000383 3 10/25/2002
9000383 3 02/15/2008
9000384 3 09/07/2001
9000384 3 10/25/2002
9000384 3 02/15/2008
9000407 3 09/07/2001
9000407 3 02/10/2005
9000407 3 02/15/2008
9000408 3 09/07/2001
9000408 3 02/10/2005
9000408 3 02/15/2008
9000420 3 02/15/2008
9000420 3 06/08/2009
9000420 3 06/15/2010
9008929 3 09/07/2001
9008929 3 05/09/2005
9008929 3 02/15/2008
9008941 3 09/07/2001
9008941 3 05/09/2005
9008941 3 02/15/2008
9008942 3 09/07/2001
9008942 3 05/09/2005
9008942 3 02/15/2008
9008951 3 09/07/2001
9008951 3 10/25/2002
9008951 3 02/15/2008
9008952 3 09/07/2001
9008952 3 10/25/2002
9008952 3 02/15/2008
9008953 3 09/07/2001
9008953 3 10/25/2002
9008953 3 02/15/2008
9008954 3 09/07/2001
9008954 3 10/25/2002
9008954 3 02/15/2008
9009146 3 02/15/2008
9009146 3 04/08/2010
9009146 3 06/15/2010
9009147 3 02/15/2008
9009147 3 04/08/2010
9009147 3 06/15/2010
9009150 3 02/15/2008
9009150 3 04/08/2010
9009150 3 06/15/2010
9009151 3 02/15/2008
9009151 3 04/08/2010
9009151 3 06/15/2010
9009152 3 02/15/2008
9009152 3 04/08/2010
9009152 3 06/15/2010
9009153 3 02/15/2008
9009153 3 04/08/2010
9009153 3 06/15/2010
9009154 3 02/15/2008
9009154 3 04/08/2010
9009154 3 06/15/2010
9009157 3 02/15/2008
9009157 3 04/08/2010
9009157 3 06/15/2010
9009302 3 09/07/2001
9009302 3 10/25/2002
9009302 3 02/15/2008
9009303 3 09/07/2001
9009303 3 10/25/2002
9009303 3 02/15/2008
9009304 3 09/07/2001
9009304 3 10/25/2002
9009304 3 02/15/2008
9009305 3 09/07/2001
9009305 3 10/25/2002
9009305 3 02/15/2008
9009306 3 09/07/2001
9009306 3 10/25/2002
9009306 3 02/15/2008
9009307 3 09/07/2001
9009307 3 10/25/2002
9009307 3 02/15/2008

Reply With Quote
  #2  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-10-2011 , 09:55 PM






If you are using DB2 9.7 FP 4 for LUW,
you can use LISTAGG OLAP specification, like the following example.

Otherwise, if your using DB2 supports XMLGROUP or XMLAGG,
you can use one of them instead of LISTAGG.

Sample 1: Fing group_id and belonging dates.
------------------------------ Commands Entered
------------------------------
WITH
sample_data(id , cnt , date_col) AS (
SELECT id
, SMALLINT(cnt)
, DATE(date_col)
FROM (
VALUES
(9000000 , 3 , '10/25/2002')
, (9000000 , 3 , '07/22/2005')
, (9000000 , 3 , '02/15/2008')

, (9000015 , 3 , '10/25/2002')
, (9000015 , 3 , '07/22/2005')
, (9000015 , 3 , '02/15/2008')

, (9000018 , 3 , '10/25/2002')
, (9000018 , 3 , '07/22/2005')
, (9000018 , 3 , '02/15/2008')

, (9000019 , 3 , '10/25/2002')
, (9000019 , 3 , '07/22/2005')
, (9000019 , 3 , '02/15/2008')

, (9000321 , 3 , '10/25/2002')
, (9000321 , 3 , '08/04/2004')
, (9000321 , 3 , '02/15/2008')

, (9000322 , 3 , '10/25/2002')
, (9000322 , 3 , '08/04/2004')
, (9000322 , 3 , '02/15/2008')

, (9000323 , 3 , '10/25/2002')
, (9000323 , 3 , '08/04/2004')
, (9000323 , 3 , '02/15/2008')

, (9000382 , 3 , '09/07/2001')
, (9000382 , 3 , '10/25/2002')
, (9000382 , 3 , '02/15/2008')

, (9000383 , 3 , '09/07/2001')
, (9000383 , 3 , '10/25/2002')
, (9000383 , 3 , '02/15/2008')

, (9000384 , 3 , '09/07/2001')
, (9000384 , 3 , '10/25/2002')
, (9000384 , 3 , '02/15/2008')

, (9000407 , 3 , '09/07/2001')
, (9000407 , 3 , '02/10/2005')
, (9000407 , 3 , '02/15/2008')

, (9000408 , 3 , '09/07/2001')
, (9000408 , 3 , '02/10/2005')
, (9000408 , 3 , '02/15/2008')

, (9000420 , 3 , '02/15/2008')
, (9000420 , 3 , '06/08/2009')
, (9000420 , 3 , '06/15/2010')

, (9008929 , 3 , '09/07/2001')
, (9008929 , 3 , '05/09/2005')
, (9008929 , 3 , '02/15/2008')

, (9008941 , 3 , '09/07/2001')
, (9008941 , 3 , '05/09/2005')
, (9008941 , 3 , '02/15/2008')

, (9008942 , 3 , '09/07/2001')
, (9008942 , 3 , '05/09/2005')
, (9008942 , 3 , '02/15/2008')

, (9008951 , 3 , '09/07/2001')
, (9008951 , 3 , '10/25/2002')
, (9008951 , 3 , '02/15/2008')

, (9008952 , 3 , '09/07/2001')
, (9008952 , 3 , '10/25/2002')
, (9008952 , 3 , '02/15/2008')

, (9008953 , 3 , '09/07/2001')
, (9008953 , 3 , '10/25/2002')
, (9008953 , 3 , '02/15/2008')

, (9008954 , 3 , '09/07/2001')
, (9008954 , 3 , '10/25/2002')
, (9008954 , 3 , '02/15/2008')

, (9009146 , 3 , '02/15/2008')
, (9009146 , 3 , '04/08/2010')
, (9009146 , 3 , '06/15/2010')

, (9009147 , 3 , '02/15/2008')
, (9009147 , 3 , '04/08/2010')
, (9009147 , 3 , '06/15/2010')

, (9009150 , 3 , '02/15/2008')
, (9009150 , 3 , '04/08/2010')
, (9009150 , 3 , '06/15/2010')

, (9009151 , 3 , '02/15/2008')
, (9009151 , 3 , '04/08/2010')
, (9009151 , 3 , '06/15/2010')

, (9009152 , 3 , '02/15/2008')
, (9009152 , 3 , '04/08/2010')
, (9009152 , 3 , '06/15/2010')

, (9009153 , 3 , '02/15/2008')
, (9009153 , 3 , '04/08/2010')
, (9009153 , 3 , '06/15/2010')

, (9009154 , 3 , '02/15/2008')
, (9009154 , 3 , '04/08/2010')
, (9009154 , 3 , '06/15/2010')

, (9009157 , 3 , '02/15/2008')
, (9009157 , 3 , '04/08/2010')
, (9009157 , 3 , '06/15/2010')

, (9009302 , 3 , '09/07/2001')
, (9009302 , 3 , '10/25/2002')
, (9009302 , 3 , '02/15/2008')

, (9009303 , 3 , '09/07/2001')
, (9009303 , 3 , '10/25/2002')
, (9009303 , 3 , '02/15/2008')

, (9009304 , 3 , '09/07/2001')
, (9009304 , 3 , '10/25/2002')
, (9009304 , 3 , '02/15/2008')

, (9009305 , 3 , '09/07/2001')
, (9009305 , 3 , '10/25/2002')
, (9009305 , 3 , '02/15/2008')

, (9009306 , 3 , '09/07/2001')
, (9009306 , 3 , '10/25/2002')
, (9009306 , 3 , '02/15/2008')

, (9009307 , 3 , '09/07/2001')
, (9009307 , 3 , '10/25/2002')
, (9009307 , 3 , '02/15/2008')
) s(id , cnt , date_col)
)
SELECT DISTINCT
group_id
, cnt
, SUBSTR(date_list , s_pos ,10) AS date_col
FROM (SELECT DISTINCT
date_list , cnt
, SMALLINT(
DENSE_RANK()
OVER(ORDER BY date_list)
) AS group_id
FROM (SELECT cnt
, CAST(
LISTAGG( CHAR(date_col) )
WITHIN GROUP(ORDER BY date_col)
AS VARCHAR(60)
) AS date_list
FROM sample_data
GROUP BY
id , cnt
)
)
LEFT OUTER JOIN
(VALUES
1 , 11, 21 , 31 , 41) AS p(s_pos)
ON s_pos < LENGTH(date_list)
ORDER BY
group_id
;
------------------------------------------------------------------------------

GROUP_ID CNT DATE_COL
-------- ------ ----------
1 3 2001-09-07
1 3 2002-10-25
1 3 2008-02-15
2 3 2001-09-07
2 3 2005-02-10
2 3 2008-02-15
3 3 2001-09-07
3 3 2005-05-09
3 3 2008-02-15
4 3 2002-10-25
4 3 2004-08-04
4 3 2008-02-15
5 3 2002-10-25
5 3 2005-07-22
5 3 2008-02-15
6 3 2008-02-15
6 3 2009-06-08
6 3 2010-06-15
7 3 2008-02-15
7 3 2010-04-08
7 3 2010-06-15

21 record(s) selected.

Reply With Quote
  #3  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-10-2011 , 10:06 PM



Sample 2: date list, group_id and list of belonging ids.

SELECT date_list
, cnt
, SMALLINT(
DENSE_RANK()
OVER(ORDER BY date_list)
) AS group_id
, CAST(
LISTAGG( CHAR(id) , ' | ' )
WITHIN GROUP(ORDER BY id)
AS VARCHAR(200)
) AS id_list
FROM (SELECT id , cnt
, CAST(
LISTAGG( CHAR(date_col) , ' | ' )
WITHIN GROUP(ORDER BY date_col)
AS VARCHAR(40)
) AS date_list
FROM sample_data
GROUP BY
id , cnt
) g
GROUP BY
date_list , cnt
ORDER BY
group_id
;
------------------------------------------------------------------------------

DATE_LIST CNT GROUP_ID
ID_LIST
---------------------------------------- ------ --------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2001-09-07 | 2002-10-25 | 2008-02-15 3 1 9000382 |
9000383 | 9000384 | 9008951 | 9008952 | 9008953 |
9008954 | 9009302 | 9009303 | 9009304 | 9009305 |
9009306 | 9009307
2001-09-07 | 2005-02-10 | 2008-02-15 3 2 9000407 |
9000408
2001-09-07 | 2005-05-09 | 2008-02-15 3 3 9008929 |
9008941 |
9008942
2002-10-25 | 2004-08-04 | 2008-02-15 3 4 9000321 |
9000322 |
9000323
2002-10-25 | 2005-07-22 | 2008-02-15 3 5 9000000 |
9000015 | 9000018 |
9000019
2008-02-15 | 2009-06-08 | 2010-06-15 3 6
9000420
2008-02-15 | 2010-04-08 | 2010-06-15 3 7 9009146 |
9009147 | 9009150 | 9009151 | 9009152 | 9009153 |
9009154 |
9009157

7 record(s) selected.

Reply With Quote
  #4  
Old   
Fin
 
Posts: n/a

Default Re: Collapse date records into groups - 06-10-2011 , 11:28 PM



Tonkuma, that is absolutely perfect !

I replaced the values ... with select ... and presto the first query
gives me exactly what I was hoping for.

Many grateful thanks for the help.

Fin.

Reply With Quote
  #5  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-11-2011 , 12:07 AM



Quote:
... LISTAGG OLAP specification ...
Correction: ... LISTAGG aggregate function ...

Reply With Quote
  #6  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-11-2011 , 12:16 AM



Minor correction:

Quote:
SELECT DISTINCT
group_id
, cnt
, SUBSTR(date_list , s_pos ,10) AS date_col
FROM (SELECT DISTINCT
date_list , cnt
....
<<

DISTINCT in outermost SELECT list was not necessary, like...
SELECT -- DISTINCT
group_id
, cnt
, SUBSTR(date_list , s_pos , 10) AS date_col
FROM (SELECT DISTINCT
date_list , cnt
....

Reply With Quote
  #7  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-11-2011 , 12:21 AM



Minor correction:

Quote:
SELECT DISTINCT
group_id
, cnt
, SUBSTR(date_list , s_pos ,10) AS date_col
FROM (SELECT DISTINCT
date_list , cnt
....
<<
DISTINCT in outermost SELECT list was not necessary, like...

SELECT --DISTINCT
group_id
, cnt
, SUBSTR(date_list , s_pos , 10) AS date_col
FROM (SELECT DISTINCT
date_list , cnt
....

Reply With Quote
  #8  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-11-2011 , 01:13 AM



Intermediate nested-table-expression in "Sample 1:" was not neccesary,
like...

Sample 3:
SELECT SMALLINT(
DENSE_RANK()
OVER(ORDER BY date_list)
) AS group_id
, cnt
, SUBSTR(date_list , s_pos , 10) AS date_col
FROM (SELECT DISTINCT
cnt
, CAST(
LISTAGG( CHAR(date_col) )
WITHIN GROUP(ORDER BY date_col)
AS VARCHAR(60)
) AS date_list
FROM sample_data
GROUP BY
id , cnt
)
LEFT OUTER JOIN
(VALUES 1 , 11 , 21 , 31 , 41) AS p(s_pos)
ON s_pos < LENGTH(date_list)
ORDER BY
group_id
;
------------------------------------------------------------------------------

GROUP_ID CNT DATE_COL
-------- ------ ----------
1 3 2001-09-07
1 3 2002-10-25
1 3 2008-02-15
2 3 2001-09-07
2 3 2005-02-10
2 3 2008-02-15
3 3 2001-09-07
3 3 2005-05-09
3 3 2008-02-15
4 3 2002-10-25
4 3 2004-08-04
4 3 2008-02-15
5 3 2002-10-25
5 3 2005-07-22
5 3 2008-02-15
6 3 2008-02-15
6 3 2009-06-08
6 3 2010-06-15
7 3 2008-02-15
7 3 2010-04-08
7 3 2010-06-15

21 record(s) selected.

Reply With Quote
  #9  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-11-2011 , 12:43 PM



Intermediate nested-table-expression and CAST in nested-table-
expression in "Sample 1:" were not neccesary,
like...

Sample 3:
SELECT SMALLINT(
DENSE_RANK() OVER(ORDER BY date_list)
) AS group_id
, cnt
, SUBSTR(date_list , s_pos , 10) AS date_col
FROM (SELECT DISTINCT
cnt
, LISTAGG( CHAR(date_col) )
WITHIN GROUP(ORDER BY date_col) AS date_list
FROM sample_data
GROUP BY
id , cnt
)
LEFT OUTER JOIN
(VALUES 1 , 11 , 21 , 31 , 41) AS p(s_pos)
ON s_pos < LENGTH(date_list)
ORDER BY
group_id
;
------------------------------------------------------------------------------

GROUP_ID CNT DATE_COL
-------- ------ ----------
1 3 2001-09-07
1 3 2002-10-25
1 3 2008-02-15
2 3 2001-09-07
2 3 2005-02-10
2 3 2008-02-15
3 3 2001-09-07
3 3 2005-05-09
3 3 2008-02-15
4 3 2002-10-25
4 3 2004-08-04
4 3 2008-02-15
5 3 2002-10-25
5 3 2005-07-22
5 3 2008-02-15
6 3 2008-02-15
6 3 2009-06-08
6 3 2010-06-15
7 3 2008-02-15
7 3 2010-04-08
7 3 2010-06-15

21 record(s) selected.

Reply With Quote
  #10  
Old   
Tonkuma
 
Posts: n/a

Default Re: Collapse date records into groups - 06-12-2011 , 09:20 PM



"Sample 1" can be improved by the following considerations.
1) Those are not necessary.
1-1) DISTINCT in outer most SELECT list.
1-2) Intermediate nested-table-expression.
1-3) CAST in inner most nested-table-expression.
2) LEFT OUTER JOIN can be replaced by INNER JOIN.

Sample 3:

SELECT DENSE_RANK() OVER(ORDER BY date_list) AS group_id
, cnt
, SUBSTR(date_list , s_pos , 10) AS date_col
FROM (SELECT DISTINCT
cnt
, LISTAGG( CHAR(date_col) )
WITHIN GROUP(ORDER BY date_col) AS date_list
FROM sample_data
GROUP BY
id , cnt
)
INNER JOIN
(VALUES 1 , 11 , 21 , 31 , 41) AS p(s_pos)
ON s_pos < LENGTH(date_list)
ORDER BY
group_id
;
------------------------------------------------------------------------------

GROUP_ID CNT DATE_COL
-------------------- ------ ----------
1 3 2001-09-07
1 3 2002-10-25
1 3 2008-02-15
2 3 2001-09-07
2 3 2005-02-10
2 3 2008-02-15
3 3 2001-09-07
3 3 2005-05-09
3 3 2008-02-15
4 3 2002-10-25
4 3 2004-08-04
4 3 2008-02-15
5 3 2002-10-25
5 3 2005-07-22
5 3 2008-02-15
6 3 2008-02-15
6 3 2009-06-08
6 3 2010-06-15
7 3 2008-02-15
7 3 2010-04-08
7 3 2010-06-15

21 record(s) selected.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.