dbTalk Databases Forums  

COUNT distinct and partitions

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss COUNT distinct and partitions in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michel Esber
 
Posts: n/a

Default COUNT distinct and partitions - 04-14-2011 , 10:12 AM






Hello,

DB2 V9.5 FP7,

db2 "select B.NAME, COUNT(distinct A.ID), COUNT(A.ID) from (values (1),
(2),(3),(4)) as A (ID), (values (1,'A'),(1,'A'),(2,'B'),(3,'A')) as B
(ID, NAME) where A.ID = B.ID group by B.NAME"

NAME 2 3
---- ----------- -----------
A 2 3
B 1 1

I am trying to achieve the same using partitions:

select B.NAME, COUNT (distinct A.ID) over (partition by B.NAME),
COUNT(A.ID) over (partition by B.NAME) from (values (1),(2),(3),(4))
as A (ID), (values (1,'A'),(1,'A'),(2,'B'),(3,'A')) as B (ID, NAME)
where A.ID = B.ID

SQL0441N Invalid use of keyword DISTINCT or ALL with function
"COUNT".
SQLSTATE=42601


I am unable to use distinct with partitions. Is this feature not
supported ? Any available workarounds?

Thanks,
-M

Reply With Quote
  #2  
Old   
Tonkuma
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-14-2011 , 11:36 AM






Here is an example of workarounds.

Note:
I like to use capital letters for keywords and small letters(or mixed)
for others.
And I think that it is usual habit in SQL.

------------------------------ Commands Entered
------------------------------
WITH
a(id) AS (
VALUES (1), (2), (3), (4)
)
, b(id, name) AS (
VALUES (1, 'A'), (1, 'A'), (2, 'B'), (3, 'A')
)
SELECT b.name
, MAX( DENSE_RANK()
OVER(PARTITION BY b.name
ORDER BY a.id)
)
OVER(PARTITION BY b.name) AS count_distinct_a_id
, COUNT(a.id)
OVER(PARTITION BY b.name) AS count_a_id
, a.id AS a_id
FROM a
, b
WHERE a.id = b.id
ORDER BY
b.name
;
------------------------------------------------------------------------------

NAME COUNT_DISTINCT_A_ID COUNT_A_ID A_ID
---- -------------------- ----------- -----------
A 2 3 1
A 2 3 1
A 2 3 3
B 1 1 2

4 record(s) selected.

Reply With Quote
  #3  
Old   
Michel Esber
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-15-2011 , 11:00 AM



Tonkuma, interesting solution.

Is there a quick way to skip null values from DENSE_RANK, without
having to do a query on top of the results ?

Thanks, -M

Reply With Quote
  #4  
Old   
Tonkuma
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-15-2011 , 01:07 PM



On Apr 16, 1:00*am, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Tonkuma, interesting solution.

Is there a quick way to skip null values from DENSE_RANK, without
having to do a query on top of the results ?

Thanks, -M
I couldn't understand the case of need to consider null value for
DENSE_RANK in your query.
If some values of a.id or b.id were null, these rows would be excluded
from joined result.
If some values of b.name were null, b.name is not a ranking
column(it's a partitioning column).

As a general(apart from your query), if you want to use DENSE_RANK for
a workaround of COUNT(DISTINCT ...) OVER(...),
you can use
ORDER BY COALESCE( <id> , MIN(<id>) OVER(PARTITION BY name) )
instead of
ORDER BY <id>
in DENSE_RANK.

Here is an example:
------------------------------ Commands Entered
------------------------------
WITH
b(id, name) AS (
VALUES (1, 'A'), (1, 'A'), (2, 'B'), (CAST(NULL AS INT), 'A')
)
SELECT name
, MAX( DENSE_RANK()
OVER(PARTITION BY name
ORDER BY COALESCE( id , MIN(id) OVER(PARTITION
BY name) )
)
)
OVER(PARTITION BY name) AS count_distinct_id
, COUNT(id)
OVER(PARTITION BY name) AS count_id
, id
FROM b
ORDER BY
name
;
------------------------------------------------------------------------------

NAME COUNT_DISTINCT_ID COUNT_ID ID
---- -------------------- ----------- -----------
A 1 2 1
A 1 2 1
A 1 2 -
B 1 1 2

4 record(s) selected.

Reply With Quote
  #5  
Old   
Tonkuma
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-15-2011 , 01:16 PM



I want to add COUNT(*) in my sample.

------------------------------ Commands Entered
------------------------------
WITH
b(id, name) AS (
VALUES (1, 'A'), (1, 'A'), (2, 'B'), (CAST(NULL AS INT), 'A')
)
SELECT name
, MAX( DENSE_RANK()
OVER(PARTITION BY name
ORDER BY COALESCE( id , MIN(id) OVER(PARTITION BY
name) )
)
)
OVER(PARTITION BY name) AS count_distinct_id
, COUNT(id)
OVER(PARTITION BY name) AS count_id
, COUNT(*)
OVER(PARTITION BY name) AS count_rows
, id
FROM b
ORDER BY
name
;
------------------------------------------------------------------------------

NAME COUNT_DISTINCT_ID COUNT_ID COUNT_ROWS ID
---- -------------------- ----------- ----------- -----------
A 1 2 3 1
A 1 2 3 1
A 1 2 3 -
B 1 1 1 2

4 record(s) selected.

Reply With Quote
  #6  
Old   
Tonkuma
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-15-2011 , 09:05 PM



If all values of the count column in a partition were null,
previous olap expression would return 1.

I found another expression which worked even if all values of the
count column in a partition were null,

------------------------------ Commands Entered
------------------------------
WITH
b(id, name) AS (
VALUES (CAST(NULL AS INT), 'A'), (CAST(NULL AS INT), 'A')
, (2, 'B'), (CAST(NULL AS INT), 'A'), (CAST(NULL AS INT), 'B')
, (5, 'C'), (CAST(NULL AS INT), 'C')
, (5, 'C'), (CAST(NULL AS INT), 'C'), (7, 'C')
)
SELECT name
, COUNT( CASE ROW_NUMBER()
OVER(PARTITION BY name , id)
WHEN 1 THEN id
END
)
OVER(PARTITION BY name) AS count_distinct_id_2
, COUNT(id)
OVER(PARTITION BY name) AS count_id
, COUNT(*)
OVER(PARTITION BY name) AS count_rows
, id
FROM b
ORDER BY
name
;
------------------------------------------------------------------------------

NAME COUNT_DISTINCT_ID_2 COUNT_ID COUNT_ROWS ID
---- ------------------- ----------- ----------- -----------
A 0 0 3 -
A 0 0 3 -
A 0 0 3 -
B 1 1 2 2
B 1 1 2 -
C 2 3 5 5
C 2 3 5 5
C 2 3 5 7
C 2 3 5 -
C 2 3 5 -

10 record(s) selected.

Reply With Quote
  #7  
Old   
Michel Esber
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-18-2011 , 01:33 PM



On 15 abr, 23:05, Tonkuma <tonk... (AT) fiberbit (DOT) net> wrote:
Quote:
If all values of the count column in a partition were null,
previous olap expression would return 1.

I found another expression which worked even if all values of the
count column in a partition were null,

------------------------------ Commands Entered
------------------------------
WITH
*b(id, name) AS (
VALUES (CAST(NULL AS INT), 'A'), (CAST(NULL AS INT), 'A')
, (2, 'B'), (CAST(NULL AS INT), 'A'), (CAST(NULL AS INT), 'B')
, (5, 'C'), (CAST(NULL AS INT), 'C')
, (5, 'C'), (CAST(NULL AS INT), 'C'), (7, 'C')
)
SELECT name
* * *, COUNT( CASE ROW_NUMBER()
* * * * * * * * * * *OVER(PARTITION BY name , id)
* * * * * * * WHEN 1 THEN id
* * * * * * * END
* * * * * * )
* * * * *OVER(PARTITION BY name) AS count_distinct_id_2
* * *, COUNT(id)
* * * * *OVER(PARTITION BY name) AS count_id
* * *, COUNT(*)
* * * * *OVER(PARTITION BY name) AS count_rows
* * *, id
*FROM *b
*ORDER BY
* * * *name
;
------------------------------------------------------------------------------

NAME COUNT_DISTINCT_ID_2 COUNT_ID * *COUNT_ROWS *ID
---- ------------------- ----------- ----------- -----------
A * * * * * * * * * * *0 * * * * * 0 * * * * * 3 * * * * * -
A * * * * * * * * * * *0 * * * * * 0 * * * * * 3 * * * * * -
A * * * * * * * * * * *0 * * * * * 0 * * * * * 3 * * * * * -
B * * * * * * * * * * *1 * * * * * 1 * * * * * 2 * * * * * 2
B * * * * * * * * * * *1 * * * * * 1 * * * * * 2 * * * * * -
C * * * * * * * * * * *2 * * * * * 3 * * * * * 5 * * * * * 5
C * * * * * * * * * * *2 * * * * * 3 * * * * * 5 * * * * * 5
C * * * * * * * * * * *2 * * * * * 3 * * * * * 5 * * * * * 7
C * * * * * * * * * * *2 * * * * * 3 * * * * * 5 * * * * * -
C * * * * * * * * * * *2 * * * * * 3 * * * * * 5 * * * * * -

* 10 record(s) selected.
Great solutions, Tonkuma. Thanks a lot for all your help.

Reply With Quote
  #8  
Old   
Michel Esber
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-18-2011 , 02:22 PM



Tonkuma, I think I found a little problem with your solution. Look:


with T (ID, VALUE) as (values (1,'abc'),(1,'xpto'),(1,cast (null as
varchar(3))), (2,CAST (null as varchar(3))))

select ID, VALUE, MAX( DENSE_RANK() OVER( PARTITION BY ID order by
COALESCE( VALUE, MIN(VALUE) OVER(PARTITION BY ID) ) ) ) OVER
(partition By ID) from T
;

ID 1 has 2 not null values and one null.
ID 2 has only one null value.

This query returns:

ID VALUE 3
----------- ----- --------------------
1 abc 2
1 - 2
1 xpto 2
2 - 1


Which is correct for ID 1 (as it has only two not null values), but is
incorrect for ID 2 ....

What am I missing?

Thanks again.

-M

Reply With Quote
  #9  
Old   
Tonkuma
 
Posts: n/a

Default Re: COUNT distinct and partitions - 04-18-2011 , 04:50 PM



I already wrote the issue for DENSE_RANK, like...
Quote:
If all values of the count column in a partition were null,
previous olap expression would return 1.

I found another expression which worked even if all values of the
count column in a partition were null,
So, please use "COUNT( CASE ROW_NUMBER() ... END ) OVER(...)",
like ...

------------------------------ Commands Entered
------------------------------
with T (ID, VALUE) as (
values
(1,'abc'), (1,'xpto')
, (1,cast (null as varchar(3)))
, (2,CAST (null as varchar(3)))
)
SELECT id
, value
, COUNT( CASE ROW_NUMBER()
OVER(PARTITION BY id , value)
WHEN 1 THEN value
END
)
OVER(PARTITION BY id) AS count_distinct
FROM t
;
------------------------------------------------------------------------------

ID VALUE COUNT_DISTINCT
----------- ----- --------------
1 abc 2
1 xpto 2
1 - 2
2 - 0

4 record(s) selected.


Note:
You wanted to give a solution which was equivalent to
COUNT(DISTINCT id) OVER(PARTITION BY name)
in your original post.

But, your previous example is corresponds to
COUNT(DISTINCT value) OVER(PARTITION BY id)

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.