dbTalk Databases Forums  

SUM of distinct

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss SUM of distinct in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michel Esber
 
Posts: n/a

Default SUM of distinct - 11-23-2010 , 01:26 PM






Db2 LUW v9.5 fp5,

drop table DISK;

create table DISK (IS_EXTERNAL CHAR, DISK_NUMBER SMALLINT, VOLUME_ID
SMALLINT, DISK_SIZE BIGINT, VOLUME_SIZE BIGINT);

insert into DISK values
('N',0,1,73368436736, 31455539200),
('N',0,4,73368436736, 41912889344),
('Y',1,3,703300894720, 703294308352),
('Y',2,2,1073741824, 1069253632 )
;

One Disk may be internal or external. Each disk has a DISK_NUMBER and
one or more VOLUME_IDs (C:\, D:\, for example)
( ** Yes I know this is not normalized, this is part of a bigger
query ... )

I need to group by IS_EXTERNAL and achieve:

(1) Count the number of distinct Internal and External Disks.
(2) SUM the total VOLUME_SIZE of Internal and External Disks.
(3) SUM the total DISK_SIZE of *** DISTINCT *** Internal and External
Disks.

Items (1) and (2) are easy. Query:

select
IS_EXTERNAL,
COUNT(distinct CHAR(DISK_NUMBER)) as COUNT,
SUM(VOLUME_SIZE) as VOLUME_TOTAL
from A
group by IS_EXTERNAL


But item (3) is tricky. Disk 0 in the above scenario has two volumes
(and, of course, two rows).
How do I sum (DISK_SIZE) for each distinct DISK_NUMBER ?

Any ideas?

Reply With Quote
  #2  
Old   
Luuk
 
Posts: n/a

Default Re: SUM of distinct - 11-23-2010 , 01:35 PM






On 23-11-10 20:26, Michel Esber wrote:
Quote:
Db2 LUW v9.5 fp5,

drop table DISK;

create table DISK (IS_EXTERNAL CHAR, DISK_NUMBER SMALLINT, VOLUME_ID
SMALLINT, DISK_SIZE BIGINT, VOLUME_SIZE BIGINT);

insert into DISK values
('N',0,1,73368436736, 31455539200),
('N',0,4,73368436736, 41912889344),
('Y',1,3,703300894720, 703294308352),
('Y',2,2,1073741824, 1069253632 )
;

One Disk may be internal or external. Each disk has a DISK_NUMBER and
one or more VOLUME_IDs (C:\, D:\, for example)
( ** Yes I know this is not normalized, this is part of a bigger
query ... )

I need to group by IS_EXTERNAL and achieve:

(1) Count the number of distinct Internal and External Disks.
(2) SUM the total VOLUME_SIZE of Internal and External Disks.
(3) SUM the total DISK_SIZE of *** DISTINCT *** Internal and External
Disks.

Items (1) and (2) are easy. Query:

select
IS_EXTERNAL,
COUNT(distinct CHAR(DISK_NUMBER)) as COUNT,
SUM(VOLUME_SIZE) as VOLUME_TOTAL
from A
group by IS_EXTERNAL


But item (3) is tricky. Disk 0 in the above scenario has two volumes
(and, of course, two rows).
How do I sum (DISK_SIZE) for each distinct DISK_NUMBER ?

Any ideas?
also group on 'VOLUME_ID' ?

--
Luuk

Reply With Quote
  #3  
Old   
Michel Esber
 
Posts: n/a

Default Re: SUM of distinct - 11-23-2010 , 01:35 PM



Expected output is:

'N' = 73368436736 (there is only disk_number 0, and its size is
73368436736),
'Y' = 704374636544 (sum of DISK_SIZE for disks 1 and 2)

Reply With Quote
  #4  
Old   
Michel Esber
 
Posts: n/a

Default Re: SUM of distinct - 11-23-2010 , 01:55 PM



Quote:
Any ideas?

also group on 'VOLUME_ID' ?

Hi Luuk, thanks for the fast response. Grouping by VOLUME_ID will not
help me.

Here is my current query:

select IS_EXTERNAL, COUNT(distinct CHAR(DISK_NUMBER)) as COUNT,
SUM(DISK_SIZE) as DISK_TOTAL, SUM(VOLUME_SIZE) as VOLUME_TOTAL from
A group by IS_EXTERNAL

And the result is:

IS_EXTERNAL COUNT DISK_TOTAL VOLUME_TOTAL
----------- ----------- -------------------- --------------------
N 1 146736873472 73368428544
Y 2 704374636544 704363561984


This is obviously incorrect, since there is only one internal disk,
and DISK_TOTAL of 73368436736 for that disk.

The expected result is:

IS_EXTERNAL COUNT DISK_TOTAL VOLUME_TOTAL
----------- ----------- -------------------- --------------------
N 1 73368436736 73368428544
Y 2 704374636544 704363561984

Any ideas ?

Thanks

Reply With Quote
  #5  
Old   
Ian
 
Posts: n/a

Default Re: SUM of distinct - 11-23-2010 , 03:11 PM



On Nov 23, 12:26*pm, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Db2 LUW v9.5 fp5,

drop table DISK;

create table DISK (IS_EXTERNAL CHAR, DISK_NUMBER SMALLINT, VOLUME_ID
SMALLINT, DISK_SIZE BIGINT, VOLUME_SIZE BIGINT);

insert into DISK values
('N',0,1,73368436736, 31455539200),
('N',0,4,73368436736, 41912889344),
('Y',1,3,703300894720, 703294308352),
('Y',2,2,1073741824, 1069253632 )
;

One Disk may be internal or external. Each disk has a DISK_NUMBER and
one or more VOLUME_IDs (C:\, D:\, for example)
( ** Yes I know this is not normalized, this is part of a bigger
query ... )

I need to group by IS_EXTERNAL and achieve:

(1) Count the number of distinct Internal and External Disks.
(2) SUM the total VOLUME_SIZE of Internal and External Disks.
(3) SUM the total DISK_SIZE of **** DISTINCT *** Internal and External
Disks.

Items (1) and (2) are easy. Query:

select
*IS_EXTERNAL,
*COUNT(distinct CHAR(DISK_NUMBER)) as COUNT,
*SUM(VOLUME_SIZE) as VOLUME_TOTAL
from A
group by IS_EXTERNAL

But item (3) is tricky. Disk 0 in the above scenario has two volumes
(and, of course, two rows).
How do I sum (DISK_SIZE) for each distinct DISK_NUMBER ?

Any ideas?
This is really a function of a bad design. It's possible to do it,
but not in a simple query:


Option 1:

with disks as (
select distinct is_external, disk_number, disk_size
from disk
),
volumes as (
select is_external, disk_number, sum(volume_size) as
tot_vol_size
from disk
group by is_external, disk_number
)
select
d.is_external
,count(d.disk_number) as count
,sum(d.disk_size) disk_total
,sum(v.tot_vol_size) as volume_total
from
disks d
inner join volumes v
on (d.is_external = v.is_external
and d.disk_number = v.disk_number)
group by
d.is_external;

IS_EXTERNAL COUNT DISK_TOTAL VOLUME_TOTAL
----------- ----------- -------------------- --------------------
N 1 73368436736 73368428544
Y 2 704374636544 704363561984



The common table expressions above are basically fixing the data model
by creating 2 related tables for disks and volumes and then joining
the results. For a small data set this works OK, but if you have a
huge data set performance may start to suffer.


Option 2:

with
disks as (
select is_external, disk_number,
rownumber() over (partition by disk_number) as dvnum,
disk_size, volume_size
from disk
)
select
is_external
,count(distinct disk_number) as count
,sum(case when dvnum = 1 then disk_size else 0 end) as disk_total
,sum(volume_size) as volume_total
from
disks
group by
is_external;

IS_EXTERNAL COUNT DISK_TOTAL VOLUME_TOTAL
----------- ----------- -------------------- --------------------
N 1 73368436736 73368428544
Y 2 704374636544 704363561984

2 record(s) selected.


This works by basically only summing up the disk_total for the first
row for a given disk number. This option may perform better for
larger volumes of data.



Good luck.

Reply With Quote
  #6  
Old   
Michel Esber
 
Posts: n/a

Default Re: SUM of distinct - 11-23-2010 , 07:22 PM



Hi Ian,

Both queries work. Thanks for the suggestions.

Quote:
This is really a function of a bad design. *It's possible to do it,
but not in a simple query.
Here is a normalized and simplifed design:

drop table DISK;

create table DISK (IS_EXTERNAL CHAR, DISK_NUMBER SMALLINT, DISK_SIZE
BIGINT );

insert into DISK values
('N',0,73368436736),
('Y',1,703300894720),
('Y',2,1073741824 )
;


drop table VOLUME;

create table VOLUME (VOLUME_ID SMALLINT, DISK_NUMBER SMALLINT,
VOLUME_SIZE BIGINT);

insert into VOLUME values
(1,0, 31455539200),
(4,0, 41912889344),
(3,1, 703294308352),
(2,2, 1069253632 )
;


select IS_EXTERNAL, COUNT(distinct D.DISK_NUMBER), SUM(V.VOLUME_SIZE)
as VOLUME_TOTAL
from DISK D left outer join VOLUME V on (D.DISK_NUMBER =
V.DISK_NUMBER)
group by IS_EXTERNAL
;

Even with a normalized and better design, I still see no other way to
achieve the result without subqueries similar to the one you posted.
Any light ?

Thanks again,

Reply With Quote
  #7  
Old   
Michel Esber
 
Posts: n/a

Default Re: SUM of distinct - 11-23-2010 , 07:27 PM



Quote:
Even with a normalized and better design, I still see no other way to
achieve the result without subqueries similar to the one you posted.
I know I can summarize both tables separately and then join the
results.
Is there are better approach?

Thanks.

Reply With Quote
  #8  
Old   
Ian
 
Posts: n/a

Default Re: SUM of distinct - 11-24-2010 , 08:36 AM



On Nov 23, 6:22*pm, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Here is a normalized and simplifed design:

drop table DISK;

create table DISK (IS_EXTERNAL CHAR, DISK_NUMBER SMALLINT, DISK_SIZE
BIGINT );

insert into DISK values
('N',0,73368436736),
('Y',1,703300894720),
('Y',2,1073741824 )
;

drop table VOLUME;

create table VOLUME (VOLUME_ID SMALLINT, DISK_NUMBER SMALLINT,
VOLUME_SIZE BIGINT);

insert into VOLUME values
(1,0, 31455539200),
(4,0, 41912889344),
(3,1, 703294308352),
(2,2, 1069253632 )
;

select IS_EXTERNAL, COUNT(distinct D.DISK_NUMBER), SUM(V.VOLUME_SIZE)
as VOLUME_TOTAL
from DISK D left outer join VOLUME V on (D.DISK_NUMBER =
V.DISK_NUMBER)
group by IS_EXTERNAL
;

Even with a normalized and better design, I still see no other way to
achieve the result without subqueries similar to the one you posted.
Any light ?
You don't agree that this rewritten query is far simpler than the ones
I suggested?

The advantage with this design is that the join can *probably* be more
efficient through indexes.

I could be wrong, and again it all depends on how much data you've
actually got.



Good luck!


Ian

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.