dbTalk Databases Forums  

COUNT(...) OVER(...)

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss COUNT(...) OVER(...) in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Álvaro G. Vicario
 
Posts: n/a

Default COUNT(...) OVER(...) - 06-25-2009 , 06:06 AM






I'm trying to learn about analytic functions, more specifically the
OVER() clause applied to COUNT() and AVG() functions. I've read the SQL
Reference and I've found several examples in Google but I still can't
grasp the concept.

What follows is a simplified example based upon a real life scenario.
I'll probably fix it by adding as many subqueries as necessary so that's
not an issue. But I'd like to know whether it can be solved with
analytic functions and eventually understand how to use them.


Data looks like this:

SELECT
ro.route_id, ro.route_date,
di.invoice, di.customer_id,
cu.customer_type
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id

route_id route_date invoice customer_id customer_type
======== ========== ======= =========== =============
1 2008-12-31 A10 1000 Type A
1 2008-12-31 A11 1000 Type A

1 2008-12-31 A12 2000 Type B
1 2008-12-31 A13 2000 Type B

1 2008-12-31 A14 3000 Type B

2 2008-12-31 A15 1000 Type A
2 2008-12-31 A16 1000 Type A

2 2008-12-31 A17 3000 Type B

3 2009-01-01 A18 1000 Type A
....

I need to fetch several sums and averages from this data set. Most of it
is pretty easy:

SELECT
cu.customer_type,
COUNT(DISTINCT ro.route_id) AS nr_of_routes,
COUNT(DISTINCT ro.route_date) AS nr_of_days,
COUNT(di.invoice) AS nr_of_invoices,
COUNT(DISTINCT di.customer_id) AS nr_of_customers,
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id
GROUP BY cu.customer_type

customer_type nr_of_routes nr_of_days nr_of_invoices nr_of_customers
============= ============ ========== ============== ===============
Type A 3 2 5 1
Type B 2 1 4 2

I also need to calculate the number of deliveries. In the first table I
separated the different deliveries with blank lines; the figures should be:

customer_type nr_of_deliveries
============= ================
Type A 3
Type B 3

The database design lacks a "deliveries" table: it links routes with
invoices and I'm not allowed to change that.

I've determined that counting the number of different customers per
route is an acceptable approximation (a driver never goes twice to the
same address in the same route) and I can calculate the number of
deliveries for any given route using a simple GROUP BY ro.route_id but I
can't calculate it by other criteria (for instance, when the main query
needs to be grouped by customer_type).

I got the impression that I could do this:


SELECT
cu.customer_type,
COUNT(DISTINCT di.customer_id) OVER(PARTITION BY ro.route_id) AS
nr_of_deliveries,
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id
GROUP BY cu.customer_type


But I get an error unless I add ro.route_id to the main GROUP BY clause
which, of course, beats the purpose of the query.

What's wrong in my assumption?



P.S. For what it matters, I'm using Oracle Database 10g Enterprise
Edition Release 10.1.0.2.



--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

Reply With Quote
  #2  
Old   
Peter Nilsson
 
Posts: n/a

Default Re: COUNT(...) OVER(...) - 06-25-2009 , 06:01 PM






On Jun 25, 9:06 pm, "Álvaro G. Vicario"
<alvaro.NOSPAMTH... (AT) demogracia (DOT) com> wrote:
Quote:
I'm trying to learn about analytic functions, more specifically
the OVER() clause applied to COUNT() and AVG() functions. I've
read the SQL Reference and I've found several examples in Google
but I still can't grasp the concept.
Sounds like a separate issue.

Quote:
What follows is a simplified example based upon a real life
scenario. I'll probably fix it by adding as many subqueries as
necessary so that's not an issue. But I'd like to know whether
it can be solved with analytic functions and eventually
understand how to use them.

Data looks like this:

SELECT
ro.route_id, ro.route_date,
di.invoice, di.customer_id,
cu.customer_type
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id

route_id route_date invoice customer_id customer_type
======== ========== ======= =========== =============
1 2008-12-31 A10 1000 Type A
1 2008-12-31 A11 1000 Type A

1 2008-12-31 A12 2000 Type B
1 2008-12-31 A13 2000 Type B

1 2008-12-31 A14 3000 Type B

2 2008-12-31 A15 1000 Type A
2 2008-12-31 A16 1000 Type A

2 2008-12-31 A17 3000 Type B

3 2009-01-01 A18 1000 Type A
...

I need to fetch several sums and averages from this data set.
Most of it is pretty easy:

SELECT
cu.customer_type,
COUNT(DISTINCT ro.route_id) AS nr_of_routes,
COUNT(DISTINCT ro.route_date) AS nr_of_days,
COUNT(di.invoice) AS nr_of_invoices,
COUNT(DISTINCT di.customer_id) AS nr_of_customers,
You have a spurious comma.

Quote:
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id
GROUP BY cu.customer_type
snip
I also need to calculate the number of deliveries. In the
first table I separated the different deliveries with blank
lines; the figures should be:

customer_type nr_of_deliveries
============= ================
Type A 3
Type B 3

The database design lacks a "deliveries" table: it links
routes with invoices and I'm not allowed to change that.

I've determined that counting the number of different
customers per route is an acceptable approximation (a driver
never goes twice to the same address in the same route) and
I can calculate the number of deliveries for any given route
using a simple GROUP BY ro.route_id but I can't calculate it
by other criteria (for instance, when the main query needs
to be grouped by customer_type).

I got the impression that I could do this:

SELECT cu.customer_type,
COUNT(DISTINCT di.customer_id)
OVER(PARTITION BY ro.route_id)
AS nr_of_deliveries,
Again, you have a spare comma.

Quote:
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id
GROUP BY cu.customer_type

But I get an error unless I add ro.route_id to the main GROUP
BY clause which, of course, beats the purpose of the query.

What's wrong in my assumption?
You can't group over windowing functions any more than you
group over a group without nested queries.

You're really tring to do a count distinct count over
multiple(2) columns. The query to get your delivery results
is...

select cu.customer_type, count(*)
from (select distinct route_id, customer_id
from delivered_invoices) di
join customers cu on di.customer_id=cu.customer_id
group by cu.customer_type

You need to choose only 1 row per route_id, customer_id
pair. You can use a conditional sum to get your count...

select cu.customer_type,
count(distinct ro.route_id) as nr_of_routes,
count(distinct ro.route_date) as nr_of_days,
count(di.invoice) as nr_of_invoices,
count(distinct di.customer_id) as nr_of_customers,
sum(decode(di.rn, 1, 1, 0)) nr_of_deliveries
from route ro
join (select route_id, customer_id, invoice,
row_number() over
(partition by route_id, customer_id
order by null) rn
from delivered_invoices) di on ro.route_id=di.route_id
join customers cu on di.customer_id=cu.customer_id
group by cu.customer_type

You haven't shown any DLL. The above assumes all routes are
completed on the same day. [Which isn't true in all real life
scenarious.] If that's not the case, then you'll need to tweek
the query.

--
Peter

Reply With Quote
  #3  
Old   
Álvaro G. Vicario
 
Posts: n/a

Default Re: COUNT(...) OVER(...) - 06-26-2009 , 03:00 AM



Peter Nilsson escribió:
Quote:
On Jun 25, 9:06 pm, "Álvaro G. Vicario"
alvaro.NOSPAMTH... (AT) demogracia (DOT) com> wrote:
I'm trying to learn about analytic functions, more specifically
the OVER() clause applied to COUNT() and AVG() functions. I've
read the SQL Reference and I've found several examples in Google
but I still can't grasp the concept.

Sounds like a separate issue.

What follows is a simplified example based upon a real life
scenario. I'll probably fix it by adding as many subqueries as
necessary so that's not an issue. But I'd like to know whether
it can be solved with analytic functions and eventually
understand how to use them.

Data looks like this:

SELECT
ro.route_id, ro.route_date,
di.invoice, di.customer_id,
cu.customer_type
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id

route_id route_date invoice customer_id customer_type
======== ========== ======= =========== =============
1 2008-12-31 A10 1000 Type A
1 2008-12-31 A11 1000 Type A

1 2008-12-31 A12 2000 Type B
1 2008-12-31 A13 2000 Type B

1 2008-12-31 A14 3000 Type B

2 2008-12-31 A15 1000 Type A
2 2008-12-31 A16 1000 Type A

2 2008-12-31 A17 3000 Type B

3 2009-01-01 A18 1000 Type A
...

I need to fetch several sums and averages from this data set.
Most of it is pretty easy:

SELECT
cu.customer_type,
COUNT(DISTINCT ro.route_id) AS nr_of_routes,
COUNT(DISTINCT ro.route_date) AS nr_of_days,
COUNT(di.invoice) AS nr_of_invoices,
COUNT(DISTINCT di.customer_id) AS nr_of_customers,

You have a spurious comma.
It's a simplified example. It's not even actual code.



Quote:
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id
GROUP BY cu.customer_type
snip
I also need to calculate the number of deliveries. In the
first table I separated the different deliveries with blank
lines; the figures should be:

customer_type nr_of_deliveries
============= ================
Type A 3
Type B 3

The database design lacks a "deliveries" table: it links
routes with invoices and I'm not allowed to change that.

I've determined that counting the number of different
customers per route is an acceptable approximation (a driver
never goes twice to the same address in the same route) and
I can calculate the number of deliveries for any given route
using a simple GROUP BY ro.route_id but I can't calculate it
by other criteria (for instance, when the main query needs
to be grouped by customer_type).

I got the impression that I could do this:

SELECT cu.customer_type,
COUNT(DISTINCT di.customer_id)
OVER(PARTITION BY ro.route_id)
AS nr_of_deliveries,

Again, you have a spare comma.

FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
INNER JOIN customers cu ON di.customer_id=cu.customer_id
GROUP BY cu.customer_type

But I get an error unless I add ro.route_id to the main GROUP
BY clause which, of course, beats the purpose of the query.

What's wrong in my assumption?

You can't group over windowing functions any more than you
group over a group without nested queries.
Do you mean that it's basically an alternative syntax for GROUP BY?


Quote:
You're really tring to do a count distinct count over
multiple(2) columns. The query to get your delivery results
is...

select cu.customer_type, count(*)
from (select distinct route_id, customer_id
from delivered_invoices) di
join customers cu on di.customer_id=cu.customer_id
group by cu.customer_type

You need to choose only 1 row per route_id, customer_id
pair. You can use a conditional sum to get your count...

select cu.customer_type,
count(distinct ro.route_id) as nr_of_routes,
count(distinct ro.route_date) as nr_of_days,
count(di.invoice) as nr_of_invoices,
count(distinct di.customer_id) as nr_of_customers,
sum(decode(di.rn, 1, 1, 0)) nr_of_deliveries
from route ro
join (select route_id, customer_id, invoice,
row_number() over
(partition by route_id, customer_id
order by null) rn
from delivered_invoices) di on ro.route_id=di.route_id
join customers cu on di.customer_id=cu.customer_id
group by cu.customer_type
Actually, I a did a regular "SELECT COUNT() ... GROUP BY" subquery. But,
as I said, I'm not trying to fix a problem but, rather than that, trying
to understand how COUNT() and OVER() work.


Quote:
You haven't shown any DLL. The above assumes all routes are
completed on the same day. [Which isn't true in all real life
scenarious.] If that's not the case, then you'll need to tweek
the query.
It can be done by adding route_date to the SELECT and GROUP BY clauses
of the subquery and the ON condition of the join.




--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

Reply With Quote
  #4  
Old   
Robert Klemme
 
Posts: n/a

Default Re: COUNT(...) OVER(...) - 06-26-2009 , 12:59 PM



On 25.06.2009 13:06, Álvaro G. Vicario wrote:
Quote:
I'm trying to learn about analytic functions, more specifically the
OVER() clause applied to COUNT() and AVG() functions. I've read the SQL
Reference and I've found several examples in Google but I still can't
grasp the concept.
Oracle has pretty good documentation online:

http://download-west.oracle.com/docs...3/analysis.htm

http://www.oracle.com/technology/pro...ticsql_twp.pdf

HTH

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.