![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| ||||
| ||||
|
|
I'm trying to learn about analytic functions, more specifically the OVER() clause applied to COUNT() and AVG() functions. I've read the SQL Reference and I've found several examples in Google but I still can't grasp the concept. |
|
What follows is a simplified example based upon a real life scenario. I'll probably fix it by adding as many subqueries as necessary so that's not an issue. But I'd like to know whether it can be solved with analytic functions and eventually understand how to use them. Data looks like this: SELECT ro.route_id, ro.route_date, di.invoice, di.customer_id, cu.customer_type FROM route ro INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id route_id route_date invoice customer_id customer_type ======== ========== ======= =========== ============= 1 2008-12-31 A10 1000 Type A 1 2008-12-31 A11 1000 Type A 1 2008-12-31 A12 2000 Type B 1 2008-12-31 A13 2000 Type B 1 2008-12-31 A14 3000 Type B 2 2008-12-31 A15 1000 Type A 2 2008-12-31 A16 1000 Type A 2 2008-12-31 A17 3000 Type B 3 2009-01-01 A18 1000 Type A ... I need to fetch several sums and averages from this data set. Most of it is pretty easy: SELECT cu.customer_type, COUNT(DISTINCT ro.route_id) AS nr_of_routes, COUNT(DISTINCT ro.route_date) AS nr_of_days, COUNT(di.invoice) AS nr_of_invoices, COUNT(DISTINCT di.customer_id) AS nr_of_customers, |
|
FROM route ro INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id GROUP BY cu.customer_type snip I also need to calculate the number of deliveries. In the first table I separated the different deliveries with blank lines; the figures should be: customer_type nr_of_deliveries ============= ================ Type A 3 Type B 3 The database design lacks a "deliveries" table: it links routes with invoices and I'm not allowed to change that. I've determined that counting the number of different customers per route is an acceptable approximation (a driver never goes twice to the same address in the same route) and I can calculate the number of deliveries for any given route using a simple GROUP BY ro.route_id but I can't calculate it by other criteria (for instance, when the main query needs to be grouped by customer_type). I got the impression that I could do this: SELECT cu.customer_type, COUNT(DISTINCT di.customer_id) OVER(PARTITION BY ro.route_id) AS nr_of_deliveries, |
|
FROM route ro INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id GROUP BY cu.customer_type But I get an error unless I add ro.route_id to the main GROUP BY clause which, of course, beats the purpose of the query. What's wrong in my assumption? |
#3
| ||||
| ||||
|
|
On Jun 25, 9:06 pm, "Álvaro G. Vicario" alvaro.NOSPAMTH... (AT) demogracia (DOT) com> wrote: I'm trying to learn about analytic functions, more specifically the OVER() clause applied to COUNT() and AVG() functions. I've read the SQL Reference and I've found several examples in Google but I still can't grasp the concept. Sounds like a separate issue. What follows is a simplified example based upon a real life scenario. I'll probably fix it by adding as many subqueries as necessary so that's not an issue. But I'd like to know whether it can be solved with analytic functions and eventually understand how to use them. Data looks like this: SELECT ro.route_id, ro.route_date, di.invoice, di.customer_id, cu.customer_type FROM route ro INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id route_id route_date invoice customer_id customer_type ======== ========== ======= =========== ============= 1 2008-12-31 A10 1000 Type A 1 2008-12-31 A11 1000 Type A 1 2008-12-31 A12 2000 Type B 1 2008-12-31 A13 2000 Type B 1 2008-12-31 A14 3000 Type B 2 2008-12-31 A15 1000 Type A 2 2008-12-31 A16 1000 Type A 2 2008-12-31 A17 3000 Type B 3 2009-01-01 A18 1000 Type A ... I need to fetch several sums and averages from this data set. Most of it is pretty easy: SELECT cu.customer_type, COUNT(DISTINCT ro.route_id) AS nr_of_routes, COUNT(DISTINCT ro.route_date) AS nr_of_days, COUNT(di.invoice) AS nr_of_invoices, COUNT(DISTINCT di.customer_id) AS nr_of_customers, You have a spurious comma. |
|
FROM route ro INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id GROUP BY cu.customer_type snip I also need to calculate the number of deliveries. In the first table I separated the different deliveries with blank lines; the figures should be: customer_type nr_of_deliveries ============= ================ Type A 3 Type B 3 The database design lacks a "deliveries" table: it links routes with invoices and I'm not allowed to change that. I've determined that counting the number of different customers per route is an acceptable approximation (a driver never goes twice to the same address in the same route) and I can calculate the number of deliveries for any given route using a simple GROUP BY ro.route_id but I can't calculate it by other criteria (for instance, when the main query needs to be grouped by customer_type). I got the impression that I could do this: SELECT cu.customer_type, COUNT(DISTINCT di.customer_id) OVER(PARTITION BY ro.route_id) AS nr_of_deliveries, Again, you have a spare comma. FROM route ro INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id GROUP BY cu.customer_type But I get an error unless I add ro.route_id to the main GROUP BY clause which, of course, beats the purpose of the query. What's wrong in my assumption? You can't group over windowing functions any more than you group over a group without nested queries. |
|
You're really tring to do a count distinct count over multiple(2) columns. The query to get your delivery results is... select cu.customer_type, count(*) from (select distinct route_id, customer_id from delivered_invoices) di join customers cu on di.customer_id=cu.customer_id group by cu.customer_type You need to choose only 1 row per route_id, customer_id pair. You can use a conditional sum to get your count... select cu.customer_type, count(distinct ro.route_id) as nr_of_routes, count(distinct ro.route_date) as nr_of_days, count(di.invoice) as nr_of_invoices, count(distinct di.customer_id) as nr_of_customers, sum(decode(di.rn, 1, 1, 0)) nr_of_deliveries from route ro join (select route_id, customer_id, invoice, row_number() over (partition by route_id, customer_id order by null) rn from delivered_invoices) di on ro.route_id=di.route_id join customers cu on di.customer_id=cu.customer_id group by cu.customer_type |
|
You haven't shown any DLL. The above assumes all routes are completed on the same day. [Which isn't true in all real life scenarious.] If that's not the case, then you'll need to tweek the query. |
#4
| |||
| |||
|
|
I'm trying to learn about analytic functions, more specifically the OVER() clause applied to COUNT() and AVG() functions. I've read the SQL Reference and I've found several examples in Google but I still can't grasp the concept. |
![]() |
| Thread Tools | |
| Display Modes | |
| |