dbTalk Databases Forums  

A CTE Question

comp.databases.oracle.server comp.databases.oracle.server


Discuss A CTE Question in the comp.databases.oracle.server forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michel Esber
 
Posts: n/a

Default A CTE Question - 08-26-2010 , 11:09 AM






Hi Guys,

Can anyone shred some light here?

Sample DDL Code:

create table DEPARTMENT
(
DEP_ID integer not null primary key,
NAME VARCHAR(32) not null,
PARENT integer not null
) ;


insert into DEPARTMENT VALUES (1,'A',-2),(2,'B',1),(3,'C',2),
(4,'D',-2),(5,'E',4),(6,'F',5),(7,'G',6);


create table PERSON
(
PERSON_ID integer not null primary key,
NAME VARCHAR(128) not null,
DEP_ID integer,
constraint FK_01 foreign key (DEP_ID) references DEPARTMENT(DEP_ID)
);


insert into PERSON values (1,'John',2),(2,'Mary',2),(3,'Suzan',3),
(4,'Joe',3),(5,'Lewis',3),(6,'Elvis',4),(7,'Presle y',5),(8,'Preston',
5),(9,'Pee',7),(10,'Wee',7),(11,'Steve-o',7),(12,'Demi Moore',7);


with t_recurs (dep_id, parent, iter) as


(
select dep_id, parent, 0 from DEPARTMENT


union all


select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter < 1000
)


select R.DEP_ID, COUNT(P.PERSON_ID)
from T_RECURS R inner join DEPARTMENT D on (R.DEP_ID, R.PARENT) =
(D.DEP_ID, D.PARENT)
left outer join PERSON P on P.DEP_ID = D.DEP_ID
group by R.DEP_ID
;


The output looks good:


DEP_ID DEP_COUNT
----------- -----------
1 0
2 2
3 3
4 1
5 2
6 0
7 4


What am I actually looking for is also the count of Persons that are
in all sub-departments in that hierarchy, including the departement a
person is currently in. Somehow I can't get this to work.


Expected output:

DEP_ID DEP_COUNT HIERARCHY_COUNT
----------- ----------- ---------------
1 0 5
2 2 5
3 3 3
4 1 7
5 2 6
6 0 6
7 4 4

Any ideas?

Thanks

Reply With Quote
  #2  
Old   
Maxim Demenko
 
Posts: n/a

Default Re: A CTE Question - 08-26-2010 , 02:09 PM






On 26.08.2010 18:09, Michel Esber wrote:
Quote:
Hi Guys,

Can anyone shred some light here?

Sample DDL Code:

create table DEPARTMENT
(
DEP_ID integer not null primary key,
NAME VARCHAR(32) not null,
PARENT integer not null
) ;


insert into DEPARTMENT VALUES (1,'A',-2),(2,'B',1),(3,'C',2),
(4,'D',-2),(5,'E',4),(6,'F',5),(7,'G',6);


create table PERSON
(
PERSON_ID integer not null primary key,
NAME VARCHAR(128) not null,
DEP_ID integer,
constraint FK_01 foreign key (DEP_ID) references DEPARTMENT(DEP_ID)
);


insert into PERSON values (1,'John',2),(2,'Mary',2),(3,'Suzan',3),
(4,'Joe',3),(5,'Lewis',3),(6,'Elvis',4),(7,'Presle y',5),(8,'Preston',
5),(9,'Pee',7),(10,'Wee',7),(11,'Steve-o',7),(12,'Demi Moore',7);


with t_recurs (dep_id, parent, iter) as


(
select dep_id, parent, 0 from DEPARTMENT


union all


select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter< 1000
)


select R.DEP_ID, COUNT(P.PERSON_ID)
from T_RECURS R inner join DEPARTMENT D on (R.DEP_ID, R.PARENT) =
(D.DEP_ID, D.PARENT)
left outer join PERSON P on P.DEP_ID = D.DEP_ID
group by R.DEP_ID
;


The output looks good:


DEP_ID DEP_COUNT
----------- -----------
1 0
2 2
3 3
4 1
5 2
6 0
7 4


What am I actually looking for is also the count of Persons that are
in all sub-departments in that hierarchy, including the departement a
person is currently in. Somehow I can't get this to work.


Expected output:

DEP_ID DEP_COUNT HIERARCHY_COUNT
----------- ----------- ---------------
1 0 5
2 2 5
3 3 3
4 1 7
5 2 6
6 0 6
7 4 4

Any ideas?

Thanks
You ask in the oracle group, your syntax however looks like you are
working with mysql (i conclude it rather from inserts and ddl - not
sure, whether mysql supports CTE) - the problem is, the query you are
asking for will be probably differently implemented in both dialects, so
, could you maybe clarify, do you need it in oracle or in some other
dialect ?

Best regards

Maxim

Reply With Quote
  #3  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: A CTE Question - 08-26-2010 , 03:49 PM



Hi,

I also would like to know if you are really using Oracle.

As I don't have Oracle 11gR2 available (which is the first Oracle version to support recursive CTEs) I tested this using PostgreSQL.

I expect this to run on Oracle 11gR2 as well.

Quote:
with t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter< 1000
)
That query is wrong, and that's why you need this "where iter < 1000" workaround.

The basic recursion should be like this:

with recursive t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 as iter
from department
where parent = -2

union all

select d.dep_id, d.parent, iter + 1
from department d
join t_recurs r on d.parent = r.dep_id
)
select *
from t_recurs;

I changed two things:
- the where condition for the first part of the union
- selecting the d.dep_id instead of r.dep_id in the recursion part which makes the check for an infinite loop obsolete

Now to your real question:

Quote:
What am I actually looking for is also the count of Persons that are
in all sub-departments in that hierarchy, including the departement a
person is currently in.
this should do it as far as I can tell.

with recursive t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 as iter
from department
where parent is null

union all

select d.dep_id, d.parent, iter + 1
from department d
join t_recurs r on d.parent = r.dep_id
)
select r.dep_id,
(select count(*) from t_recurs r2 where r2.iter > r.iter) as department_count,
(select count(*) from person where dep_id = any (select dep_id from t_recurs r3 where r3.iter >= r.iter)) as person_count
from t_recurs r
order by 1;

Now, whether that will be fast enough is a completely different topic

Thomas

Reply With Quote
  #4  
Old   
The Boss
 
Posts: n/a

Default Re: A CTE Question - 08-26-2010 , 04:10 PM



Thomas Kellerer wrote:
Quote:
Hi,

I also would like to know if you are really using Oracle.

As I don't have Oracle 11gR2 available (which is the first Oracle
version to support recursive CTEs) I tested this using PostgreSQL.
I expect this to run on Oracle 11gR2 as well.

with t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter< 1000
)

That query is wrong, and that's why you need this "where iter < 1000"
workaround.
The basic recursion should be like this:

with recursive t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 as iter
from department
where parent = -2

union all

select d.dep_id, d.parent, iter + 1
from department d
join t_recurs r on d.parent = r.dep_id
)
select *
from t_recurs;

I changed two things:
- the where condition for the first part of the union
- selecting the d.dep_id instead of r.dep_id in the recursion part
which makes the check for an infinite loop obsolete
Now to your real question:

What am I actually looking for is also the count of Persons that are
in all sub-departments in that hierarchy, including the departement a
person is currently in.

this should do it as far as I can tell.

with recursive t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 as iter
from department
where parent is null

union all

select d.dep_id, d.parent, iter + 1
from department d
join t_recurs r on d.parent = r.dep_id
)
select r.dep_id,
(select count(*) from t_recurs r2 where r2.iter > r.iter) as
department_count, (select count(*) from person where dep_id =
any (select dep_id from t_recurs r3 where r3.iter >= r.iter)) as
person_count from t_recurs r
order by 1;

Now, whether that will be fast enough is a completely different topic

Thomas
I can't answer on behalf of Michel, just wanted to say he has asked the same
question in comp.databases.ibm-db2 (and did receive a couple of good
answers).

Cheers!

--
Jeroen

Reply With Quote
  #5  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: A CTE Question - 08-28-2010 , 04:36 AM



Thomas Kellerer (OTPXDAJCSJVU (AT) spammotel (DOT) com) writes:
Quote:
I also would like to know if you are really using Oracle.
Actually, my assumption was that he was on Oracle!

That is, I saw the post in comp.databases.ms-sqlserver, and all the
syntax parsed on SQL 2008 except for the row constructor in the join.
The row constructor for the INSERT are ANSI as far as I know.

(And, no, I do normally hang out here. I had a feeling that Michel may
have posted to more newsgroups, so I searched on Google groups, and
sure enough.)

Quote:
with t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter< 1000
)

That query is wrong, and that's why you need this "where iter < 1000"
workaround.
The recursion as works: you start from the bottom and work your
way up. You will stop recurse when you reach the top level. However,
he would need a WHERE NOT EXISTS to only start at the leaves. Now
he starts on all nodes.

And at first glance, it seems that this is the way to go: you want
to count how many there are in each department, so why not start at
the bottom? The problem is that if a department has several sub-
departments, you will count the staff in that department everytime.
So I think that is a non-starter nevertheless. (And this test case
was missing frmo Michel's test data.)

Quote:
this should do it as far as I can tell.

with recursive t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 as iter
from department
where parent is null

union all

select d.dep_id, d.parent, iter + 1
from department d
join t_recurs r on d.parent = r.dep_id
)
select r.dep_id,
(select count(*) from t_recurs r2 where r2.iter > r.iter)
as department_count,
(select count(*) from person where dep_id =
any (select dep_id from t_recurs r3 where r3.iter >= r.iter))
as person_count from t_recurs r
order by 1;
I don't think this works. This assumes that there is a single line
of departments with no branches, and Michel's data had two roots. (And
uses -2 rather than NULL to signify a root.)

Here is the query that I suggested in the SQL Server newsgroup:

WITH depcnt (DEP_ID, cnt) AS (
SELECT DEP_ID, COUNT(*)
FROM PERSON
GROUP BY DEP_ID
), recurs (DEP_ID, iter, path) AS (
SELECT DEP_ID, 0, cast(str(DEP_ID, 4) as varchar(8000))
FROM DEPARTMENT
WHERE PARENT = -2
UNION ALL
SELECT D.DEP_ID, r.iter+1, r.path + str(D.DEP_ID, 4)
FROM recurs r
JOIN DEPARTMENT D ON r.DEP_ID = D.PARENT
)
SELECT R1.DEP_ID, SUM(dc.cnt)
FROM recurs R1
JOIN recurs R2 ON R2.path LIKE R1.path + '%'
LEFT JOIN depcnt dc ON R2.DEP_ID = dc.DEP_ID
GROUP BY R1.DEP_ID
ORDER BY R1.DEP_ID

To get this working on Oracle, you probably need to replace the str()
function, and change + in the LIKE expression to ||.

As for performance, on SQL Server it is certainly recommendable to
to first save the result of the CTE into a temp table and work from
there.

Here is some more sample data (still with row constructors, though):

create table DEPARTMENT
(
DEP_ID integer not null primary key,
NAME VARCHAR(32) not null,
PARENT integer not null
) ;


insert into DEPARTMENT
VALUES (1,'A',-2),
(2,'B',1),
(3,'C',2),
(4,'D',-2),
(5,'E',4),
(6,'F',5),
(7,'G',6),
(8,'D1', 4),
(9,'D2', 4),
(10,'E2', 5),
(11,'E3', 5)


create table PERSON (
PERSON_ID integer not null primary key,
NAME VARCHAR(128) not null,
DEP_ID integer,
constraint FK_01 foreign key (DEP_ID) references DEPARTMENT(DEP_ID)
);



--
Erland Sommarskog, Stockholm, esquel (AT) sommarskog (DOT) se

Reply With Quote
  #6  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: A CTE Question - 08-28-2010 , 05:56 AM



Erland Sommarskog wrote on 28.08.2010 11:36:
Quote:
with t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter< 1000
)

That query is wrong, and that's why you need this "where iter< 1000"
workaround.

The recursion as works: you start from the bottom and work your
way up. You will stop recurse when you reach the top level. However,
he would need a WHERE NOT EXISTS to only start at the leaves. Now
he starts on all nodes.
Hmm, for me the above query yields an endless "loop" when you remove the " iter < 1000"


Quote:
with recursive t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 as iter
from department
where parent is null

union all

select d.dep_id, d.parent, iter + 1
from department d
join t_recurs r on d.parent = r.dep_id
)
select r.dep_id,
(select count(*) from t_recurs r2 where r2.iter> r.iter)
as department_count,
(select count(*) from person where dep_id =
any (select dep_id from t_recurs r3 where r3.iter>= r.iter))
as person_count from t_recurs r
order by 1;

I don't think this works.
It works on PostgreSQL with the test data supplied

Quote:
(And uses -2 rather than NULL to signify a root.)
Yes, correct. Copy & Paste error. I just don't like "magic values"


Quote:
As for performance, on SQL Server it is certainly recommendable to
to first save the result of the CTE into a temp table and work from
there.
Temp tables are (most of the time) a real performance killer in the Oracle world.


Thomas

Reply With Quote
  #7  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: A CTE Question - 08-28-2010 , 07:38 AM



Thomas Kellerer (OTPXDAJCSJVU (AT) spammotel (DOT) com) writes:
Quote:
Erland Sommarskog wrote on 28.08.2010 11:36:
with t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter< 1000
)

That query is wrong, and that's why you need this "where iter< 1000"
workaround.

The recursion as works: you start from the bottom and work your
way up. You will stop recurse when you reach the top level. However,
he would need a WHERE NOT EXISTS to only start at the leaves. Now
he starts on all nodes.

Hmm, for me the above query yields an endless "loop" when you remove the "
iter < 1000"
On PostgreSQL?

It should not, as long as there are no cycles in the data. (And if there
is, you will get stuck, no matter where you start in the graph.) When
you come to the top node, the the recursive part of the CTE will
not produce any more rows, and recursion should not continue.

I noticed that in your own query you used the RECURSIVE keyword which
is unknown to SQL Server. Maybe that is needed on PostgreSQL to
prevent recursion from continuing?

Quote:
It works on PostgreSQL with the test data supplied
Interesting. I certainly don't get the expected result on SQL Server.
I don't have PostgresSQL, but it looks like a bug to me. If you try the
extra test data I composed, does it still work? (See below.)

Quote:
(And uses -2 rather than NULL to signify a root.)
Yes, correct. Copy & Paste error. I just don't like "magic values"
Nor do I!

Here is my test data again. In my previous post I had inadvertendly
left out the extra persons

create table DEPARTMENT
(
DEP_ID integer not null primary key,
NAME VARCHAR(32) not null,
PARENT integer not null
) ;


insert into DEPARTMENT
VALUES (1,'A',-2),
(2,'B',1),
(3,'C',2),
(4,'D',-2),
(5,'E',4),
(6,'F',5),
(7,'G',6),
(8,'D1', 4),
(9,'D2', 4),
(10,'E2', 5),
(11,'E3', 5)


create table PERSON (
PERSON_ID integer not null primary key,
NAME VARCHAR(128) not null,
DEP_ID integer,
constraint FK_01 foreign key (DEP_ID) references DEPARTMENT(DEP_ID)
);


insert into PERSON values
(1,'John',2),(2,'Mary',2),(3,'Suzan',3),
(4,'Joe',3), (5,'Lewis',3),(6,'Elvis',4),
(7,'Presley',5),(8,'Preston',5),(9,'Pee',7),
(10,'Wee',7),(11,'Steve-o',7),(12,'Demi Moore',7),
(13,'Pelle',8),(14,'Nisse',8),(15,'Petra',9),
(16,'Jöns',10),(17,'Pekka',11),(18,'Lalle',11),
(19,'Putte',11);
go

--
Erland Sommarskog, Stockholm, esquel (AT) sommarskog (DOT) se

Reply With Quote
  #8  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: A CTE Question - 08-28-2010 , 08:14 AM



Erland Sommarskog wrote on 28.08.2010 14:38:
Quote:
I noticed that in your own query you used the RECURSIVE keyword which
is unknown to SQL Server. Maybe that is needed on PostgreSQL to
prevent recursion from continuing?
No. It's required by the ANSI standard...


Quote:
It works on PostgreSQL with the test data supplied

Interesting. I certainly don't get the expected result on SQL Server.
I don't have PostgresSQL, but it looks like a bug to me. If you try the
extra test data I composed, does it still work? (See below.)
Note that the OP incorrectly (at least in my eyes) selects the the dep_id from the t_recurs not from the department table, and I think that is causing the problem with the endless loop:

with t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id, <-- !!! this should be D.dep_id !!!!
d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter < 1000
)

It should be d.dep_id (because r.dep_id will create a cycle).
As soon as I select the dep_id from the department table in the second part of the union, the endless loop stops.


Quote:
It works on PostgreSQL with the test data supplied

Interesting. I certainly don't get the expected result on SQL Server.
I don't have PostgresSQL, but it looks like a bug to me. If you try the
extra test data I composed, does it still work? (See below.)
You are right, it's counting too many rows

Because I'm simply counting everything below the current hierarchy level, not along the current branch. Your solution with taking the full path deals correctly with that.

Regards
Thomas

Reply With Quote
  #9  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: A CTE Question - 08-28-2010 , 12:15 PM



Thomas Kellerer (OTPXDAJCSJVU (AT) spammotel (DOT) com) writes:
Quote:
Note that the OP incorrectly (at least in my eyes) selects the the
dep_id from the t_recurs not from the department table, and I think that
is causing the problem with the endless loop:

with t_recurs (dep_id, parent, iter) as
(
select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id, <-- !!! this should be D.dep_id !!!!
d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter < 1000
)

It should be d.dep_id (because r.dep_id will create a cycle).
As soon as I select the dep_id from the department table in the second
part of the union, the endless loop stops.
Having r.dep_id in the recursive part might be a mistake, but it should
not cause infinite recursion. The recursion is controlled by the WHERE
clause, and the WHERE does not involve r.dep_id.

When I run:

with t_recurs (DEP_ID, PARENT, iter) as (
select DEP_ID, PARENT, 0 from DEPARTMENT
union all
select r.DEP_ID, d.PARENT, iter + 1
from t_recurs r, DEPARTMENT d
where r.PARENT = d.DEP_ID
)
SELECT * FROM t_recurs ORDER BY iter, DEP_ID

I get this output on SQL Server:

DEP_ID PARENT iter
----------- ----------- -----------
1 -2 0
2 1 0
3 2 0
4 -2 0
5 4 0
6 5 0
7 6 0
8 4 0
9 4 0
10 5 0
11 5 0
2 -2 1
3 1 1
5 -2 1
6 4 1
7 5 1
8 -2 1
9 -2 1
10 4 1
11 4 1
3 -2 2
6 -2 2
7 4 2
10 -2 2
11 -2 2
7 -2 3

Not that I think the output is useful for the problem Michel presented,
but I like to point out that there are other situations where this result
is useful, at least if you remove all rows with PARENT = -2. To with, this
table gives us a quick answer to "does subdept n belong dept m directly
or indirectly".

If you need the condition "iter < 1000" on PostgreSQL I assume that this
query gives you many more rows than 26. Looking at that output might
give a clue of what is going on.

--
Erland Sommarskog, Stockholm, esquel (AT) sommarskog (DOT) se

Reply With Quote
  #10  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: A CTE Question - 08-29-2010 , 06:52 AM



Erland Sommarskog (esquel (AT) sommarskog (DOT) se) writes:
Quote:
Having r.dep_id in the recursive part might be a mistake, but it should
not cause infinite recursion. The recursion is controlled by the WHERE
clause, and the WHERE does not involve r.dep_id.
...

Not that I think the output is useful for the problem Michel presented,
but I like to point out that there are other situations where this result
is useful, at least if you remove all rows with PARENT = -2. To with, this
table gives us a quick answer to "does subdept n belong dept m directly
or indirectly".
I was wrong. I was prompted to study the solutions in the DB2 newsgroup,
and the CTE Michel had is in fact quite close to the solution. He had:

select dep_id, parent, 0 from DEPARTMENT
union all
select r.dep_id,
d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter < 1000

The change needed is not in the recursive part, but in the root. It needs
to read:

select dep_id, dep_id, 0 from DEPARTMENT

Yes, dep_id twice! This CTE results in a table similar to the one I posted
yesterday. That is, you can see which departments a certain department is a
member, but now there is also a row where the department is a member of
itself.

And once you have this it's simple to join on department level to PERSON,
but aggregate on the enclosing aggregate:

with tc (dep_id, ancestor_id, iter) as (
select DEP_ID, DEP_ID, 0 from DEPARTMENT
union all
select tc.dep_id, d.PARENT, iter + 1
from tc, DEPARTMENT d
where tc.ancestor_id = d.DEP_ID
and d.PARENT >= 0
)
select tc.ancestor_id,
(select count(1) from PERSON p2
where p2.DEP_ID = tc.ancestor_id),
count(p.NAME)
from tc
join PERSON p
on tc.dep_id = p.DEP_ID
group by tc.ancestor_id
order by tc.ancestor_id

This query is courtsey of Lennart Jonsson who originally posted it to
comp.databases.ibm-db2.

--
Erland Sommarskog, Stockholm, esquel (AT) sommarskog (DOT) se

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.