dbTalk Databases Forums  

A CTE Question

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss A CTE Question in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michel Esber
 
Posts: n/a

Default A CTE Question - 08-26-2010 , 11:08 AM






Hi Guys,

Can anyone shred some light here?

Sample DDL Code:


create table DEPARTMENT
(
DEP_ID integer not null primary key,
NAME VARCHAR(32) not null,
PARENT integer not null
) ;


insert into DEPARTMENT VALUES (1,'A',-2),(2,'B',1),(3,'C',2),
(4,'D',-2),(5,'E',4),(6,'F',5),(7,'G',6);


create table PERSON
(
PERSON_ID integer not null primary key,
NAME VARCHAR(128) not null,
DEP_ID integer,
constraint FK_01 foreign key (DEP_ID) references DEPARTMENT(DEP_ID)
);


insert into PERSON values (1,'John',2),(2,'Mary',2),(3,'Suzan',3),
(4,'Joe',3),(5,'Lewis',3),(6,'Elvis',4),(7,'Presle y',5),(8,'Preston',
5),(9,'Pee',7),(10,'Wee',7),(11,'Steve-o',7),(12,'Demi Moore',7);


with t_recurs (dep_id, parent, iter) as


(
select dep_id, parent, 0 from DEPARTMENT


union all


select r.dep_id, d.parent, iter + 1
from t_recurs r, DEPARTMENT d
where r.parent = d.dep_id and iter < 1000
)


select R.DEP_ID, COUNT(P.PERSON_ID)
from T_RECURS R inner join DEPARTMENT D on (R.DEP_ID, R.PARENT) =
(D.DEP_ID, D.PARENT)
left outer join PERSON P on P.DEP_ID = D.DEP_ID
group by R.DEP_ID
;


The output looks good:


DEP_ID DEP_COUNT
----------- -----------
1 0
2 2
3 3
4 1
5 2
6 0
7 4


What am I actually looking for is also the count of Persons that are
in all sub-departments in that hierarchy, including the departement a
person is currently in. Somehow I can't get this to work.


Expected output:


DEP_ID DEP_COUNT HIERARCHY_COUNT
----------- ----------- ---------------
1 0 5
2 2 5
3 3 3
4 1 7
5 2 6
6 0 6
7 4 4


Any ideas?


Thanks

Reply With Quote
  #2  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: A CTE Question - 08-28-2010 , 04:03 AM






Michel Esber (michel (AT) automatos (DOT) com) writes:
Quote:
from T_RECURS R inner join DEPARTMENT D on (R.DEP_ID, R.PARENT) =
(D.DEP_ID, D.PARENT)
Wait! That is not legal syntax for SQL Server! Are you on Oracle?

Quote:
What am I actually looking for is also the count of Persons that are
in all sub-departments in that hierarchy, including the departement a
person is currently in. Somehow I can't get this to work.
Although it looks like a standard problem it is not exactly trivial, if
you have not done it before. I know. I had to give up my first attempt
the other day, and I did not come back to the problem until now.

I think the query you have is a non-starter. You start from the bottom
and work upwards. This means that you start on every node, so you get
lots of extra rows. This can easily be dealt with a NOT EXISTS in the
anchor, but as you accumulate when you traverse upwards there is another
issue which your sample data would not reveal: If department 4 has one guy,
and two subdepartments 5 and 6 with one two guys each, you will count the
guy in dept 4 twice.

Eventually I came to the conclusion that you cannot do this without a
materialised path (unless you switch to a representation with nested
sets). Traverse the tree from the top, to get the path for all nodes,
then you can join the tree with itself with a LIKE expression to get
all the subdepartments:

WITH depcnt (DEP_ID, cnt) AS (
SELECT DEP_ID, COUNT(*)
FROM PERSON
GROUP BY DEP_ID
), recurs (DEP_ID, iter, path) AS (
SELECT DEP_ID, 0, cast(str(DEP_ID, 4) as varchar(8000))
FROM DEPARTMENT
WHERE PARENT = -2
UNION ALL
SELECT D.DEP_ID, r.iter+1, r.path + str(D.DEP_ID, 4)
FROM recurs r
JOIN DEPARTMENT D ON r.DEP_ID = D.PARENT
)
SELECT R1.DEP_ID, SUM(dc.cnt)
FROM recurs R1
JOIN recurs R2 ON R2.path LIKE R1.path + '%'
LEFT JOIN depcnt dc ON R2.DEP_ID = dc.DEP_ID
GROUP BY R1.DEP_ID
ORDER BY R1.DEP_ID

Notes:

* The str() function may be SQL Server specific.

* If you are on SQL Server, performance is likely to be better if you
save the result of the CTE in a temp table, since SQL Server else
will compute the CTE better. Whether this also applies to Oracle,
I don't know.

Here is an extended version of your sample data:



create table DEPARTMENT
(
DEP_ID integer not null primary key,
NAME VARCHAR(32) not null,
PARENT integer not null
) ;


insert into DEPARTMENT
VALUES (1,'A',-2),
(2,'B',1),
(3,'C',2),
(4,'D',-2),
(5,'E',4),
(6,'F',5),
(7,'G',6),
(8,'D1', 4),
(9,'D2', 4),
(10,'E2', 5),
(11,'E3', 5)


create table PERSON (
PERSON_ID integer not null primary key,
NAME VARCHAR(128) not null,
DEP_ID integer,
constraint FK_01 foreign key (DEP_ID) references DEPARTMENT(DEP_ID)
);


insert into PERSON values
(1,'John',2),(2,'Mary',2),(3,'Suzan',3),
(4,'Joe',3), (5,'Lewis',3),(6,'Elvis',4),
(7,'Presley',5),(8,'Preston',5),(9,'Pee',7),
(10,'Wee',7),(11,'Steve-o',7),(12,'Demi Moore',7),
(13,'Pelle',8),(14,'Nisse',8),(15,'Petra',9),
(16,'Jöns',10),(17,'Pekka',11),(18,'Lalle',11),
(19,'Putte',11);



--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
  #3  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: A CTE Question - 08-28-2010 , 03:34 PM



On 2010-08-28 11:03, Erland Sommarskog wrote:
Quote:
Michel Esber (michel (AT) automatos (DOT) com) writes:
from T_RECURS R inner join DEPARTMENT D on (R.DEP_ID, R.PARENT) =
(D.DEP_ID, D.PARENT)

Wait! That is not legal syntax for SQL Server! Are you on Oracle?

What am I actually looking for is also the count of Persons that are
in all sub-departments in that hierarchy, including the departement a
person is currently in. Somehow I can't get this to work.

Although it looks like a standard problem it is not exactly trivial, if
you have not done it before. I know. I had to give up my first attempt
the other day, and I did not come back to the problem until now.

I think the query you have is a non-starter. You start from the bottom
and work upwards. This means that you start on every node, so you get
lots of extra rows. This can easily be dealt with a NOT EXISTS in the
anchor, but as you accumulate when you traverse upwards there is another
issue which your sample data would not reveal: If department 4 has one guy,
and two subdepartments 5 and 6 with one two guys each, you will count the
guy in dept 4 twice.

Eventually I came to the conclusion that you cannot do this without a
materialised path (unless you switch to a representation with nested
sets). Traverse the tree from the top, to get the path for all nodes,
then you can join the tree with itself with a LIKE expression to get
all the subdepartments:

WITH depcnt (DEP_ID, cnt) AS (
SELECT DEP_ID, COUNT(*)
FROM PERSON
GROUP BY DEP_ID
), recurs (DEP_ID, iter, path) AS (
SELECT DEP_ID, 0, cast(str(DEP_ID, 4) as varchar(8000))
FROM DEPARTMENT
WHERE PARENT = -2
UNION ALL
SELECT D.DEP_ID, r.iter+1, r.path + str(D.DEP_ID, 4)
FROM recurs r
JOIN DEPARTMENT D ON r.DEP_ID = D.PARENT
)
SELECT R1.DEP_ID, SUM(dc.cnt)
FROM recurs R1
JOIN recurs R2 ON R2.path LIKE R1.path + '%'
LEFT JOIN depcnt dc ON R2.DEP_ID = dc.DEP_ID
GROUP BY R1.DEP_ID
ORDER BY R1.DEP_ID

Erland, can you post the output of your query? As you point out there
are some mssql specific constructs in there and I dont have access to
any mssql server, it would be interesting to compare your results with mine.

Judging from your sample data I assumed that you are Swedish, and some
googling strengthens that hypothesis. Jag bor själv i Umeå, men av någon
underlig anledning har jag tidigare antagit att du kommer från Tyskland,
det är en liten värld vi lever i ;-)

/Lennart

Reply With Quote
  #4  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: A CTE Question - 08-29-2010 , 06:44 AM



Lennart Jonsson (erik.lennart.jonsson (AT) gmail (DOT) com) writes:
Quote:
Erland, can you post the output of your query? As you point out there
are some mssql specific constructs in there and I dont have access to
any mssql server, it would be interesting to compare your results with
mine.
My query produces the same result as yours (save that I did not care to
include the number of employees in the specific department).

I will have to admit that when I came to comp.databases.ibm-db2, I was
a bit exhausted after my own exercises and also having visited
comp.databases.oraacle.server where the solutions were not very good.
So I did not look at the DB2 solutions very closely, and just dumped my
solution to newsgroup.

I have now studied your solution, and I think it is far better than
mine. It is also amazingly close to Michel's original query. The key is
that in the root you don't select ID and parent which is the common
way to start a recursion upwards, but instead you start with ID and ID.
This gives you a table which details which departments a department is
member of, including itself. Once armed with this information it's
simple to join with PERSON on the dep_id and the aggregate on the
parent (parent on any level) to get the desired number.

My solution is inferior as it requires a join of the CTE with itself,
which is not is likely to be efficient on any DBMS. The comparison with
LIKE may also be a performance killer. Another issue is that it is slightly
inconvenient to get the count on the local department level.

And a final problem is that it is difficult to write this solution fully
portable. One issue is the string concatenation which is:

R1.path + '%'

On SQL Server and Sybase, but

R1.path || '%'

On ANSI-compatible products.

Another problem is

str(DEP_ID, 4)

which makes the number into a right-justified fixed-length string: ' 4'.
This is surely possible on DB2 and Oracle as well, but the function is
likely to be another one. For this particular problem I could use

cast(D.DEP_ID as varchar(8000)) + ' '

But often when you work with a path, you need to have a fixed length of
the components to you can extract data from the path.

So let me say that I really appreciate that you jumped into this group
and gave me reason to study you query in detail! That learnt me something
new.

One final question: all you DB2 guys have this condition

where iter < 1000

in the recursive part of the CTE. Is this standard on DB2 to protect
against cycles? What happens if you don't include it, and there is a
cycle? Does the query loop forever? On SQL Server the default is that
if there is more than 100 levels of recursion, you get an error. You
can override this with a query hint.


For the reference of readers of this group, here is Lennart's query:

with tc (dep_id, ancestor_id, iter) as (
select DEP_ID, DEP_ID, 0 from DEPARTMENT
union all
select tc.dep_id, d.PARENT, iter + 1
from tc, DEPARTMENT d
where tc.ancestor_id = d.DEP_ID
and d.PARENT >= 0
)
select tc.ancestor_id,
(select count(1) from PERSON p2
where p2.DEP_ID = tc.ancestor_id),
count(p.NAME)
from tc
join PERSON p
on tc.dep_id = p.DEP_ID
group by tc.ancestor_id
order by tc.ancestor_id


Quote:
Judging from your sample data I assumed that you are Swedish, and some
googling strengthens that hypothesis. Jag bor själv i Umeå, men av någon
underlig anledning har jag tidigare antagit att du kommer från Tyskland,
det är en liten värld vi lever i ;-)
Tja, själv tycker jag som skåning att allt norr om Loshult är Lappland,
så jag kan förstå om Ume-bor tycker att allt så där långt söderut är samma
sak som Tyskland. :-)

--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
  #5  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: A CTE Question - 08-29-2010 , 08:24 AM



On 2010-08-29 13:44, Erland Sommarskog wrote:
[...]
Quote:
One final question: all you DB2 guys have this condition

where iter < 1000

in the recursive part of the CTE. Is this standard on DB2 to protect
against cycles? What happens if you don't include it, and there is a
cycle? Does the query loop forever? On SQL Server the default is that
if there is more than 100 levels of recursion, you get an error. You
can override this with a query hint.

On db2 there is no default limit and the query will loop forever (well
until terminated, runs out of memory, etc).

Beside protecting from infinite recursion, the compiler will produce a
warning unless it can deduce that the query is guaranteed to terminate:

SQL0347W The recursive common table expression "LELLE.TC" may contain
an infinite loop. SQLSTATE=01605

A trivial termination condition like: iter < 1000

helps the compiler to realise that the recursion will terminate, and
prevents this warning.

[...]

Quote:
Tja, själv tycker jag som skåning att allt norr om Loshult är Lappland,
så jag kan förstå om Ume-bor tycker att allt så där långt söderut är samma
sak som Tyskland. :-)

;-)

/Lennart

Reply With Quote
  #6  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: A CTE Question - 08-29-2010 , 04:54 PM



Lennart Jonsson (erik.lennart.jonsson (AT) gmail (DOT) com) writes:
Quote:
On db2 there is no default limit and the query will loop forever (well
until terminated, runs out of memory, etc).

Beside protecting from infinite recursion, the compiler will produce a
warning unless it can deduce that the query is guaranteed to terminate:

SQL0347W The recursive common table expression "LELLE.TC" may contain
an infinite loop. SQLSTATE=01605

A trivial termination condition like: iter < 1000

helps the compiler to realise that the recursion will terminate, and
prevents this warning.
Thanks for the information. Next time I see "iter < 1000" I know where it's
coming from. :-)
--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.