dbTalk Databases Forums  

how to group/list top 3 of each category w/o using Union?

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss how to group/list top 3 of each category w/o using Union? in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Rich Protzel
 
Posts: n/a

Default how to group/list top 3 of each category w/o using Union? - 08-06-2003 , 11:43 AM






Hello,

So my table contains say 100,000 records, and I need to group the
categories in fld1 by the highest count of subcategories. Say fld1
contains categories A, B, C, D, E.

All of these categories contain subcategories AA, AB, AC, AD,...AJ, BA,
BB...BJ, CA, CB, CC...CJ, etc in fld2.

I am counting how many subcategories are listed for each category. Like
A may contain 5 of AA, 7 of AB, 3 of AC, 11 of AD...1 for the rest and
20 of AJ. B may contain 2 of BA, 11 of BB, 7 of BC, and 1 for the rest.
I want to pick up the top 3 subcategory counts for each category. Would
look like this:

Cat SubCat Count
A AJ 20
A AD 11
A AB 7
B BB 11
B BC 7
B BA 2

So event though each category contains 10 subcategories, I only want to
list the top 3 categories with the highest counts as above. If I just
do a group by and sort I can get this:

Cat SubCat Count
A ... ...
A
A
A
A
A
A
...
B ... ...
B
B
B
B
B
...

But I just want the top 3 of each category. The only way I can think of
to do this is to query each category individually and Select Top 3, and
then Union these guys into one query. The problem is that I have to
hardcode each category in the Union query. There may be new categoris
that I miss. Is there a way to achieve what I want without using Union?

Thanks,
Rich

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
  #2  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: how to group/list top 3 of each category w/o using Union? - 08-06-2003 , 04:57 PM






[posted and mailed, please reply in news]

Rich Protzel (rpng123 (AT) aol (DOT) com) writes:
Quote:
So my table contains say 100,000 records, and I need to group the
categories in fld1 by the highest count of subcategories. Say fld1
contains categories A, B, C, D, E.

All of these categories contain subcategories AA, AB, AC, AD,...AJ, BA,
BB...BJ, CA, CB, CC...CJ, etc in fld2.

I am counting how many subcategories are listed for each category. Like
A may contain 5 of AA, 7 of AB, 3 of AC, 11 of AD...1 for the rest and
20 of AJ. B may contain 2 of BA, 11 of BB, 7 of BC, and 1 for the rest.
I want to pick up the top 3 subcategory counts for each category. Would
look like this:

Cat SubCat Count
A AJ 20
A AD 11
A AB 7
B BB 11
B BC 7
B BA 2

So event though each category contains 10 subcategories, I only want to
list the top 3 categories with the highest counts as above. If I just
do a group by and sort I can get this:
Here is a pragmatic solution:

CREATE TABLE #temp (id int IDENTITY,
fld1 char(2) NOT NULL,
fld2 int NOT NULL,
cnt int NOT NULL)

INSERT #temp (fld1, fld2, cnt)
SELECT xtype, id % 10, COUNT(*)
FROM sysobjects
GROUP BY xtype, id % 10
ORDER BY 3 DESC
OPTION (MAXDOP 1)

SELECT DISTINCT t.*
FROM #temp t
JOIN (SELECT fld1, id = MIN(id)
FROM #temp
GROUP BY fld1) AS s ON t.id BETWEEN s.id AND s.id + 2
ORDER BY t.fld1, t.cnt DESC
go
drop table #temp

Here I have replaced your table with a funky categorization of a system
tables. (Hint: post CREATE TABLE statement for your tables, INSERT
statements with sample data and the desired output and you get a
tested solution that fits your situation.)

The solution is not fool-proof. There is not really any guarantee that
the identity values are assigned in the order specified in the SELECT
statement. But it usually works, particularly with the query hint that
turns of parallelism.


--
Erland Sommarskog, SQL Server MVP, sommar (AT) algonet (DOT) se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp


Reply With Quote
  #3  
Old   
John Gilson
 
Posts: n/a

Default Re: how to group/list top 3 of each category w/o using Union? - 08-06-2003 , 06:38 PM



"Rich Protzel" <rpng123 (AT) aol (DOT) com> wrote

Quote:
Hello,

So my table contains say 100,000 records, and I need to group the
categories in fld1 by the highest count of subcategories. Say fld1
contains categories A, B, C, D, E.

All of these categories contain subcategories AA, AB, AC, AD,...AJ, BA,
BB...BJ, CA, CB, CC...CJ, etc in fld2.

I am counting how many subcategories are listed for each category. Like
A may contain 5 of AA, 7 of AB, 3 of AC, 11 of AD...1 for the rest and
20 of AJ. B may contain 2 of BA, 11 of BB, 7 of BC, and 1 for the rest.
I want to pick up the top 3 subcategory counts for each category. Would
look like this:

Cat SubCat Count
A AJ 20
A AD 11
A AB 7
B BB 11
B BC 7
B BA 2

So event though each category contains 10 subcategories, I only want to
list the top 3 categories with the highest counts as above. If I just
do a group by and sort I can get this:

Cat SubCat Count
A ... ...
A
A
A
A
A
A
..
B ... ...
B
B
B
B
B
..

But I just want the top 3 of each category. The only way I can think of
to do this is to query each category individually and Select Top 3, and
then Union these guys into one query. The problem is that I have to
hardcode each category in the Union query. There may be new categoris
that I miss. Is there a way to achieve what I want without using Union?

Thanks,
Rich

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Here's a UDF that will return the top N for each category where
N is provided as an argument to the function.

CREATE TABLE CategoryCounts
(
category CHAR(1) NOT NULL,
subcategory CHAR(2) NOT NULL PRIMARY KEY,
cnt INT NOT NULL
)

-- Sample data
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('A', 'AA', 5)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('A', 'AB', 7)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('A', 'AC', 3)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('A', 'AD', 11)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('A', 'AE', 1)

INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('B', 'BA', 2)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('B', 'BB', 11)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('B', 'BC', 7)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('B', 'BD', 1)
INSERT INTO CategoryCounts (category, subcategory, cnt)
VALUES ('B', 'BE', 1)

CREATE FUNCTION TopNFromEachCategory
(@n INT)
RETURNS TABLE
AS
RETURN(
SELECT C1.category, C1.subcategory, C1.cnt
FROM CategoryCounts AS C1
WHERE @n >= (SELECT COUNT(DISTINCT C2.cnt)
FROM CategoryCounts AS C2
WHERE C1.category = C2.category AND
C2.cnt >= C1.cnt)
)

SELECT *
FROM TopNFromEachCategory(3)
ORDER BY category ASC, cnt DESC, subcategory ASC

category subcategory cnt
A AD 11
A AB 7
A AA 5
B BB 11
B BC 7
B BA 2

Regards,
jag




Reply With Quote
  #4  
Old   
Rich Protzel
 
Posts: n/a

Default Re: how to group/list top 3 of each category w/o using Union? - 08-07-2003 , 01:32 PM



Just a quick update on what I tried with the given solutions:

I created a static temp table which included an ID field, a Category
field, subCategory, and cnt. The first select statement here produced
the desired results with my actual data of retrieving only the top 3
subcategories (with the highest cnt's) for each category in Query
Analyzer. Note that in the temp table I ordered the table by Category
ASC and for cnt DESC.

SELECT DISTINCT t.*
FROM Temp t
JOIN (SELECT Category, id = MIN(id)
FROM Temp
GROUP BY Category) AS s ON t.id BETWEEN s.id AND s.id + 2
ORDER BY t.Category, t.cnt DESC


I also created a temp table without the ID field but all the rest of the
fields as above. The following select statement did not limit the
subcategories to the top 3. It did remove a few of the subcategories,
but removed the subcategories with the highest count. Maybe I missed
something or maybe it needs an ID field. I would be interested to know
what it would take to make this second procedure work as desired - since
it is so similar to the first one, except doesn't use ID field, and
doesn't use Min function, but uses Count instead.

SELECT C1.category, C1.subcategory, C1.cnt
FROM Temp AS C1
WHERE 3 >= (SELECT COUNT(DISTINCT C2.cnt)
FROM Temp AS C2
WHERE C1.category = C2.category AND
C2.cnt >= C1.cnt)


Again, I thank you all for helping me out.


Rich

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.