dbTalk Databases Forums  

Optimize query that uses TopCount()

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Optimize query that uses TopCount() in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
lmcphee
 
Posts: n/a

Default Optimize query that uses TopCount() - 10-11-2005 , 02:14 PM






I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location
(L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member
that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the
Count > 1. i.e. Count of repetitions within each hour for each L4 member.

I want to get:
Location's L2 members on Rows
Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3
name, % Contrubtion of top 3 BadActors.

I have MDX that returns the correct names and numbers, but when I try it on
real data (~1 M Incidents) it takes much too long. I created a separate
calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 %
Contribution. I then CrossJoin these in the Select with {L2 members}

I welcome any suggestions on how to improve performance of this query. I
have read a bit about MDX CACHE - is this a case where it could be useful? Or
processing location?

Thanks,

LMcPhee

Reply With Quote
  #2  
Old   
Chris Webb
 
Posts: n/a

Default RE: Optimize query that uses TopCount() - 10-12-2005 , 07:38 AM






Can I make sure I understand you correctly? What you're doing is for each L2
member, finding the names of the top 3 descendants of that L2 member by
[Repeat] and then calculating their overall contribution? And you're using
AS2K rather than AS2005? I'll assume you're using the former in this reply.

A few things spring to mind...

* It sounds like all of your queries are taking place at the leaf levels of
Time and Location; if there are any other dimensions in the cube, it might be
worth building some aggregations manually at higher levels of these other
dimensions and L4/Hour. Take a look at the following posting on my blog for
more information on how to do this:
http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry

* Are you partitioning your cube? It sounds like partitioning by L2 might
help here.

* What does the MDX for your query look like? One thing that might speed up
your query is to use ad-hoc named sets to make sure you only do your TOPCOUNT
(which is likely to be the most expensive operation in your query once).
Here's an example using Foodmart 2000 which finds the top 3 customers for
each state, but which shows how you only need to find those top 3 once and
re-use the result across multiple calculated measures:

WITH
MEMBER MEASURES.TOPCUST AS '{
TOPCOUNT(
DESCENDANTS(
[Customers].CURRENTMEMBER
, [Customers].[Name])
, 3, MEASURES.[UNIT SALES])
AS TOP3}.ITEM(0).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.SECONDCUST AS 'TOP3.ITEM(1).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.THIRDCUST AS 'TOP3.ITEM(2).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.CONTRIB AS 'SUM(TOP3, MEASURES.[UNIT SALES])/MEASURES.[UNIT
SALES]', FORMAT_STRING='PERCENT', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
SELECT {MEASURES.TOPCUST, MEASURES.SECONDCUST, MEASURES.THIRDCUST,
MEASURES.CONTRIB} ON 0,
NON EMPTY
[Customers].[State Province].MEMBERS ON 1
FROM SALES

As you can see, the set TOP3 is created in the first calculated measure,
then referenced in the others.

* Setting the NON_EMPTY_BEHAVIOR property for your calculated measures could
also be useful if you are removing empty members at any point. See the query
above for an example; you can also do it in the Cube Editor for calculated
measures defined on the server. The following article has more info:
http://beta.blogs.msdn.com/bi_system...es/162852.aspx

* Similarly, using NonEmptyCrossjoin() rather than Crossjoin() inside
[Repeat] if you can might be a good idea.

* Setting various connection string properties, like Exection Location,
Cache Policy and Cache Ratio could help. Try the following combinations
either alone or in combination:
Execution Location=3; Default Isolation Mode=1
Cache Policy=7
Cache Ratio=0.1; Cache Ratio2=0.1

HTH,

Chris

--
Blog at:
http://spaces.msn.com/members/cwebbbi/


"lmcphee" wrote:

Quote:
I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location
(L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member
that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the
Count > 1. i.e. Count of repetitions within each hour for each L4 member.

I want to get:
Location's L2 members on Rows
Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3
name, % Contrubtion of top 3 BadActors.

I have MDX that returns the correct names and numbers, but when I try it on
real data (~1 M Incidents) it takes much too long. I created a separate
calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 %
Contribution. I then CrossJoin these in the Select with {L2 members}

I welcome any suggestions on how to improve performance of this query. I
have read a bit about MDX CACHE - is this a case where it could be useful? Or
processing location?

Thanks,

LMcPhee

Reply With Quote
  #3  
Old   
lmcphee
 
Posts: n/a

Default RE: Optimize query that uses TopCount() - 10-12-2005 , 09:36 AM



1. Your understanding is correct. Using AS2K.

2. All of the other dimensions are "flat".

3. Do not have Enterprise edition, so can't use partitions I don't think.
I changed my CrossJoin to NonEmptyCrossJoin - that helped a lot, but still
takes over 10 minutes in some cases.

I did some tests - started with just none, then added one Bad Actor column
at a time. The time added when adding BadActor columns was not that
significant - so must be some caching of earlier TopCount results, being used
for later calcs. The time now appears to be mainly dependent on the number of
L2 members. and the initial Repeat calculation.

I will look over your MDX suggestion later today when I have some time.

Thanks,
L McPhee

"Chris Webb" wrote:

Quote:
Can I make sure I understand you correctly? What you're doing is for each L2
member, finding the names of the top 3 descendants of that L2 member by
[Repeat] and then calculating their overall contribution? And you're using
AS2K rather than AS2005? I'll assume you're using the former in this reply.

A few things spring to mind...

* It sounds like all of your queries are taking place at the leaf levels of
Time and Location; if there are any other dimensions in the cube, it might be
worth building some aggregations manually at higher levels of these other
dimensions and L4/Hour. Take a look at the following posting on my blog for
more information on how to do this:
http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry

* Are you partitioning your cube? It sounds like partitioning by L2 might
help here.

* What does the MDX for your query look like? One thing that might speed up
your query is to use ad-hoc named sets to make sure you only do your TOPCOUNT
(which is likely to be the most expensive operation in your query once).
Here's an example using Foodmart 2000 which finds the top 3 customers for
each state, but which shows how you only need to find those top 3 once and
re-use the result across multiple calculated measures:

WITH
MEMBER MEASURES.TOPCUST AS '{
TOPCOUNT(
DESCENDANTS(
[Customers].CURRENTMEMBER
, [Customers].[Name])
, 3, MEASURES.[UNIT SALES])
AS TOP3}.ITEM(0).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.SECONDCUST AS 'TOP3.ITEM(1).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.THIRDCUST AS 'TOP3.ITEM(2).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.CONTRIB AS 'SUM(TOP3, MEASURES.[UNIT SALES])/MEASURES.[UNIT
SALES]', FORMAT_STRING='PERCENT', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
SELECT {MEASURES.TOPCUST, MEASURES.SECONDCUST, MEASURES.THIRDCUST,
MEASURES.CONTRIB} ON 0,
NON EMPTY
[Customers].[State Province].MEMBERS ON 1
FROM SALES

As you can see, the set TOP3 is created in the first calculated measure,
then referenced in the others.

* Setting the NON_EMPTY_BEHAVIOR property for your calculated measures could
also be useful if you are removing empty members at any point. See the query
above for an example; you can also do it in the Cube Editor for calculated
measures defined on the server. The following article has more info:
http://beta.blogs.msdn.com/bi_system...es/162852.aspx

* Similarly, using NonEmptyCrossjoin() rather than Crossjoin() inside
[Repeat] if you can might be a good idea.

* Setting various connection string properties, like Exection Location,
Cache Policy and Cache Ratio could help. Try the following combinations
either alone or in combination:
Execution Location=3; Default Isolation Mode=1
Cache Policy=7
Cache Ratio=0.1; Cache Ratio2=0.1

HTH,

Chris

--
Blog at:
http://spaces.msn.com/members/cwebbbi/


"lmcphee" wrote:

I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location
(L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member
that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the
Count > 1. i.e. Count of repetitions within each hour for each L4 member.

I want to get:
Location's L2 members on Rows
Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3
name, % Contrubtion of top 3 BadActors.

I have MDX that returns the correct names and numbers, but when I try it on
real data (~1 M Incidents) it takes much too long. I created a separate
calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 %
Contribution. I then CrossJoin these in the Select with {L2 members}

I welcome any suggestions on how to improve performance of this query. I
have read a bit about MDX CACHE - is this a case where it could be useful? Or
processing location?

Thanks,

LMcPhee

Reply With Quote
  #4  
Old   
Chris Webb
 
Posts: n/a

Default RE: Optimize query that uses TopCount() - 10-12-2005 , 09:52 AM



One last point though - on point (2), even if your other dimensions are flat,
do they have All Levels? If so, then an aggregation built at L4/Hour and the
All level of all the other dimensions could make a big difference, and it's
highly unlikely it would get built using the Aggregation Design Wizard or the
Usage-Based Optimisation Wizard - you'd have to build it manually.

Chris
--
Blog at:
http://spaces.msn.com/members/cwebbbi/


"lmcphee" wrote:

Quote:
1. Your understanding is correct. Using AS2K.

2. All of the other dimensions are "flat".

3. Do not have Enterprise edition, so can't use partitions I don't think.
I changed my CrossJoin to NonEmptyCrossJoin - that helped a lot, but still
takes over 10 minutes in some cases.

I did some tests - started with just none, then added one Bad Actor column
at a time. The time added when adding BadActor columns was not that
significant - so must be some caching of earlier TopCount results, being used
for later calcs. The time now appears to be mainly dependent on the number of
L2 members. and the initial Repeat calculation.

I will look over your MDX suggestion later today when I have some time.

Thanks,
L McPhee

"Chris Webb" wrote:

Can I make sure I understand you correctly? What you're doing is for each L2
member, finding the names of the top 3 descendants of that L2 member by
[Repeat] and then calculating their overall contribution? And you're using
AS2K rather than AS2005? I'll assume you're using the former in this reply.

A few things spring to mind...

* It sounds like all of your queries are taking place at the leaf levels of
Time and Location; if there are any other dimensions in the cube, it might be
worth building some aggregations manually at higher levels of these other
dimensions and L4/Hour. Take a look at the following posting on my blog for
more information on how to do this:
http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry

* Are you partitioning your cube? It sounds like partitioning by L2 might
help here.

* What does the MDX for your query look like? One thing that might speed up
your query is to use ad-hoc named sets to make sure you only do your TOPCOUNT
(which is likely to be the most expensive operation in your query once).
Here's an example using Foodmart 2000 which finds the top 3 customers for
each state, but which shows how you only need to find those top 3 once and
re-use the result across multiple calculated measures:

WITH
MEMBER MEASURES.TOPCUST AS '{
TOPCOUNT(
DESCENDANTS(
[Customers].CURRENTMEMBER
, [Customers].[Name])
, 3, MEASURES.[UNIT SALES])
AS TOP3}.ITEM(0).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.SECONDCUST AS 'TOP3.ITEM(1).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.THIRDCUST AS 'TOP3.ITEM(2).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.CONTRIB AS 'SUM(TOP3, MEASURES.[UNIT SALES])/MEASURES.[UNIT
SALES]', FORMAT_STRING='PERCENT', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
SELECT {MEASURES.TOPCUST, MEASURES.SECONDCUST, MEASURES.THIRDCUST,
MEASURES.CONTRIB} ON 0,
NON EMPTY
[Customers].[State Province].MEMBERS ON 1
FROM SALES

As you can see, the set TOP3 is created in the first calculated measure,
then referenced in the others.

* Setting the NON_EMPTY_BEHAVIOR property for your calculated measures could
also be useful if you are removing empty members at any point. See the query
above for an example; you can also do it in the Cube Editor for calculated
measures defined on the server. The following article has more info:
http://beta.blogs.msdn.com/bi_system...es/162852.aspx

* Similarly, using NonEmptyCrossjoin() rather than Crossjoin() inside
[Repeat] if you can might be a good idea.

* Setting various connection string properties, like Exection Location,
Cache Policy and Cache Ratio could help. Try the following combinations
either alone or in combination:
Execution Location=3; Default Isolation Mode=1
Cache Policy=7
Cache Ratio=0.1; Cache Ratio2=0.1

HTH,

Chris

--
Blog at:
http://spaces.msn.com/members/cwebbbi/


"lmcphee" wrote:

I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location
(L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member
that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the
Count > 1. i.e. Count of repetitions within each hour for each L4 member.

I want to get:
Location's L2 members on Rows
Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3
name, % Contrubtion of top 3 BadActors.

I have MDX that returns the correct names and numbers, but when I try it on
real data (~1 M Incidents) it takes much too long. I created a separate
calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 %
Contribution. I then CrossJoin these in the Select with {L2 members}

I welcome any suggestions on how to improve performance of this query. I
have read a bit about MDX CACHE - is this a case where it could be useful? Or
processing location?

Thanks,

LMcPhee

Reply With Quote
  #5  
Old   
lmcphee
 
Posts: n/a

Default RE: Optimize query that uses TopCount() - 10-12-2005 , 11:01 PM



Thanks for the tip. I didn't know you could manually build aggregations. I
read about the SSABI pkg and installed it, but the Partition Aggregation
utility is not under the Tools folder where it should be! I believe we have
the SQL2K Resource Kit and I have read there is a similar tool on there.
Still, odd that my install of SSABI.msi would not install the utility.

L McPhee

"Chris Webb" wrote:

Quote:
One last point though - on point (2), even if your other dimensions are flat,
do they have All Levels? If so, then an aggregation built at L4/Hour and the
All level of all the other dimensions could make a big difference, and it's
highly unlikely it would get built using the Aggregation Design Wizard or the
Usage-Based Optimisation Wizard - you'd have to build it manually.

Chris
--
Blog at:
http://spaces.msn.com/members/cwebbbi/


"lmcphee" wrote:

1. Your understanding is correct. Using AS2K.

2. All of the other dimensions are "flat".

3. Do not have Enterprise edition, so can't use partitions I don't think.
I changed my CrossJoin to NonEmptyCrossJoin - that helped a lot, but still
takes over 10 minutes in some cases.

I did some tests - started with just none, then added one Bad Actor column
at a time. The time added when adding BadActor columns was not that
significant - so must be some caching of earlier TopCount results, being used
for later calcs. The time now appears to be mainly dependent on the number of
L2 members. and the initial Repeat calculation.

I will look over your MDX suggestion later today when I have some time.

Thanks,
L McPhee

"Chris Webb" wrote:

Can I make sure I understand you correctly? What you're doing is for each L2
member, finding the names of the top 3 descendants of that L2 member by
[Repeat] and then calculating their overall contribution? And you're using
AS2K rather than AS2005? I'll assume you're using the former in this reply.

A few things spring to mind...

* It sounds like all of your queries are taking place at the leaf levels of
Time and Location; if there are any other dimensions in the cube, it might be
worth building some aggregations manually at higher levels of these other
dimensions and L4/Hour. Take a look at the following posting on my blog for
more information on how to do this:
http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry

* Are you partitioning your cube? It sounds like partitioning by L2 might
help here.

* What does the MDX for your query look like? One thing that might speed up
your query is to use ad-hoc named sets to make sure you only do your TOPCOUNT
(which is likely to be the most expensive operation in your query once).
Here's an example using Foodmart 2000 which finds the top 3 customers for
each state, but which shows how you only need to find those top 3 once and
re-use the result across multiple calculated measures:

WITH
MEMBER MEASURES.TOPCUST AS '{
TOPCOUNT(
DESCENDANTS(
[Customers].CURRENTMEMBER
, [Customers].[Name])
, 3, MEASURES.[UNIT SALES])
AS TOP3}.ITEM(0).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.SECONDCUST AS 'TOP3.ITEM(1).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.THIRDCUST AS 'TOP3.ITEM(2).ITEM(0).NAME',
NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
MEMBER MEASURES.CONTRIB AS 'SUM(TOP3, MEASURES.[UNIT SALES])/MEASURES.[UNIT
SALES]', FORMAT_STRING='PERCENT', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]'
SELECT {MEASURES.TOPCUST, MEASURES.SECONDCUST, MEASURES.THIRDCUST,
MEASURES.CONTRIB} ON 0,
NON EMPTY
[Customers].[State Province].MEMBERS ON 1
FROM SALES

As you can see, the set TOP3 is created in the first calculated measure,
then referenced in the others.

* Setting the NON_EMPTY_BEHAVIOR property for your calculated measures could
also be useful if you are removing empty members at any point. See the query
above for an example; you can also do it in the Cube Editor for calculated
measures defined on the server. The following article has more info:
http://beta.blogs.msdn.com/bi_system...es/162852.aspx

* Similarly, using NonEmptyCrossjoin() rather than Crossjoin() inside
[Repeat] if you can might be a good idea.

* Setting various connection string properties, like Exection Location,
Cache Policy and Cache Ratio could help. Try the following combinations
either alone or in combination:
Execution Location=3; Default Isolation Mode=1
Cache Policy=7
Cache Ratio=0.1; Cache Ratio2=0.1

HTH,

Chris

--
Blog at:
http://spaces.msn.com/members/cwebbbi/


"lmcphee" wrote:

I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location
(L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member
that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the
Count > 1. i.e. Count of repetitions within each hour for each L4 member.

I want to get:
Location's L2 members on Rows
Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3
name, % Contrubtion of top 3 BadActors.

I have MDX that returns the correct names and numbers, but when I try it on
real data (~1 M Incidents) it takes much too long. I created a separate
calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 %
Contribution. I then CrossJoin these in the Select with {L2 members}

I welcome any suggestions on how to improve performance of this query. I
have read a bit about MDX CACHE - is this a case where it could be useful? Or
processing location?

Thanks,

LMcPhee

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.