![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location (L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the Count > 1. i.e. Count of repetitions within each hour for each L4 member. I want to get: Location's L2 members on Rows Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3 name, % Contrubtion of top 3 BadActors. I have MDX that returns the correct names and numbers, but when I try it on real data (~1 M Incidents) it takes much too long. I created a separate calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 % Contribution. I then CrossJoin these in the Select with {L2 members} I welcome any suggestions on how to improve performance of this query. I have read a bit about MDX CACHE - is this a case where it could be useful? Or processing location? Thanks, LMcPhee |
#3
| |||
| |||
|
|
Can I make sure I understand you correctly? What you're doing is for each L2 member, finding the names of the top 3 descendants of that L2 member by [Repeat] and then calculating their overall contribution? And you're using AS2K rather than AS2005? I'll assume you're using the former in this reply. A few things spring to mind... * It sounds like all of your queries are taking place at the leaf levels of Time and Location; if there are any other dimensions in the cube, it might be worth building some aggregations manually at higher levels of these other dimensions and L4/Hour. Take a look at the following posting on my blog for more information on how to do this: http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry * Are you partitioning your cube? It sounds like partitioning by L2 might help here. * What does the MDX for your query look like? One thing that might speed up your query is to use ad-hoc named sets to make sure you only do your TOPCOUNT (which is likely to be the most expensive operation in your query once). Here's an example using Foodmart 2000 which finds the top 3 customers for each state, but which shows how you only need to find those top 3 once and re-use the result across multiple calculated measures: WITH MEMBER MEASURES.TOPCUST AS '{ TOPCOUNT( DESCENDANTS( [Customers].CURRENTMEMBER , [Customers].[Name]) , 3, MEASURES.[UNIT SALES]) AS TOP3}.ITEM(0).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.SECONDCUST AS 'TOP3.ITEM(1).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.THIRDCUST AS 'TOP3.ITEM(2).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.CONTRIB AS 'SUM(TOP3, MEASURES.[UNIT SALES])/MEASURES.[UNIT SALES]', FORMAT_STRING='PERCENT', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' SELECT {MEASURES.TOPCUST, MEASURES.SECONDCUST, MEASURES.THIRDCUST, MEASURES.CONTRIB} ON 0, NON EMPTY [Customers].[State Province].MEMBERS ON 1 FROM SALES As you can see, the set TOP3 is created in the first calculated measure, then referenced in the others. * Setting the NON_EMPTY_BEHAVIOR property for your calculated measures could also be useful if you are removing empty members at any point. See the query above for an example; you can also do it in the Cube Editor for calculated measures defined on the server. The following article has more info: http://beta.blogs.msdn.com/bi_system...es/162852.aspx * Similarly, using NonEmptyCrossjoin() rather than Crossjoin() inside [Repeat] if you can might be a good idea. * Setting various connection string properties, like Exection Location, Cache Policy and Cache Ratio could help. Try the following combinations either alone or in combination: Execution Location=3; Default Isolation Mode=1 Cache Policy=7 Cache Ratio=0.1; Cache Ratio2=0.1 HTH, Chris -- Blog at: http://spaces.msn.com/members/cwebbbi/ "lmcphee" wrote: I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location (L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the Count > 1. i.e. Count of repetitions within each hour for each L4 member. I want to get: Location's L2 members on Rows Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3 name, % Contrubtion of top 3 BadActors. I have MDX that returns the correct names and numbers, but when I try it on real data (~1 M Incidents) it takes much too long. I created a separate calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 % Contribution. I then CrossJoin these in the Select with {L2 members} I welcome any suggestions on how to improve performance of this query. I have read a bit about MDX CACHE - is this a case where it could be useful? Or processing location? Thanks, LMcPhee |
#4
| |||
| |||
|
|
1. Your understanding is correct. Using AS2K. 2. All of the other dimensions are "flat". 3. Do not have Enterprise edition, so can't use partitions I don't think. I changed my CrossJoin to NonEmptyCrossJoin - that helped a lot, but still takes over 10 minutes in some cases. I did some tests - started with just none, then added one Bad Actor column at a time. The time added when adding BadActor columns was not that significant - so must be some caching of earlier TopCount results, being used for later calcs. The time now appears to be mainly dependent on the number of L2 members. and the initial Repeat calculation. I will look over your MDX suggestion later today when I have some time. Thanks, L McPhee "Chris Webb" wrote: Can I make sure I understand you correctly? What you're doing is for each L2 member, finding the names of the top 3 descendants of that L2 member by [Repeat] and then calculating their overall contribution? And you're using AS2K rather than AS2005? I'll assume you're using the former in this reply. A few things spring to mind... * It sounds like all of your queries are taking place at the leaf levels of Time and Location; if there are any other dimensions in the cube, it might be worth building some aggregations manually at higher levels of these other dimensions and L4/Hour. Take a look at the following posting on my blog for more information on how to do this: http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry * Are you partitioning your cube? It sounds like partitioning by L2 might help here. * What does the MDX for your query look like? One thing that might speed up your query is to use ad-hoc named sets to make sure you only do your TOPCOUNT (which is likely to be the most expensive operation in your query once). Here's an example using Foodmart 2000 which finds the top 3 customers for each state, but which shows how you only need to find those top 3 once and re-use the result across multiple calculated measures: WITH MEMBER MEASURES.TOPCUST AS '{ TOPCOUNT( DESCENDANTS( [Customers].CURRENTMEMBER , [Customers].[Name]) , 3, MEASURES.[UNIT SALES]) AS TOP3}.ITEM(0).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.SECONDCUST AS 'TOP3.ITEM(1).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.THIRDCUST AS 'TOP3.ITEM(2).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.CONTRIB AS 'SUM(TOP3, MEASURES.[UNIT SALES])/MEASURES.[UNIT SALES]', FORMAT_STRING='PERCENT', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' SELECT {MEASURES.TOPCUST, MEASURES.SECONDCUST, MEASURES.THIRDCUST, MEASURES.CONTRIB} ON 0, NON EMPTY [Customers].[State Province].MEMBERS ON 1 FROM SALES As you can see, the set TOP3 is created in the first calculated measure, then referenced in the others. * Setting the NON_EMPTY_BEHAVIOR property for your calculated measures could also be useful if you are removing empty members at any point. See the query above for an example; you can also do it in the Cube Editor for calculated measures defined on the server. The following article has more info: http://beta.blogs.msdn.com/bi_system...es/162852.aspx * Similarly, using NonEmptyCrossjoin() rather than Crossjoin() inside [Repeat] if you can might be a good idea. * Setting various connection string properties, like Exection Location, Cache Policy and Cache Ratio could help. Try the following combinations either alone or in combination: Execution Location=3; Default Isolation Mode=1 Cache Policy=7 Cache Ratio=0.1; Cache Ratio2=0.1 HTH, Chris -- Blog at: http://spaces.msn.com/members/cwebbbi/ "lmcphee" wrote: I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location (L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the Count > 1. i.e. Count of repetitions within each hour for each L4 member. I want to get: Location's L2 members on Rows Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3 name, % Contrubtion of top 3 BadActors. I have MDX that returns the correct names and numbers, but when I try it on real data (~1 M Incidents) it takes much too long. I created a separate calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 % Contribution. I then CrossJoin these in the Select with {L2 members} I welcome any suggestions on how to improve performance of this query. I have read a bit about MDX CACHE - is this a case where it could be useful? Or processing location? Thanks, LMcPhee |
#5
| |||
| |||
|
|
One last point though - on point (2), even if your other dimensions are flat, do they have All Levels? If so, then an aggregation built at L4/Hour and the All level of all the other dimensions could make a big difference, and it's highly unlikely it would get built using the Aggregation Design Wizard or the Usage-Based Optimisation Wizard - you'd have to build it manually. Chris -- Blog at: http://spaces.msn.com/members/cwebbbi/ "lmcphee" wrote: 1. Your understanding is correct. Using AS2K. 2. All of the other dimensions are "flat". 3. Do not have Enterprise edition, so can't use partitions I don't think. I changed my CrossJoin to NonEmptyCrossJoin - that helped a lot, but still takes over 10 minutes in some cases. I did some tests - started with just none, then added one Bad Actor column at a time. The time added when adding BadActor columns was not that significant - so must be some caching of earlier TopCount results, being used for later calcs. The time now appears to be mainly dependent on the number of L2 members. and the initial Repeat calculation. I will look over your MDX suggestion later today when I have some time. Thanks, L McPhee "Chris Webb" wrote: Can I make sure I understand you correctly? What you're doing is for each L2 member, finding the names of the top 3 descendants of that L2 member by [Repeat] and then calculating their overall contribution? And you're using AS2K rather than AS2005? I'll assume you're using the former in this reply. A few things spring to mind... * It sounds like all of your queries are taking place at the leaf levels of Time and Location; if there are any other dimensions in the cube, it might be worth building some aggregations manually at higher levels of these other dimensions and L4/Hour. Take a look at the following posting on my blog for more information on how to do this: http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry * Are you partitioning your cube? It sounds like partitioning by L2 might help here. * What does the MDX for your query look like? One thing that might speed up your query is to use ad-hoc named sets to make sure you only do your TOPCOUNT (which is likely to be the most expensive operation in your query once). Here's an example using Foodmart 2000 which finds the top 3 customers for each state, but which shows how you only need to find those top 3 once and re-use the result across multiple calculated measures: WITH MEMBER MEASURES.TOPCUST AS '{ TOPCOUNT( DESCENDANTS( [Customers].CURRENTMEMBER , [Customers].[Name]) , 3, MEASURES.[UNIT SALES]) AS TOP3}.ITEM(0).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.SECONDCUST AS 'TOP3.ITEM(1).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.THIRDCUST AS 'TOP3.ITEM(2).ITEM(0).NAME', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' MEMBER MEASURES.CONTRIB AS 'SUM(TOP3, MEASURES.[UNIT SALES])/MEASURES.[UNIT SALES]', FORMAT_STRING='PERCENT', NON_EMPTY_BEHAVIOR='MEASURES.[UNIT SALES]' SELECT {MEASURES.TOPCUST, MEASURES.SECONDCUST, MEASURES.THIRDCUST, MEASURES.CONTRIB} ON 0, NON EMPTY [Customers].[State Province].MEMBERS ON 1 FROM SALES As you can see, the set TOP3 is created in the first calculated measure, then referenced in the others. * Setting the NON_EMPTY_BEHAVIOR property for your calculated measures could also be useful if you are removing empty members at any point. See the query above for an example; you can also do it in the Cube Editor for calculated measures defined on the server. The following article has more info: http://beta.blogs.msdn.com/bi_system...es/162852.aspx * Similarly, using NonEmptyCrossjoin() rather than Crossjoin() inside [Repeat] if you can might be a good idea. * Setting various connection string properties, like Exection Location, Cache Policy and Cache Ratio could help. Try the following combinations either alone or in combination: Execution Location=3; Default Isolation Mode=1 Cache Policy=7 Cache Ratio=0.1; Cache Ratio2=0.1 HTH, Chris -- Blog at: http://spaces.msn.com/members/cwebbbi/ "lmcphee" wrote: I have two hierarchical dimensions: Time (Year/Month/Day/Hour) and Location (L1/L2/L3/L4). My facts are "Incident" records. I have a Repeat calc member that CrossJoins Hour and L4 to return (Count-1) for every HourxL4 where the Count > 1. i.e. Count of repetitions within each hour for each L4 member. I want to get: Location's L2 members on Rows Total Repeat Count, L4 BadActor#1 name, L4 BadActor#2 name, L4 BadActor#3 name, % Contrubtion of top 3 BadActors. I have MDX that returns the correct names and numbers, but when I try it on real data (~1 M Incidents) it takes much too long. I created a separate calc-member for each of BadActor1, BadActor2, BadActor3, and Top 3 % Contribution. I then CrossJoin these in the Select with {L2 members} I welcome any suggestions on how to improve performance of this query. I have read a bit about MDX CACHE - is this a case where it could be useful? Or processing location? Thanks, LMcPhee |
![]() |
| Thread Tools | |
| Display Modes | |
| |