![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi, We have a critical problem with the query performance of a calculated measure that uses the Max function over a time dimension that includes members till the hour level. The current solution that we have includes: 1. A measure (i.e. UsageSum) that sums up the usage with all of the dimensions 2. A calculated member (i.e. Max Usage) that takes the hour with the max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember, [Time].[Hour].[Hour]), [Measures].[UsageSum]) The cube includes data of MAX usage of licenses. It uses 8 dimensions (User, Project, Time.Hour, License…). Its physical size is only 400MB. We have 40,000 leaf members in the Time dimension (since it is in the hour granularity). There are only 1400 leaf members in the project dimension and 4000 members in the leaf level of the License dimension. TimeHour Site Group License Project User Max Used Licenses 11/22/2004 8:00 S1 G1 L1 P1 U1 7 11/22/2004 8:00 S1 G1 L1 P1 U2 5 11/22/2004 8:00 S1 G1 L1 P1 U3 3 Max Used Licenses means – how many concurrent licenses have been used during a specific hour. This is our raw data – we don’t have the actual events (Start Time ? End Time) The problem is that the "Max Usage" member is calculated in run time and when running, for example, a query on the max usage of projects and licenses in last 10 weeks, it takes ~100 seconds to get the results. Note that in order to get the correct max usage we must perform two aggregation actions, sum over all of the dimensions and only afterwards max over the time. 1. Do you have any suggestion how to improve the performance of this calculated member? (We have followed most of the recommendations in the article http://www.microsoft.com/technet/pro...ansvcspg.mspx). 2. Do you have any idea for another method to calculate the max usage? 3. How can we control the aggregations of the cube so there will be aggregations on the leaf level of the Time, the Project and the License dimension? We were trying to affect the design storage using DSO programming with AddGoalQuery but it did not look like it improves the performance. We did not try to use the usage-based optimization. Thanks, Shirly. |
#3
| |||
| |||
|
|
Hi Shirly, Some thoughts on your problem... First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN to remove any hours with no activity at all could improve performance if a) some apps are only very rarely used, and/or b) usage of your apps is restricted to only some hours of the day. So your calculated member would look something like this: Max( NONEMPTYCROSSJOIN( Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour]) ) , [Measures].[UsageSum]) Secondly, to answer your question about how to only build aggregations at specific levels of dimensions you need to look at the 'Aggregation Usage' property of a dimension (accessible via the cube editor by selecting the dimension and then looking at the Advanced tab in the properties pane). Setting this to 'Bottom level only' for a dimension would ensure that AS would only consider building aggregations at the bottom level for this particular dimension. That said, with 40000 members at the leaf level of your time dimension you're unlikely to be able to build any aggregations using the Storage Design Wizard or even the Usage Based Optimisation wizard - my guess is that they would fall foul of the 1/3 rule. In that case you could try building aggregations manually using the Partition Manager tool that ships (in an early version) with the SQL Server Resource kit or (in an updated version) with the BI Accelerator. You might want to read the following recent thread for some help on deciding which aggregations you build when using this tool: http://groups.google.co.uk/groups?hl...com%26rnum%3D3 Thirdly, partitioning by Time might help query performance lots here. Are you doing this? Lastly, and possibly most unhelpfully, I would suggest that the design of your time dimension could well be part of the problem here. Is there any way you could break it up into maybe two dimensions - a week/day dimension and a time-of-day dimension? The combined number of members on both these dimensions would be far less than the number of members on your existing dimension, and as a result query performance is likely to improve. The big disadvantage of this approach though is that users can no longer drill down from day to hour level - they need to do a crossjoin instead. HTH, Chris "Shirly Baruch" wrote: Hi, We have a critical problem with the query performance of a calculated measure that uses the Max function over a time dimension that includes members till the hour level. The current solution that we have includes: 1. A measure (i.e. UsageSum) that sums up the usage with all of the dimensions 2. A calculated member (i.e. Max Usage) that takes the hour with the max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember, [Time].[Hour].[Hour]), [Measures].[UsageSum]) The cube includes data of MAX usage of licenses. It uses 8 dimensions (User, Project, Time.Hour, License…). Its physical size is only 400MB. We have 40,000 leaf members in the Time dimension (since it is in the hour granularity). There are only 1400 leaf members in the project dimension and 4000 members in the leaf level of the License dimension. TimeHour Site Group License Project User Max Used Licenses 11/22/2004 8:00 S1 G1 L1 P1 U1 7 11/22/2004 8:00 S1 G1 L1 P1 U2 5 11/22/2004 8:00 S1 G1 L1 P1 U3 3 Max Used Licenses means – how many concurrent licenses have been used during a specific hour. This is our raw data – we don’t have the actual events (Start Time ? End Time) The problem is that the "Max Usage" member is calculated in run time and when running, for example, a query on the max usage of projects and licenses in last 10 weeks, it takes ~100 seconds to get the results. Note that in order to get the correct max usage we must perform two aggregation actions, sum over all of the dimensions and only afterwards max over the time. 1. Do you have any suggestion how to improve the performance of this calculated member? (We have followed most of the recommendations in the article http://www.microsoft.com/technet/pro...ansvcspg.mspx). 2. Do you have any idea for another method to calculate the max usage? 3. How can we control the aggregations of the cube so there will be aggregations on the leaf level of the Time, the Project and the License dimension? We were trying to affect the design storage using DSO programming with AddGoalQuery but it did not look like it improves the performance. We did not try to use the usage-based optimization. Thanks, Shirly. |
#4
| |||
| |||
|
|
Hi Chirs, Thanks for your help! 1. We have usage data of sites across the world so there is almost no idle hours. Therefore using NonEmptyCorssJoin within the measure definition did not improve the performance. 2. I installed the BI Accelerator. How can it help with creating aggregations? I could find only a way to define the performance gain and the paritions' count using PartAggUtil.exe. 3. I am using partitions on the Month level. The data is usually being queried on the Week level. Do you think that defining partitions on a different level can improve the performance? 4. Your last suggestion is a great idea! Drilling down to the hour level is not a string requirement for us. We can have a hidden dimension for the hour-of-day and have two nested calculated members: HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum) Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day), Measures.HourOfDayMaxUsage). In that way we might benefit from aggregations that will be done for a dimension with two levels and 24 members. Thanks again, Shirly. "Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com... Hi Shirly, Some thoughts on your problem... First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN to remove any hours with no activity at all could improve performance if a) some apps are only very rarely used, and/or b) usage of your apps is restricted to only some hours of the day. So your calculated member would look something like this: Max( NONEMPTYCROSSJOIN( Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour]) ) , [Measures].[UsageSum]) Secondly, to answer your question about how to only build aggregations at specific levels of dimensions you need to look at the 'Aggregation Usage' property of a dimension (accessible via the cube editor by selecting the dimension and then looking at the Advanced tab in the properties pane). Setting this to 'Bottom level only' for a dimension would ensure that AS would only consider building aggregations at the bottom level for this particular dimension. That said, with 40000 members at the leaf level of your time dimension you're unlikely to be able to build any aggregations using the Storage Design Wizard or even the Usage Based Optimisation wizard - my guess is that they would fall foul of the 1/3 rule. In that case you could try building aggregations manually using the Partition Manager tool that ships (in an early version) with the SQL Server Resource kit or (in an updated version) with the BI Accelerator. You might want to read the following recent thread for some help on deciding which aggregations you build when using this tool: http://groups.google.co.uk/groups?hl...com%26rnum%3D3 Thirdly, partitioning by Time might help query performance lots here. Are you doing this? Lastly, and possibly most unhelpfully, I would suggest that the design of your time dimension could well be part of the problem here. Is there any way you could break it up into maybe two dimensions - a week/day dimension and a time-of-day dimension? The combined number of members on both these dimensions would be far less than the number of members on your existing dimension, and as a result query performance is likely to improve. The big disadvantage of this approach though is that users can no longer drill down from day to hour level - they need to do a crossjoin instead. HTH, Chris "Shirly Baruch" wrote: Hi, We have a critical problem with the query performance of a calculated measure that uses the Max function over a time dimension that includes members till the hour level. The current solution that we have includes: 1. A measure (i.e. UsageSum) that sums up the usage with all of the dimensions 2. A calculated member (i.e. Max Usage) that takes the hour with the max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember, [Time].[Hour].[Hour]), [Measures].[UsageSum]) The cube includes data of MAX usage of licenses. It uses 8 dimensions (User, Project, Time.Hour, License…). Its physical size is only 400MB. We have 40,000 leaf members in the Time dimension (since it is in the hour granularity). There are only 1400 leaf members in the project dimension and 4000 members in the leaf level of the License dimension. TimeHour Site Group License Project User Max Used Licenses 11/22/2004 8:00 S1 G1 L1 P1 U1 7 11/22/2004 8:00 S1 G1 L1 P1 U2 5 11/22/2004 8:00 S1 G1 L1 P1 U3 3 Max Used Licenses means – how many concurrent licenses have been used during a specific hour. This is our raw data – we don’t have the actual events (Start Time ? End Time) The problem is that the "Max Usage" member is calculated in run time and when running, for example, a query on the max usage of projects and licenses in last 10 weeks, it takes ~100 seconds to get the results. Note that in order to get the correct max usage we must perform two aggregation actions, sum over all of the dimensions and only afterwards max over the time. 1. Do you have any suggestion how to improve the performance of this calculated member? (We have followed most of the recommendations in the article http://www.microsoft.com/technet/pro...ansvcspg.mspx). 2. Do you have any idea for another method to calculate the max usage? 3. How can we control the aggregations of the cube so there will be aggregations on the leaf level of the Time, the Project and the License dimension? We were trying to affect the design storage using DSO programming with AddGoalQuery but it did not look like it improves the performance. We did not try to use the usage-based optimization. Thanks, Shirly. |
#5
| |||
| |||
|
|
Hi Shirly, 1) Nonetheless, it's probably a good idea to keep the NECJ statement inside your calculated measure. It won't hurt, and might make a difference in a few cases where you are filtering on members from the majority of your dimensions. 2) I'm sorry, I should have been a bit clearer here. It's not the BI Accelerator itself which will help, but a tool called 'Partition Manager' which gets installed as part of the BI Accelerator. It allows you to look at each partition in your cube and see exactly which aggregations have been built, and which levels they are at. If you read the thread I mentioned in my last post, it will give you details of how to find which aggregations you need to build to service your queries; and as I said, if your queries are always at the leaf level of many of your dimensions, it's likely that the Storage Design Wizard and the Usage-Based Optimization Wizard might not be building the aggregations you need. 3) If you're already partitioning by month, and setting the slice value of your partition properly, then I doubt that partitioning by week is going to make a big difference to your performance. 4) If you don't actually need to drill down to the hour level, then why not remove it completely? You can then calculate the Max usage for a day when you're loading the data into your fact table, and make day rather than hour the granularity of your cube - it will mean that your data load takes longer, but it's likely to make your queries much faster. Regards, Chris "Shirly Baruch" wrote: Hi Chirs, Thanks for your help! 1. We have usage data of sites across the world so there is almost no idle hours. Therefore using NonEmptyCorssJoin within the measure definition did not improve the performance. 2. I installed the BI Accelerator. How can it help with creating aggregations? I could find only a way to define the performance gain and the paritions' count using PartAggUtil.exe. 3. I am using partitions on the Month level. The data is usually being queried on the Week level. Do you think that defining partitions on a different level can improve the performance? 4. Your last suggestion is a great idea! Drilling down to the hour level is not a string requirement for us. We can have a hidden dimension for the hour-of-day and have two nested calculated members: HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum) Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day), Measures.HourOfDayMaxUsage). In that way we might benefit from aggregations that will be done for a dimension with two levels and 24 members. Thanks again, Shirly. "Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com... Hi Shirly, Some thoughts on your problem... First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN to remove any hours with no activity at all could improve performance if a) some apps are only very rarely used, and/or b) usage of your apps is restricted to only some hours of the day. So your calculated member would look something like this: Max( NONEMPTYCROSSJOIN( Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour]) ) , [Measures].[UsageSum]) Secondly, to answer your question about how to only build aggregations at specific levels of dimensions you need to look at the 'Aggregation Usage' property of a dimension (accessible via the cube editor by selecting the dimension and then looking at the Advanced tab in the properties pane). Setting this to 'Bottom level only' for a dimension would ensure that AS would only consider building aggregations at the bottom level for this particular dimension. That said, with 40000 members at the leaf level of your time dimension you're unlikely to be able to build any aggregations using the Storage Design Wizard or even the Usage Based Optimisation wizard - my guess is that they would fall foul of the 1/3 rule. In that case you could try building aggregations manually using the Partition Manager tool that ships (in an early version) with the SQL Server Resource kit or (in an updated version) with the BI Accelerator. You might want to read the following recent thread for some help on deciding which aggregations you build when using this tool: http://groups.google.co.uk/groups?hl...com%26rnum%3D3 Thirdly, partitioning by Time might help query performance lots here. Are you doing this? Lastly, and possibly most unhelpfully, I would suggest that the design of your time dimension could well be part of the problem here. Is there any way you could break it up into maybe two dimensions - a week/day dimension and a time-of-day dimension? The combined number of members on both these dimensions would be far less than the number of members on your existing dimension, and as a result query performance is likely to improve. The big disadvantage of this approach though is that users can no longer drill down from day to hour level - they need to do a crossjoin instead. HTH, Chris "Shirly Baruch" wrote: Hi, We have a critical problem with the query performance of a calculated measure that uses the Max function over a time dimension that includes members till the hour level. The current solution that we have includes: 1. A measure (i.e. UsageSum) that sums up the usage with all of the dimensions 2. A calculated member (i.e. Max Usage) that takes the hour with the max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember, [Time].[Hour].[Hour]), [Measures].[UsageSum]) The cube includes data of MAX usage of licenses. It uses 8 dimensions (User, Project, Time.Hour, License…). Its physical size is only 400MB. We have 40,000 leaf members in the Time dimension (since it is in the hour granularity). There are only 1400 leaf members in the project dimension and 4000 members in the leaf level of the License dimension. TimeHour Site Group License Project User Max Used Licenses 11/22/2004 8:00 S1 G1 L1 P1 U1 7 11/22/2004 8:00 S1 G1 L1 P1 U2 5 11/22/2004 8:00 S1 G1 L1 P1 U3 3 Max Used Licenses means – how many concurrent licenses have been used during a specific hour. This is our raw data – we don’t have the actual events (Start Time ? End Time) The problem is that the "Max Usage" member is calculated in run time and when running, for example, a query on the max usage of projects and licenses in last 10 weeks, it takes ~100 seconds to get the results. Note that in order to get the correct max usage we must perform two aggregation actions, sum over all of the dimensions and only afterwards max over the time. 1. Do you have any suggestion how to improve the performance of this calculated member? (We have followed most of the recommendations in the article http://www.microsoft.com/technet/pro...ansvcspg.mspx). 2. Do you have any idea for another method to calculate the max usage? 3. How can we control the aggregations of the cube so there will be aggregations on the leaf level of the Time, the Project and the License dimension? We were trying to affect the design storage using DSO programming with AddGoalQuery but it did not look like it improves the performance. We did not try to use the usage-based optimization. Thanks, Shirly. |
#6
| |||
| |||
|
|
Hi again, Calculating the max value in time requires the hour level even if we do not need to show it. The order of actions is important for the data correctness. First the data should be summed up in all of the dimensions and only then we can take the max on the time dimension. In that way the max value for a day can be a different value if you select different members in the background or if we select non-leaf members. For example: P1 and P2 are two projects under the same division D1. Let 's say the raw data is: Site Project User Time Usage S1 P1 a 200410101600 5 S1 P2 b 200410101700 10 S1 P1 c 200410101600 5 The max usage for a day is: Site Project Time Usage S1 P1 20041010 10 S1 P2 20041010 5 If we query the data that is based on the max usage of a day, we'll get a wrong result for D1: Site Project Time Usage S1 D1 20041010 15 There wasn't any hour during the day with 15 licenses in use. The correct max value is still 10. Therefore we have to use the hour level data for querying in the different levels of the dimensions. Thanks, Shirly. "Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message news:9543EA7A-C779-4A46-8243-D42E3C5A0609 (AT) microsoft (DOT) com... Hi Shirly, 1) Nonetheless, it's probably a good idea to keep the NECJ statement inside your calculated measure. It won't hurt, and might make a difference in a few cases where you are filtering on members from the majority of your dimensions. 2) I'm sorry, I should have been a bit clearer here. It's not the BI Accelerator itself which will help, but a tool called 'Partition Manager' which gets installed as part of the BI Accelerator. It allows you to look at each partition in your cube and see exactly which aggregations have been built, and which levels they are at. If you read the thread I mentioned in my last post, it will give you details of how to find which aggregations you need to build to service your queries; and as I said, if your queries are always at the leaf level of many of your dimensions, it's likely that the Storage Design Wizard and the Usage-Based Optimization Wizard might not be building the aggregations you need. 3) If you're already partitioning by month, and setting the slice value of your partition properly, then I doubt that partitioning by week is going to make a big difference to your performance. 4) If you don't actually need to drill down to the hour level, then why not remove it completely? You can then calculate the Max usage for a day when you're loading the data into your fact table, and make day rather than hour the granularity of your cube - it will mean that your data load takes longer, but it's likely to make your queries much faster. Regards, Chris "Shirly Baruch" wrote: Hi Chirs, Thanks for your help! 1. We have usage data of sites across the world so there is almost no idle hours. Therefore using NonEmptyCorssJoin within the measure definition did not improve the performance. 2. I installed the BI Accelerator. How can it help with creating aggregations? I could find only a way to define the performance gain and the paritions' count using PartAggUtil.exe. 3. I am using partitions on the Month level. The data is usually being queried on the Week level. Do you think that defining partitions on a different level can improve the performance? 4. Your last suggestion is a great idea! Drilling down to the hour level is not a string requirement for us. We can have a hidden dimension for the hour-of-day and have two nested calculated members: HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum) Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day), Measures.HourOfDayMaxUsage). In that way we might benefit from aggregations that will be done for a dimension with two levels and 24 members. Thanks again, Shirly. "Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com... Hi Shirly, Some thoughts on your problem... First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN to remove any hours with no activity at all could improve performance if a) some apps are only very rarely used, and/or b) usage of your apps is restricted to only some hours of the day. So your calculated member would look something like this: Max( NONEMPTYCROSSJOIN( Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour]) ) , [Measures].[UsageSum]) Secondly, to answer your question about how to only build aggregations at specific levels of dimensions you need to look at the 'Aggregation Usage' property of a dimension (accessible via the cube editor by selecting the dimension and then looking at the Advanced tab in the properties pane). Setting this to 'Bottom level only' for a dimension would ensure that AS would only consider building aggregations at the bottom level for this particular dimension. That said, with 40000 members at the leaf level of your time dimension you're unlikely to be able to build any aggregations using the Storage Design Wizard or even the Usage Based Optimisation wizard - my guess is that they would fall foul of the 1/3 rule. In that case you could try building aggregations manually using the Partition Manager tool that ships (in an early version) with the SQL Server Resource kit or (in an updated version) with the BI Accelerator. You might want to read the following recent thread for some help on deciding which aggregations you build when using this tool: http://groups.google.co.uk/groups?hl...com%26rnum%3D3 Thirdly, partitioning by Time might help query performance lots here. Are you doing this? Lastly, and possibly most unhelpfully, I would suggest that the design of your time dimension could well be part of the problem here. Is there any way you could break it up into maybe two dimensions - a week/day dimension and a time-of-day dimension? The combined number of members on both these dimensions would be far less than the number of members on your existing dimension, and as a result query performance is likely to improve. The big disadvantage of this approach though is that users can no longer drill down from day to hour level - they need to do a crossjoin instead. HTH, Chris "Shirly Baruch" wrote: Hi, We have a critical problem with the query performance of a calculated measure that uses the Max function over a time dimension that includes members till the hour level. The current solution that we have includes: 1. A measure (i.e. UsageSum) that sums up the usage with all of the dimensions 2. A calculated member (i.e. Max Usage) that takes the hour with the max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember, [Time].[Hour].[Hour]), [Measures].[UsageSum]) The cube includes data of MAX usage of licenses. It uses 8 dimensions (User, Project, Time.Hour, License…). Its physical size is only 400MB. We have 40,000 leaf members in the Time dimension (since it is in the hour granularity). There are only 1400 leaf members in the project dimension and 4000 members in the leaf level of the License dimension. TimeHour Site Group License Project User Max Used Licenses 11/22/2004 8:00 S1 G1 L1 P1 U1 7 11/22/2004 8:00 S1 G1 L1 P1 U2 5 11/22/2004 8:00 S1 G1 L1 P1 U3 3 Max Used Licenses means – how many concurrent licenses have been used during a specific hour. This is our raw data – we don’t have the actual events (Start Time ? End Time) The problem is that the "Max Usage" member is calculated in run time and when running, for example, a query on the max usage of projects and licenses in last 10 weeks, it takes ~100 seconds to get the results. Note that in order to get the correct max usage we must perform two aggregation actions, sum over all of the dimensions and only afterwards max over the time. 1. Do you have any suggestion how to improve the performance of this calculated member? (We have followed most of the recommendations in the article http://www.microsoft.com/technet/pro...ansvcspg.mspx). 2. Do you have any idea for another method to calculate the max usage? 3. How can we control the aggregations of the cube so there will be aggregations on the leaf level of the Time, the Project and the License dimension? We were trying to affect the design storage using DSO programming with AddGoalQuery but it did not look like it improves the performance. We did not try to use the usage-based optimization. Thanks, Shirly. |
#7
| |||
| |||
|
|
Hi Shirly, Sorry, I didn't properly understand your requirements - you're right, you do need the hour data. Did any of my other suggestions make a difference? Regards, Chris "Shirly Baruch" wrote: Hi again, Calculating the max value in time requires the hour level even if we do not need to show it. The order of actions is important for the data correctness. First the data should be summed up in all of the dimensions and only then we can take the max on the time dimension. In that way the max value for a day can be a different value if you select different members in the background or if we select non-leaf members. For example: P1 and P2 are two projects under the same division D1. Let 's say the raw data is: Site Project User Time Usage S1 P1 a 200410101600 5 S1 P2 b 200410101700 10 S1 P1 c 200410101600 5 The max usage for a day is: Site Project Time Usage S1 P1 20041010 10 S1 P2 20041010 5 If we query the data that is based on the max usage of a day, we'll get a wrong result for D1: Site Project Time Usage S1 D1 20041010 15 There wasn't any hour during the day with 15 licenses in use. The correct max value is still 10. Therefore we have to use the hour level data for querying in the different levels of the dimensions. Thanks, Shirly. "Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message news:9543EA7A-C779-4A46-8243-D42E3C5A0609 (AT) microsoft (DOT) com... Hi Shirly, 1) Nonetheless, it's probably a good idea to keep the NECJ statement inside your calculated measure. It won't hurt, and might make a difference in a few cases where you are filtering on members from the majority of your dimensions. 2) I'm sorry, I should have been a bit clearer here. It's not the BI Accelerator itself which will help, but a tool called 'Partition Manager' which gets installed as part of the BI Accelerator. It allows you to look at each partition in your cube and see exactly which aggregations have been built, and which levels they are at. If you read the thread I mentioned in my last post, it will give you details of how to find which aggregations you need to build to service your queries; and as I said, if your queries are always at the leaf level of many of your dimensions, it's likely that the Storage Design Wizard and the Usage-Based Optimization Wizard might not be building the aggregations you need. 3) If you're already partitioning by month, and setting the slice value of your partition properly, then I doubt that partitioning by week is going to make a big difference to your performance. 4) If you don't actually need to drill down to the hour level, then why not remove it completely? You can then calculate the Max usage for a day when you're loading the data into your fact table, and make day rather than hour the granularity of your cube - it will mean that your data load takes longer, but it's likely to make your queries much faster. Regards, Chris "Shirly Baruch" wrote: Hi Chirs, Thanks for your help! 1. We have usage data of sites across the world so there is almost no idle hours. Therefore using NonEmptyCorssJoin within the measure definition did not improve the performance. 2. I installed the BI Accelerator. How can it help with creating aggregations? I could find only a way to define the performance gain and the paritions' count using PartAggUtil.exe. 3. I am using partitions on the Month level. The data is usually being queried on the Week level. Do you think that defining partitions on a different level can improve the performance? 4. Your last suggestion is a great idea! Drilling down to the hour level is not a string requirement for us. We can have a hidden dimension for the hour-of-day and have two nested calculated members: HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum) Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day), Measures.HourOfDayMaxUsage). In that way we might benefit from aggregations that will be done for a dimension with two levels and 24 members. Thanks again, Shirly. "Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com... Hi Shirly, Some thoughts on your problem... First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN to remove any hours with no activity at all could improve performance if a) some apps are only very rarely used, and/or b) usage of your apps is restricted to only some hours of the day. So your calculated member would look something like this: Max( NONEMPTYCROSSJOIN( Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour]) ) , [Measures].[UsageSum]) Secondly, to answer your question about how to only build aggregations at specific levels of dimensions you need to look at the 'Aggregation Usage' property of a dimension (accessible via the cube editor by selecting the dimension and then looking at the Advanced tab in the properties pane). Setting this to 'Bottom level only' for a dimension would ensure that AS would only consider building aggregations at the bottom level for this particular dimension. That said, with 40000 members at the leaf level of your time dimension you're unlikely to be able to build any aggregations using the Storage Design Wizard or even the Usage Based Optimisation wizard - my guess is that they would fall foul of the 1/3 rule. In that case you could try building aggregations manually using the Partition Manager tool that ships (in an early version) with the SQL Server Resource kit or (in an updated version) with the BI Accelerator. You might want to read the following recent thread for some help on deciding which aggregations you build when using this tool: http://groups.google.co.uk/groups?hl...com%26rnum%3D3 Thirdly, partitioning by Time might help query performance lots here. Are you doing this? Lastly, and possibly most unhelpfully, I would suggest that the design of your time dimension could well be part of the problem here. Is there any way you could break it up into maybe two dimensions - a week/day dimension and a time-of-day dimension? The combined number of members on both these dimensions would be far less than the number of members on your existing dimension, and as a result query performance is likely to improve. The big disadvantage of this approach though is that users can no longer drill down from day to hour level - they need to do a crossjoin instead. HTH, Chris "Shirly Baruch" wrote: Hi, We have a critical problem with the query performance of a calculated measure that uses the Max function over a time dimension that includes members till the hour level. The current solution that we have includes: 1. A measure (i.e. UsageSum) that sums up the usage with all of the dimensions 2. A calculated member (i.e. Max Usage) that takes the hour with the max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember, [Time].[Hour].[Hour]), [Measures].[UsageSum]) The cube includes data of MAX usage of licenses. It uses 8 dimensions (User, Project, Time.Hour, License…). Its physical size is only 400MB. We have 40,000 leaf members in the Time dimension (since it is in the hour granularity). There are only 1400 leaf members in the project dimension and 4000 members in the leaf level of the License dimension. TimeHour Site Group License Project User Max Used Licenses 11/22/2004 8:00 S1 G1 L1 P1 U1 7 11/22/2004 8:00 S1 G1 L1 P1 U2 5 11/22/2004 8:00 S1 G1 L1 P1 U3 3 Max Used Licenses means – how many concurrent licenses have been used during a specific hour. This is our raw data – we don’t have the actual events (Start Time ? End Time) The problem is that the "Max Usage" member is calculated in run time and when running, for example, a query on the max usage of projects and licenses in last 10 weeks, it takes ~100 seconds to get the results. Note that in order to get the correct max usage we must perform two aggregation actions, sum over all of the dimensions and only afterwards max over the time. 1. Do you have any suggestion how to improve the performance of this calculated member? (We have followed most of the recommendations in the article http://www.microsoft.com/technet/pro...ansvcspg.mspx). 2. Do you have any idea for another method to calculate the max usage? 3. How can we control the aggregations of the cube so there will be aggregations on the leaf level of the Time, the Project and the License dimension? We were trying to affect the design storage using DSO programming with AddGoalQuery but it did not look like it improves the performance. We did not try to use the usage-based optimization. Thanks, Shirly. |
![]() |
| Thread Tools | |
| Display Modes | |
| |