dbTalk Databases Forums  

Calculating Max Usage with OLAP

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Calculating Max Usage with OLAP in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Shirly Baruch
 
Posts: n/a

Default Calculating Max Usage with OLAP - 11-28-2004 , 08:10 AM






Hi,

We have a critical problem with the query performance of a calculated
measure that uses the Max function over a time dimension that includes
members till the hour level.

The current solution that we have includes:

1. A measure (i.e. UsageSum) that sums up the usage with all of the
dimensions

2. A calculated member (i.e. Max Usage) that takes the hour with the
max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember,
[Time].[Hour].[Hour]), [Measures].[UsageSum])



The cube includes data of MAX usage of licenses. It uses 8 dimensions (User,
Project, Time.Hour, License…). Its physical size is only 400MB. We have
40,000 leaf members in the Time dimension (since it is in the hour
granularity). There are only 1400 leaf members in the project dimension and
4000 members in the leaf level of the License dimension.

TimeHour
Site
Group
License
Project
User
Max Used Licenses

11/22/2004 8:00
S1
G1
L1
P1
U1
7

11/22/2004 8:00
S1
G1
L1
P1
U2
5

11/22/2004 8:00
S1
G1
L1
P1
U3
3


Max Used Licenses means – how many concurrent licenses have been used during
a specific hour. This is our raw data – we don’t have the actual events
(Start Time ? End Time)



The problem is that the "Max Usage" member is calculated in run time and
when running, for example, a query on the max usage of projects and licenses
in last 10 weeks, it takes ~100 seconds to get the results.

Note that in order to get the correct max usage we must perform two
aggregation actions, sum over all of the dimensions and only afterwards max
over the time.



1. Do you have any suggestion how to improve the performance of this
calculated member? (We have followed most of the recommendations in the
article
http://www.microsoft.com/technet/pro...ansvcspg.mspx).

2. Do you have any idea for another method to calculate the max usage?

3. How can we control the aggregations of the cube so there will be
aggregations on the leaf level of the Time, the Project and the License
dimension? We were trying to affect the design storage using DSO programming
with AddGoalQuery but it did not look like it improves the performance. We
did not try to use the usage-based optimization.



Thanks,

Shirly.




Reply With Quote
  #2  
Old   
Chris Webb
 
Posts: n/a

Default RE: Calculating Max Usage with OLAP - 11-29-2004 , 07:39 AM






Hi Shirly,

Some thoughts on your problem...

First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN to
remove any hours with no activity at all could improve performance if a) some
apps are only very rarely used, and/or b) usage of your apps is restricted to
only some hours of the day. So your calculated member would look something
like this:
Max(
NONEMPTYCROSSJOIN(
Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour])
)
, [Measures].[UsageSum])

Secondly, to answer your question about how to only build aggregations at
specific levels of dimensions you need to look at the 'Aggregation Usage'
property of a dimension (accessible via the cube editor by selecting the
dimension and then looking at the Advanced tab in the properties pane).
Setting this to 'Bottom level only' for a dimension would ensure that AS
would only consider building aggregations at the bottom level for this
particular dimension. That said, with 40000 members at the leaf level of your
time dimension you're unlikely to be able to build any aggregations using the
Storage Design Wizard or even the Usage Based Optimisation wizard - my guess
is that they would fall foul of the 1/3 rule. In that case you could try
building aggregations manually using the Partition Manager tool that ships
(in an early version) with the SQL Server Resource kit or (in an updated
version) with the BI Accelerator. You might want to read the following recent
thread for some help on deciding which aggregations you build when using this
tool
http://groups.google.co.uk/groups?hl...com%26rnum%3D3

Thirdly, partitioning by Time might help query performance lots here. Are
you doing this?

Lastly, and possibly most unhelpfully, I would suggest that the design of
your time dimension could well be part of the problem here. Is there any way
you could break it up into maybe two dimensions - a week/day dimension and a
time-of-day dimension? The combined number of members on both these
dimensions would be far less than the number of members on your existing
dimension, and as a result query performance is likely to improve. The big
disadvantage of this approach though is that users can no longer drill down
from day to hour level - they need to do a crossjoin instead.

HTH,

Chris



"Shirly Baruch" wrote:

Quote:
Hi,

We have a critical problem with the query performance of a calculated
measure that uses the Max function over a time dimension that includes
members till the hour level.

The current solution that we have includes:

1. A measure (i.e. UsageSum) that sums up the usage with all of the
dimensions

2. A calculated member (i.e. Max Usage) that takes the hour with the
max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember,
[Time].[Hour].[Hour]), [Measures].[UsageSum])



The cube includes data of MAX usage of licenses. It uses 8 dimensions (User,
Project, Time.Hour, License…). Its physical size is only 400MB. We have
40,000 leaf members in the Time dimension (since it is in the hour
granularity). There are only 1400 leaf members in the project dimension and
4000 members in the leaf level of the License dimension.

TimeHour
Site
Group
License
Project
User
Max Used Licenses

11/22/2004 8:00
S1
G1
L1
P1
U1
7

11/22/2004 8:00
S1
G1
L1
P1
U2
5

11/22/2004 8:00
S1
G1
L1
P1
U3
3


Max Used Licenses means – how many concurrent licenses have been used during
a specific hour. This is our raw data – we don’t have the actual events
(Start Time ? End Time)



The problem is that the "Max Usage" member is calculated in run time and
when running, for example, a query on the max usage of projects and licenses
in last 10 weeks, it takes ~100 seconds to get the results.

Note that in order to get the correct max usage we must perform two
aggregation actions, sum over all of the dimensions and only afterwards max
over the time.



1. Do you have any suggestion how to improve the performance of this
calculated member? (We have followed most of the recommendations in the
article
http://www.microsoft.com/technet/pro...ansvcspg.mspx).

2. Do you have any idea for another method to calculate the max usage?

3. How can we control the aggregations of the cube so there will be
aggregations on the leaf level of the Time, the Project and the License
dimension? We were trying to affect the design storage using DSO programming
with AddGoalQuery but it did not look like it improves the performance. We
did not try to use the usage-based optimization.



Thanks,

Shirly.





Reply With Quote
  #3  
Old   
Shirly Baruch
 
Posts: n/a

Default Re: Calculating Max Usage with OLAP - 11-30-2004 , 12:19 PM



Hi Chirs,



Thanks for your help!



1. We have usage data of sites across the world so there is almost no idle
hours. Therefore using NonEmptyCorssJoin within the measure definition did
not improve the performance.

2. I installed the BI Accelerator. How can it help with creating
aggregations? I could find only a way to define the performance gain and the
paritions' count using PartAggUtil.exe.

3. I am using partitions on the Month level. The data is usually being
queried on the Week level. Do you think that defining partitions on a
different level can improve the performance?

4. Your last suggestion is a great idea! Drilling down to the hour level is
not a string requirement for us. We can have a hidden dimension for the
hour-of-day and have two nested calculated members:

HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum)

Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day),
Measures.HourOfDayMaxUsage).

In that way we might benefit from aggregations that will be done for a
dimension with two levels and 24 members.



Thanks again,

Shirly.





"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote

Quote:
Hi Shirly,

Some thoughts on your problem...

First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN
to
remove any hours with no activity at all could improve performance if a)
some
apps are only very rarely used, and/or b) usage of your apps is restricted
to
only some hours of the day. So your calculated member would look something
like this:
Max(
NONEMPTYCROSSJOIN(
Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour])
)
, [Measures].[UsageSum])

Secondly, to answer your question about how to only build aggregations at
specific levels of dimensions you need to look at the 'Aggregation Usage'
property of a dimension (accessible via the cube editor by selecting the
dimension and then looking at the Advanced tab in the properties pane).
Setting this to 'Bottom level only' for a dimension would ensure that AS
would only consider building aggregations at the bottom level for this
particular dimension. That said, with 40000 members at the leaf level of
your
time dimension you're unlikely to be able to build any aggregations using
the
Storage Design Wizard or even the Usage Based Optimisation wizard - my
guess
is that they would fall foul of the 1/3 rule. In that case you could try
building aggregations manually using the Partition Manager tool that ships
(in an early version) with the SQL Server Resource kit or (in an updated
version) with the BI Accelerator. You might want to read the following
recent
thread for some help on deciding which aggregations you build when using
this
tool:

http://groups.google.co.uk/groups?hl...com%26rnum%3D3

Thirdly, partitioning by Time might help query performance lots here. Are
you doing this?

Lastly, and possibly most unhelpfully, I would suggest that the design of
your time dimension could well be part of the problem here. Is there any
way
you could break it up into maybe two dimensions - a week/day dimension and
a
time-of-day dimension? The combined number of members on both these
dimensions would be far less than the number of members on your existing
dimension, and as a result query performance is likely to improve. The big
disadvantage of this approach though is that users can no longer drill
down
from day to hour level - they need to do a crossjoin instead.

HTH,

Chris



"Shirly Baruch" wrote:

Hi,

We have a critical problem with the query performance of a calculated
measure that uses the Max function over a time dimension that includes
members till the hour level.

The current solution that we have includes:

1. A measure (i.e. UsageSum) that sums up the usage with all of
the
dimensions

2. A calculated member (i.e. Max Usage) that takes the hour with
the
max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember,
[Time].[Hour].[Hour]), [Measures].[UsageSum])



The cube includes data of MAX usage of licenses. It uses 8 dimensions
(User,
Project, Time.Hour, License…). Its physical size is only 400MB. We have
40,000 leaf members in the Time dimension (since it is in the hour
granularity). There are only 1400 leaf members in the project dimension
and
4000 members in the leaf level of the License dimension.

TimeHour
Site
Group
License
Project
User
Max Used Licenses

11/22/2004 8:00
S1
G1
L1
P1
U1
7

11/22/2004 8:00
S1
G1
L1
P1
U2
5

11/22/2004 8:00
S1
G1
L1
P1
U3
3


Max Used Licenses means – how many concurrent licenses have been used
during
a specific hour. This is our raw data – we don’t have the actual events
(Start Time ? End Time)



The problem is that the "Max Usage" member is calculated in run time and
when running, for example, a query on the max usage of projects and
licenses
in last 10 weeks, it takes ~100 seconds to get the results.

Note that in order to get the correct max usage we must perform two
aggregation actions, sum over all of the dimensions and only afterwards
max
over the time.



1. Do you have any suggestion how to improve the performance of
this
calculated member? (We have followed most of the recommendations in the
article

http://www.microsoft.com/technet/pro...ansvcspg.mspx).

2. Do you have any idea for another method to calculate the max
usage?

3. How can we control the aggregations of the cube so there will
be
aggregations on the leaf level of the Time, the Project and the License
dimension? We were trying to affect the design storage using DSO
programming
with AddGoalQuery but it did not look like it improves the performance.
We
did not try to use the usage-based optimization.



Thanks,

Shirly.







Reply With Quote
  #4  
Old   
Chris Webb
 
Posts: n/a

Default Re: Calculating Max Usage with OLAP - 12-01-2004 , 04:05 AM



Hi Shirly,

1) Nonetheless, it's probably a good idea to keep the NECJ statement inside
your calculated measure. It won't hurt, and might make a difference in a few
cases where you are filtering on members from the majority of your dimensions.
2) I'm sorry, I should have been a bit clearer here. It's not the BI
Accelerator itself which will help, but a tool called 'Partition Manager'
which gets installed as part of the BI Accelerator. It allows you to look at
each partition in your cube and see exactly which aggregations have been
built, and which levels they are at. If you read the thread I mentioned in my
last post, it will give you details of how to find which aggregations you
need to build to service your queries; and as I said, if your queries are
always at the leaf level of many of your dimensions, it's likely that the
Storage Design Wizard and the Usage-Based Optimization Wizard might not be
building the aggregations you need.
3) If you're already partitioning by month, and setting the slice value of
your partition properly, then I doubt that partitioning by week is going to
make a big difference to your performance.
4) If you don't actually need to drill down to the hour level, then why not
remove it completely? You can then calculate the Max usage for a day when
you're loading the data into your fact table, and make day rather than hour
the granularity of your cube - it will mean that your data load takes longer,
but it's likely to make your queries much faster.

Regards,

Chris

"Shirly Baruch" wrote:

Quote:
Hi Chirs,



Thanks for your help!



1. We have usage data of sites across the world so there is almost no idle
hours. Therefore using NonEmptyCorssJoin within the measure definition did
not improve the performance.

2. I installed the BI Accelerator. How can it help with creating
aggregations? I could find only a way to define the performance gain and the
paritions' count using PartAggUtil.exe.

3. I am using partitions on the Month level. The data is usually being
queried on the Week level. Do you think that defining partitions on a
different level can improve the performance?

4. Your last suggestion is a great idea! Drilling down to the hour level is
not a string requirement for us. We can have a hidden dimension for the
hour-of-day and have two nested calculated members:

HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum)

Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day),
Measures.HourOfDayMaxUsage).

In that way we might benefit from aggregations that will be done for a
dimension with two levels and 24 members.



Thanks again,

Shirly.





"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com...
Hi Shirly,

Some thoughts on your problem...

First of all, regarding your calculated member, using a NONEMPTYCROSSJOIN
to
remove any hours with no activity at all could improve performance if a)
some
apps are only very rarely used, and/or b) usage of your apps is restricted
to
only some hours of the day. So your calculated member would look something
like this:
Max(
NONEMPTYCROSSJOIN(
Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour])
)
, [Measures].[UsageSum])

Secondly, to answer your question about how to only build aggregations at
specific levels of dimensions you need to look at the 'Aggregation Usage'
property of a dimension (accessible via the cube editor by selecting the
dimension and then looking at the Advanced tab in the properties pane).
Setting this to 'Bottom level only' for a dimension would ensure that AS
would only consider building aggregations at the bottom level for this
particular dimension. That said, with 40000 members at the leaf level of
your
time dimension you're unlikely to be able to build any aggregations using
the
Storage Design Wizard or even the Usage Based Optimisation wizard - my
guess
is that they would fall foul of the 1/3 rule. In that case you could try
building aggregations manually using the Partition Manager tool that ships
(in an early version) with the SQL Server Resource kit or (in an updated
version) with the BI Accelerator. You might want to read the following
recent
thread for some help on deciding which aggregations you build when using
this
tool:

http://groups.google.co.uk/groups?hl...com%26rnum%3D3

Thirdly, partitioning by Time might help query performance lots here. Are
you doing this?

Lastly, and possibly most unhelpfully, I would suggest that the design of
your time dimension could well be part of the problem here. Is there any
way
you could break it up into maybe two dimensions - a week/day dimension and
a
time-of-day dimension? The combined number of members on both these
dimensions would be far less than the number of members on your existing
dimension, and as a result query performance is likely to improve. The big
disadvantage of this approach though is that users can no longer drill
down
from day to hour level - they need to do a crossjoin instead.

HTH,

Chris



"Shirly Baruch" wrote:

Hi,

We have a critical problem with the query performance of a calculated
measure that uses the Max function over a time dimension that includes
members till the hour level.

The current solution that we have includes:

1. A measure (i.e. UsageSum) that sums up the usage with all of
the
dimensions

2. A calculated member (i.e. Max Usage) that takes the hour with
the
max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember,
[Time].[Hour].[Hour]), [Measures].[UsageSum])



The cube includes data of MAX usage of licenses. It uses 8 dimensions
(User,
Project, Time.Hour, License…). Its physical size is only 400MB. We have
40,000 leaf members in the Time dimension (since it is in the hour
granularity). There are only 1400 leaf members in the project dimension
and
4000 members in the leaf level of the License dimension.

TimeHour
Site
Group
License
Project
User
Max Used Licenses

11/22/2004 8:00
S1
G1
L1
P1
U1
7

11/22/2004 8:00
S1
G1
L1
P1
U2
5

11/22/2004 8:00
S1
G1
L1
P1
U3
3


Max Used Licenses means – how many concurrent licenses have been used
during
a specific hour. This is our raw data – we don’t have the actual events
(Start Time ? End Time)



The problem is that the "Max Usage" member is calculated in run time and
when running, for example, a query on the max usage of projects and
licenses
in last 10 weeks, it takes ~100 seconds to get the results.

Note that in order to get the correct max usage we must perform two
aggregation actions, sum over all of the dimensions and only afterwards
max
over the time.



1. Do you have any suggestion how to improve the performance of
this
calculated member? (We have followed most of the recommendations in the
article

http://www.microsoft.com/technet/pro...ansvcspg.mspx).

2. Do you have any idea for another method to calculate the max
usage?

3. How can we control the aggregations of the cube so there will
be
aggregations on the leaf level of the Time, the Project and the License
dimension? We were trying to affect the design storage using DSO
programming
with AddGoalQuery but it did not look like it improves the performance.
We
did not try to use the usage-based optimization.



Thanks,

Shirly.








Reply With Quote
  #5  
Old   
Shirly Baruch
 
Posts: n/a

Default Re: Calculating Max Usage with OLAP - 12-05-2004 , 02:27 AM



Hi again,

Calculating the max value in time requires the hour level even if we do not
need to show it.
The order of actions is important for the data correctness.
First the data should be summed up in all of the dimensions and only then we
can take the max on the time dimension.
In that way the max value for a day can be a different value if you select
different members in the background or if we select non-leaf members.
For example:
P1 and P2 are two projects under the same division D1.
Let 's say the raw data is:
Site Project User Time Usage
S1 P1 a 200410101600 5
S1 P2 b 200410101700 10
S1 P1 c 200410101600 5
The max usage for a day is:
Site Project Time Usage
S1 P1 20041010 10
S1 P2 20041010 5
If we query the data that is based on the max usage of a day, we'll get a
wrong result for D1:
Site Project Time Usage
S1 D1 20041010 15
There wasn't any hour during the day with 15 licenses in use. The correct
max value is still 10.
Therefore we have to use the hour level data for querying in the different
levels of the dimensions.

Thanks,
Shirly.

"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote

Quote:
Hi Shirly,

1) Nonetheless, it's probably a good idea to keep the NECJ statement
inside
your calculated measure. It won't hurt, and might make a difference in a
few
cases where you are filtering on members from the majority of your
dimensions.
2) I'm sorry, I should have been a bit clearer here. It's not the BI
Accelerator itself which will help, but a tool called 'Partition Manager'
which gets installed as part of the BI Accelerator. It allows you to look
at
each partition in your cube and see exactly which aggregations have been
built, and which levels they are at. If you read the thread I mentioned in
my
last post, it will give you details of how to find which aggregations you
need to build to service your queries; and as I said, if your queries are
always at the leaf level of many of your dimensions, it's likely that the
Storage Design Wizard and the Usage-Based Optimization Wizard might not be
building the aggregations you need.
3) If you're already partitioning by month, and setting the slice value of
your partition properly, then I doubt that partitioning by week is going
to
make a big difference to your performance.
4) If you don't actually need to drill down to the hour level, then why
not
remove it completely? You can then calculate the Max usage for a day when
you're loading the data into your fact table, and make day rather than
hour
the granularity of your cube - it will mean that your data load takes
longer,
but it's likely to make your queries much faster.

Regards,

Chris

"Shirly Baruch" wrote:

Hi Chirs,



Thanks for your help!



1. We have usage data of sites across the world so there is almost no
idle
hours. Therefore using NonEmptyCorssJoin within the measure definition
did
not improve the performance.

2. I installed the BI Accelerator. How can it help with creating
aggregations? I could find only a way to define the performance gain and
the
paritions' count using PartAggUtil.exe.

3. I am using partitions on the Month level. The data is usually being
queried on the Week level. Do you think that defining partitions on a
different level can improve the performance?

4. Your last suggestion is a great idea! Drilling down to the hour level
is
not a string requirement for us. We can have a hidden dimension for the
hour-of-day and have two nested calculated members:

HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum)

Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day),
Measures.HourOfDayMaxUsage).

In that way we might benefit from aggregations that will be done for a
dimension with two levels and 24 members.



Thanks again,

Shirly.





"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com...
Hi Shirly,

Some thoughts on your problem...

First of all, regarding your calculated member, using a
NONEMPTYCROSSJOIN
to
remove any hours with no activity at all could improve performance if
a)
some
apps are only very rarely used, and/or b) usage of your apps is
restricted
to
only some hours of the day. So your calculated member would look
something
like this:
Max(
NONEMPTYCROSSJOIN(
Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour])
)
, [Measures].[UsageSum])

Secondly, to answer your question about how to only build aggregations
at
specific levels of dimensions you need to look at the 'Aggregation
Usage'
property of a dimension (accessible via the cube editor by selecting
the
dimension and then looking at the Advanced tab in the properties
pane).
Setting this to 'Bottom level only' for a dimension would ensure that
AS
would only consider building aggregations at the bottom level for this
particular dimension. That said, with 40000 members at the leaf level
of
your
time dimension you're unlikely to be able to build any aggregations
using
the
Storage Design Wizard or even the Usage Based Optimisation wizard - my
guess
is that they would fall foul of the 1/3 rule. In that case you could
try
building aggregations manually using the Partition Manager tool that
ships
(in an early version) with the SQL Server Resource kit or (in an
updated
version) with the BI Accelerator. You might want to read the following
recent
thread for some help on deciding which aggregations you build when
using
this
tool:


http://groups.google.co.uk/groups?hl...com%26rnum%3D3

Thirdly, partitioning by Time might help query performance lots here.
Are
you doing this?

Lastly, and possibly most unhelpfully, I would suggest that the design
of
your time dimension could well be part of the problem here. Is there
any
way
you could break it up into maybe two dimensions - a week/day dimension
and
a
time-of-day dimension? The combined number of members on both these
dimensions would be far less than the number of members on your
existing
dimension, and as a result query performance is likely to improve. The
big
disadvantage of this approach though is that users can no longer drill
down
from day to hour level - they need to do a crossjoin instead.

HTH,

Chris



"Shirly Baruch" wrote:

Hi,

We have a critical problem with the query performance of a
calculated
measure that uses the Max function over a time dimension that
includes
members till the hour level.

The current solution that we have includes:

1. A measure (i.e. UsageSum) that sums up the usage with all
of
the
dimensions

2. A calculated member (i.e. Max Usage) that takes the hour
with
the
max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember,
[Time].[Hour].[Hour]), [Measures].[UsageSum])



The cube includes data of MAX usage of licenses. It uses 8
dimensions
(User,
Project, Time.Hour, License…). Its physical size is only 400MB. We
have
40,000 leaf members in the Time dimension (since it is in the hour
granularity). There are only 1400 leaf members in the project
dimension
and
4000 members in the leaf level of the License dimension.

TimeHour
Site
Group
License
Project
User
Max Used Licenses

11/22/2004 8:00
S1
G1
L1
P1
U1
7

11/22/2004 8:00
S1
G1
L1
P1
U2
5

11/22/2004 8:00
S1
G1
L1
P1
U3
3


Max Used Licenses means – how many concurrent licenses have been
used
during
a specific hour. This is our raw data – we don’t have the actual
events
(Start Time ? End Time)



The problem is that the "Max Usage" member is calculated in run time
and
when running, for example, a query on the max usage of projects and
licenses
in last 10 weeks, it takes ~100 seconds to get the results.

Note that in order to get the correct max usage we must perform two
aggregation actions, sum over all of the dimensions and only
afterwards
max
over the time.



1. Do you have any suggestion how to improve the performance
of
this
calculated member? (We have followed most of the recommendations in
the
article


http://www.microsoft.com/technet/pro...ansvcspg.mspx).

2. Do you have any idea for another method to calculate the
max
usage?

3. How can we control the aggregations of the cube so there
will
be
aggregations on the leaf level of the Time, the Project and the
License
dimension? We were trying to affect the design storage using DSO
programming
with AddGoalQuery but it did not look like it improves the
performance.
We
did not try to use the usage-based optimization.



Thanks,

Shirly.










Reply With Quote
  #6  
Old   
Chris Webb
 
Posts: n/a

Default Re: Calculating Max Usage with OLAP - 12-06-2004 , 07:47 AM



Hi Shirly,

Sorry, I didn't properly understand your requirements - you're right, you do
need the hour data. Did any of my other suggestions make a difference?

Regards,

Chris

"Shirly Baruch" wrote:

Quote:
Hi again,

Calculating the max value in time requires the hour level even if we do not
need to show it.
The order of actions is important for the data correctness.
First the data should be summed up in all of the dimensions and only then we
can take the max on the time dimension.
In that way the max value for a day can be a different value if you select
different members in the background or if we select non-leaf members.
For example:
P1 and P2 are two projects under the same division D1.
Let 's say the raw data is:
Site Project User Time Usage
S1 P1 a 200410101600 5
S1 P2 b 200410101700 10
S1 P1 c 200410101600 5
The max usage for a day is:
Site Project Time Usage
S1 P1 20041010 10
S1 P2 20041010 5
If we query the data that is based on the max usage of a day, we'll get a
wrong result for D1:
Site Project Time Usage
S1 D1 20041010 15
There wasn't any hour during the day with 15 licenses in use. The correct
max value is still 10.
Therefore we have to use the hour level data for querying in the different
levels of the dimensions.

Thanks,
Shirly.

"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:9543EA7A-C779-4A46-8243-D42E3C5A0609 (AT) microsoft (DOT) com...
Hi Shirly,

1) Nonetheless, it's probably a good idea to keep the NECJ statement
inside
your calculated measure. It won't hurt, and might make a difference in a
few
cases where you are filtering on members from the majority of your
dimensions.
2) I'm sorry, I should have been a bit clearer here. It's not the BI
Accelerator itself which will help, but a tool called 'Partition Manager'
which gets installed as part of the BI Accelerator. It allows you to look
at
each partition in your cube and see exactly which aggregations have been
built, and which levels they are at. If you read the thread I mentioned in
my
last post, it will give you details of how to find which aggregations you
need to build to service your queries; and as I said, if your queries are
always at the leaf level of many of your dimensions, it's likely that the
Storage Design Wizard and the Usage-Based Optimization Wizard might not be
building the aggregations you need.
3) If you're already partitioning by month, and setting the slice value of
your partition properly, then I doubt that partitioning by week is going
to
make a big difference to your performance.
4) If you don't actually need to drill down to the hour level, then why
not
remove it completely? You can then calculate the Max usage for a day when
you're loading the data into your fact table, and make day rather than
hour
the granularity of your cube - it will mean that your data load takes
longer,
but it's likely to make your queries much faster.

Regards,

Chris

"Shirly Baruch" wrote:

Hi Chirs,



Thanks for your help!



1. We have usage data of sites across the world so there is almost no
idle
hours. Therefore using NonEmptyCorssJoin within the measure definition
did
not improve the performance.

2. I installed the BI Accelerator. How can it help with creating
aggregations? I could find only a way to define the performance gain and
the
paritions' count using PartAggUtil.exe.

3. I am using partitions on the Month level. The data is usually being
queried on the Week level. Do you think that defining partitions on a
different level can improve the performance?

4. Your last suggestion is a great idea! Drilling down to the hour level
is
not a string requirement for us. We can have a hidden dimension for the
hour-of-day and have two nested calculated members:

HourOfDayMaxUsage = Max(HourOfDay.Members, Measures.UsageSum)

Max Usage = Max(Descendants(Time.Day.CurrentMember, Time.Day),
Measures.HourOfDayMaxUsage).

In that way we might benefit from aggregations that will be done for a
dimension with two levels and 24 members.



Thanks again,

Shirly.





"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com...
Hi Shirly,

Some thoughts on your problem...

First of all, regarding your calculated member, using a
NONEMPTYCROSSJOIN
to
remove any hours with no activity at all could improve performance if
a)
some
apps are only very rarely used, and/or b) usage of your apps is
restricted
to
only some hours of the day. So your calculated member would look
something
like this:
Max(
NONEMPTYCROSSJOIN(
Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour])
)
, [Measures].[UsageSum])

Secondly, to answer your question about how to only build aggregations
at
specific levels of dimensions you need to look at the 'Aggregation
Usage'
property of a dimension (accessible via the cube editor by selecting
the
dimension and then looking at the Advanced tab in the properties
pane).
Setting this to 'Bottom level only' for a dimension would ensure that
AS
would only consider building aggregations at the bottom level for this
particular dimension. That said, with 40000 members at the leaf level
of
your
time dimension you're unlikely to be able to build any aggregations
using
the
Storage Design Wizard or even the Usage Based Optimisation wizard - my
guess
is that they would fall foul of the 1/3 rule. In that case you could
try
building aggregations manually using the Partition Manager tool that
ships
(in an early version) with the SQL Server Resource kit or (in an
updated
version) with the BI Accelerator. You might want to read the following
recent
thread for some help on deciding which aggregations you build when
using
this
tool:


http://groups.google.co.uk/groups?hl...com%26rnum%3D3

Thirdly, partitioning by Time might help query performance lots here.
Are
you doing this?

Lastly, and possibly most unhelpfully, I would suggest that the design
of
your time dimension could well be part of the problem here. Is there
any
way
you could break it up into maybe two dimensions - a week/day dimension
and
a
time-of-day dimension? The combined number of members on both these
dimensions would be far less than the number of members on your
existing
dimension, and as a result query performance is likely to improve. The
big
disadvantage of this approach though is that users can no longer drill
down
from day to hour level - they need to do a crossjoin instead.

HTH,

Chris



"Shirly Baruch" wrote:

Hi,

We have a critical problem with the query performance of a
calculated
measure that uses the Max function over a time dimension that
includes
members till the hour level.

The current solution that we have includes:

1. A measure (i.e. UsageSum) that sums up the usage with all
of
the
dimensions

2. A calculated member (i.e. Max Usage) that takes the hour
with
the
max value of UsageSum – Max(Descendents([Time].[Hour].CurrentMember,
[Time].[Hour].[Hour]), [Measures].[UsageSum])



The cube includes data of MAX usage of licenses. It uses 8
dimensions
(User,
Project, Time.Hour, License…). Its physical size is only 400MB. We
have
40,000 leaf members in the Time dimension (since it is in the hour
granularity). There are only 1400 leaf members in the project
dimension
and
4000 members in the leaf level of the License dimension.

TimeHour
Site
Group
License
Project
User
Max Used Licenses

11/22/2004 8:00
S1
G1
L1
P1
U1
7

11/22/2004 8:00
S1
G1
L1
P1
U2
5

11/22/2004 8:00
S1
G1
L1
P1
U3
3


Max Used Licenses means – how many concurrent licenses have been
used
during
a specific hour. This is our raw data – we don’t have the actual
events
(Start Time ? End Time)



The problem is that the "Max Usage" member is calculated in run time
and
when running, for example, a query on the max usage of projects and
licenses
in last 10 weeks, it takes ~100 seconds to get the results.

Note that in order to get the correct max usage we must perform two
aggregation actions, sum over all of the dimensions and only
afterwards
max
over the time.



1. Do you have any suggestion how to improve the performance
of
this
calculated member? (We have followed most of the recommendations in
the
article


http://www.microsoft.com/technet/pro...ansvcspg.mspx).

2. Do you have any idea for another method to calculate the
max
usage?

3. How can we control the aggregations of the cube so there
will
be
aggregations on the leaf level of the Time, the Project and the
License
dimension? We were trying to affect the design storage using DSO
programming
with AddGoalQuery but it did not look like it improves the
performance.
We
did not try to use the usage-based optimization.



Thanks,

Shirly.











Reply With Quote
  #7  
Old   
Shirly Baruch
 
Posts: n/a

Default Re: Calculating Max Usage with OLAP - 12-07-2004 , 03:33 AM



Hi Chris,

It seems like we gained better performance by creating a Hour-Of-Day
dimension. In addtion we defined "Non Empty Behavior" to the calculated
measurs. We are testing this solution.

Regards,
Shirly.

"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote

Quote:
Hi Shirly,

Sorry, I didn't properly understand your requirements - you're right, you
do
need the hour data. Did any of my other suggestions make a difference?

Regards,

Chris

"Shirly Baruch" wrote:

Hi again,

Calculating the max value in time requires the hour level even if we do
not
need to show it.
The order of actions is important for the data correctness.
First the data should be summed up in all of the dimensions and only
then we
can take the max on the time dimension.
In that way the max value for a day can be a different value if you
select
different members in the background or if we select non-leaf members.
For example:
P1 and P2 are two projects under the same division D1.
Let 's say the raw data is:
Site Project User Time Usage
S1 P1 a 200410101600 5
S1 P2 b 200410101700 10
S1 P1 c 200410101600 5
The max usage for a day is:
Site Project Time Usage
S1 P1 20041010 10
S1 P2 20041010 5
If we query the data that is based on the max usage of a day, we'll get
a
wrong result for D1:
Site Project Time Usage
S1 D1 20041010 15
There wasn't any hour during the day with 15 licenses in use. The
correct
max value is still 10.
Therefore we have to use the hour level data for querying in the
different
levels of the dimensions.

Thanks,
Shirly.

"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:9543EA7A-C779-4A46-8243-D42E3C5A0609 (AT) microsoft (DOT) com...
Hi Shirly,

1) Nonetheless, it's probably a good idea to keep the NECJ statement
inside
your calculated measure. It won't hurt, and might make a difference in
a
few
cases where you are filtering on members from the majority of your
dimensions.
2) I'm sorry, I should have been a bit clearer here. It's not the BI
Accelerator itself which will help, but a tool called 'Partition
Manager'
which gets installed as part of the BI Accelerator. It allows you to
look
at
each partition in your cube and see exactly which aggregations have
been
built, and which levels they are at. If you read the thread I
mentioned in
my
last post, it will give you details of how to find which aggregations
you
need to build to service your queries; and as I said, if your queries
are
always at the leaf level of many of your dimensions, it's likely that
the
Storage Design Wizard and the Usage-Based Optimization Wizard might
not be
building the aggregations you need.
3) If you're already partitioning by month, and setting the slice
value of
your partition properly, then I doubt that partitioning by week is
going
to
make a big difference to your performance.
4) If you don't actually need to drill down to the hour level, then
why
not
remove it completely? You can then calculate the Max usage for a day
when
you're loading the data into your fact table, and make day rather than
hour
the granularity of your cube - it will mean that your data load takes
longer,
but it's likely to make your queries much faster.

Regards,

Chris

"Shirly Baruch" wrote:

Hi Chirs,



Thanks for your help!



1. We have usage data of sites across the world so there is almost
no
idle
hours. Therefore using NonEmptyCorssJoin within the measure
definition
did
not improve the performance.

2. I installed the BI Accelerator. How can it help with creating
aggregations? I could find only a way to define the performance gain
and
the
paritions' count using PartAggUtil.exe.

3. I am using partitions on the Month level. The data is usually
being
queried on the Week level. Do you think that defining partitions on
a
different level can improve the performance?

4. Your last suggestion is a great idea! Drilling down to the hour
level
is
not a string requirement for us. We can have a hidden dimension for
the
hour-of-day and have two nested calculated members:

HourOfDayMaxUsage = Max(HourOfDay.Members,
Measures.UsageSum)

Max Usage = Max(Descendants(Time.Day.CurrentMember,
Time.Day),
Measures.HourOfDayMaxUsage).

In that way we might benefit from aggregations that will be done
for a
dimension with two levels and 24 members.



Thanks again,

Shirly.





"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in
message
news:A3782068-C677-4B4C-A426-986B9163F681 (AT) microsoft (DOT) com...
Hi Shirly,

Some thoughts on your problem...

First of all, regarding your calculated member, using a
NONEMPTYCROSSJOIN
to
remove any hours with no activity at all could improve performance
if
a)
some
apps are only very rarely used, and/or b) usage of your apps is
restricted
to
only some hours of the day. So your calculated member would look
something
like this:
Max(
NONEMPTYCROSSJOIN(
Descendents([Time].[Hour].CurrentMember,[Time].[Hour].[Hour])
)
, [Measures].[UsageSum])

Secondly, to answer your question about how to only build
aggregations
at
specific levels of dimensions you need to look at the 'Aggregation
Usage'
property of a dimension (accessible via the cube editor by
selecting
the
dimension and then looking at the Advanced tab in the properties
pane).
Setting this to 'Bottom level only' for a dimension would ensure
that
AS
would only consider building aggregations at the bottom level for
this
particular dimension. That said, with 40000 members at the leaf
level
of
your
time dimension you're unlikely to be able to build any
aggregations
using
the
Storage Design Wizard or even the Usage Based Optimisation
wizard - my
guess
is that they would fall foul of the 1/3 rule. In that case you
could
try
building aggregations manually using the Partition Manager tool
that
ships
(in an early version) with the SQL Server Resource kit or (in an
updated
version) with the BI Accelerator. You might want to read the
following
recent
thread for some help on deciding which aggregations you build when
using
this
tool:



http://groups.google.co.uk/groups?hl...com%26rnum%3D3

Thirdly, partitioning by Time might help query performance lots
here.
Are
you doing this?

Lastly, and possibly most unhelpfully, I would suggest that the
design
of
your time dimension could well be part of the problem here. Is
there
any
way
you could break it up into maybe two dimensions - a week/day
dimension
and
a
time-of-day dimension? The combined number of members on both
these
dimensions would be far less than the number of members on your
existing
dimension, and as a result query performance is likely to improve.
The
big
disadvantage of this approach though is that users can no longer
drill
down
from day to hour level - they need to do a crossjoin instead.

HTH,

Chris



"Shirly Baruch" wrote:

Hi,

We have a critical problem with the query performance of a
calculated
measure that uses the Max function over a time dimension that
includes
members till the hour level.

The current solution that we have includes:

1. A measure (i.e. UsageSum) that sums up the usage with
all
of
the
dimensions

2. A calculated member (i.e. Max Usage) that takes the
hour
with
the
max value of UsageSum –
Max(Descendents([Time].[Hour].CurrentMember,
[Time].[Hour].[Hour]), [Measures].[UsageSum])



The cube includes data of MAX usage of licenses. It uses 8
dimensions
(User,
Project, Time.Hour, License…). Its physical size is only 400MB.
We
have
40,000 leaf members in the Time dimension (since it is in the
hour
granularity). There are only 1400 leaf members in the project
dimension
and
4000 members in the leaf level of the License dimension.

TimeHour
Site
Group
License
Project
User
Max Used Licenses

11/22/2004 8:00
S1
G1
L1
P1
U1
7

11/22/2004 8:00
S1
G1
L1
P1
U2
5

11/22/2004 8:00
S1
G1
L1
P1
U3
3


Max Used Licenses means – how many concurrent licenses have been
used
during
a specific hour. This is our raw data – we don’t have the actual
events
(Start Time ? End Time)



The problem is that the "Max Usage" member is calculated in run
time
and
when running, for example, a query on the max usage of projects
and
licenses
in last 10 weeks, it takes ~100 seconds to get the results.

Note that in order to get the correct max usage we must perform
two
aggregation actions, sum over all of the dimensions and only
afterwards
max
over the time.



1. Do you have any suggestion how to improve the
performance
of
this
calculated member? (We have followed most of the recommendations
in
the
article



http://www.microsoft.com/technet/pro...ansvcspg.mspx).

2. Do you have any idea for another method to calculate
the
max
usage?

3. How can we control the aggregations of the cube so
there
will
be
aggregations on the leaf level of the Time, the Project and the
License
dimension? We were trying to affect the design storage using DSO
programming
with AddGoalQuery but it did not look like it improves the
performance.
We
did not try to use the usage-based optimization.



Thanks,

Shirly.













Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.