dbTalk Databases Forums  

AS2005 very poor performance w/ DistinctCount

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss AS2005 very poor performance w/ DistinctCount in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Tiago Rente
 
Posts: n/a

Default AS2005 very poor performance w/ DistinctCount - 08-09-2006 , 11:58 AM






Hi,

We are a performance issue with a cube in AS2005 with the following
characteristics:
• Measure:
o 1 DistinctCount – MeasureA.
o 1 Sum – MeasureB.
o 1 Calculated – MeasureC = MeasureB/MeasureA. Empty behavior is activated
for both base measures.
• Data can be divided in 2 clusters:
o For data type X we load ~3 million rows per month (partition) to AS2005.
o For data type Y we load ~25 million rows per month (partition) to AS2005.
• Dimensions: 11 dimensions with 22 hierarchies.
• One dimension has a default value to restrict data to type Y.
• We have 24 partitions – 12 monthly partitions for each data type. All
partitions have defined slices for both month and data type.

The performance is very poor and we are not able to reduce the data that
feeds AS2005. How can we improve the performance?
We tried creating aggregations but the partitions’ directory increase
“exponentially”.

Wasn’t supposed for AS2005 to improve the performance for DistinctCount
measures?

Any help will be appreciated.

Thanks in advance.

Reply With Quote
  #2  
Old   
Jj
 
Posts: n/a

Default Re: AS2005 very poor performance w/ DistinctCount - 08-09-2006 , 12:09 PM






can you quickly descibe your hardware and the time taken to answer your slow
queries?
what is the % of aggregations?
does the dcount and the Sum measures are into 2 separate measure group?
does the problem still here with a warm cache? (reecuting the same queries
take the same time or the performance is better?)



"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hi,

We are a performance issue with a cube in AS2005 with the following
characteristics:
. Measure:
o 1 DistinctCount - MeasureA.
o 1 Sum - MeasureB.
o 1 Calculated - MeasureC = MeasureB/MeasureA. Empty behavior is
activated
for both base measures.
. Data can be divided in 2 clusters:
o For data type X we load ~3 million rows per month (partition) to
AS2005.
o For data type Y we load ~25 million rows per month (partition) to
AS2005.
. Dimensions: 11 dimensions with 22 hierarchies.
. One dimension has a default value to restrict data to type Y.
. We have 24 partitions - 12 monthly partitions for each data type. All
partitions have defined slices for both month and data type.

The performance is very poor and we are not able to reduce the data that
feeds AS2005. How can we improve the performance?
We tried creating aggregations but the partitions' directory increase
"exponentially".

Wasn't supposed for AS2005 to improve the performance for DistinctCount
measures?

Any help will be appreciated.

Thanks in advance.



Reply With Quote
  #3  
Old   
Tiago Rente
 
Posts: n/a

Default Re: AS2005 very poor performance w/ DistinctCount - 08-10-2006 , 04:10 AM



Hi,

We have two environments:
1. 32-bit HP server with 8 x 2.5 GHz Xeon - 12 GB RAM - running W2K3 SP1
w/ parameter /3Gb.
2. 64-bit HP server with 4 x 1.5 GHz Itanium 2 - 16 GB RAM - running W2K3
SP1.

Queries take normally more than 10 minutes do execute.

Currently we do not have set the aggregations since we are not able to do
the usage based optimization. I.e., Althought, we have queries in the
QueryLog table the aggreagations created in the usage based optimization are
not related to just the queries done by the users!
In a previous version of the cube creating ~30% aggreagations, each
partition had a file with almost 2 GB in size. This is not practical since we
process the cubes in a staging server and then copied to the end-users server.

The 2 measures are in the same measure group since the 3rd measure uses the
other 2.
I forgot to set the default measure as the MeasureB (sum). Would it increase
the performance?

Even if I execute the some report repeatly the second/third time takes more
or less the same time.
However, there were some reports that were execute is seconds, but this was
a never refreshed query.

Most of the tests were done in the 32-bit server.

Thanks.

"Jéjé" wrote:

Quote:
can you quickly descibe your hardware and the time taken to answer your slow
queries?
what is the % of aggregations?
does the dcount and the Sum measures are into 2 separate measure group?
does the problem still here with a warm cache? (reecuting the same queries
take the same time or the performance is better?)



"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote in message
news:F2BEB525-2FCA-4F49-A135-4B17E503D97F (AT) microsoft (DOT) com...
Hi,

We are a performance issue with a cube in AS2005 with the following
characteristics:
. Measure:
o 1 DistinctCount - MeasureA.
o 1 Sum - MeasureB.
o 1 Calculated - MeasureC = MeasureB/MeasureA. Empty behavior is
activated
for both base measures.
. Data can be divided in 2 clusters:
o For data type X we load ~3 million rows per month (partition) to
AS2005.
o For data type Y we load ~25 million rows per month (partition) to
AS2005.
. Dimensions: 11 dimensions with 22 hierarchies.
. One dimension has a default value to restrict data to type Y.
. We have 24 partitions - 12 monthly partitions for each data type. All
partitions have defined slices for both month and data type.

The performance is very poor and we are not able to reduce the data that
feeds AS2005. How can we improve the performance?
We tried creating aggregations but the partitions' directory increase
"exponentially".

Wasn't supposed for AS2005 to improve the performance for DistinctCount
measures?

Any help will be appreciated.

Thanks in advance.




Reply With Quote
  #4  
Old   
Jj
 
Posts: n/a

Default Re: AS2005 very poor performance w/ DistinctCount - 08-10-2006 , 06:42 AM



the third measure?
the calculated measure is in a separate measuregroup???
its really important to create a dedicated measuregroup for your DCount
measure. (with only this measure in this group)
how the non empty behavior is setup for the calculated measure?

also, create starting aggregations. this size on the disk is required, no
aggregation, no performance.
there is no miracle.
I have a cube with 2 dcounts (1 against 80 millions of rows / partition;
second against 10 millions / partition; + standard sums and count measure)
the response time is 10 seconds on a 2 * dual core server and with a warm
cache, the same result takes only 1 or 2 seconds.
my formula divide the 2 dcount measures. (and I have other measures like sum
/ dcount)

"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hi,

We have two environments:
1. 32-bit HP server with 8 x 2.5 GHz Xeon - 12 GB RAM - running W2K3 SP1
w/ parameter /3Gb.
2. 64-bit HP server with 4 x 1.5 GHz Itanium 2 - 16 GB RAM - running W2K3
SP1.

Queries take normally more than 10 minutes do execute.

Currently we do not have set the aggregations since we are not able to do
the usage based optimization. I.e., Althought, we have queries in the
QueryLog table the aggreagations created in the usage based optimization
are
not related to just the queries done by the users!
In a previous version of the cube creating ~30% aggreagations, each
partition had a file with almost 2 GB in size. This is not practical since
we
process the cubes in a staging server and then copied to the end-users
server.

The 2 measures are in the same measure group since the 3rd measure uses
the
other 2.
I forgot to set the default measure as the MeasureB (sum). Would it
increase
the performance?

Even if I execute the some report repeatly the second/third time takes
more
or less the same time.
However, there were some reports that were execute is seconds, but this
was
a never refreshed query.

Most of the tests were done in the 32-bit server.

Thanks.

"Jj" wrote:

can you quickly descibe your hardware and the time taken to answer your
slow
queries?
what is the % of aggregations?
does the dcount and the Sum measures are into 2 separate measure group?
does the problem still here with a warm cache? (reecuting the same
queries
take the same time or the performance is better?)



"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote in message
news:F2BEB525-2FCA-4F49-A135-4B17E503D97F (AT) microsoft (DOT) com...
Hi,

We are a performance issue with a cube in AS2005 with the following
characteristics:
. Measure:
o 1 DistinctCount - MeasureA.
o 1 Sum - MeasureB.
o 1 Calculated - MeasureC = MeasureB/MeasureA. Empty behavior is
activated
for both base measures.
. Data can be divided in 2 clusters:
o For data type X we load ~3 million rows per month (partition) to
AS2005.
o For data type Y we load ~25 million rows per month (partition) to
AS2005.
. Dimensions: 11 dimensions with 22 hierarchies.
. One dimension has a default value to restrict data to type Y.
. We have 24 partitions - 12 monthly partitions for each data type. All
partitions have defined slices for both month and data type.

The performance is very poor and we are not able to reduce the data
that
feeds AS2005. How can we improve the performance?
We tried creating aggregations but the partitions' directory increase
"exponentially".

Wasn't supposed for AS2005 to improve the performance for DistinctCount
measures?

Any help will be appreciated.

Thanks in advance.






Reply With Quote
  #5  
Old   
Tiago Rente
 
Posts: n/a

Default Re: AS2005 very poor performance w/ DistinctCount - 08-10-2006 , 08:02 AM



All measures are in the same measure group.

If I understood correctly you suggestion is:
* Create a measure group (MeasureGroupX) just for the DistinctCount
MeasureA.
* Create a measure group (MeasureGroupY) containing the other to measures
MeasureB + MeasureC (= MeasureB/MeasureA).
* Create aggregations.

Do I need to create aggregations on the MeasureGroupX?
If so how do I ensure that only the aggregations for which I a query from
the users is created?
I tried creating aggregations using the usage-based optimization (UBO) it
created more aggregations then queries. I had 2 distinct queries and it
created more than 100 aggregations when I stopped the UBO.
In AS2000 the UBO does not create more aggregations besides the ones in the
QueryLog.

Creating the two measure groups doesn't it implies loading the million of
rows twice??

Did you change anything in the AS2005 configuration? E.g.,
OLAP\ProcessPlan\DistinctBuffer.

I will try to create the two measure groups.

Thanks.

"Jéjé" wrote:

Quote:
the third measure?
the calculated measure is in a separate measuregroup???
its really important to create a dedicated measuregroup for your DCount
measure. (with only this measure in this group)
how the non empty behavior is setup for the calculated measure?

also, create starting aggregations. this size on the disk is required, no
aggregation, no performance.
there is no miracle.
I have a cube with 2 dcounts (1 against 80 millions of rows / partition;
second against 10 millions / partition; + standard sums and count measure)
the response time is 10 seconds on a 2 * dual core server and with a warm
cache, the same result takes only 1 or 2 seconds.
my formula divide the 2 dcount measures. (and I have other measures like sum
/ dcount)

"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote in message
news:260C7F37-6F7B-44C8-8B59-6A4D8DD12D2B (AT) microsoft (DOT) com...
Hi,

We have two environments:
1. 32-bit HP server with 8 x 2.5 GHz Xeon - 12 GB RAM - running W2K3 SP1
w/ parameter /3Gb.
2. 64-bit HP server with 4 x 1.5 GHz Itanium 2 - 16 GB RAM - running W2K3
SP1.

Queries take normally more than 10 minutes do execute.

Currently we do not have set the aggregations since we are not able to do
the usage based optimization. I.e., Althought, we have queries in the
QueryLog table the aggreagations created in the usage based optimization
are
not related to just the queries done by the users!
In a previous version of the cube creating ~30% aggreagations, each
partition had a file with almost 2 GB in size. This is not practical since
we
process the cubes in a staging server and then copied to the end-users
server.

The 2 measures are in the same measure group since the 3rd measure uses
the
other 2.
I forgot to set the default measure as the MeasureB (sum). Would it
increase
the performance?

Even if I execute the some report repeatly the second/third time takes
more
or less the same time.
However, there were some reports that were execute is seconds, but this
was
a never refreshed query.

Most of the tests were done in the 32-bit server.

Thanks.

"Jéjé" wrote:

can you quickly descibe your hardware and the time taken to answer your
slow
queries?
what is the % of aggregations?
does the dcount and the Sum measures are into 2 separate measure group?
does the problem still here with a warm cache? (reecuting the same
queries
take the same time or the performance is better?)



"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote in message
news:F2BEB525-2FCA-4F49-A135-4B17E503D97F (AT) microsoft (DOT) com...
Hi,

We are a performance issue with a cube in AS2005 with the following
characteristics:
. Measure:
o 1 DistinctCount - MeasureA.
o 1 Sum - MeasureB.
o 1 Calculated - MeasureC = MeasureB/MeasureA. Empty behavior is
activated
for both base measures.
. Data can be divided in 2 clusters:
o For data type X we load ~3 million rows per month (partition) to
AS2005.
o For data type Y we load ~25 million rows per month (partition) to
AS2005.
. Dimensions: 11 dimensions with 22 hierarchies.
. One dimension has a default value to restrict data to type Y.
. We have 24 partitions - 12 monthly partitions for each data type. All
partitions have defined slices for both month and data type.

The performance is very poor and we are not able to reduce the data
that
feeds AS2005. How can we improve the performance?
We tried creating aggregations but the partitions' directory increase
"exponentially".

Wasn't supposed for AS2005 to improve the performance for DistinctCount
measures?

Any help will be appreciated.

Thanks in advance.







Reply With Quote
  #6  
Old   
Jj
 
Posts: n/a

Default Re: AS2005 very poor performance w/ DistinctCount - 08-10-2006 , 11:06 AM



this load 2 times the rows, yes
but the resulting aggregations are correctly calculated.
so, normally, instead of having gb of aggregated data, you'll have Mb
if you keep the 2 measures into 1 measuregroup only, this produce a huge
amount of aggregated data, its why 2 measuregroup is recommanded.

you have to aggregate the 2 measuregroups
try to aggregate first without using the UBO, then use the UBO later.

the other option if you don't want to load 2 times the rows is to create a
dynamic dcount instead of a "physical" dcount
in this case you'll count the members of your dimension, but its slower then
the physical dcount.

does your MeasureC = MeasureB/MeasureA is a calculated measure? or a
"physical" measure?
(should be calculated one)


"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote

Quote:
All measures are in the same measure group.

If I understood correctly you suggestion is:
* Create a measure group (MeasureGroupX) just for the DistinctCount
MeasureA.
* Create a measure group (MeasureGroupY) containing the other to measures
MeasureB + MeasureC (= MeasureB/MeasureA).
* Create aggregations.

Do I need to create aggregations on the MeasureGroupX?
If so how do I ensure that only the aggregations for which I a query from
the users is created?
I tried creating aggregations using the usage-based optimization (UBO) it
created more aggregations then queries. I had 2 distinct queries and it
created more than 100 aggregations when I stopped the UBO.
In AS2000 the UBO does not create more aggregations besides the ones in
the
QueryLog.

Creating the two measure groups doesn't it implies loading the million of
rows twice??

Did you change anything in the AS2005 configuration? E.g.,
OLAP\ProcessPlan\DistinctBuffer.

I will try to create the two measure groups.

Thanks.

"Jj" wrote:

the third measure?
the calculated measure is in a separate measuregroup???
its really important to create a dedicated measuregroup for your DCount
measure. (with only this measure in this group)
how the non empty behavior is setup for the calculated measure?

also, create starting aggregations. this size on the disk is required, no
aggregation, no performance.
there is no miracle.
I have a cube with 2 dcounts (1 against 80 millions of rows / partition;
second against 10 millions / partition; + standard sums and count
measure)
the response time is 10 seconds on a 2 * dual core server and with a warm
cache, the same result takes only 1 or 2 seconds.
my formula divide the 2 dcount measures. (and I have other measures like
sum
/ dcount)

"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote in message
news:260C7F37-6F7B-44C8-8B59-6A4D8DD12D2B (AT) microsoft (DOT) com...
Hi,

We have two environments:
1. 32-bit HP server with 8 x 2.5 GHz Xeon - 12 GB RAM - running W2K3
SP1
w/ parameter /3Gb.
2. 64-bit HP server with 4 x 1.5 GHz Itanium 2 - 16 GB RAM - running
W2K3
SP1.

Queries take normally more than 10 minutes do execute.

Currently we do not have set the aggregations since we are not able to
do
the usage based optimization. I.e., Althought, we have queries in the
QueryLog table the aggreagations created in the usage based
optimization
are
not related to just the queries done by the users!
In a previous version of the cube creating ~30% aggreagations, each
partition had a file with almost 2 GB in size. This is not practical
since
we
process the cubes in a staging server and then copied to the end-users
server.

The 2 measures are in the same measure group since the 3rd measure uses
the
other 2.
I forgot to set the default measure as the MeasureB (sum). Would it
increase
the performance?

Even if I execute the some report repeatly the second/third time takes
more
or less the same time.
However, there were some reports that were execute is seconds, but this
was
a never refreshed query.

Most of the tests were done in the 32-bit server.

Thanks.

"Jj" wrote:

can you quickly descibe your hardware and the time taken to answer
your
slow
queries?
what is the % of aggregations?
does the dcount and the Sum measures are into 2 separate measure
group?
does the problem still here with a warm cache? (reecuting the same
queries
take the same time or the performance is better?)



"Tiago Rente" <TiagoRente (AT) discussions (DOT) microsoft.com> wrote in message
news:F2BEB525-2FCA-4F49-A135-4B17E503D97F (AT) microsoft (DOT) com...
Hi,

We are a performance issue with a cube in AS2005 with the following
characteristics:
. Measure:
o 1 DistinctCount - MeasureA.
o 1 Sum - MeasureB.
o 1 Calculated - MeasureC = MeasureB/MeasureA. Empty behavior is
activated
for both base measures.
. Data can be divided in 2 clusters:
o For data type X we load ~3 million rows per month (partition) to
AS2005.
o For data type Y we load ~25 million rows per month (partition) to
AS2005.
. Dimensions: 11 dimensions with 22 hierarchies.
. One dimension has a default value to restrict data to type Y.
. We have 24 partitions - 12 monthly partitions for each data type.
All
partitions have defined slices for both month and data type.

The performance is very poor and we are not able to reduce the data
that
feeds AS2005. How can we improve the performance?
We tried creating aggregations but the partitions' directory
increase
"exponentially".

Wasn't supposed for AS2005 to improve the performance for
DistinctCount
measures?

Any help will be appreciated.

Thanks in advance.









Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.