dbTalk Databases Forums  

MDX query performance

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss MDX query performance in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
PReese
 
Posts: n/a

Default MDX query performance - 07-21-2006 , 03:10 PM






We have a VB.net application that cycles through a cube to do a large number
of calculations for each customer listed in the cube.

The cube has 5 measures and 10 dimensions. In AS2K, we were able to get it
to work fine, but the exact same code in AS 2005 is at least 10 times slower.

The application opens a cellset with an MDX select that references no more
than 3 dimensions and 4 measures at a time. The code loops through each
customer in the customer dimension (there are generally about 60,000 in the
dimension), does a series of multi-dimensional lookups in the cellset (i.e.,
an index is specified for each axis to get to a single value).

The application degrades miserably over time, as memory is consumed. In
2000 we reopened the cellset every n rows and that seemed to fix the problem.

Since this is not a typical OLAP application, and caching would not seem to
help much since each access is different, any advice you could give on how to
optimize this type of access would be great.

Overall the application does about 100 multi-dimensional calculations for
each of 3 million customers (we rebuild the cube with 60K customers each time
through the loop). In AS2K it runs in about 6 hours, and we can't get the
new version anywhere near that.

Reply With Quote
  #2  
Old   
Jeje
 
Posts: n/a

Default Re: MDX query performance - 07-22-2006 , 08:25 AM






can I ask what do you do with the result of this scanning?

In AS2005, the cache is managed by the server, in AS2000 the cache is in the
client application (cache of calculations, the loaded data are caced in the
server)
so everytime you close a connection, you release the cache while AS2005 keep
it in memory.
you can release the cache on the server by calling the clearcache xml/a
command.

maybe you have to redesign your queries to take advantage of the new AS2005
MDX features:
to increase the performance, try to use the subcube feature of AS2005
select .... from (Select ... from mycube)
in some cases this greatly improve the performance

use the exists / existsing / nonempty functions
etc...

also, you can take a look at the aggregations created in your partitions.
and you can manually add some of them to improve the performance (again,
take a look at the XML/A scripts)


"PReese" <PReese (AT) discussions (DOT) microsoft.com> wrote

Quote:
We have a VB.net application that cycles through a cube to do a large
number
of calculations for each customer listed in the cube.

The cube has 5 measures and 10 dimensions. In AS2K, we were able to get
it
to work fine, but the exact same code in AS 2005 is at least 10 times
slower.

The application opens a cellset with an MDX select that references no more
than 3 dimensions and 4 measures at a time. The code loops through each
customer in the customer dimension (there are generally about 60,000 in
the
dimension), does a series of multi-dimensional lookups in the cellset
(i.e.,
an index is specified for each axis to get to a single value).

The application degrades miserably over time, as memory is consumed. In
2000 we reopened the cellset every n rows and that seemed to fix the
problem.

Since this is not a typical OLAP application, and caching would not seem
to
help much since each access is different, any advice you could give on how
to
optimize this type of access would be great.

Overall the application does about 100 multi-dimensional calculations for
each of 3 million customers (we rebuild the cube with 60K customers each
time
through the loop). In AS2K it runs in about 6 hours, and we can't get the
new version anywhere near that.



Reply With Quote
  #3  
Old   
PReese
 
Posts: n/a

Default Re: MDX query performance - 07-24-2006 , 03:16 PM



The results of our calculations are stored in a dataset and later fed by
another application to a third party for crunching.



In short, the application is fairly straightforward:



First, we create a string to query the cube and return a cellset:



strMDX = "select {[CustomerID].Members} on axis (0),
{[RecencyYear].Members} on axis(1), " & _

" {[ProductCategory].Members} on axis(2), " & _

" {[Measures].[ Line Item Count], [Measures].[Extended Price]}
on axis(3) " & _

" From [Model Scoring]"



The CustomerID dimension has about 20,000 members in our test environment.

RecencyYear has 9 members indicating how recently a customer purchased
something in years.

ProductCategory has about 20 members indicating high level product
categorizations.



Then, we open the cellset and go into a loop as follows:



For (I = 0 to number of customers)



‘Product category 2, line items within 0-12 months

Dataset.Item(i).value1 = cellset.Item(i+1, 0, 2, 0).Value



‘dollars, product category 6, recency 25-36

Dataset.Item(i).value2 = cellset.Item(i+1, 2, 6, 1).Value



…and so on



End For



We just index directly into the multiple dimensions and measures in the
cellset and save the values into our VB dataset.



With AS2000, for example, one routine could process 20,000 customers in 2
minutes. The exact same code takes 20 minutes in AS2005.



We have tried subcubes, cubes with query-based optimization aggregations,
cubes with no aggregations, etc. with no performance impact.



It seems that it ought to be simple. Although the cube itself has 9
dimensions, our cellsets don’t have more than the example above.



Both the application and the msmdsrv process seem to page fault a lot. If
we close and re-open the cellset every 1000 customers, it helps, otherwise it
takes even longer than 20 minutes.






--------------------------------------------------------------------------------

From: Paul Reese
Sent: Monday, July 24, 2006 8:17 AM
To: Tom Lange
Subject: FW: Microsoft Community Notification - MDX query performance



can I ask what do you do with the result of this scanning?

In AS2005, the cache is managed by the server, in AS2000 the cache is in the
client application (cache of calculations, the loaded data are caced in the
server)
so everytime you close a connection, you release the cache while AS2005 keep
it in memory.
you can release the cache on the server by calling the clearcache xml/a
command.

maybe you have to redesign your queries to take advantage of the new AS2005
MDX features:
to increase the performance, try to use the subcube feature of AS2005
select .... from (Select ... from mycube)
in some cases this greatly improve the performance

use the exists / existsing / nonempty functions
etc...

also, you can take a look at the aggregations created in your partitions.
and you can manually add some of them to improve the performance (again,
take a look at the XML/A scripts)



"Jeje" wrote:

Quote:
can I ask what do you do with the result of this scanning?

In AS2005, the cache is managed by the server, in AS2000 the cache is in the
client application (cache of calculations, the loaded data are caced in the
server)
so everytime you close a connection, you release the cache while AS2005 keep
it in memory.
you can release the cache on the server by calling the clearcache xml/a
command.

maybe you have to redesign your queries to take advantage of the new AS2005
MDX features:
to increase the performance, try to use the subcube feature of AS2005
select .... from (Select ... from mycube)
in some cases this greatly improve the performance

use the exists / existsing / nonempty functions
etc...

also, you can take a look at the aggregations created in your partitions.
and you can manually add some of them to improve the performance (again,
take a look at the XML/A scripts)


"PReese" <PReese (AT) discussions (DOT) microsoft.com> wrote in message
news:A5365DAD-E427-415E-825C-9D981A146433 (AT) microsoft (DOT) com...
We have a VB.net application that cycles through a cube to do a large
number
of calculations for each customer listed in the cube.

The cube has 5 measures and 10 dimensions. In AS2K, we were able to get
it
to work fine, but the exact same code in AS 2005 is at least 10 times
slower.

The application opens a cellset with an MDX select that references no more
than 3 dimensions and 4 measures at a time. The code loops through each
customer in the customer dimension (there are generally about 60,000 in
the
dimension), does a series of multi-dimensional lookups in the cellset
(i.e.,
an index is specified for each axis to get to a single value).

The application degrades miserably over time, as memory is consumed. In
2000 we reopened the cellset every n rows and that seemed to fix the
problem.

Since this is not a typical OLAP application, and caching would not seem
to
help much since each access is different, any advice you could give on how
to
optimize this type of access would be great.

Overall the application does about 100 multi-dimensional calculations for
each of 3 million customers (we rebuild the cube with 60K customers each
time
through the loop). In AS2K it runs in about 6 hours, and we can't get the
new version anywhere near that.




Reply With Quote
  #4  
Old   
Jeje
 
Posts: n/a

Default Re: MDX query performance - 07-24-2006 , 06:26 PM



ok...

use a query like:

select non empty ([CustomerID].Members * [RecencyYear].Members *
[ProductCategory].Members * ....) on 0
from MyCube

this will result in a big cellset but the non empty keyword greatly improve
the performance.
AS2005 also do an "autoexists" so only existing members are returned and
again, this improve the performance. but using multiple axis, I'm not sure
how AS handle this.

your current query appear to return a lot of non used data and empty cells.
if your objective is to identify something like "dollars, product category
6, recency 25-36"
create a query to return only this instead of created a huge empty result
which return empty cells and cells out of this range of value.

try to use the ExecuteReader function to retrieve the result, this improve
the performance too.
you'll scan the result only 1 time in a fast forward read only method.
Filling a cellset is an overhead, you have to load the result in memory
before you can use it, the Read object load the row, then discard the memory
when the next row is read.

executing more small resulting queries could provide better result then
returning everything.
specially because you'll take advantage of the AS cache and you'll reduce
the traffic.

also, in your connectionstring, try to force the usage of the compression
and try without.
AS2005 compress the result, this can improve the response time other a
network, but locally this provide an overhead and consume more CPU.

and if your are confortable in multithreading, try to create a multithreaded
application to take advantage of all your CPUs.


"PReese" <PReese (AT) discussions (DOT) microsoft.com> wrote

Quote:
The results of our calculations are stored in a dataset and later fed by
another application to a third party for crunching.



In short, the application is fairly straightforward:



First, we create a string to query the cube and return a cellset:



strMDX = "select {[CustomerID].Members} on axis (0),
{[RecencyYear].Members} on axis(1), " & _

" {[ProductCategory].Members} on axis(2), " & _

" {[Measures].[ Line Item Count], [Measures].[Extended Price]}
on axis(3) " & _

" From [Model Scoring]"



The CustomerID dimension has about 20,000 members in our test environment.

RecencyYear has 9 members indicating how recently a customer purchased
something in years.

ProductCategory has about 20 members indicating high level product
categorizations.



Then, we open the cellset and go into a loop as follows:



For (I = 0 to number of customers)



'Product category 2, line items within 0-12 months

Dataset.Item(i).value1 = cellset.Item(i+1, 0, 2, 0).Value



'dollars, product category 6, recency 25-36

Dataset.Item(i).value2 = cellset.Item(i+1, 2, 6, 1).Value



.and so on



End For



We just index directly into the multiple dimensions and measures in the
cellset and save the values into our VB dataset.



With AS2000, for example, one routine could process 20,000 customers in 2
minutes. The exact same code takes 20 minutes in AS2005.



We have tried subcubes, cubes with query-based optimization aggregations,
cubes with no aggregations, etc. with no performance impact.



It seems that it ought to be simple. Although the cube itself has 9
dimensions, our cellsets don't have more than the example above.



Both the application and the msmdsrv process seem to page fault a lot. If
we close and re-open the cellset every 1000 customers, it helps, otherwise
it
takes even longer than 20 minutes.






--------------------------------------------------------------------------------

From: Paul Reese
Sent: Monday, July 24, 2006 8:17 AM
To: Tom Lange
Subject: FW: Microsoft Community Notification - MDX query performance



can I ask what do you do with the result of this scanning?

In AS2005, the cache is managed by the server, in AS2000 the cache is in
the
client application (cache of calculations, the loaded data are caced in
the
server)
so everytime you close a connection, you release the cache while AS2005
keep
it in memory.
you can release the cache on the server by calling the clearcache xml/a
command.

maybe you have to redesign your queries to take advantage of the new
AS2005
MDX features:
to increase the performance, try to use the subcube feature of AS2005
select .... from (Select ... from mycube)
in some cases this greatly improve the performance

use the exists / existsing / nonempty functions
etc...

also, you can take a look at the aggregations created in your partitions.
and you can manually add some of them to improve the performance (again,
take a look at the XML/A scripts)



"Jeje" wrote:

can I ask what do you do with the result of this scanning?

In AS2005, the cache is managed by the server, in AS2000 the cache is in
the
client application (cache of calculations, the loaded data are caced in
the
server)
so everytime you close a connection, you release the cache while AS2005
keep
it in memory.
you can release the cache on the server by calling the clearcache xml/a
command.

maybe you have to redesign your queries to take advantage of the new
AS2005
MDX features:
to increase the performance, try to use the subcube feature of AS2005
select .... from (Select ... from mycube)
in some cases this greatly improve the performance

use the exists / existsing / nonempty functions
etc...

also, you can take a look at the aggregations created in your partitions.
and you can manually add some of them to improve the performance (again,
take a look at the XML/A scripts)


"PReese" <PReese (AT) discussions (DOT) microsoft.com> wrote in message
news:A5365DAD-E427-415E-825C-9D981A146433 (AT) microsoft (DOT) com...
We have a VB.net application that cycles through a cube to do a large
number
of calculations for each customer listed in the cube.

The cube has 5 measures and 10 dimensions. In AS2K, we were able to
get
it
to work fine, but the exact same code in AS 2005 is at least 10 times
slower.

The application opens a cellset with an MDX select that references no
more
than 3 dimensions and 4 measures at a time. The code loops through
each
customer in the customer dimension (there are generally about 60,000 in
the
dimension), does a series of multi-dimensional lookups in the cellset
(i.e.,
an index is specified for each axis to get to a single value).

The application degrades miserably over time, as memory is consumed.
In
2000 we reopened the cellset every n rows and that seemed to fix the
problem.

Since this is not a typical OLAP application, and caching would not
seem
to
help much since each access is different, any advice you could give on
how
to
optimize this type of access would be great.

Overall the application does about 100 multi-dimensional calculations
for
each of 3 million customers (we rebuild the cube with 60K customers
each
time
through the loop). In AS2K it runs in about 6 hours, and we can't get
the
new version anywhere near that.






Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.