![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
We have a VB.net application that cycles through a cube to do a large number of calculations for each customer listed in the cube. The cube has 5 measures and 10 dimensions. In AS2K, we were able to get it to work fine, but the exact same code in AS 2005 is at least 10 times slower. The application opens a cellset with an MDX select that references no more than 3 dimensions and 4 measures at a time. The code loops through each customer in the customer dimension (there are generally about 60,000 in the dimension), does a series of multi-dimensional lookups in the cellset (i.e., an index is specified for each axis to get to a single value). The application degrades miserably over time, as memory is consumed. In 2000 we reopened the cellset every n rows and that seemed to fix the problem. Since this is not a typical OLAP application, and caching would not seem to help much since each access is different, any advice you could give on how to optimize this type of access would be great. Overall the application does about 100 multi-dimensional calculations for each of 3 million customers (we rebuild the cube with 60K customers each time through the loop). In AS2K it runs in about 6 hours, and we can't get the new version anywhere near that. |
#3
| |||
| |||
|
|
can I ask what do you do with the result of this scanning? In AS2005, the cache is managed by the server, in AS2000 the cache is in the client application (cache of calculations, the loaded data are caced in the server) so everytime you close a connection, you release the cache while AS2005 keep it in memory. you can release the cache on the server by calling the clearcache xml/a command. maybe you have to redesign your queries to take advantage of the new AS2005 MDX features: to increase the performance, try to use the subcube feature of AS2005 select .... from (Select ... from mycube) in some cases this greatly improve the performance use the exists / existsing / nonempty functions etc... also, you can take a look at the aggregations created in your partitions. and you can manually add some of them to improve the performance (again, take a look at the XML/A scripts) "PReese" <PReese (AT) discussions (DOT) microsoft.com> wrote in message news:A5365DAD-E427-415E-825C-9D981A146433 (AT) microsoft (DOT) com... We have a VB.net application that cycles through a cube to do a large number of calculations for each customer listed in the cube. The cube has 5 measures and 10 dimensions. In AS2K, we were able to get it to work fine, but the exact same code in AS 2005 is at least 10 times slower. The application opens a cellset with an MDX select that references no more than 3 dimensions and 4 measures at a time. The code loops through each customer in the customer dimension (there are generally about 60,000 in the dimension), does a series of multi-dimensional lookups in the cellset (i.e., an index is specified for each axis to get to a single value). The application degrades miserably over time, as memory is consumed. In 2000 we reopened the cellset every n rows and that seemed to fix the problem. Since this is not a typical OLAP application, and caching would not seem to help much since each access is different, any advice you could give on how to optimize this type of access would be great. Overall the application does about 100 multi-dimensional calculations for each of 3 million customers (we rebuild the cube with 60K customers each time through the loop). In AS2K it runs in about 6 hours, and we can't get the new version anywhere near that. |
#4
| |||
| |||
|
|
The results of our calculations are stored in a dataset and later fed by another application to a third party for crunching. In short, the application is fairly straightforward: First, we create a string to query the cube and return a cellset: strMDX = "select {[CustomerID].Members} on axis (0), {[RecencyYear].Members} on axis(1), " & _ " {[ProductCategory].Members} on axis(2), " & _ " {[Measures].[ Line Item Count], [Measures].[Extended Price]} on axis(3) " & _ " From [Model Scoring]" The CustomerID dimension has about 20,000 members in our test environment. RecencyYear has 9 members indicating how recently a customer purchased something in years. ProductCategory has about 20 members indicating high level product categorizations. Then, we open the cellset and go into a loop as follows: For (I = 0 to number of customers) 'Product category 2, line items within 0-12 months Dataset.Item(i).value1 = cellset.Item(i+1, 0, 2, 0).Value 'dollars, product category 6, recency 25-36 Dataset.Item(i).value2 = cellset.Item(i+1, 2, 6, 1).Value .and so on End For We just index directly into the multiple dimensions and measures in the cellset and save the values into our VB dataset. With AS2000, for example, one routine could process 20,000 customers in 2 minutes. The exact same code takes 20 minutes in AS2005. We have tried subcubes, cubes with query-based optimization aggregations, cubes with no aggregations, etc. with no performance impact. It seems that it ought to be simple. Although the cube itself has 9 dimensions, our cellsets don't have more than the example above. Both the application and the msmdsrv process seem to page fault a lot. If we close and re-open the cellset every 1000 customers, it helps, otherwise it takes even longer than 20 minutes. -------------------------------------------------------------------------------- From: Paul Reese Sent: Monday, July 24, 2006 8:17 AM To: Tom Lange Subject: FW: Microsoft Community Notification - MDX query performance can I ask what do you do with the result of this scanning? In AS2005, the cache is managed by the server, in AS2000 the cache is in the client application (cache of calculations, the loaded data are caced in the server) so everytime you close a connection, you release the cache while AS2005 keep it in memory. you can release the cache on the server by calling the clearcache xml/a command. maybe you have to redesign your queries to take advantage of the new AS2005 MDX features: to increase the performance, try to use the subcube feature of AS2005 select .... from (Select ... from mycube) in some cases this greatly improve the performance use the exists / existsing / nonempty functions etc... also, you can take a look at the aggregations created in your partitions. and you can manually add some of them to improve the performance (again, take a look at the XML/A scripts) "Jeje" wrote: can I ask what do you do with the result of this scanning? In AS2005, the cache is managed by the server, in AS2000 the cache is in the client application (cache of calculations, the loaded data are caced in the server) so everytime you close a connection, you release the cache while AS2005 keep it in memory. you can release the cache on the server by calling the clearcache xml/a command. maybe you have to redesign your queries to take advantage of the new AS2005 MDX features: to increase the performance, try to use the subcube feature of AS2005 select .... from (Select ... from mycube) in some cases this greatly improve the performance use the exists / existsing / nonempty functions etc... also, you can take a look at the aggregations created in your partitions. and you can manually add some of them to improve the performance (again, take a look at the XML/A scripts) "PReese" <PReese (AT) discussions (DOT) microsoft.com> wrote in message news:A5365DAD-E427-415E-825C-9D981A146433 (AT) microsoft (DOT) com... We have a VB.net application that cycles through a cube to do a large number of calculations for each customer listed in the cube. The cube has 5 measures and 10 dimensions. In AS2K, we were able to get it to work fine, but the exact same code in AS 2005 is at least 10 times slower. The application opens a cellset with an MDX select that references no more than 3 dimensions and 4 measures at a time. The code loops through each customer in the customer dimension (there are generally about 60,000 in the dimension), does a series of multi-dimensional lookups in the cellset (i.e., an index is specified for each axis to get to a single value). The application degrades miserably over time, as memory is consumed. In 2000 we reopened the cellset every n rows and that seemed to fix the problem. Since this is not a typical OLAP application, and caching would not seem to help much since each access is different, any advice you could give on how to optimize this type of access would be great. Overall the application does about 100 multi-dimensional calculations for each of 3 million customers (we rebuild the cube with 60K customers each time through the loop). In AS2K it runs in about 6 hours, and we can't get the new version anywhere near that. |
![]() |
| Thread Tools | |
| Display Modes | |
| |