![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
[ This article is also available in a better readable format here: http://users.pandora.be/see/mdx/news.htm ] Given a cube with the following properties: - a measure Amount - a time dimension DimTime with a granularity of months (only Years and Months, no lower levels) - a client dimension DimClient - a product dimension DimProduct (consisting of a level Product and a level SubProduct) - a few other dimensions Consider the following requirement: A report must be made that shows the total Amount in a given period compared to the total Amount in the same period one year ago for each client for each product. The report has the following layout: (hopefully you read this in a non-proportional font, if not consider the link mentioned above) Product A SubProduct A Product B .. SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 .. .. All DimClient x y x y x y x y Client 1 x y x y x y x y Client 2 x y x y x y x y .. x y x y x y x y In pseudo-MDX: each x above = Sum(DimTime.startmonth imTime.endmonth,Measures.[Amount]). The start and end dates of the period are parameters: they are entered by the user and injected into the MDX. This makes it impossible to aggregate the sums beforehand. The report should only include those clients that have at least one (sub)product. Also only those (sub)products that are associated with at least one client should be shown. Because of the nature of this report a lot of data will be presented to the user. The dimension DimClient contains 12000 elements, DimProducts contains 60 elements. Depending on the chosen period and possibly other slicers (among other dimensions) the report may return up to 12000 rows and 120 columns. This looks like a pretty standard report to me, yet I could not find an MDX-pattern for this type of problem on the net. Neither did I find a hint in Spofford's book. Here are the 2 approaches I devised myself. MDX 1 - Selection based on Filter() ----------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' member [Measures].[AllClientPeriodSums] as '([DimClient].[All DimClient],[Measures].[SumPeriod])+([DimClient].[All DimClient],[Measures].[SumPeriod Y-1])' select { Crossjoin( Hierarchize( Filter( [DimProduct].[SubProduct].members + [DimProduct].[Product].members ,not(isempty([Measures].[AllClientPeriodSums])))) ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]}) } on columns, { Filter([DimClient].members,([Measures].[SumPeriod]<>0 or [Measures].[SumPeriod Y-1]<>0)) } on rows from TheCube where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember]) The above in words: - Filter all Products and SubProducts for which the "All" level in DimClient has a non-empty value for the sums over the sets SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least 1 client has a value for that (sub)product, so that (sub)product must be retained. - Filter all Clients based on the test if the sums are <> 0. (this should be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due to extra requirements of our users it is implemented like this). MDX 2 - Selection based on NonEmptyCrossjoin() ---------------------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2) ,[DimClient]))' set [x-axis-products] as 'Hierarchize( Generate( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2) ,[DimProduct]) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) The above in words: - Find non-empty crossings between the period sets and clients based on the Amount measure. From this set, extract the Clients and hierarchize. These are the clients that go on the Y-axis of the report. - Find non-empty crossings between the period sets and subproducts based on the Amount measure. Extract the Product members from that set. Add the parents, i.e. the Products and hierarchize. These are the products that will go on the X-axis of the report. Questions --------- 1) Both solutions are rather slow. Is there an MDX 3 I should implement instead ? I there a better approach ? I am rather new to MDX and I find it dazzling at times... 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above) the report is up to 60% slower than without any slicer element at all! Without the where-clause in the MDX the report executes faster than for MDX with a where-clause that limits the data to some 70%. What is happening ?? 3) Is there a way to incorporate via cube-design a better/faster solution ? Keep in mind that both start and end months are variables. 4) [May be unrelated to this particular report] I noticed that even on a medium sized desktop pc that runs Analysis Services locally the performance is way better than in a client/server setup with a high-end desktop and a high-end server... This has me baffled. The network usage is low and never a bottleneck. The server is dedicated and there are no other users executing reports. We use Excel VBA with ADO MD to get to our data. I did follow all steps in the Microsoft whitepaper on Analysis Services performance. Many, many thanks for reading up to this point! Many more for all helpful information and remarks. Tom |
#3
| |||
| |||
|
|
[ This article is also available in a better readable format here: http://users.pandora.be/see/mdx/news.htm ] Given a cube with the following properties: - a measure Amount - a time dimension DimTime with a granularity of months (only Years and Months, no lower levels) - a client dimension DimClient - a product dimension DimProduct (consisting of a level Product and a level SubProduct) - a few other dimensions Consider the following requirement: A report must be made that shows the total Amount in a given period compared to the total Amount in the same period one year ago for each client for each product. The report has the following layout: (hopefully you read this in a non-proportional font, if not consider the link mentioned above) Product A SubProduct A Product B .. SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 .. .. All DimClient x y x y x y x y Client 1 x y x y x y x y Client 2 x y x y x y x y .. x y x y x y x y In pseudo-MDX: each x above = Sum(DimTime.startmonth imTime.endmonth,Measures.[Amount]). The start and end dates of the period are parameters: they are entered by the user and injected into the MDX. This makes it impossible to aggregate the sums beforehand. The report should only include those clients that have at least one (sub)product. Also only those (sub)products that are associated with at least one client should be shown. Because of the nature of this report a lot of data will be presented to the user. The dimension DimClient contains 12000 elements, DimProducts contains 60 elements. Depending on the chosen period and possibly other slicers (among other dimensions) the report may return up to 12000 rows and 120 columns. This looks like a pretty standard report to me, yet I could not find an MDX-pattern for this type of problem on the net. Neither did I find a hint in Spofford's book. Here are the 2 approaches I devised myself. MDX 1 - Selection based on Filter() ----------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' member [Measures].[AllClientPeriodSums] as '([DimClient].[All DimClient],[Measures].[SumPeriod])+([DimClient].[All DimClient],[Measures].[SumPeriod Y-1])' select { Crossjoin( Hierarchize( Filter( [DimProduct].[SubProduct].members + [DimProduct].[Product].members ,not(isempty([Measures].[AllClientPeriodSums])))) ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]}) } on columns, { Filter([DimClient].members,([Measures].[SumPeriod]<>0 or [Measures].[SumPeriod Y-1]<>0)) } on rows from TheCube where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember]) The above in words: - Filter all Products and SubProducts for which the "All" level in DimClient has a non-empty value for the sums over the sets SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least 1 client has a value for that (sub)product, so that (sub)product must be retained. - Filter all Clients based on the test if the sums are <> 0. (this should be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due to extra requirements of our users it is implemented like this). MDX 2 - Selection based on NonEmptyCrossjoin() ---------------------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2) ,[DimClient]))' set [x-axis-products] as 'Hierarchize( Generate( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2) ,[DimProduct]) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) The above in words: - Find non-empty crossings between the period sets and clients based on the Amount measure. From this set, extract the Clients and hierarchize. These are the clients that go on the Y-axis of the report. - Find non-empty crossings between the period sets and subproducts based on the Amount measure. Extract the Product members from that set. Add the parents, i.e. the Products and hierarchize. These are the products that will go on the X-axis of the report. Questions --------- 1) Both solutions are rather slow. Is there an MDX 3 I should implement instead ? I there a better approach ? I am rather new to MDX and I find it dazzling at times... 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above) the report is up to 60% slower than without any slicer element at all! Without the where-clause in the MDX the report executes faster than for MDX with a where-clause that limits the data to some 70%. What is happening ?? 3) Is there a way to incorporate via cube-design a better/faster solution ? Keep in mind that both start and end months are variables. 4) [May be unrelated to this particular report] I noticed that even on a medium sized desktop pc that runs Analysis Services locally the performance is way better than in a client/server setup with a high-end desktop and a high-end server... This has me baffled. The network usage is low and never a bottleneck. The server is dedicated and there are no other users executing reports. We use Excel VBA with ADO MD to get to our data. I did follow all steps in the Microsoft whitepaper on Analysis Services performance. Many, many thanks for reading up to this point! Many more for all helpful information and remarks. Tom |
#4
| |||
| |||
|
|
Hi Tom, First of all, thanks for the detailed explanation. It makes it much easier to suggest solutions... On the modelling side, you might be interested to read the following posting on how I generally approach this kind of problem: http://groups.google.co.uk/groups?hl...40TK2MSFTNGP12 With a little bit of work you would be able to adapt this to your own scenario, which might allow you to run these kind of queries from any front-end; was this requirement the reason why you built your own front-end? Also, does your customer dimension have any levels or is it just a flat list? If the latter, I would strongly recommend putting in two or three levels to group the customers even if you don't have a business need for them and end up making them invisible. It should also help. Regarding MDX, the most efficient queries are always going to be the ones using NONEMPTYCROSSJOIN rather than FILTER. Looking at your second query, you might find that performance improves if you get rid of the unnecessary EXTRACTs and put the contents of your WHERE clause into the NECJ function too. Here's what it might look like (nb might be syntax errors!): with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( NonEmptyCrossjoin([DimClient].Members,{[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1))' set [x-axis-products] as 'Hierarchize( Generate( NonEmptyCrossjoin([DimProduct].[SubProduct].members, {[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent})' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) However, I'm not sure that any of the above will make a real difference to performance, which is your biggest issue. If you've read the AS Performance Guide you've probably already got a good partitioning strategy (by month I assume) and run the Usage-Based Optimisation wizard. However, I think the main reason you've got poor performance is that your queries run at the Client level, which has 12000 members and is unlikely to have any aggregations associated with it because of the 1/3 rule. In this case it might be worth building your own aggregations at the Client level using the Partition Manager tool available in the SQL 2000 Resource Kit, and seeing if that improves matters (although it would increase the size of the cube and processing times). For hints on how to know which aggregations to build when you're doing it manually, have a look at this thread: http://groups.google.co.uk/groups?hl...com%26rnum%3D1 ...and the following, which contains a better explanation of how to understand the query log: http://www.mosha.com/msolap/articles/querylog.htm Because you have control over your front-end, it might also be worth playing around with some connection string properties. The infamous Execution Location/Default Isolation Mode pairing is always worth a try, as is trying different values for Large Level Threshold. Also, in your case, try values of 0 to 7 for the Cache Policy property, and try values of 0.01 for CacheRatio and CacheRatio2. Finally, my last observation is that any report where you expect to get up to 12000 rows and 120 columns is always going to perform slowly, and is more data than anyone can take in. It needs to be broken down into more digestible chunks --> smaller queries! HTH, Chris "Tom VdP" wrote: [ This article is also available in a better readable format here: http://users.pandora.be/see/mdx/news.htm ] Given a cube with the following properties: - a measure Amount - a time dimension DimTime with a granularity of months (only Years and Months, no lower levels) - a client dimension DimClient - a product dimension DimProduct (consisting of a level Product and a level SubProduct) - a few other dimensions Consider the following requirement: A report must be made that shows the total Amount in a given period compared to the total Amount in the same period one year ago for each client for each product. The report has the following layout: (hopefully you read this in a non-proportional font, if not consider the link mentioned above) Product A SubProduct A Product B .. SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 .. .. All DimClient x y x y x y x y Client 1 x y x y x y x y Client 2 x y x y x y x y .. x y x y x y x y In pseudo-MDX: each x above = Sum(DimTime.startmonth imTime.endmonth,Measures.[Amount]). The start and end dates of the period are parameters: they are entered by the user and injected into the MDX. This makes it impossible to aggregate the sums beforehand. The report should only include those clients that have at least one (sub)product. Also only those (sub)products that are associated with at least one client should be shown. Because of the nature of this report a lot of data will be presented to the user. The dimension DimClient contains 12000 elements, DimProducts contains 60 elements. Depending on the chosen period and possibly other slicers (among other dimensions) the report may return up to 12000 rows and 120 columns. This looks like a pretty standard report to me, yet I could not find an MDX-pattern for this type of problem on the net. Neither did I find a hint in Spofford's book. Here are the 2 approaches I devised myself. MDX 1 - Selection based on Filter() ----------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' member [Measures].[AllClientPeriodSums] as '([DimClient].[All DimClient],[Measures].[SumPeriod])+([DimClient].[All DimClient],[Measures].[SumPeriod Y-1])' select { Crossjoin( Hierarchize( Filter( [DimProduct].[SubProduct].members + [DimProduct].[Product].members ,not(isempty([Measures].[AllClientPeriodSums])))) ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]}) } on columns, { Filter([DimClient].members,([Measures].[SumPeriod]<>0 or [Measures].[SumPeriod Y-1]<>0)) } on rows from TheCube where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember]) The above in words: - Filter all Products and SubProducts for which the "All" level in DimClient has a non-empty value for the sums over the sets SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least 1 client has a value for that (sub)product, so that (sub)product must be retained. - Filter all Clients based on the test if the sums are <> 0. (this should be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due to extra requirements of our users it is implemented like this). MDX 2 - Selection based on NonEmptyCrossjoin() ---------------------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2) ,[DimClient]))' set [x-axis-products] as 'Hierarchize( Generate( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2) ,[DimProduct]) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) The above in words: - Find non-empty crossings between the period sets and clients based on the Amount measure. From this set, extract the Clients and hierarchize. These are the clients that go on the Y-axis of the report. - Find non-empty crossings between the period sets and subproducts based on the Amount measure. Extract the Product members from that set. Add the parents, i.e. the Products and hierarchize. These are the products that will go on the X-axis of the report. Questions --------- 1) Both solutions are rather slow. Is there an MDX 3 I should implement instead ? I there a better approach ? I am rather new to MDX and I find it dazzling at times... 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above) the report is up to 60% slower than without any slicer element at all! Without the where-clause in the MDX the report executes faster than for MDX with a where-clause that limits the data to some 70%. What is happening ?? 3) Is there a way to incorporate via cube-design a better/faster solution ? Keep in mind that both start and end months are variables. 4) [May be unrelated to this particular report] I noticed that even on a medium sized desktop pc that runs Analysis Services locally the performance is way better than in a client/server setup with a high-end desktop and a high-end server... This has me baffled. The network usage is low and never a bottleneck. The server is dedicated and there are no other users executing reports. We use Excel VBA with ADO MD to get to our data. I did follow all steps in the Microsoft whitepaper on Analysis Services performance. Many, many thanks for reading up to this point! Many more for all helpful information and remarks. Tom |
#5
| |||
| |||
|
|
Oh, and one other important point - make sure you have SP3 PTS installed on your client, it can sometimes make a big difference with performance. "Chris Webb" wrote: Hi Tom, First of all, thanks for the detailed explanation. It makes it much easier to suggest solutions... On the modelling side, you might be interested to read the following posting on how I generally approach this kind of problem: http://groups.google.co.uk/groups?hl...40TK2MSFTNGP12 With a little bit of work you would be able to adapt this to your own scenario, which might allow you to run these kind of queries from any front-end; was this requirement the reason why you built your own front-end? Also, does your customer dimension have any levels or is it just a flat list? If the latter, I would strongly recommend putting in two or three levels to group the customers even if you don't have a business need for them and end up making them invisible. It should also help. Regarding MDX, the most efficient queries are always going to be the ones using NONEMPTYCROSSJOIN rather than FILTER. Looking at your second query, you might find that performance improves if you get rid of the unnecessary EXTRACTs and put the contents of your WHERE clause into the NECJ function too. Here's what it might look like (nb might be syntax errors!): with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( NonEmptyCrossjoin([DimClient].Members,{[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1))' set [x-axis-products] as 'Hierarchize( Generate( NonEmptyCrossjoin([DimProduct].[SubProduct].members, {[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent})' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) However, I'm not sure that any of the above will make a real difference to performance, which is your biggest issue. If you've read the AS Performance Guide you've probably already got a good partitioning strategy (by month I assume) and run the Usage-Based Optimisation wizard. However, I think the main reason you've got poor performance is that your queries run at the Client level, which has 12000 members and is unlikely to have any aggregations associated with it because of the 1/3 rule. In this case it might be worth building your own aggregations at the Client level using the Partition Manager tool available in the SQL 2000 Resource Kit, and seeing if that improves matters (although it would increase the size of the cube and processing times). For hints on how to know which aggregations to build when you're doing it manually, have a look at this thread: http://groups.google.co.uk/groups?hl...com%26rnum%3D1 ...and the following, which contains a better explanation of how to understand the query log: http://www.mosha.com/msolap/articles/querylog.htm Because you have control over your front-end, it might also be worth playing around with some connection string properties. The infamous Execution Location/Default Isolation Mode pairing is always worth a try, as is trying different values for Large Level Threshold. Also, in your case, try values of 0 to 7 for the Cache Policy property, and try values of 0.01 for CacheRatio and CacheRatio2. Finally, my last observation is that any report where you expect to get up to 12000 rows and 120 columns is always going to perform slowly, and is more data than anyone can take in. It needs to be broken down into more digestible chunks --> smaller queries! HTH, Chris "Tom VdP" wrote: [ This article is also available in a better readable format here: http://users.pandora.be/see/mdx/news.htm ] Given a cube with the following properties: - a measure Amount - a time dimension DimTime with a granularity of months (only Years and Months, no lower levels) - a client dimension DimClient - a product dimension DimProduct (consisting of a level Product and a level SubProduct) - a few other dimensions Consider the following requirement: A report must be made that shows the total Amount in a given period compared to the total Amount in the same period one year ago for each client for each product. The report has the following layout: (hopefully you read this in a non-proportional font, if not consider the link mentioned above) Product A SubProduct A Product B .. SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 .. .. All DimClient x y x y x y x y Client 1 x y x y x y x y Client 2 x y x y x y x y .. x y x y x y x y In pseudo-MDX: each x above = Sum(DimTime.startmonth imTime.endmonth,Measures.[Amount]). The start and end dates of the period are parameters: they are entered by the user and injected into the MDX. This makes it impossible to aggregate the sums beforehand. The report should only include those clients that have at least one (sub)product. Also only those (sub)products that are associated with at least one client should be shown. Because of the nature of this report a lot of data will be presented to the user. The dimension DimClient contains 12000 elements, DimProducts contains 60 elements. Depending on the chosen period and possibly other slicers (among other dimensions) the report may return up to 12000 rows and 120 columns. This looks like a pretty standard report to me, yet I could not find an MDX-pattern for this type of problem on the net. Neither did I find a hint in Spofford's book. Here are the 2 approaches I devised myself. MDX 1 - Selection based on Filter() ----------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' member [Measures].[AllClientPeriodSums] as '([DimClient].[All DimClient],[Measures].[SumPeriod])+([DimClient].[All DimClient],[Measures].[SumPeriod Y-1])' select { Crossjoin( Hierarchize( Filter( [DimProduct].[SubProduct].members + [DimProduct].[Product].members ,not(isempty([Measures].[AllClientPeriodSums])))) ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]}) } on columns, { Filter([DimClient].members,([Measures].[SumPeriod]<>0 or [Measures].[SumPeriod Y-1]<>0)) } on rows from TheCube where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember]) The above in words: - Filter all Products and SubProducts for which the "All" level in DimClient has a non-empty value for the sums over the sets SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least 1 client has a value for that (sub)product, so that (sub)product must be retained. - Filter all Clients based on the test if the sums are <> 0. (this should be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due to extra requirements of our users it is implemented like this). MDX 2 - Selection based on NonEmptyCrossjoin() ---------------------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2) ,[DimClient]))' set [x-axis-products] as 'Hierarchize( Generate( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2) ,[DimProduct]) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) The above in words: - Find non-empty crossings between the period sets and clients based on the Amount measure. From this set, extract the Clients and hierarchize. These are the clients that go on the Y-axis of the report. - Find non-empty crossings between the period sets and subproducts based on the Amount measure. Extract the Product members from that set. Add the parents, i.e. the Products and hierarchize. These are the products that will go on the X-axis of the report. Questions --------- 1) Both solutions are rather slow. Is there an MDX 3 I should implement instead ? I there a better approach ? I am rather new to MDX and I find it dazzling at times... 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above) the report is up to 60% slower than without any slicer element at all! Without the where-clause in the MDX the report executes faster than for MDX with a where-clause that limits the data to some 70%. What is happening ?? 3) Is there a way to incorporate via cube-design a better/faster solution ? Keep in mind that both start and end months are variables. 4) [May be unrelated to this particular report] I noticed that even on a medium sized desktop pc that runs Analysis Services locally the performance is way better than in a client/server setup with a high-end desktop and a high-end server... This has me baffled. The network usage is low and never a bottleneck. The server is dedicated and there are no other users executing reports. We use Excel VBA with ADO MD to get to our data. I did follow all steps in the Microsoft whitepaper on Analysis Services performance. Many, many thanks for reading up to this point! Many more for all helpful information and remarks. Tom |
#6
| |||
| |||
|
|
Hi Chris, Jamie, rest of the world, Sorry for not responding earlier. Here is a roundup of the situation. First of all, a bit of info about the front-end. The front-end is written in Excel VBA because the customer wants quite a bit of specific behaviour and layout. Rows and columns can be collapsed/expanded, context menus appear on right-click that use the specific cell-location info (e.g. filtering, linking to another report, sort options), popup windows (mainly for entering report parameters), ... The front-end creates the necessary MDX dynamically. This development costed about 2 manmonths of work. Cubes: There are 5 cubes, 2 virtual cubes, largest cube has 7 dimensions, 5 measures, 2 calculated measures. Dimension security in about 30 groups. The amount of data is very little: 2 million fact rows. The cubes fit in 10Mb files. Necessity for optimisation: Some reports return a lot of data: 12.000 rows x 120 columns. Loading this in Excel is slow, but on top of that getting the data out of the cube also takes a quite some time: up to 2 minutes. The returned cells are sums over variable periods, so these must be calculated ad hoc. Loading the data and formatting takes another minute or so. About the optimisation proces: First of all, I did read and tried to apply the Performance Guide. So the tweaking with connection string options and so was done. Also the usage based optimisation and other settings server side were applied. Latest service packs applied everywhere. Extra things I did because of the hints Chris gave: 1) MDX: use the NECJ version instead of the Filter() version 2) MDX : work away the Extract() Indeed, I think I got confused by the examples in BOL and the fact that NECJ must not necessarily return a typical crossjoined set. It can be a set consisting of members of 1 dimension only. --> result : no measurable differences 3) Performance monitor: measure DSN Used & DSN Requested. VERY interesting! I noticed quite some room for uptimisation here! So I added aggregations manually via the Partition Manager tool (for those looking for it: it is available in the SQLServer2000 Resource Kit). All queries that were logged had the same level DSN Used & DSN Requested. --> result: disappointing! again no measurable difference. (Sidenote question: how do you log ALL DSN Used & DSN Requested ? The perfmon only polls once/second. And the querylog.mdb does not contain DSN Used. What a mistake!) 4) Partition the cubes Here I must admit that I still do not grasp the finesses of correct partitioning... I partioned over months (-> 284 partitions!?) --> result: much worse performance. What I did not yet investigate is Cache Policy & Cache Ratio. So all in all a disappointing result. Yet I must say that I am very glad I did all this, because it makes me feel more confident when facing the customer. Still pending: why is it so much slower in a client/server environment than in a single-pc setup ? Could it be because of the dimension security ? Many thanks (especially to Chris!) Tom "Chris Webb" wrote: Oh, and one other important point - make sure you have SP3 PTS installed on your client, it can sometimes make a big difference with performance. "Chris Webb" wrote: Hi Tom, First of all, thanks for the detailed explanation. It makes it much easier to suggest solutions... On the modelling side, you might be interested to read the following posting on how I generally approach this kind of problem: http://groups.google.co.uk/groups?hl...40TK2MSFTNGP12 With a little bit of work you would be able to adapt this to your own scenario, which might allow you to run these kind of queries from any front-end; was this requirement the reason why you built your own front-end? Also, does your customer dimension have any levels or is it just a flat list? If the latter, I would strongly recommend putting in two or three levels to group the customers even if you don't have a business need for them and end up making them invisible. It should also help. Regarding MDX, the most efficient queries are always going to be the ones using NONEMPTYCROSSJOIN rather than FILTER. Looking at your second query, you might find that performance improves if you get rid of the unnecessary EXTRACTs and put the contents of your WHERE clause into the NECJ function too. Here's what it might look like (nb might be syntax errors!): with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( NonEmptyCrossjoin([DimClient].Members,{[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1))' set [x-axis-products] as 'Hierarchize( Generate( NonEmptyCrossjoin([DimProduct].[SubProduct].members, {[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent})' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) However, I'm not sure that any of the above will make a real difference to performance, which is your biggest issue. If you've read the AS Performance Guide you've probably already got a good partitioning strategy (by month I assume) and run the Usage-Based Optimisation wizard. However, I think the main reason you've got poor performance is that your queries run at the Client level, which has 12000 members and is unlikely to have any aggregations associated with it because of the 1/3 rule. In this case it might be worth building your own aggregations at the Client level using the Partition Manager tool available in the SQL 2000 Resource Kit, and seeing if that improves matters (although it would increase the size of the cube and processing times). For hints on how to know which aggregations to build when you're doing it manually, have a look at this thread: http://groups.google.co.uk/groups?hl...com%26rnum%3D1 ...and the following, which contains a better explanation of how to understand the query log: http://www.mosha.com/msolap/articles/querylog.htm Because you have control over your front-end, it might also be worth playing around with some connection string properties. The infamous Execution Location/Default Isolation Mode pairing is always worth a try, as is trying different values for Large Level Threshold. Also, in your case, try values of 0 to 7 for the Cache Policy property, and try values of 0.01 for CacheRatio and CacheRatio2. Finally, my last observation is that any report where you expect to get up to 12000 rows and 120 columns is always going to perform slowly, and is more data than anyone can take in. It needs to be broken down into more digestible chunks --> smaller queries! HTH, Chris "Tom VdP" wrote: [ This article is also available in a better readable format here: http://users.pandora.be/see/mdx/news.htm ] Given a cube with the following properties: - a measure Amount - a time dimension DimTime with a granularity of months (only Years and Months, no lower levels) - a client dimension DimClient - a product dimension DimProduct (consisting of a level Product and a level SubProduct) - a few other dimensions Consider the following requirement: A report must be made that shows the total Amount in a given period compared to the total Amount in the same period one year ago for each client for each product. The report has the following layout: (hopefully you read this in a non-proportional font, if not consider the link mentioned above) Product A SubProduct A Product B .. SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 .. .. All DimClient x y x y x y x y Client 1 x y x y x y x y Client 2 x y x y x y x y .. x y x y x y x y In pseudo-MDX: each x above = Sum(DimTime.startmonth imTime.endmonth,Measures.[Amount]). The start and end dates of the period are parameters: they are entered by the user and injected into the MDX. This makes it impossible to aggregate the sums beforehand. The report should only include those clients that have at least one (sub)product. Also only those (sub)products that are associated with at least one client should be shown. Because of the nature of this report a lot of data will be presented to the user. The dimension DimClient contains 12000 elements, DimProducts contains 60 elements. Depending on the chosen period and possibly other slicers (among other dimensions) the report may return up to 12000 rows and 120 columns. This looks like a pretty standard report to me, yet I could not find an MDX-pattern for this type of problem on the net. Neither did I find a hint in Spofford's book. Here are the 2 approaches I devised myself. MDX 1 - Selection based on Filter() ----------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' member [Measures].[AllClientPeriodSums] as '([DimClient].[All DimClient],[Measures].[SumPeriod])+([DimClient].[All DimClient],[Measures].[SumPeriod Y-1])' select { Crossjoin( Hierarchize( Filter( [DimProduct].[SubProduct].members + [DimProduct].[Product].members ,not(isempty([Measures].[AllClientPeriodSums])))) ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]}) } on columns, { Filter([DimClient].members,([Measures].[SumPeriod]<>0 or [Measures].[SumPeriod Y-1]<>0)) } on rows from TheCube where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember]) The above in words: - Filter all Products and SubProducts for which the "All" level in DimClient has a non-empty value for the sums over the sets SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least 1 client has a value for that (sub)product, so that (sub)product must be retained. - Filter all Clients based on the test if the sums are <> 0. (this should be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due to extra requirements of our users it is implemented like this). MDX 2 - Selection based on NonEmptyCrossjoin() ---------------------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2) ,[DimClient]))' set [x-axis-products] as 'Hierarchize( Generate( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2) ,[DimProduct]) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) The above in words: - Find non-empty crossings between the period sets and clients based on the Amount measure. From this set, extract the Clients and hierarchize. These are the clients that go on the Y-axis of the report. - Find non-empty crossings between the period sets and subproducts based on the Amount measure. Extract the Product members from that set. Add the parents, i.e. the Products and hierarchize. These are the products that will go on the X-axis of the report. Questions --------- 1) Both solutions are rather slow. Is there an MDX 3 I should implement instead ? I there a better approach ? I am rather new to MDX and I find it dazzling at times... 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above) the report is up to 60% slower than without any slicer element at all! Without the where-clause in the MDX the report executes faster than for MDX with a where-clause that limits the data to some 70%. What is happening ?? 3) Is there a way to incorporate via cube-design a better/faster solution ? Keep in mind that both start and end months are variables. 4) [May be unrelated to this particular report] I noticed that even on a medium sized desktop pc that runs Analysis Services locally the performance is way better than in a client/server setup with a high-end desktop and a high-end server... This has me baffled. The network usage is low and never a bottleneck. The server is dedicated and there are no other users executing reports. We use Excel VBA with ADO MD to get to our data. I did follow all steps in the Microsoft whitepaper on Analysis Services performance. Many, many thanks for reading up to this point! Many more for all helpful information and remarks. Tom |
#7
| |||
| |||
|
|
Hi Tom, First, I'll respond to your numbered points: 1,2) No surprise here that it didn't make much difference, unfortunately. It was worth a try though. 3) If DSN Requested is always the same as DSN Used, then you're always hitting an aggregation and there's no point building any more. However, can you tell me how many subqueries each query results in? You can find this out by looking at the Query Number counter. Lots of subqueries can sometimes mean a certain network overhead, and the Cache Ratio and Cache Policy properties can improve this. There is, as you've noticed, no satisfactory way of looking at the DSN Used counter; I wrote a simple .NET app once which did this fairly well, but not 100% accurately. Also, if lots of subqueries are generated then you will find that turning off query logging on the server may have a noticeable effect on query performance. 4) The fact that you got worse performance after partitioning is very strange. Did you set the slice property on each partition properly? You can check this by running the Edit wizard on the partition in Analysis Manager - you set it on the second step. Partitioning for this kind of app should improve performance, and I can only think that if you did set the slice then the effect of reprocessing and therefore dropping the cache led to the worsened performance. The fact you have dimension security on your cube makes me think you might not have as much memory available on the server for caching as you could. Can you give us some more details on which dimensions you have it set up on? Is it your client dimension? Dimension security won't explain the difference between a client-server and single PC setup - the issue there will be how much work is done on the client, and how much data is moved over the network. What do you see in Task Manager and the client and the server when you run queries in the client-server setup? I suspect that the real cause of the performance problem is that you have to sum up large amounts of data across your variable time ranges, something which is always going to take a lot of time; if this is taking place on the client rather than the server, then it will take even longer. Leading on from this topic, since you have control over the MDX used you might be able to cut the amount of time needed for this summing by writing clever MDX and building aggregations. The key question is: how large are the time ranges that your users typically select? If they are larger than two or three months then you could do the following: 1) Build some extra levels into your time dimension, maybe by grouping two months together into a level of six members, and then grouping these members together into twos making a level of three members. You should hide these levels from the user. 2) In your MDX, whenever you have a time range that crosses a set of months that all belong to one common parent, then instead of summing up the individual months then include the common parent then substitute the common parent instead. So, for example, imagine you were querying a range of months 1-9; instead of summing up nine months, if you had the dimension structure proposed in (1) you would end up summing three members: month 9 and the higher level members which are the common parents of months 1-4 and months 5-8. I wouldn't recommend trying to do this substitution in MDX - it will be much easier to do in code in your front-end! 3) Build aggregations at these new higher levels (and the year level) so that when your query hits them, it doesn't just go ahead and sum the data from the month level anyway. The result of this will be that the majority of the summing that you're doing at query time will no longer be necessary, because most of it will already have been done when you're building the aggregations. Using the example above, a sum of 9 members will be turned into a sum of three members, hopefully meaning that the query only takes 1/3 of the time. Incidentally, are you connecting to the server 'natively' or using http? The latter is less efficient than the former, and you should only use it if you need to; there's also a Windows 2003 bug relating to it than can impact performance too. However, my overall feeling is that I'm not surprised your queries take 2 minutes if they return large amounts of data and need to do summing up as well. Any report that returns 120*12000 cells isn't what I would define as an OLAP-style report, and might well perform better in a relational database! Are you sure you can't redesign the app to get it to return data in smaller queries? Regards, Chris "Tom VdP" wrote: Hi Chris, Jamie, rest of the world, Sorry for not responding earlier. Here is a roundup of the situation. First of all, a bit of info about the front-end. The front-end is written in Excel VBA because the customer wants quite a bit of specific behaviour and layout. Rows and columns can be collapsed/expanded, context menus appear on right-click that use the specific cell-location info (e.g. filtering, linking to another report, sort options), popup windows (mainly for entering report parameters), ... The front-end creates the necessary MDX dynamically. This development costed about 2 manmonths of work. Cubes: There are 5 cubes, 2 virtual cubes, largest cube has 7 dimensions, 5 measures, 2 calculated measures. Dimension security in about 30 groups. The amount of data is very little: 2 million fact rows. The cubes fit in 10Mb files. Necessity for optimisation: Some reports return a lot of data: 12.000 rows x 120 columns. Loading this in Excel is slow, but on top of that getting the data out of the cube also takes a quite some time: up to 2 minutes. The returned cells are sums over variable periods, so these must be calculated ad hoc. Loading the data and formatting takes another minute or so. About the optimisation proces: First of all, I did read and tried to apply the Performance Guide. So the tweaking with connection string options and so was done. Also the usage based optimisation and other settings server side were applied. Latest service packs applied everywhere. Extra things I did because of the hints Chris gave: 1) MDX: use the NECJ version instead of the Filter() version 2) MDX : work away the Extract() Indeed, I think I got confused by the examples in BOL and the fact that NECJ must not necessarily return a typical crossjoined set. It can be a set consisting of members of 1 dimension only. --> result : no measurable differences 3) Performance monitor: measure DSN Used & DSN Requested. VERY interesting! I noticed quite some room for uptimisation here! So I added aggregations manually via the Partition Manager tool (for those looking for it: it is available in the SQLServer2000 Resource Kit). All queries that were logged had the same level DSN Used & DSN Requested. --> result: disappointing! again no measurable difference. (Sidenote question: how do you log ALL DSN Used & DSN Requested ? The perfmon only polls once/second. And the querylog.mdb does not contain DSN Used. What a mistake!) 4) Partition the cubes Here I must admit that I still do not grasp the finesses of correct partitioning... I partioned over months (-> 284 partitions!?) --> result: much worse performance. What I did not yet investigate is Cache Policy & Cache Ratio. So all in all a disappointing result. Yet I must say that I am very glad I did all this, because it makes me feel more confident when facing the customer. Still pending: why is it so much slower in a client/server environment than in a single-pc setup ? Could it be because of the dimension security ? Many thanks (especially to Chris!) Tom "Chris Webb" wrote: Oh, and one other important point - make sure you have SP3 PTS installed on your client, it can sometimes make a big difference with performance. "Chris Webb" wrote: Hi Tom, First of all, thanks for the detailed explanation. It makes it much easier to suggest solutions... On the modelling side, you might be interested to read the following posting on how I generally approach this kind of problem: http://groups.google.co.uk/groups?hl...40TK2MSFTNGP12 With a little bit of work you would be able to adapt this to your own scenario, which might allow you to run these kind of queries from any front-end; was this requirement the reason why you built your own front-end? Also, does your customer dimension have any levels or is it just a flat list? If the latter, I would strongly recommend putting in two or three levels to group the customers even if you don't have a business need for them and end up making them invisible. It should also help. Regarding MDX, the most efficient queries are always going to be the ones using NONEMPTYCROSSJOIN rather than FILTER. Looking at your second query, you might find that performance improves if you get rid of the unnecessary EXTRACTs and put the contents of your WHERE clause into the NECJ function too. Here's what it might look like (nb might be syntax errors!): with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( NonEmptyCrossjoin([DimClient].Members,{[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1))' set [x-axis-products] as 'Hierarchize( Generate( NonEmptyCrossjoin([DimProduct].[SubProduct].members, {[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent})' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) However, I'm not sure that any of the above will make a real difference to performance, which is your biggest issue. If you've read the AS Performance Guide you've probably already got a good partitioning strategy (by month I assume) and run the Usage-Based Optimisation wizard. However, I think the main reason you've got poor performance is that your queries run at the Client level, which has 12000 members and is unlikely to have any aggregations associated with it because of the 1/3 rule. In this case it might be worth building your own aggregations at the Client level using the Partition Manager tool available in the SQL 2000 Resource Kit, and seeing if that improves matters (although it would increase the size of the cube and processing times). For hints on how to know which aggregations to build when you're doing it manually, have a look at this thread: http://groups.google.co.uk/groups?hl...com%26rnum%3D1 ...and the following, which contains a better explanation of how to understand the query log: http://www.mosha.com/msolap/articles/querylog.htm Because you have control over your front-end, it might also be worth playing around with some connection string properties. The infamous Execution Location/Default Isolation Mode pairing is always worth a try, as is trying different values for Large Level Threshold. Also, in your case, try values of 0 to 7 for the Cache Policy property, and try values of 0.01 for CacheRatio and CacheRatio2. Finally, my last observation is that any report where you expect to get up to 12000 rows and 120 columns is always going to perform slowly, and is more data than anyone can take in. It needs to be broken down into more digestible chunks --> smaller queries! HTH, Chris "Tom VdP" wrote: [ This article is also available in a better readable format here: http://users.pandora.be/see/mdx/news.htm ] Given a cube with the following properties: - a measure Amount - a time dimension DimTime with a granularity of months (only Years and Months, no lower levels) - a client dimension DimClient - a product dimension DimProduct (consisting of a level Product and a level SubProduct) - a few other dimensions Consider the following requirement: A report must be made that shows the total Amount in a given period compared to the total Amount in the same period one year ago for each client for each product. The report has the following layout: (hopefully you read this in a non-proportional font, if not consider the link mentioned above) Product A SubProduct A Product B .. SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 .. .. All DimClient x y x y x y x y Client 1 x y x y x y x y Client 2 x y x y x y x y .. x y x y x y x y In pseudo-MDX: each x above = Sum(DimTime.startmonth imTime.endmonth,Measures.[Amount]). The start and end dates of the period are parameters: they are entered by the user and injected into the MDX. This makes it impossible to aggregate the sums beforehand. The report should only include those clients that have at least one (sub)product. Also only those (sub)products that are associated with at least one client should be shown. Because of the nature of this report a lot of data will be presented to the user. The dimension DimClient contains 12000 elements, DimProducts contains 60 elements. Depending on the chosen period and possibly other slicers (among other dimensions) the report may return up to 12000 rows and 120 columns. This looks like a pretty standard report to me, yet I could not find an MDX-pattern for this type of problem on the net. Neither did I find a hint in Spofford's book. Here are the 2 approaches I devised myself. MDX 1 - Selection based on Filter() ----------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' member [Measures].[AllClientPeriodSums] as '([DimClient].[All DimClient],[Measures].[SumPeriod])+([DimClient].[All DimClient],[Measures].[SumPeriod Y-1])' select { Crossjoin( Hierarchize( Filter( [DimProduct].[SubProduct].members + [DimProduct].[Product].members ,not(isempty([Measures].[AllClientPeriodSums])))) ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]}) } on columns, { Filter([DimClient].members,([Measures].[SumPeriod]<>0 or [Measures].[SumPeriod Y-1]<>0)) } on rows from TheCube where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember]) The above in words: - Filter all Products and SubProducts for which the "All" level in DimClient has a non-empty value for the sums over the sets SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least 1 client has a value for that (sub)product, so that (sub)product must be retained. - Filter all Clients based on the test if the sums are <> 0. (this should be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due to extra requirements of our users it is implemented like this). MDX 2 - Selection based on NonEmptyCrossjoin() ---------------------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2) ,[DimClient]))' set [x-axis-products] as 'Hierarchize( Generate( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2) ,[DimProduct]) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) The above in words: - Find non-empty crossings between the period sets and clients based on the Amount measure. From this set, extract the Clients and hierarchize. These are the clients that go on the Y-axis of the report. - Find non-empty crossings between the period sets and subproducts based on the Amount measure. Extract the Product members from that set. Add the parents, i.e. the Products and hierarchize. These are the products that will go on the X-axis of the report. Questions --------- 1) Both solutions are rather slow. Is there an MDX 3 I should implement instead ? I there a better approach ? I am rather new to MDX and I find it dazzling at times... 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above) the report is up to 60% slower than without any slicer element at all! Without the where-clause in the MDX the report executes faster than for MDX with a where-clause that limits the data to some 70%. What is happening ?? 3) Is there a way to incorporate via cube-design a better/faster solution ? Keep in mind that both start and end months are variables. 4) [May be unrelated to this particular report] I noticed that even on a medium sized desktop pc that runs Analysis Services locally the performance is way better than in a client/server setup with a high-end desktop and a high-end server... This has me baffled. The network usage is low and never a bottleneck. The server is dedicated and there are no other users executing reports. We use Excel VBA with ADO MD to get to our data. I did follow all steps in the Microsoft whitepaper on Analysis Services performance. Many, many thanks for reading up to this point! Many more for all helpful information and remarks. Tom |
#8
| |||
| |||
|
|
If you're interested, here's the MDX to do the member substitution I talked about in my last post, using the Foodmart Sales cube. It's a worked example showing how an input set of {[Time].[1997].[Q1].[1]:[Time].[1998].[Q2].[5]} can be transformed into a set containing {[Time].[1997], [Time].[1998].[Q1], [Time].[1998].[Q2].[4], [Time].[1998].[Q2].[5]} - which, as I said, would be much quicker to SUM if you had built the right aggregations on your cube. WITH /*The initial time range expressed in months*/ SET MYRANGE AS '{[Time].[1997].[Q1].[1]:[Time].[1998].[Q2].[5]}' /*Find out which whole years are present*/ SET YEARSPRESENT AS ' /*Loop over all Years from the ancestor of the first month in the range to the ancestor of the last month*/ GENERATE( {ANCESTOR(MYRANGE.ITEM(0), [Time].[Year]) : ANCESTOR(TAIL(MYRANGE, 1).ITEM(0), [Time].[Year])} /*Need to use nasty string functions to get IIF to return members or sets. Yukon does not have this restriction!*/ , STRTOSET( /*If all of the descendants at month level of the current year appear in the range, we can replace them with the year member*/ IIF(COUNT(INTERSECT(MYRANGE, DESCENDANTS(TIME.CURRENTMEMBER, [Time].[Month]))) = COUNT(DESCENDANTS(TIME.CURRENTMEMBER, [Time].[Month])) , "{" + MEMBERTOSTR(TIME.CURRENTMEMBER) + "}" , "{}") ) )' /*As an intermediate step, remove all the months from the initial range which will be replaced by whole years*/ SET MONTHSLEFT AS 'EXCEPT(MYRANGE, DESCENDANTS(YEARSPRESENT, [Time].[Month]))' /*Now go through all the remaining months and see which ones have whole quarters present, using the same logic as above*/ SET QUARTERSPRESENT AS 'GENERATE( {ANCESTOR(MONTHSLEFT.ITEM(0), [Time].[Quarter]) : ANCESTOR(TAIL(MONTHSLEFT, 1).ITEM(0), [Time].[Quarter])} , STRTOSET( IIF(COUNT(INTERSECT(MONTHSLEFT, DESCENDANTS(TIME.CURRENTMEMBER, [Time].[Month]))) = COUNT(DESCENDANTS(TIME.CURRENTMEMBER, [Time].[Month])) , "{" + MEMBERTOSTR(TIME.CURRENTMEMBER) + "}" , "{}") ) )' /*Finally return the new range, with years and quarters substituted for months where possible*/ SET MYNEWRANGE AS '{YEARSPRESENT, QUARTERSPRESENT, EXCEPT(MYRANGE, DESCENDANTS({YEARSPRESENT, QUARTERSPRESENT}, [Time].[Month]))}' /*Helper calculated members to display the steps*/ MEMBER MEASURES.STEP1 AS 'SETTOSTR(MYRANGE)' MEMBER MEASURES.STEP2 AS 'SETTOSTR(YEARSPRESENT)' MEMBER MEASURES.STEP3 AS 'SETTOSTR(MONTHSLEFT)' MEMBER MEASURES.STEP4 AS 'SETTOSTR(QUARTERSPRESENT)' MEMBER MEASURES.STEP5 AS 'SETTOSTR(MYNEWRANGE)' /*Show working*/ SELECT {MEASURES.STEP1, MEASURES.STEP2, MEASURES.STEP3, MEASURES.STEP4, MEASURES.STEP5} ON 0 FROM SALES Let me know if you have any questions... "Chris Webb" wrote: Hi Tom, First, I'll respond to your numbered points: 1,2) No surprise here that it didn't make much difference, unfortunately. It was worth a try though. 3) If DSN Requested is always the same as DSN Used, then you're always hitting an aggregation and there's no point building any more. However, can you tell me how many subqueries each query results in? You can find this out by looking at the Query Number counter. Lots of subqueries can sometimes mean a certain network overhead, and the Cache Ratio and Cache Policy properties can improve this. There is, as you've noticed, no satisfactory way of looking at the DSN Used counter; I wrote a simple .NET app once which did this fairly well, but not 100% accurately. Also, if lots of subqueries are generated then you will find that turning off query logging on the server may have a noticeable effect on query performance. 4) The fact that you got worse performance after partitioning is very strange. Did you set the slice property on each partition properly? You can check this by running the Edit wizard on the partition in Analysis Manager - you set it on the second step. Partitioning for this kind of app should improve performance, and I can only think that if you did set the slice then the effect of reprocessing and therefore dropping the cache led to the worsened performance. The fact you have dimension security on your cube makes me think you might not have as much memory available on the server for caching as you could. Can you give us some more details on which dimensions you have it set up on? Is it your client dimension? Dimension security won't explain the difference between a client-server and single PC setup - the issue there will be how much work is done on the client, and how much data is moved over the network. What do you see in Task Manager and the client and the server when you run queries in the client-server setup? I suspect that the real cause of the performance problem is that you have to sum up large amounts of data across your variable time ranges, something which is always going to take a lot of time; if this is taking place on the client rather than the server, then it will take even longer. Leading on from this topic, since you have control over the MDX used you might be able to cut the amount of time needed for this summing by writing clever MDX and building aggregations. The key question is: how large are the time ranges that your users typically select? If they are larger than two or three months then you could do the following: 1) Build some extra levels into your time dimension, maybe by grouping two months together into a level of six members, and then grouping these members together into twos making a level of three members. You should hide these levels from the user. 2) In your MDX, whenever you have a time range that crosses a set of months that all belong to one common parent, then instead of summing up the individual months then include the common parent then substitute the common parent instead. So, for example, imagine you were querying a range of months 1-9; instead of summing up nine months, if you had the dimension structure proposed in (1) you would end up summing three members: month 9 and the higher level members which are the common parents of months 1-4 and months 5-8. I wouldn't recommend trying to do this substitution in MDX - it will be much easier to do in code in your front-end! 3) Build aggregations at these new higher levels (and the year level) so that when your query hits them, it doesn't just go ahead and sum the data from the month level anyway. The result of this will be that the majority of the summing that you're doing at query time will no longer be necessary, because most of it will already have been done when you're building the aggregations. Using the example above, a sum of 9 members will be turned into a sum of three members, hopefully meaning that the query only takes 1/3 of the time. Incidentally, are you connecting to the server 'natively' or using http? The latter is less efficient than the former, and you should only use it if you need to; there's also a Windows 2003 bug relating to it than can impact performance too. However, my overall feeling is that I'm not surprised your queries take 2 minutes if they return large amounts of data and need to do summing up as well. Any report that returns 120*12000 cells isn't what I would define as an OLAP-style report, and might well perform better in a relational database! Are you sure you can't redesign the app to get it to return data in smaller queries? Regards, Chris "Tom VdP" wrote: Hi Chris, Jamie, rest of the world, Sorry for not responding earlier. Here is a roundup of the situation. First of all, a bit of info about the front-end. The front-end is written in Excel VBA because the customer wants quite a bit of specific behaviour and layout. Rows and columns can be collapsed/expanded, context menus appear on right-click that use the specific cell-location info (e.g. filtering, linking to another report, sort options), popup windows (mainly for entering report parameters), ... The front-end creates the necessary MDX dynamically. This development costed about 2 manmonths of work. Cubes: There are 5 cubes, 2 virtual cubes, largest cube has 7 dimensions, 5 measures, 2 calculated measures. Dimension security in about 30 groups. The amount of data is very little: 2 million fact rows. The cubes fit in 10Mb files. Necessity for optimisation: Some reports return a lot of data: 12.000 rows x 120 columns. Loading this in Excel is slow, but on top of that getting the data out of the cube also takes a quite some time: up to 2 minutes. The returned cells are sums over variable periods, so these must be calculated ad hoc. Loading the data and formatting takes another minute or so. About the optimisation proces: First of all, I did read and tried to apply the Performance Guide. So the tweaking with connection string options and so was done. Also the usage based optimisation and other settings server side were applied. Latest service packs applied everywhere. Extra things I did because of the hints Chris gave: 1) MDX: use the NECJ version instead of the Filter() version 2) MDX : work away the Extract() Indeed, I think I got confused by the examples in BOL and the fact that NECJ must not necessarily return a typical crossjoined set. It can be a set consisting of members of 1 dimension only. --> result : no measurable differences 3) Performance monitor: measure DSN Used & DSN Requested. VERY interesting! I noticed quite some room for uptimisation here! So I added aggregations manually via the Partition Manager tool (for those looking for it: it is available in the SQLServer2000 Resource Kit). All queries that were logged had the same level DSN Used & DSN Requested. --> result: disappointing! again no measurable difference. (Sidenote question: how do you log ALL DSN Used & DSN Requested ? The perfmon only polls once/second. And the querylog.mdb does not contain DSN Used. What a mistake!) 4) Partition the cubes Here I must admit that I still do not grasp the finesses of correct partitioning... I partioned over months (-> 284 partitions!?) --> result: much worse performance. What I did not yet investigate is Cache Policy & Cache Ratio. So all in all a disappointing result. Yet I must say that I am very glad I did all this, because it makes me feel more confident when facing the customer. Still pending: why is it so much slower in a client/server environment than in a single-pc setup ? Could it be because of the dimension security ? Many thanks (especially to Chris!) Tom "Chris Webb" wrote: Oh, and one other important point - make sure you have SP3 PTS installed on your client, it can sometimes make a big difference with performance. "Chris Webb" wrote: Hi Tom, First of all, thanks for the detailed explanation. It makes it much easier to suggest solutions... On the modelling side, you might be interested to read the following posting on how I generally approach this kind of problem: http://groups.google.co.uk/groups?hl...40TK2MSFTNGP12 With a little bit of work you would be able to adapt this to your own scenario, which might allow you to run these kind of queries from any front-end; was this requirement the reason why you built your own front-end? Also, does your customer dimension have any levels or is it just a flat list? If the latter, I would strongly recommend putting in two or three levels to group the customers even if you don't have a business need for them and end up making them invisible. It should also help. Regarding MDX, the most efficient queries are always going to be the ones using NONEMPTYCROSSJOIN rather than FILTER. Looking at your second query, you might find that performance improves if you get rid of the unnecessary EXTRACTs and put the contents of your WHERE clause into the NECJ function too. Here's what it might look like (nb might be syntax errors!): with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( NonEmptyCrossjoin([DimClient].Members,{[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1))' set [x-axis-products] as 'Hierarchize( Generate( NonEmptyCrossjoin([DimProduct].[SubProduct].members, {[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent})' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) However, I'm not sure that any of the above will make a real difference to performance, which is your biggest issue. If you've read the AS Performance Guide you've probably already got a good partitioning strategy (by month I assume) and run the Usage-Based Optimisation wizard. However, I think the main reason you've got poor performance is that your queries run at the Client level, which has 12000 members and is unlikely to have any aggregations associated with it because of the 1/3 rule. In this case it might be worth building your own aggregations at the Client level using the Partition Manager tool available in the SQL 2000 Resource Kit, and seeing if that improves matters (although it would increase the size of the cube and processing times). For hints on how to know which aggregations to build when you're doing it manually, have a look at this thread: http://groups.google.co.uk/groups?hl...com%26rnum%3D1 ...and the following, which contains a better explanation of how to understand the query log: http://www.mosha.com/msolap/articles/querylog.htm Because you have control over your front-end, it might also be worth playing around with some connection string properties. The infamous Execution Location/Default Isolation Mode pairing is always worth a try, as is trying different values for Large Level Threshold. Also, in your case, try values of 0 to 7 for the Cache Policy property, and try values of 0.01 for CacheRatio and CacheRatio2. Finally, my last observation is that any report where you expect to get up to 12000 rows and 120 columns is always going to perform slowly, and is more data than anyone can take in. It needs to be broken down into more digestible chunks --> smaller queries! HTH, Chris "Tom VdP" wrote: [ This article is also available in a better readable format here: http://users.pandora.be/see/mdx/news.htm ] Given a cube with the following properties: - a measure Amount - a time dimension DimTime with a granularity of months (only Years and Months, no lower levels) - a client dimension DimClient - a product dimension DimProduct (consisting of a level Product and a level SubProduct) - a few other dimensions Consider the following requirement: A report must be made that shows the total Amount in a given period compared to the total Amount in the same period one year ago for each client for each product. The report has the following layout: (hopefully you read this in a non-proportional font, if not consider the link mentioned above) Product A SubProduct A Product B .. SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 .. .. All DimClient x y x y x y x y Client 1 x y x y x y x y Client 2 x y x y x y x y .. x y x y x y x y In pseudo-MDX: each x above = Sum(DimTime.startmonth imTime.endmonth,Measures.[Amount]). The start and end dates of the period are parameters: they are entered by the user and injected into the MDX. This makes it impossible to aggregate the sums beforehand. The report should only include those clients that have at least one (sub)product. Also only those (sub)products that are associated with at least one client should be shown. Because of the nature of this report a lot of data will be presented to the user. The dimension DimClient contains 12000 elements, DimProducts contains 60 elements. Depending on the chosen period and possibly other slicers (among other dimensions) the report may return up to 12000 rows and 120 columns. This looks like a pretty standard report to me, yet I could not find an MDX-pattern for this type of problem on the net. Neither did I find a hint in Spofford's book. Here are the 2 approaches I devised myself. MDX 1 - Selection based on Filter() ----------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' member [Measures].[AllClientPeriodSums] as '([DimClient].[All DimClient],[Measures].[SumPeriod])+([DimClient].[All DimClient],[Measures].[SumPeriod Y-1])' select { Crossjoin( Hierarchize( Filter( [DimProduct].[SubProduct].members + [DimProduct].[Product].members ,not(isempty([Measures].[AllClientPeriodSums])))) ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]}) } on columns, { Filter([DimClient].members,([Measures].[SumPeriod]<>0 or [Measures].[SumPeriod Y-1]<>0)) } on rows from TheCube where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember]) The above in words: - Filter all Products and SubProducts for which the "All" level in DimClient has a non-empty value for the sums over the sets SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least 1 client has a value for that (sub)product, so that (sub)product must be retained. - Filter all Clients based on the test if the sums are <> 0. (this should be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due to extra requirements of our users it is implemented like this). MDX 2 - Selection based on NonEmptyCrossjoin() ---------------------------------------------- with set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]' set [PrevSelectedMonths] as 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])' member [Measures].[SumPeriod] as 'sum([SelectedMonths], [Measures].[Amount])' member [Measures].[SumPeriod Y-1] as 'sum([PrevSelectedMonths], [Measures].[Amount])' set [y-axis] as 'Hierarchize( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2) ,[DimClient]))' set [x-axis-products] as 'Hierarchize( Generate( Extract( NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2) ,[DimProduct]) ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))' select { Crossjoin([x-axis-products], {[Measures].[SumPeriod], [Measures].[SumPeriod Y-1]}) } on columns, { [y-axis] } on rows from TheCube where ([DimSomeOtherDimension].[SomeMember]) The above in words: - Find non-empty crossings between the period sets and clients based on the Amount measure. From this set, extract the Clients and hierarchize. These are the clients that go on the Y-axis of the report. - Find non-empty crossings between the period sets and subproducts based on the Amount measure. Extract the Product members from that set. Add the parents, i.e. the Products and hierarchize. These are the products that will go on the X-axis of the report. Questions --------- 1) Both solutions are rather slow. Is there an MDX 3 I should implement instead ? I there a better approach ? I am rather new to MDX and I find it dazzling at times... 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above) the report is up to 60% slower than without any slicer element at all! Without the where-clause in the MDX the report executes faster than for MDX with a where-clause that limits the data to some 70%. What is happening ?? 3) Is there a way to incorporate via cube-design a better/faster solution ? Keep in mind that both start and end months are variables. 4) [May be unrelated to this particular report] I noticed that even on a medium sized desktop pc that runs Analysis Services locally the performance is way better than in a client/server setup with a high-end desktop and a high-end server... This has me baffled. The network usage is low and never a bottleneck. The server is dedicated and there are no other users executing reports. We use Excel VBA with ADO MD to get to our data. I did follow all steps in the Microsoft whitepaper on Analysis Services performance. Many, many thanks for reading up to this point! Many more for all helpful information and remarks. Tom |
#9
| |||
| |||
|
#10
| |||
| |||
|
|
Hi Tom, I'm glad to hear we're making progress...! First of all, regarding the aggregates, the fact that performance didn't increase very much when you built the new ones only strengthens my opinion that your main problem is the amount of summing up you need to do. The fact that you are generating thousands of subqueries is also significant - if you haven't tried these properties already in your connection string, try using the following together or separately: Cache Ratio=0.01 Cache Ratio2=0.01 Cache Policy = 5 or 6 or 7 (in theory 7 should be the one that makes the difference) I've found that various combinations of these reduce the number of subqueries sent to the server, and therefore speed up overall query performance. And also, as I think I said in an earlier mail, make sure you have turned off query logging on the server - with this many subqueries, logging every single one can introduce a fairly big overhead. Regarding the extra levels, the reason I proposed two hidden levels rather than a single quarter level between month and year is that it should increase the opportunities you have for optimisation - you have two sets of aggregations you could hit instead of one. But this is something you'll need to test yourself to see if the impact is worth the extra work. Last of all, partitioning should always have positive impact on this type of query, especially if you are partitioning by month. By setting the data slice property on a partition in your scenario, you're saying to the AS engine that instead of having to search through the whole cube for data for the month January 1999 (for example), it can go directly to the partition you have specified. Therefore the amount of time actually taken to read the data off disk is greatly reduced. There are two possible reasons why you're seeing a performance decrease, in my opinion: 1) You haven't set the data slice property properly (although I'm sure you have!) 2) When you reprocessed the cube to include the new partitions, and reran your test queries, the cache on the server was empty (as a result of the reprocess) and you were comparing this performance with queries run on your old cube when the cache was pre-populated with useful data. What I generally do is stop and start the AS service and close and reopen the client connection before each query run when I'm doing performance testing to try to reduce the impact of caching as much as possible. However, as we already established when looking at your aggregations, it's the summing up of data and not the reading from disk that is your problem, so I would imagine that any benefit partitioning will bring you will be relatively minor. I would definitely keep it in your cube design though. Regards, Chris |
![]() |
| Thread Tools | |
| Display Modes | |
| |