dbTalk Databases Forums  

MDX Rank function - how to use it efficiently?

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss MDX Rank function - how to use it efficiently? in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Stan Kondrat
 
Posts: n/a

Default MDX Rank function - how to use it efficiently? - 12-13-2004 , 12:37 PM






Hi,

I'm running the sample query (see below) on the Foodmart 2000 Sales cube to
rank a couple of measures over a large set of customers by Gender and
Product. The query below gives me correct results but is very slow.

Is there a more efficient way to produce the rankings?
Is the Rank function not meant to be used over a large set?

If I don't use the Order function with the second argument of the Rank
function, then the results are incorrect (contrary to MDX documentation)
always showing the ordinal in the set, not dependent on calculated expression.

Here is my sample query:

WITH
SET [RowSet0] AS '{[Product].[All Products].[Drink],[Product].[All
Products].[Food]}'
SET [RowSet1] AS 'Descendants([Customers].[All Customers].[USA])'
SET [ColumnSet0] AS '{[Gender].[All Gender].[F],[Gender].[All
Gender].[M]}'
SET [ColumnSet1] AS '{[Measures].[Store Cost],[Measures].[Store Sales]}'
MEMBER [Measures].[Rank0] AS 'Rank([Customers].CurrentMember,
Order({[RowSet1]},
([Product].CurrentMember,[Gender].CurrentMember,[Measures].[Store
Cost]),BDESC),
([Product].CurrentMember,[Gender].CurrentMember,[Measures].[Store
Cost]))'
MEMBER [Measures].[Rank1] AS 'Rank([Customers].CurrentMember,
Order({[RowSet1]},
([Product].CurrentMember,[Gender].CurrentMember,[Measures].[Store
Sales]),BDESC),
([Product].CurrentMember,[Gender].CurrentMember,[Measures].[Store
Sales]))'
SET [RankSet] AS
'{Subset([ColumnSet1],0,1),[Measures].[Rank0],Subset([ColumnSet1],1,1),
[Measures].[Rank1],Subset([ColumnSet1],2)}'
SET [RowSet] AS '[RowSet0] * [RowSet1]'
SET [ColumnSet] AS '[ColumnSet0] * [RankSet]'
SELECT
Subset([RowSet], 0, 100) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Sales]

Any suggestions?

Thanks,
Stan

Reply With Quote
  #2  
Old   
Deepak Puri
 
Posts: n/a

Default Re: MDX Rank function - how to use it efficiently? - 12-13-2004 , 06:34 PM






Using NonEmptyCrossJoin() when ranking seems to reduce query time by
about 50% (still slow, though). That may be because each customer falls
into just 1 of 2 genders:

Quote:
WITH
SET [RowSet0] AS '{[Product].[All Products].[Drink],[Product].[All
Products].[Food]}'
SET [RowSet1] AS 'Descendants([Customers].[All Customers].[USA])'
SET [ColumnSet0] AS '{[Gender].[All Gender].[F],[Gender].[All
Gender].[M]}'
SET [ColumnSet1] AS '{[Measures].[Store Cost],[Measures].[Store
Sales]}'
MEMBER [Measures].[Rank0] AS 'iif(IsEmpty([Measures].[Store Cost]),
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1).Count + 1,
Rank([Customers].CurrentMember, Order(
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1),
[Measures].[Store Cost],BDESC)))'
MEMBER [Measures].[Rank1] AS 'iif(IsEmpty([Measures].[Store Sales]),
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1).Count + 1,
Rank([Customers].CurrentMember, Order(
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1),
[Measures].[Store Sales],BDESC)))'
SET [RankSet] AS
'{Subset([ColumnSet1],0,1),[Measures].[Rank0],Subset([ColumnSet1],1,1),
[Measures].[Rank1],Subset([ColumnSet1],2)}'
SET [RowSet] AS '[RowSet0] * [RowSet1]'
SET [ColumnSet] AS '[ColumnSet0] * [RankSet]'
SELECT
Subset([RowSet], 0, 100) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Sales]
Quote:

- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Reply With Quote
  #3  
Old   
Stan Kondrat
 
Posts: n/a

Default Re: MDX Rank function - how to use it efficiently? - 12-14-2004 , 09:47 AM



Thanks Deepak, but this suggestion doesn't really help much. In the real
application I have to deal with various calculated members, so
NonEmptyCrossJoin is not an option.

I think the slowness is caused by re-ordering the set for each ranked cell
(because the Order function is the calculated member formula). For simpler
cases, where the sorted sets can be generated in advance before ranking, the
performance is not bad, as in the following example ("TopCount" seems to be
faster than "order"):

WITH
SET [RowSet0] AS 'Descendants([Customers].[USA])'
SET [ColumnSet0] AS '{[Measures].[Store Cost],[Measures].[Store Sales]}'
SET [RankSet0] AS 'TopCount([RowSet0],Count([RowSet0]),
([Measures].[Store Cost]))'
SET [RankSet1] AS 'TopCount([RowSet0],Count([RowSet0]),
([Measures].[Store Sales]))'
MEMBER [Measures].[Rank0] AS 'Rank(
[Customers].CurrentMember,
[RankSet0])'
MEMBER [Measures].[Rank1] AS 'Rank([Customers].CurrentMember,
[RankSet1],
([Measures].[Store Sales]))'
SET [RankSet] AS
'{Subset([ColumnSet0],0,1),[Measures].[Rank0],Subset([ColumnSet0],1,1),
[Measures].[Rank1],Subset([ColumnSet0],2)}'
SET [RowSet] AS '[RowSet0]'
SET [ColumnSet] AS '[RankSet]'
SELECT
Subset( [RowSet],0,2000) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Sales]

Is there a way to somehow cache the sorted sets (i.e. not re-sort on every
cell) in more complicated cases (where the number of unique sorted sets is
too large to pre-define)?

Thanks,
Stan
"Deepak Puri" wrote:

Quote:
Using NonEmptyCrossJoin() when ranking seems to reduce query time by
about 50% (still slow, though). That may be because each customer falls
into just 1 of 2 genders:


WITH
SET [RowSet0] AS '{[Product].[All Products].[Drink],[Product].[All
Products].[Food]}'
SET [RowSet1] AS 'Descendants([Customers].[All Customers].[USA])'
SET [ColumnSet0] AS '{[Gender].[All Gender].[F],[Gender].[All
Gender].[M]}'
SET [ColumnSet1] AS '{[Measures].[Store Cost],[Measures].[Store
Sales]}'
MEMBER [Measures].[Rank0] AS 'iif(IsEmpty([Measures].[Store Cost]),
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1).Count + 1,
Rank([Customers].CurrentMember, Order(
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1),
[Measures].[Store Cost],BDESC)))'
MEMBER [Measures].[Rank1] AS 'iif(IsEmpty([Measures].[Store Sales]),
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1).Count + 1,
Rank([Customers].CurrentMember, Order(
NonEmptyCrossJoin([RowSet1], {[Product].CurrentMember},
{[Gender].CurrentMember}, 1),
[Measures].[Store Sales],BDESC)))'
SET [RankSet] AS
'{Subset([ColumnSet1],0,1),[Measures].[Rank0],Subset([ColumnSet1],1,1),
[Measures].[Rank1],Subset([ColumnSet1],2)}'
SET [RowSet] AS '[RowSet0] * [RowSet1]'
SET [ColumnSet] AS '[ColumnSet0] * [RankSet]'
SELECT
Subset([RowSet], 0, 100) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Sales]



- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Reply With Quote
  #4  
Old   
Deepak Puri
 
Posts: n/a

Default Re: MDX Rank function - how to use it efficiently? - 12-14-2004 , 12:59 PM



Hi Stan,

Interesting question about whether the ranking sets can be cached - I
haven't done something similar so far, but maybe others on the newsgroup
have?

I'm puzzled by why TopCount(), as you used it, would out-perform
Order(), since it's ordering the same number of members - doing the same
thing.

When dealing with calculated measures, it's sometimes still possible to
use NonEmptyCrossJoin() for filtering. For example, if there is a
calculated average that is null whenever a base sum measure is, then
NECJ() can be used to filter out tuples of a set that have a null
average.


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
  #5  
Old   
Mosha Pasumansky [MS]
 
Posts: n/a

Default Re: MDX Rank function - how to use it efficiently? - 12-14-2004 , 06:43 PM



Quote:
I'm puzzled by why TopCount(), as you used it, would out-perform
Order(), since it's ordering the same number of members - doing the same
thing.
Actually no - algorithm for TopCount doesn't need to order the set, but
Order needs to. So Order is nlog(n), while TopCount is only nlog(k) which is
for small k's is same as n.

--
==================================================
Mosha Pasumansky - http://www.mosha.com/msolap
Analysis Services blog at http://www.sqljunkies.com/WebLog/mosha
Development Lead in the Analysis Server team
All you need is love (John Lennon)
Disclaimer : This posting is provided "AS IS" with no warranties, and
confers no rights.
==================================================

"Deepak Puri" <deepak_puri (AT) progressive (DOT) com> wrote

Quote:
Hi Stan,

Interesting question about whether the ranking sets can be cached - I
haven't done something similar so far, but maybe others on the newsgroup
have?

I'm puzzled by why TopCount(), as you used it, would out-perform
Order(), since it's ordering the same number of members - doing the same
thing.

When dealing with calculated measures, it's sometimes still possible to
use NonEmptyCrossJoin() for filtering. For example, if there is a
calculated average that is null whenever a base sum measure is, then
NECJ() can be used to filter out tuples of a set that have a null
average.


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!



Reply With Quote
  #6  
Old   
Deepak Puri
 
Posts: n/a

Default Re: MDX Rank function - how to use it efficiently? - 12-14-2004 , 09:09 PM



Hi Mosha,

Thanks for the insight into Order() vs TopCount() complexity - just
wanted to clarify whether 'k' is the number of values (Top-k) returned
by TopCount(), or some other range of key values. In the former case,
wouldn't an expression like this be O(n log n), since k = n?

'TopCount([RowSet0],Count([RowSet0]),
([Measures].[Store Cost]))'


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
  #7  
Old   
Deepak Puri
 
Posts: n/a

Default Re: MDX Rank function - how to use it efficiently? - 12-15-2004 , 12:14 AM



Hi Stan,

Based on the performance improvement you observed by pre-computing
relevant ordered sets, I came up with a "cached" set that works with
your original MDX query. All the ordered sets are saved in a single set,
by tagging each member with a qualifying "tuple" (in this case, a
combination of Product/Gender/Measure). So the Customer set: [RowSet1]
is re-ordered and stored 8 times. The query performance now seems more
acceptable:

Quote:
WITH
SET [RowSet0] AS '{[Product].[All Products].[Drink],[Product].[All
Products].[Food]}'
SET [RowSet1] AS 'Descendants([Customers].[All Customers].[USA])'
SET [ColumnSet0] AS '{[Gender].[All Gender].[F],[Gender].[All
Gender].[M]}'
SET [ColumnSet1] AS '{[Measures].[Store Cost],[Measures].[Store
Sales]}'
Set [VariableSet] as '[RowSet0] * [ColumnSet0] * [ColumnSet1]'
Set [CachedSet] as 'Generate([VariableSet], Order([RowSet1]
* {[VariableSet].Current}, Measures.CurrentMember, BDESC))'
MEMBER [Measures].[RawRank0] AS 'Rank(([Customers].CurrentMember,
[Product].CurrentMember, [Gender].CurrentMember, [Measures].[Store
Cost]),
[CachedSet], [Measures].[Store Cost])'
Member [Measures].[Rank0] as '[Measures].[RawRank0] -
(Int([Measures].[RawRank0]
/[RowSet1].Count) * [RowSet1].Count)'
MEMBER [Measures].[RawRank1] AS 'Rank(([Customers].CurrentMember,
[Product].CurrentMember, [Gender].CurrentMember, [Measures].[Store
Sales]),
[CachedSet], [Measures].[Store Sales])'
Member [Measures].[Rank1] as '[Measures].[RawRank1] -
(Int([Measures].[RawRank1]
/[RowSet1].Count) * [RowSet1].Count)'
SET [RankSet] AS
'{Subset([ColumnSet1],0,1),[Measures].[Rank0],Subset([ColumnSet1],1,1),
[Measures].[Rank1],Subset([ColumnSet1],2)}'
SET [RowSet] AS '[RowSet0] * [RowSet1]'
SET [ColumnSet] AS '[ColumnSet0] * [RankSet]'
SELECT
Subset([RowSet], 0, 100) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Sales]
Quote:

When retrieving the rank of the current Customer member, it is again
"tagged" with the correct tuple. But the rank has then to be normalized
by Mod (Customer count), because of the multiple Customer copies in the
"cache".

AFAIK, there's no built-in Mod function in MDX (I hope Yukon will
address this lacuna!), so I resorted to some kludgy integer arithmetic
to achieve it.


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Reply With Quote
  #8  
Old   
Mosha Pasumansky [MS]
 
Posts: n/a

Default Re: MDX Rank function - how to use it efficiently? - 12-15-2004 , 12:14 AM



Quote:
'TopCount([RowSet0],Count([RowSet0]), ([Measures].[Store Cost]))'
Yes, in this case k=n, however it is not clear what Stan refered to when he
said that here TopCount also beats Order, because in the example using
TopCount, the RankSet is generated as static set instead of being
dynalically built inside calculated member.
Even when both Order and TopCount are used in static sets, still TopCount
may beat Order, because Order usually uses variation of QuickSort, which has
nlog(n) as an average case, but may degrade to n^2 in some cases. TopCount
uses variation of priority queue sorting, which, BTW, also can degrade to
n^2 when k=n, it just that on some given dataset itmight have behaved
better.

--
==================================================
Mosha Pasumansky - http://www.mosha.com/msolap
Analysis Services blog at http://www.sqljunkies.com/WebLog/mosha
Development Lead in the Analysis Server team
All you need is love (John Lennon)
Disclaimer : This posting is provided "AS IS" with no warranties, and
confers no rights.
==================================================




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.