dbTalk Databases Forums  

better formula for DCount....

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss better formula for DCount.... in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jéjé
 
Posts: n/a

Default better formula for DCount.... - 11-22-2004 , 09:55 PM






Hi,



what is the better formula:

1. count(nonemptycrossjoin([Customers].[CustomerKey].members,
{[Measures].[Sales]}))

2. Count(crossjoin({[Measures].[Sales]},
descendants([Customers].Currentmember, [Customers].[CustomerKey])),
excludeempty)

3. count(filter([Customers].CustomerKey.members, not
isempty([Measures].[Sales])))

4. count(filter(descendants([Customers].Currentmember,
[Customers].[CustomerKey]), [Measures].[Sales]>0))



My "customers" dimension is hidden for the end user (but I use a lot of
virtual dimensions like gender...)

For the moment I've 20 000 members in this dimension, but I plan 100 000
members.

Also, how to display a NULL value instead-of a 0 when I've no sales?

these 4 formulas always display 0 (and the "non-empty behavior" is set to
"Sales")

My customers dimension is optimized at the lowest level only

For the moment, the formula n.1 appear to have the better performance.



thanks for your feedback.



Reply With Quote
  #2  
Old   
Deepak Puri
 
Posts: n/a

Default Re: better formula for DCount.... - 11-23-2004 , 12:40 AM






To get NULL when count is 0, can't you wrap it in iif()?


NonEmptyCrossJoin() (ie. option 1) will usually be faster with sparse
data. This article by Russ Whitney explains:

http://www.winnetmag.net/Articles/Ar...00/pg/2/2.html
Quote:
...
I was recently involved in a customer deployment in which the
NONEMPTYCROSSJOIN() function changed a query's performance from 15
minutes to about 3 seconds. That's compelling!
...
But what actually happens is that Analysis Services retrieves the
non-empty portion of the cube for California, then cross-joins the
products and sales representatives that exist in that portion of the
cube. Performing the NONEMPTYCROSSJOIN() function in this order gives
the function a significant performance advantage over filtering the
items after the cross join. Because cubes are generally sparse, Analysis
Services stores only the non-null values; and when the client-side
driver, PivotTable Service, requests data from the server, it receives
only non-null blocks of cube data. NONEMPTYCROSSJOIN() takes advantage
of this method of storing and communicating data in Analysis Services.
If you think about it, creating a long list of empty entries with the
CROSSJOIN() function only to turn around and eliminate them with the
FILTER() function is a giant waste of CPU time.
...
Quote:

- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Reply With Quote
  #3  
Old   
Jéjé
 
Posts: n/a

Default Re: better formula for DCount.... - 11-23-2004 , 07:47 AM



the nonempty appear to works fine...
but I have some performance problems when I've calculated measures which
used this DCount calculated measure !

I work on it...

"Deepak Puri" <deepak_puri (AT) progressive (DOT) com> wrote

Quote:
To get NULL when count is 0, can't you wrap it in iif()?


NonEmptyCrossJoin() (ie. option 1) will usually be faster with sparse
data. This article by Russ Whitney explains:

http://www.winnetmag.net/Articles/Ar...00/pg/2/2.html

..
I was recently involved in a customer deployment in which the
NONEMPTYCROSSJOIN() function changed a query's performance from 15
minutes to about 3 seconds. That's compelling!
..
But what actually happens is that Analysis Services retrieves the
non-empty portion of the cube for California, then cross-joins the
products and sales representatives that exist in that portion of the
cube. Performing the NONEMPTYCROSSJOIN() function in this order gives
the function a significant performance advantage over filtering the
items after the cross join. Because cubes are generally sparse, Analysis
Services stores only the non-null values; and when the client-side
driver, PivotTable Service, requests data from the server, it receives
only non-null blocks of cube data. NONEMPTYCROSSJOIN() takes advantage
of this method of storing and communicating data in Analysis Services.
If you think about it, creating a long list of empty entries with the
CROSSJOIN() function only to turn around and eliminate them with the
FILTER() function is a giant waste of CPU time.
..



- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!



Reply With Quote
  #4  
Old   
Chris Webb
 
Posts: n/a

Default Re: better formula for DCount.... - 11-23-2004 , 10:07 AM



When you're doing this kind of distinct count on large dimension one of the
causes of performance problems can be the size of the dimension itself. It
might be worth checking out this thread
http://groups.google.co.uk/groups?hl...40dbforums.com

It shows how you can split a large customer dimension up into several much
smaller ones by splitting the key value - in your case you could create four
dimensions of ten members each by decomposing the key into 10s, 100s, 1000s
and 10000s. Your calculated member would therefore change from being
COUNT(NONEMPTYCROSSJOIN(CUSTOMERS.CUSTOMERKEY.MEMB ERS))
to something like
COUNT(NONEMPTYCROSSJOIN(CUSTOMERS1.CUSTOMERKEY1.ME MBERS,
CUSTOMERS2.CUSTOMERKEY2.MEMBERS, CUSTOMERS3.CUSTOMERKEY3.MEMBERS,
CUSTOMERS4.CUSTOMERKEY4.MEMBERS))
Since each of these dimensions has only a small number of members,
performance is likely to improve because less memory is needed to store the
dimension on the server, and there is no longer any risk of having to
download a large dimension to the client. You aren't showing your customer
dimension to your users so it won't matter to them that it doesn't exist any
more!

That said, your partition and aggregation strategies are also going to be
important here too...

HTH,

Chris



"Jéjé" wrote:

Quote:
the nonempty appear to works fine...
but I have some performance problems when I've calculated measures which
used this DCount calculated measure !

I work on it...

"Deepak Puri" <deepak_puri (AT) progressive (DOT) com> wrote in message
news:OSJjFdS0EHA.1568 (AT) tk2msftngp13 (DOT) phx.gbl...
To get NULL when count is 0, can't you wrap it in iif()?


NonEmptyCrossJoin() (ie. option 1) will usually be faster with sparse
data. This article by Russ Whitney explains:

http://www.winnetmag.net/Articles/Ar...00/pg/2/2.html

..
I was recently involved in a customer deployment in which the
NONEMPTYCROSSJOIN() function changed a query's performance from 15
minutes to about 3 seconds. That's compelling!
..
But what actually happens is that Analysis Services retrieves the
non-empty portion of the cube for California, then cross-joins the
products and sales representatives that exist in that portion of the
cube. Performing the NONEMPTYCROSSJOIN() function in this order gives
the function a significant performance advantage over filtering the
items after the cross join. Because cubes are generally sparse, Analysis
Services stores only the non-null values; and when the client-side
driver, PivotTable Service, requests data from the server, it receives
only non-null blocks of cube data. NONEMPTYCROSSJOIN() takes advantage
of this method of storing and communicating data in Analysis Services.
If you think about it, creating a long list of empty entries with the
CROSSJOIN() function only to turn around and eliminate them with the
FILTER() function is a giant waste of CPU time.
..



- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!




Reply With Quote
  #5  
Old   
Jéjé
 
Posts: n/a

Default Re: better formula for DCount.... - 11-23-2004 , 12:03 PM



good idea

but, what's appends if I'm using virtual dimensions based on my customers
dimension?
For example, I've the customer gender.
Because I'm using only 1 dimension for the moment, I've no problems.
My customers dimension is hidden, and I only use virtual dimensions. (but I
count customers)

in the case of splitting into 3 columns (and 3 dimensions) I think I have to
return the Gender within the Fact view and I have to create a real dimension
not a virtual one.


"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote

Quote:
When you're doing this kind of distinct count on large dimension one of
the
causes of performance problems can be the size of the dimension itself. It
might be worth checking out this thread:
http://groups.google.co.uk/groups?hl...40dbforums.com

It shows how you can split a large customer dimension up into several much
smaller ones by splitting the key value - in your case you could create
four
dimensions of ten members each by decomposing the key into 10s, 100s,
1000s
and 10000s. Your calculated member would therefore change from being
COUNT(NONEMPTYCROSSJOIN(CUSTOMERS.CUSTOMERKEY.MEMB ERS))
to something like
COUNT(NONEMPTYCROSSJOIN(CUSTOMERS1.CUSTOMERKEY1.ME MBERS,
CUSTOMERS2.CUSTOMERKEY2.MEMBERS, CUSTOMERS3.CUSTOMERKEY3.MEMBERS,
CUSTOMERS4.CUSTOMERKEY4.MEMBERS))
Since each of these dimensions has only a small number of members,
performance is likely to improve because less memory is needed to store
the
dimension on the server, and there is no longer any risk of having to
download a large dimension to the client. You aren't showing your customer
dimension to your users so it won't matter to them that it doesn't exist
any
more!

That said, your partition and aggregation strategies are also going to be
important here too...

HTH,

Chris



"Jéjé" wrote:

the nonempty appear to works fine...
but I have some performance problems when I've calculated measures which
used this DCount calculated measure !

I work on it...

"Deepak Puri" <deepak_puri (AT) progressive (DOT) com> wrote in message
news:OSJjFdS0EHA.1568 (AT) tk2msftngp13 (DOT) phx.gbl...
To get NULL when count is 0, can't you wrap it in iif()?


NonEmptyCrossJoin() (ie. option 1) will usually be faster with sparse
data. This article by Russ Whitney explains:

http://www.winnetmag.net/Articles/Ar...00/pg/2/2.html

..
I was recently involved in a customer deployment in which the
NONEMPTYCROSSJOIN() function changed a query's performance from 15
minutes to about 3 seconds. That's compelling!
..
But what actually happens is that Analysis Services retrieves the
non-empty portion of the cube for California, then cross-joins the
products and sales representatives that exist in that portion of the
cube. Performing the NONEMPTYCROSSJOIN() function in this order gives
the function a significant performance advantage over filtering the
items after the cross join. Because cubes are generally sparse,
Analysis
Services stores only the non-null values; and when the client-side
driver, PivotTable Service, requests data from the server, it receives
only non-null blocks of cube data. NONEMPTYCROSSJOIN() takes advantage
of this method of storing and communicating data in Analysis Services.
If you think about it, creating a long list of empty entries with the
CROSSJOIN() function only to turn around and eliminate them with the
FILTER() function is a giant waste of CPU time.
..



- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!






Reply With Quote
  #6  
Old   
Deepak Puri
 
Posts: n/a

Default Re: better formula for DCount.... - 11-23-2004 , 10:30 PM



Another old SQL OLAP thread you can consider, if your customer dimension
has multiple levels, is entitled: "Working with big number of members in
one dimention". In that thread, NonEmptyCrossJoin() descends 1 level at
a time using Generate(), rather than being applied directly to the leaf
level. This is efficient when higher nodes can be entirely eliminated
without expansion:

http://groups.google.com/groups?hl=e...DHA.452%40TK2M
SFTNGP11.phx.gbl&rnum=9&prev=/groups%3Fq%3Dgroup:microsoft.public.sqlser
ver.olap%2Bauthoruri%26hl%3Den%26lr%3D%26scoring %3Dd%26as_drrb%3Db%26a
s_mind%3D1%26as_minm%3D3%26as_miny%3D2003%26as_max d%3D31%26as_maxm%3D6%2
6as_maxy%3D2003%26selm%3DuNGdGNCMDHA.452%2540TK2MS FTNGP11.phx.gbl%26rnum
%3D9


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
  #7  
Old   
Chris Webb
 
Posts: n/a

Default Re: better formula for DCount.... - 11-24-2004 , 03:45 AM



Yes, as you've realised, this approach would mean quite a bit of remodelling
and your virtual dimensions would need to be converted to regular dimensions
hanging directly off the fact table.

Chris

"Jéjé" wrote:

Quote:
good idea

but, what's appends if I'm using virtual dimensions based on my customers
dimension?
For example, I've the customer gender.
Because I'm using only 1 dimension for the moment, I've no problems.
My customers dimension is hidden, and I only use virtual dimensions. (but I
count customers)

in the case of splitting into 3 columns (and 3 dimensions) I think I have to
return the Gender within the Fact view and I have to create a real dimension
not a virtual one.


"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:A4DA1C37-B0CA-48AE-96C6-B857B56D6344 (AT) microsoft (DOT) com...
When you're doing this kind of distinct count on large dimension one of
the
causes of performance problems can be the size of the dimension itself. It
might be worth checking out this thread:
http://groups.google.co.uk/groups?hl...40dbforums.com

It shows how you can split a large customer dimension up into several much
smaller ones by splitting the key value - in your case you could create
four
dimensions of ten members each by decomposing the key into 10s, 100s,
1000s
and 10000s. Your calculated member would therefore change from being
COUNT(NONEMPTYCROSSJOIN(CUSTOMERS.CUSTOMERKEY.MEMB ERS))
to something like
COUNT(NONEMPTYCROSSJOIN(CUSTOMERS1.CUSTOMERKEY1.ME MBERS,
CUSTOMERS2.CUSTOMERKEY2.MEMBERS, CUSTOMERS3.CUSTOMERKEY3.MEMBERS,
CUSTOMERS4.CUSTOMERKEY4.MEMBERS))
Since each of these dimensions has only a small number of members,
performance is likely to improve because less memory is needed to store
the
dimension on the server, and there is no longer any risk of having to
download a large dimension to the client. You aren't showing your customer
dimension to your users so it won't matter to them that it doesn't exist
any
more!

That said, your partition and aggregation strategies are also going to be
important here too...

HTH,

Chris



"Jéjé" wrote:

the nonempty appear to works fine...
but I have some performance problems when I've calculated measures which
used this DCount calculated measure !

I work on it...

"Deepak Puri" <deepak_puri (AT) progressive (DOT) com> wrote in message
news:OSJjFdS0EHA.1568 (AT) tk2msftngp13 (DOT) phx.gbl...
To get NULL when count is 0, can't you wrap it in iif()?


NonEmptyCrossJoin() (ie. option 1) will usually be faster with sparse
data. This article by Russ Whitney explains:

http://www.winnetmag.net/Articles/Ar...00/pg/2/2.html

..
I was recently involved in a customer deployment in which the
NONEMPTYCROSSJOIN() function changed a query's performance from 15
minutes to about 3 seconds. That's compelling!
..
But what actually happens is that Analysis Services retrieves the
non-empty portion of the cube for California, then cross-joins the
products and sales representatives that exist in that portion of the
cube. Performing the NONEMPTYCROSSJOIN() function in this order gives
the function a significant performance advantage over filtering the
items after the cross join. Because cubes are generally sparse,
Analysis
Services stores only the non-null values; and when the client-side
driver, PivotTable Service, requests data from the server, it receives
only non-null blocks of cube data. NONEMPTYCROSSJOIN() takes advantage
of this method of storing and communicating data in Analysis Services.
If you think about it, creating a long list of empty entries with the
CROSSJOIN() function only to turn around and eliminate them with the
FILTER() function is a giant waste of CPU time.
..



- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!







Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.