dbTalk Databases Forums  

Performance Issue (Many boolean dimensions)

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Performance Issue (Many boolean dimensions) in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Robhar
 
Posts: n/a

Default Performance Issue (Many boolean dimensions) - 03-13-2006 , 11:08 AM






Hello All

I'm having a performance issue with numerous boolean dimensions. I have a
cube that needs to have 21 true / false dimensions (The cube is to track
pesticide applicator certifications).

Number of rows is relatively small (45,000) but when it tries to calculate
the aggregates it just goes bye-bye. I've had large dimensions without
problems but booleans have allways been a problem.

Is there a way to handle large numbers of boolean dimensions? Is it any
better in 2005? Any help is appreciated.

Robhar

Reply With Quote
  #2  
Old   
T.K. Anand [MSFT]
 
Posts: n/a

Default Re: Performance Issue (Many boolean dimensions) - 03-13-2006 , 07:23 PM






What exactly do you mean by "it tries to calculate the aggregates it just
goes bye-bye"? Did you use the aggregation designer to build aggs? How many
aggs? Does the cube process fine with no aggregations?

With regard to boolean dimensions, 2005 is pretty similar to 2000.

--
This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.


"Robhar" <Robhar (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hello All

I'm having a performance issue with numerous boolean dimensions. I have a
cube that needs to have 21 true / false dimensions (The cube is to track
pesticide applicator certifications).

Number of rows is relatively small (45,000) but when it tries to calculate
the aggregates it just goes bye-bye. I've had large dimensions without
problems but booleans have allways been a problem.

Is there a way to handle large numbers of boolean dimensions? Is it any
better in 2005? Any help is appreciated.

Robhar



Reply With Quote
  #3  
Old   
Vladimir Chtepa
 
Posts: n/a

Default Re: Performance Issue (Many boolean dimensions) - 03-14-2006 , 09:26 AM



Hi,

Try to conbine your 21 bool dimensions in one dimenson, that key is int (32
bit suffices for 32 bit dimensions) or long (64 bit), and make your original
dimensions as attruibutes of new one,

Vladimir Chtepa

"Robhar" <Robhar (AT) discussions (DOT) microsoft.com> schrieb im Newsbeitrag
news:C76D5ADD-BA8D-4A58-9172-0990757B8B67 (AT) microsoft (DOT) com...
Quote:
Hello All

I'm having a performance issue with numerous boolean dimensions. I have a
cube that needs to have 21 true / false dimensions (The cube is to track
pesticide applicator certifications).

Number of rows is relatively small (45,000) but when it tries to calculate
the aggregates it just goes bye-bye. I've had large dimensions without
problems but booleans have allways been a problem.

Is there a way to handle large numbers of boolean dimensions? Is it any
better in 2005? Any help is appreciated.

Robhar



Reply With Quote
  #4  
Old   
Robhar
 
Posts: n/a

Default Re: Performance Issue (Many boolean dimensions) - 03-14-2006 , 10:07 AM



I thought of that but I'm unsure if it would work and still let them do the
analysis they want.

When I count the rows in the certifications table there are only 889 of
them. Of these only 118 distinct combinations.

I thought of using the certification combinations as the lookup but in the
analysis they want to just pull out record counts for one particular
certification or combination of any number of certifications. I've asked for
more info from the users but no response yet on what certification
combinations they would want. I was hoping I could drop some or come up with
logical groupings.

They goal is to generate KPI measures for inspections. They are trying to
measure results for industry sectors of pesticide applicators. If they can
show that one area is doing poorly then they can target that one area. The
certifications is how they know what that business is allowed to do and under
what conditions.

Rob

"Vladimir Chtepa" wrote:

Quote:
Hi,

Try to conbine your 21 bool dimensions in one dimenson, that key is int (32
bit suffices for 32 bit dimensions) or long (64 bit), and make your original
dimensions as attruibutes of new one,

Vladimir Chtepa

"Robhar" <Robhar (AT) discussions (DOT) microsoft.com> schrieb im Newsbeitrag
news:C76D5ADD-BA8D-4A58-9172-0990757B8B67 (AT) microsoft (DOT) com...
Hello All

I'm having a performance issue with numerous boolean dimensions. I have a
cube that needs to have 21 true / false dimensions (The cube is to track
pesticide applicator certifications).

Number of rows is relatively small (45,000) but when it tries to calculate
the aggregates it just goes bye-bye. I've had large dimensions without
problems but booleans have allways been a problem.

Is there a way to handle large numbers of boolean dimensions? Is it any
better in 2005? Any help is appreciated.

Robhar




Reply With Quote
  #5  
Old   
Robhar
 
Posts: n/a

Default Re: Performance Issue (Many boolean dimensions) - 03-14-2006 , 10:26 AM



When Analysis manager is trying to calculate the aggregates it starts off
pretty good but gradually slows down until it grinds to a complete stop. I
actually let it run over the weekend for 2.5 days and it didn't work. On the
plus side if you just let it process the cube it does. Performance suffers if
I use my test data (350,000 rows) but is tolerable with production (47,000
rows).

With all the flags in the cube (21 of them) it gets to around 1400
aggregates and about 2 % performance increase before grinding to a stop.

Because of the small number of values for the dimensions the cube is
relatively small in size (a few meg).

I'm not sure as I have only been working with Analysis server for a short
time. It seems that a great deal is done to optimize the calculations of
aggregates for large dimensions but nothing seems to be optimized for small
dimensions.

When you think about it, how many people have 21 true / false columns in a
table. That works out to 2**21 0r 2097152 possible combinations. It seems to
want to do them all.

I've tried cutting back the number of true/false columns. Seems I can only
get about 1/2 dozen in there.

I've tested other complex cubes. I can do a date dimension covering 100
years and another with about 180 values easier then doing two true/false
dimensions.

That's why I was wondering if there are any tricks to working with boolean
dimensions. It just doesn't seem to be optimized for those. I was hoping this
had changed in 2005.

Rob

"T.K. Anand [MSFT]" wrote:

Quote:
What exactly do you mean by "it tries to calculate the aggregates it just
goes bye-bye"? Did you use the aggregation designer to build aggs? How many
aggs? Does the cube process fine with no aggregations?

With regard to boolean dimensions, 2005 is pretty similar to 2000.

--
This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.


"Robhar" <Robhar (AT) discussions (DOT) microsoft.com> wrote in message
news:C76D5ADD-BA8D-4A58-9172-0990757B8B67 (AT) microsoft (DOT) com...
Hello All

I'm having a performance issue with numerous boolean dimensions. I have a
cube that needs to have 21 true / false dimensions (The cube is to track
pesticide applicator certifications).

Number of rows is relatively small (45,000) but when it tries to calculate
the aggregates it just goes bye-bye. I've had large dimensions without
problems but booleans have allways been a problem.

Is there a way to handle large numbers of boolean dimensions? Is it any
better in 2005? Any help is appreciated.

Robhar




Reply With Quote
  #6  
Old   
BK
 
Posts: n/a

Default Re: Performance Issue (Many boolean dimensions) - 03-15-2006 , 08:43 AM



Hi Rob,
We've go a production cube with 77 dims (167 levels) of which 46 of
them are "True/False." It spans 3 years of social/health data equating
to approx 150,000 rows. Have many others with row counts way into the
XX millions with 15 to 20 true false. Most all of our cubes can be
processed in under half and hour. We process MOLAP with relatively few
aggregations though.

Initially we had many of then constructed as composite variables so
that one variable of say length 10 would house all possible
combinations for those 10 flags of a related condition. There were two
main issues we ran into with that. The first was a slightly longer
process time, and the second (& killer for us) was the burden on the #
of levels. MSAS00 has a 126 dim constraint and a 256 level constraint,
and we were running out of levels long before running out of
dimensions.

So we decided to split them out into individual columns on the table as
int/smallint/tinyint style variables and things are running great for
us... Because you can (simingly - havent tried) nest variables
infinitely we've never had a problem finding the right combinations.

Byron


Reply With Quote
  #7  
Old   
Robhar
 
Posts: n/a

Default Re: Performance Issue (Many boolean dimensions) - 03-15-2006 , 10:01 AM



Hi Byron

Your cube is a lot more complex then mine but you do have the same situation.

My cube does process ok but it takes longer then yours. Likely because of
hardware or the fact that I am stuck in a VM world. My biggest tables are
only around 15 million but the have very few dimensions so I never had a
problem with them.

Does the fact that you have very few aggregates have a negative impact on
performance? From what I had read I understood that the more aggregates you
let it do the better your performance. I allways let it get to 100%
performance increase but with this cube and all the true false columns I
couldn't get beyond 3%.

Robert Hart

"BK" wrote:

Quote:
Hi Rob,
We've go a production cube with 77 dims (167 levels) of which 46 of
them are "True/False." It spans 3 years of social/health data equating
to approx 150,000 rows. Have many others with row counts way into the
XX millions with 15 to 20 true false. Most all of our cubes can be
processed in under half and hour. We process MOLAP with relatively few
aggregations though.

Initially we had many of then constructed as composite variables so
that one variable of say length 10 would house all possible
combinations for those 10 flags of a related condition. There were two
main issues we ran into with that. The first was a slightly longer
process time, and the second (& killer for us) was the burden on the #
of levels. MSAS00 has a 126 dim constraint and a 256 level constraint,
and we were running out of levels long before running out of
dimensions.

So we decided to split them out into individual columns on the table as
int/smallint/tinyint style variables and things are running great for
us... Because you can (simingly - havent tried) nest variables
infinitely we've never had a problem finding the right combinations.

Byron



Reply With Quote
  #8  
Old   
BK
 
Posts: n/a

Default Re: Performance Issue (Many boolean dimensions) - 03-15-2006 , 11:13 AM



Typically we don't have data retrieval issues with the smaller cubes
(<3M rows) from a time prospective. We process everything MOLAP and if
the cube is left intact or only has data added to it (via partitions)
then your cube will systematically speed up simply due to use. I
don't think we've ever let performance go over about 20%. When I
took an MDX class last year the instructor sad that in all of his
testing that that make was about where you hit sort of a trade off
point in process time vs. performance gains. Also you can use the
usage-based optimization for your new partitions so that the aggs.
being constructed are the most appropriate.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.