dbTalk Databases Forums  

Need help with cube design please

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Need help with cube design please in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
lc
 
Posts: n/a

Default Need help with cube design please - 11-01-2003 , 08:12 PM







I give up. I tried everything I thought I knew and the damn thing is still
way too slow. I would highly appreciate if you someone could give me a few
moments of your time to go through this (lengthy) question:

Here's what I have:
- a transaction table, also used as fact table. The table has about 200
fields. Of those, I use perhaps 30 for linking in dimensions and <10 for
measures. Is this a smart thing to do? Should I create a copy of this table
with only user and key fields in it?
- each transaction record is stamped with a weekending date, sales rep code,
client code, contact code, manager code, branch code and type of business (3
elements in corresponding dimension).
- I created one dimension for sales rep, one for branch and one for business
type
- the client dimension has two levels, company name and contact name
- finally, I created a time dimension for weekending date

The cube is MOLAP and was processed with 50% of optimization. I also tried
with 100% with no change in performance. Originally, the cube was done on
SQL7 OLAP and I just finished getting disappointed with performance on 2000
OA. I'm sure this must be something I did (or didn't do).

Now, O open excel and:
- drag branch to slice page and pick NY branch only
- drag weekendings to columns and choose only Q3 and Q4 of 2003
- drag Markup (calculated measure) to data page
- drag client to rows

So far, everything's blazingly fast. However, I now drag sales rep to rows
to the right of client... and kiss it goodbye. Not even 3 hours later would
I see any results. Excel goes mad, taking 100% of CPU and using ram like
there is no tomorrow. The same query fired in MDX Sample Application, though
sometimes (much) faster, is still far from what one would expect from
"precalculated" source.
As I said earlier, I tried "everything". I even changed the connection
string to Execution Location=3 and Default Isolation Mode=1. I just can't
understand why is performance so slow. Is there something wrong in a way I
designed the cube or.... or perhaps all of this is normal, is it?

Any idea would really be highly appreciated.

TIA

lc



Reply With Quote
  #2  
Old   
Ray Higdon
 
Posts: n/a

Default Re: Need help with cube design please - 11-01-2003 , 08:33 PM






Quote:
- a transaction table, also used as fact table.
You should use DTS or use some other tool to place the table you want to use
as your fact table in another database or preferably another server

Quote:
The table has about 200 fields.
Are all of those "columns" used in your cubes? If not, follow first advice
and eliminate any columns not needed in your destination

Quote:
Should I create a copy of this table with only user and key fields in it?
A fact table should consist of measures that are commiserate with the
granularity of your fact table and surrogate key connections to your dim
tables, this is a major performance benefit

I would highly recommend you go out and buy the book by Ralph Kimball called
"the data warehouse lifecycle toolkit" Working on cube design without
knowing warehouse design can be tough, although very tempting as the cubes
can be where you get your business ROI.

Some sites that may help you are:

www.dmreview.com
www.intelligententerprise.com
and
www.databasejournal.com

If you have more questions, please ask

HTH

--
Ray Higdon MCSE, MCDBA, CCNA
---
"lc" <lcdata (AT) hotmail (DOT) com> wrote

Quote:
I give up. I tried everything I thought I knew and the damn thing is still
way too slow. I would highly appreciate if you someone could give me a few
moments of your time to go through this (lengthy) question:

Here's what I have:
- a transaction table, also used as fact table. The table has about 200
fields. Of those, I use perhaps 30 for linking in dimensions and <10 for
measures. Is this a smart thing to do? Should I create a copy of this
table
with only user and key fields in it?
- each transaction record is stamped with a weekending date, sales rep
code,
client code, contact code, manager code, branch code and type of business
(3
elements in corresponding dimension).
- I created one dimension for sales rep, one for branch and one for
business
type
- the client dimension has two levels, company name and contact name
- finally, I created a time dimension for weekending date

The cube is MOLAP and was processed with 50% of optimization. I also tried
with 100% with no change in performance. Originally, the cube was done on
SQL7 OLAP and I just finished getting disappointed with performance on
2000
OA. I'm sure this must be something I did (or didn't do).

Now, O open excel and:
- drag branch to slice page and pick NY branch only
- drag weekendings to columns and choose only Q3 and Q4 of 2003
- drag Markup (calculated measure) to data page
- drag client to rows

So far, everything's blazingly fast. However, I now drag sales rep to rows
to the right of client... and kiss it goodbye. Not even 3 hours later
would
I see any results. Excel goes mad, taking 100% of CPU and using ram like
there is no tomorrow. The same query fired in MDX Sample Application,
though
sometimes (much) faster, is still far from what one would expect from
"precalculated" source.
As I said earlier, I tried "everything". I even changed the connection
string to Execution Location=3 and Default Isolation Mode=1. I just can't
understand why is performance so slow. Is there something wrong in a way I
designed the cube or.... or perhaps all of this is normal, is it?

Any idea would really be highly appreciated.

TIA

lc





Reply With Quote
  #3  
Old   
lc
 
Posts: n/a

Default Re: Need help with cube design please - 11-01-2003 , 08:49 PM




"Ray Higdon" <rayhigdon (AT) higdonconsulting (DOT) com> wrote

Quote:
- a transaction table, also used as fact table.

You should use DTS or use some other tool to place the table you want to
use
as your fact table in another database or preferably another server
Tried that with no visible performance benefit.

Quote:
Are all of those "columns" used in your cubes? If not, follow first advice
and eliminate any columns not needed in your destination
Will try that. Thanks for the advice.

Quote:
Should I create a copy of this table with only user and key fields in
it?

A fact table should consist of measures that are commiserate with the
granularity of your fact table and surrogate key connections to your dim
tables, this is a major performance benefit
Not sure if I understand correctly. My fact table (about 2 million records),
is structured (approximately) like this:
TransactionCode
WeekendingDate
SalesRepCode
ManagerCode
ClientCode
BranchCode
Cost
Billing
Commissions

+ I devise Markup as calculated field.

Quote:
I would highly recommend you go out and buy the book by Ralph Kimball
called
"the data warehouse lifecycle toolkit" Working on cube design without
knowing warehouse design can be tough, although very tempting as the cubes
can be where you get your business ROI.
Looks very interesting. Thanks for the tip and thanks for your time.

lc




Reply With Quote
  #4  
Old   
Thomas Ivarsson
 
Posts: n/a

Default Re: Need help with cube design please - 11-02-2003 , 11:35 AM



I am not sure of all your design choices but I think I can make some general
suggestions.

When you add a lot of dimensions to a cube you will make it more sparse,
which means it will build more and more empty combinations or cells. I am
not sure if you have 30 dimensions or less in this cube from what you
describe. The solution is to find dimensions that generate values, measures
that intersect, and avoid combinations that generate to many empty cells.
You have always the option to join separate physical cubes in virtual cubes.

When you crossjoin different dimensions at the leaf/lowest level (put them
on the same axis) you create a big performance problem. If you can wait for
reporting services for SQL Server 2000 you can join leaf members in a
SQL-group by clause and make more efficient reports in this way. In
relational databases you will only get back combinations(from dimensions and
the fact table) that have values, not empty ones.

Aggregations are always done from the top(ie all member level in each
dimension) and then downwards to more detailed levels, depending on your
degree of aggregations. I recommend to start with 10-20 percent aggregation.
I do not think that aggregation is an issue here.

HTH
Thomas Ivarsson, BI consultant Malmö Sweden

"lc" <lcdata (AT) hotmail (DOT) com> wrote

Quote:
I give up. I tried everything I thought I knew and the damn thing is still
way too slow. I would highly appreciate if you someone could give me a few
moments of your time to go through this (lengthy) question:

Here's what I have:
- a transaction table, also used as fact table. The table has about 200
fields. Of those, I use perhaps 30 for linking in dimensions and <10 for
measures. Is this a smart thing to do? Should I create a copy of this
table
with only user and key fields in it?
- each transaction record is stamped with a weekending date, sales rep
code,
client code, contact code, manager code, branch code and type of business
(3
elements in corresponding dimension).
- I created one dimension for sales rep, one for branch and one for
business
type
- the client dimension has two levels, company name and contact name
- finally, I created a time dimension for weekending date

The cube is MOLAP and was processed with 50% of optimization. I also tried
with 100% with no change in performance. Originally, the cube was done on
SQL7 OLAP and I just finished getting disappointed with performance on
2000
OA. I'm sure this must be something I did (or didn't do).

Now, O open excel and:
- drag branch to slice page and pick NY branch only
- drag weekendings to columns and choose only Q3 and Q4 of 2003
- drag Markup (calculated measure) to data page
- drag client to rows

So far, everything's blazingly fast. However, I now drag sales rep to rows
to the right of client... and kiss it goodbye. Not even 3 hours later
would
I see any results. Excel goes mad, taking 100% of CPU and using ram like
there is no tomorrow. The same query fired in MDX Sample Application,
though
sometimes (much) faster, is still far from what one would expect from
"precalculated" source.
As I said earlier, I tried "everything". I even changed the connection
string to Execution Location=3 and Default Isolation Mode=1. I just can't
understand why is performance so slow. Is there something wrong in a way I
designed the cube or.... or perhaps all of this is normal, is it?

Any idea would really be highly appreciated.

TIA

lc





Reply With Quote
  #5  
Old   
lc
 
Posts: n/a

Default Re: Need help with cube design please - 11-02-2003 , 05:33 PM




"Thomas Ivarsson" <TI_Nospam (AT) hotmail (DOT) com> wrote

Quote:
I am not sure of all your design choices but I think I can make some
general
suggestions.
Me neither

Quote:
that intersect, and avoid combinations that generate to many empty cells.
You have always the option to join separate physical cubes in virtual
cubes.

I, sort of, sensed this might be an issue and I went ahead and cut them down
to a smaller number. Either way I see it, I have to come up with 12
dimensions. Among these, I can't favor any single one nor can I afford to
omit it as all of them are frequently used.

While trying to figure out what might be the problem, I removed all
dimensions but 5. One Time dimension, two standard one-level (code to code)
and two two-level standard dimensions. I still see no visible improvement.
One thing is for sure, when dimensions with small number of elements are
involved, everything is very fast. Client, however, is a dimension that
consists of 10000 elements. When used in any kind of cross-join, client
slows down to a crawl. I **am** limiting number of clients to show down to
hundreds (sliced by Company and limited to Time Q4 of 2003) but this doesn't
seem to speed things up.


Quote:
When you crossjoin different dimensions at the leaf/lowest level (put them
on the same axis) you create a big performance problem.
And here is where I stand in shock. I though that's what OLAP is all about.
I can generate same outputs from my RDB in a nick of time. Why would this be
so complicated for OLAP/AS. In concept, it's the same thing: give me this
and that where this is in (list) and that > whatever and otherthing....
Is PTS combined with AS a wrong choice? I'm confused. Sorry.

Quote:
Aggregations are always done from the top(ie all member level in each
dimension) and then downwards to more detailed levels, depending on your
degree of aggregations. I recommend to start with 10-20 percent
aggregation.
I do not think that aggregation is an issue here.
Probably not as I tried with option "when performance gain reaches 100%" and
I haven't seen any speed improvement.

Please don't anyone get me wrong, I do understand that I'm not familiar with
data warehousing theory/practise, but my 20 years with databases in general
say something's not right here. Having dimensions that join with fact table
on code-to-code - that's all that I did. What kind of theory doesn't lead to
this kind of structure. I'm not going for anything obscure - no star
schemas, no dynamic dimensions/cubes... nothing fancy; and yet...

Thanks for your help and your time. Very appreciated.

lc




Reply With Quote
  #6  
Old   
lc
 
Posts: n/a

Default Re: Need help with cube design please - 11-02-2003 , 06:05 PM




Oh, yes, wanted to tell what else I did. I tried with a custom DB query that
returns the same number of elements that would be included in the PT and
placed the results on one sheet. On another sheet, I created a PT drawing
from these results. The whole operation took under 1 minute. Isn't that
wild? Well, it is to me but, then again, I must be missing something here.

Thanks a lot.

lc



Reply With Quote
  #7  
Old   
Ray Higdon
 
Posts: n/a

Default Re: Need help with cube design please - 11-02-2003 , 06:19 PM



Can you post your DDL for the tables you are using for your cubes? In one
post you mentioned a transactional table with more than 200 columns, in
another you list a fact table with only measures and foreign key references,
in another you mention you do not use a star schema but you say your fact
table links to the dimension tables by a code, it would help to see the
table layouts for the tables you use in your cubing.

--
Ray Higdon MCSE, MCDBA, CCNA
---
"lc" <lcdata (AT) hotmail (DOT) com> wrote

Quote:
"Thomas Ivarsson" <TI_Nospam (AT) hotmail (DOT) com> wrote in message
news:O0IwQ9VoDHA.2964 (AT) tk2msftngp13 (DOT) phx.gbl...
I am not sure of all your design choices but I think I can make some
general
suggestions.

Me neither

that intersect, and avoid combinations that generate to many empty
cells.
You have always the option to join separate physical cubes in virtual
cubes.

I, sort of, sensed this might be an issue and I went ahead and cut them
down
to a smaller number. Either way I see it, I have to come up with 12
dimensions. Among these, I can't favor any single one nor can I afford to
omit it as all of them are frequently used.

While trying to figure out what might be the problem, I removed all
dimensions but 5. One Time dimension, two standard one-level (code to
code)
and two two-level standard dimensions. I still see no visible improvement.
One thing is for sure, when dimensions with small number of elements are
involved, everything is very fast. Client, however, is a dimension that
consists of 10000 elements. When used in any kind of cross-join, client
slows down to a crawl. I **am** limiting number of clients to show down to
hundreds (sliced by Company and limited to Time Q4 of 2003) but this
doesn't
seem to speed things up.


When you crossjoin different dimensions at the leaf/lowest level (put
them
on the same axis) you create a big performance problem.

And here is where I stand in shock. I though that's what OLAP is all
about.
I can generate same outputs from my RDB in a nick of time. Why would this
be
so complicated for OLAP/AS. In concept, it's the same thing: give me this
and that where this is in (list) and that > whatever and otherthing....
Is PTS combined with AS a wrong choice? I'm confused. Sorry.

Aggregations are always done from the top(ie all member level in each
dimension) and then downwards to more detailed levels, depending on your
degree of aggregations. I recommend to start with 10-20 percent
aggregation.
I do not think that aggregation is an issue here.

Probably not as I tried with option "when performance gain reaches 100%"
and
I haven't seen any speed improvement.

Please don't anyone get me wrong, I do understand that I'm not familiar
with
data warehousing theory/practise, but my 20 years with databases in
general
say something's not right here. Having dimensions that join with fact
table
on code-to-code - that's all that I did. What kind of theory doesn't lead
to
this kind of structure. I'm not going for anything obscure - no star
schemas, no dynamic dimensions/cubes... nothing fancy; and yet...

Thanks for your help and your time. Very appreciated.

lc





Reply With Quote
  #8  
Old   
lc
 
Posts: n/a

Default Re: Need help with cube design please - 11-02-2003 , 06:47 PM




"Ray Higdon" <rayhigdon (AT) higdonconsulting (DOT) com> wrote

Quote:
Can you post your DDL for the tables you are using for your cubes? In one
post you mentioned a transactional table with more than 200 columns, in
another you list a fact table with only measures and foreign key
references,
in another you mention you do not use a star schema but you say your fact
table links to the dimension tables by a code, it would help to see the
table layouts for the tables you use in your cubing.
Yes, sorry, I took your advice to eliminate from fact table everything
that's not needed. Also, I mentioned "no star schema" but I really meant
snowflake.

DDL: I'd feel much more at ease if I could email it to you instead of
posting it here. Do you mind?

Thanks a lot.

lc




Reply With Quote
  #9  
Old   
lc
 
Posts: n/a

Default Re: Need help with cube design please - 11-02-2003 , 07:18 PM




Okay, changed names a bit; now I feel comfortable to post it here (not that
it's a NASA secret or anything . Does this help? This is a scaled down
versiion of the fact table. Cross-join Client and SalesRep is still a 4 hour
job.

Thanks a lot for your time.

lc


Transactions Fact Table - around million records
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TransactionCode int PK
ClientCode char 7
CompanyCode char 3
BranchCode char 3
SalesRepCode char 5
WEDate datetime
Cost float
Billing float
Commissions float


Clients Standard Dimension - around 10000 elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ClientCode char 7 PK
ClientName char 50

Companies Standard Dimension - 5 elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
CompanyCode char 3 PK

Branches Standard Dimension - 13 elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BranchCode char 3 PK

SalesReps Standard Dimension - around 1200 elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SalesRepCode char 5 PK

WEDates Time Dimension - Year, Quarter, Month, Day - since 1999
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^
WEDate datetime PK



Reply With Quote
  #10  
Old   
Ray Higdon
 
Posts: n/a

Default Re: Need help with cube design please - 11-02-2003 , 08:13 PM



From the DDL I assume you are using your application (Natural) keys to
link from the fact table to the dim tables. You would get a performance
boost if you took it one more step and used all surrogate (identity)
keys to link instead of char columns. This is not that bad to do, you
use what you have as a staging area and then generate your dim tables
with the same info but with an integer identity and store that linking
integer column in your fact table.

Ray Higdon MCSE, MCDBA, CCNA

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.