dbTalk Databases Forums  

Dimension table VS Dimension from fact table

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Dimension table VS Dimension from fact table in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Suchi
 
Posts: n/a

Default Dimension table VS Dimension from fact table - 11-10-2004 , 10:53 AM






Hi All,

I am new to MS Analysis services. We have to develop cube for our client.

The cube is developed by someone and i have to do some modifications.
There is one dimension "xx" from fact table.The frequency of updates is high
and no of records are less(around 30,000).

Shall we create new dimension table for this "xx" or dimension from the fact
table is okay? What is the difference between the two? When do we need to go
for the one? If it is seperate dimension table, is it static or changing?
Please advice me on this.

Thanks in advance
Suchi




Reply With Quote
  #2  
Old   
Dave Wickert [MSFT]
 
Posts: n/a

Default Re: Dimension table VS Dimension from fact table - 11-10-2004 , 11:07 AM






While this can sometimes be a religious debate, I think that most folks will
say that having an independent dimension table is extremely important.
First, it allows you to capture metadata about the dimension which you are
probably not carrying in the fact table. For example, an independent
dimension table for "Time" would have additional information on weekend,
holidays, etc. -- you probably don't carry that in your fact table --
normally you would have just a timestamp. Second, it allows you to have
members without matching data. Sometimes this can be important. Again, using
Time as an example, you might not have sales over the weekend, but you may
need to include the member itself during your analysis. Lastly and very
importantly, a dimension based on the fact table does not scale. I work
with customer having billions and billions of rows in their fact tables --
having to do dimension processing based on those structures just isn't
practical. We have techniques to scale fact tables that large, e.g.
partitions -- but we don't have similar techniques for dimension tables.

Stick with independent dimension tables. In the long run, you will be much
happier.

Hope that helps.
--
Dave Wickert [MSFT]
dwickert (AT) online (DOT) microsoft.com
Program Manager
BI SystemsTeam
SQL BI Product Unit (Analysis Services)
--
This posting is provided "AS IS" with no warranties, and confers no rights.

"Suchi" <Suchi (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hi All,

I am new to MS Analysis services. We have to develop cube for our client.

The cube is developed by someone and i have to do some modifications.
There is one dimension "xx" from fact table.The frequency of updates is
high
and no of records are less(around 30,000).

Shall we create new dimension table for this "xx" or dimension from the
fact
table is okay? What is the difference between the two? When do we need to
go
for the one? If it is seperate dimension table, is it static or changing?
Please advice me on this.

Thanks in advance
Suchi






Reply With Quote
  #3  
Old   
kurb
 
Posts: n/a

Default Re: Dimension table VS Dimension from fact table - 11-10-2004 , 04:16 PM



Hello
I am newbie as well, still gathering my resource material
I am unclear about the criteria for fact table

In my situation I have a set of time stamped input data for each Item,
from which I generate gobs of Stats data for many variations of (sets of
items), each for many variations of time attributes (aggregation past N
days, per day of week, per month of year etc etc). And I'll have stored
procedures for calculating many of the statistical data derived from the
raw data.

I can see having

1. One table for each Item (ItemID,DateID, data1,dat2...) for the
raw data
2. One table for each Stats(StatID,DateID,ItemCategoryID, time1,
time2, ) for calculated stat for each time sensitivity, for each
Item associated to the ItemCategryID
3. Dimension for Items with Levels for different categories of Items
4. Dimension for Stats with Levels for different categories of Stats
5. Dimension for Time with Levels for different categories of Time

Now lets say I have a 100 Items and 100 Stats this would give me 100 x
100 = 10000 Tables for Stats.
Am I losing it here? Should I be reducing the number of Tables?

Thanks for any suggestions, or recommended resource materials
Kurb

Dave Wickert [MSFT] wrote:

Quote:
While this can sometimes be a religious debate, I think that most folks will
say that having an independent dimension table is extremely important.
First, it allows you to capture metadata about the dimension which you are
probably not carrying in the fact table. For example, an independent
dimension table for "Time" would have additional information on weekend,
holidays, etc. -- you probably don't carry that in your fact table --
normally you would have just a timestamp. Second, it allows you to have
members without matching data. Sometimes this can be important. Again, using
Time as an example, you might not have sales over the weekend, but you may
need to include the member itself during your analysis. Lastly and very
importantly, a dimension based on the fact table does not scale. I work
with customer having billions and billions of rows in their fact tables --
having to do dimension processing based on those structures just isn't
practical. We have techniques to scale fact tables that large, e.g.
partitions -- but we don't have similar techniques for dimension tables.

Stick with independent dimension tables. In the long run, you will be much
happier.

Hope that helps.




Reply With Quote
  #4  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Dimension table VS Dimension from fact table - 11-11-2004 , 03:25 AM



Sounds like way to many...

You don' need a table in 1 for each item - you already have a itemid ins
this table that you can
join to a dimension table.

What you want to do dimension-wise needs to be controlled by this dimension.

The typical case is ONE fact tabel and 10-20 dimensions...

Then of course you can merge physical cubes into one total virtual cube - if
so desired.

/Michael V.


"kurb" <mompara (AT) sympatico (DOT) ca> wrote

Quote:
Hello
I am newbie as well, still gathering my resource material
I am unclear about the criteria for fact table

In my situation I have a set of time stamped input data for each Item,
from which I generate gobs of Stats data for many variations of (sets of
items), each for many variations of time attributes (aggregation past N
days, per day of week, per month of year etc etc). And I'll have stored
procedures for calculating many of the statistical data derived from the
raw data.

I can see having

1. One table for each Item (ItemID,DateID, data1,dat2...) for the
raw data
2. One table for each Stats(StatID,DateID,ItemCategoryID, time1,
time2, ) for calculated stat for each time sensitivity, for each
Item associated to the ItemCategryID
3. Dimension for Items with Levels for different categories of Items
4. Dimension for Stats with Levels for different categories of Stats
5. Dimension for Time with Levels for different categories of Time

Now lets say I have a 100 Items and 100 Stats this would give me 100 x
100 = 10000 Tables for Stats.
Am I losing it here? Should I be reducing the number of Tables?

Thanks for any suggestions, or recommended resource materials
Kurb

Dave Wickert [MSFT] wrote:

While this can sometimes be a religious debate, I think that most folks
will
say that having an independent dimension table is extremely important.
First, it allows you to capture metadata about the dimension which you
are
probably not carrying in the fact table. For example, an independent
dimension table for "Time" would have additional information on weekend,
holidays, etc. -- you probably don't carry that in your fact table --
normally you would have just a timestamp. Second, it allows you to have
members without matching data. Sometimes this can be important. Again,
using
Time as an example, you might not have sales over the weekend, but you
may
need to include the member itself during your analysis. Lastly and very
importantly, a dimension based on the fact table does not scale. I work
with customer having billions and billions of rows in their fact
tables --
having to do dimension processing based on those structures just isn't
practical. We have techniques to scale fact tables that large, e.g.
partitions -- but we don't have similar techniques for dimension tables.

Stick with independent dimension tables. In the long run, you will be
much
happier.

Hope that helps.






Reply With Quote
  #5  
Old   
kurb
 
Posts: n/a

Default Re: Modeling Time Series Application - 11-11-2004 , 10:04 AM



Thank you, appreciate your response.

The reason I have one table for each Item is as follows:

Actually each Item is a Stock Symbol with daily/hourly price data. User
likes to drill/aggregate/calculate data within each Symbol, and across
sets of Symbols for various combinations of Time attribute and Stats
attrubutes

I want to keep all the data rows of data for a specific Symbol in strict
day/time sequence. This way I can easily calculate relative price
changes (which I have to do tons of) by simply counting relative row
numbers ieg 20 days ago is 20 rows down from the current row. If I mix
all the rows of data for all Symbols in one table it's diificult to
maintain the time series for a particular Symbol (new data for each
Symbol is coming in all the time). Alternatively I can calculate the
date of 20 business days ago each time. but I imagine this is time
consuming.

Also the daily raw data that gets plugged into the cube is available
from the source on a per Symbol/TimeSeries basis. Also the data may be
charted on a per Symbol/TimeSeries and it would be easier to haul out
the data if the Cube is structured for that.

If I have a Cube for every Symbol then I'll end up several Virtual Cubes
each having hundreds of Cubes which I imagine will not be manageable. I
could have a single Stats table for every Symbol. with
Stats(SymbolID,Date/TimeOfDayID, StatsID, time1, time2,....time100). So
if I have 100 different Stats to be applied to 10 pieces of raw data
(Date/TimeOfDayID) per day for 100 different Time attributes, then I'll
have 1000 = 10 x100 (for each Stat) rows of data X 100 columns of
data (for each Time attribute) per day. For 1 year I'll have 250 x 1000
= 250 000 rows of data. For 10 Years I'll have 2.5 million rows per Symbol.

Does this approach make more sense? Will really appreciate any
suggestions/ideas to examine.

Thanks
Kurb


Michael Vardinghus wrote:

Quote:
Sounds like way to many...

You don' need a table in 1 for each item - you already have a itemid ins
this table that you can
join to a dimension table.

What you want to do dimension-wise needs to be controlled by this dimension.

The typical case is ONE fact tabel and 10-20 dimensions...

Then of course you can merge physical cubes into one total virtual cube - if
so desired.

/Michael V.


"kurb" <mompara (AT) sympatico (DOT) ca> wrote in message
news:5rwkd.22978$Z7.772365 (AT) news20 (DOT) bellglobal.com...


Hello
I am newbie as well, still gathering my resource material
I am unclear about the criteria for fact table

In my situation I have a set of time stamped input data for each Item,
from which I generate gobs of Stats data for many variations of (sets of
items), each for many variations of time attributes (aggregation past N
days, per day of week, per month of year etc etc). And I'll have stored
procedures for calculating many of the statistical data derived from the
raw data.

I can see having

1. One table for each Item (ItemID,DateID, data1,dat2...) for the
raw data
2. One table for each Stats(StatID,DateID,ItemCategoryID, time1,
time2, ) for calculated stat for each time sensitivity, for each
Item associated to the ItemCategryID
3. Dimension for Items with Levels for different categories of Items
4. Dimension for Stats with Levels for different categories of Stats
5. Dimension for Time with Levels for different categories of Time

Now lets say I have a 100 Items and 100 Stats this would give me 100 x
100 = 10000 Tables for Stats.
Am I losing it here? Should I be reducing the number of Tables?

Thanks for any suggestions, or recommended resource materials
Kurb

Dave Wickert [MSFT] wrote:



While this can sometimes be a religious debate, I think that most folks


will


say that having an independent dimension table is extremely important.
First, it allows you to capture metadata about the dimension which you


are


probably not carrying in the fact table. For example, an independent
dimension table for "Time" would have additional information on weekend,
holidays, etc. -- you probably don't carry that in your fact table --
normally you would have just a timestamp. Second, it allows you to have
members without matching data. Sometimes this can be important. Again,


using


Time as an example, you might not have sales over the weekend, but you


may


need to include the member itself during your analysis. Lastly and very
importantly, a dimension based on the fact table does not scale. I work
with customer having billions and billions of rows in their fact


tables --


having to do dimension processing based on those structures just isn't
practical. We have techniques to scale fact tables that large, e.g.
partitions -- but we don't have similar techniques for dimension tables.

Stick with independent dimension tables. In the long run, you will be


much


happier.

Hope that helps.










Reply With Quote
  #6  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Modeling Time Series Application - 11-11-2004 , 01:34 PM



I must admit that I'm not that clever at MDX querying yet....to me it still sounds like you need one cube and then some kind
of MDX wizard (a person not a tool...) to help you construct the calculations you need.

It actually sounds like one of those situations that MDX was meant for.

However ... if you're calculations will depend on eachother in a more complex manner it may be the wrong way to go....

But when you mention rows it sounds like you're doing a lot outside AS and perhaps something can be done using AS to get
a "cleaner" architechture - but often it is putting something into the relational environment and something into AS .. because
MDX in AS is calculated on the fly - you cannot store calculated members as aggregates.

"kurb" <mompara (AT) sympatico (DOT) ca> wrote

Thank you, appreciate your response.

The reason I have one table for each Item is as follows:

Actually each Item is a Stock Symbol with daily/hourly price data. User likes to drill/aggregate/calculate data within each Symbol, and across sets of Symbols for various combinations of Time attribute and Stats attrubutes

I want to keep all the data rows of data for a specific Symbol in strict day/time sequence. This way I can easily calculate relative price changes (which I have to do tons of) by simply counting relative row numbers ieg 20 days ago is 20 rows down from the current row. If I mix all the rows of data for all Symbols in one table it's diificult to maintain the time series for a particular Symbol (new data for each Symbol is coming in all the time). Alternatively I can calculate the date of 20 business days ago each time. but I imagine this is time consuming.

Also the daily raw data that gets plugged into the cube is available from the source on a per Symbol/TimeSeries basis. Also the data may be charted on a per Symbol/TimeSeries and it would be easier to haul out the data if the Cube is structured for that.

If I have a Cube for every Symbol then I'll end up several Virtual Cubes each having hundreds of Cubes which I imagine will not be manageable. I could have a single Stats table for every Symbol. with Stats(SymbolID,Date/TimeOfDayID, StatsID, time1, time2,....time100). So if I have 100 different Stats to be applied to 10 pieces of raw data (Date/TimeOfDayID) per day for 100 different Time attributes, then I'll have 1000 = 10 x100 (for each Stat) rows of data X 100 columns of data (for each Time attribute) per day. For 1 year I'll have 250 x 1000 = 250 000 rows of data. For 10 Years I'll have 2.5 million rows per Symbol.

Does this approach make more sense? Will really appreciate any suggestions/ideas to examine.

Thanks
Kurb


Michael Vardinghus wrote:

Sounds like way to many...

You don' need a table in 1 for each item - you already have a itemid ins
this table that you can
join to a dimension table.

What you want to do dimension-wise needs to be controlled by this dimension.

The typical case is ONE fact tabel and 10-20 dimensions...

Then of course you can merge physical cubes into one total virtual cube - if
so desired.

/Michael V.


"kurb" <mompara (AT) sympatico (DOT) ca> wrote

Hello
I am newbie as well, still gathering my resource material
I am unclear about the criteria for fact table

In my situation I have a set of time stamped input data for each Item,
from which I generate gobs of Stats data for many variations of (sets of
items), each for many variations of time attributes (aggregation past N
days, per day of week, per month of year etc etc). And I'll have stored
procedures for calculating many of the statistical data derived from the
raw data.

I can see having

1. One table for each Item (ItemID,DateID, data1,dat2...) for the
raw data
2. One table for each Stats(StatID,DateID,ItemCategoryID, time1,
time2, ) for calculated stat for each time sensitivity, for each
Item associated to the ItemCategryID
3. Dimension for Items with Levels for different categories of Items
4. Dimension for Stats with Levels for different categories of Stats
5. Dimension for Time with Levels for different categories of Time

Now lets say I have a 100 Items and 100 Stats this would give me 100 x
100 = 10000 Tables for Stats.
Am I losing it here? Should I be reducing the number of Tables?

Thanks for any suggestions, or recommended resource materials
Kurb

Dave Wickert [MSFT] wrote:

While this can sometimes be a religious debate, I think that most folks
will
say that having an independent dimension table is extremely important.
First, it allows you to capture metadata about the dimension which you
are
probably not carrying in the fact table. For example, an independent
dimension table for "Time" would have additional information on weekend,
holidays, etc. -- you probably don't carry that in your fact table --
normally you would have just a timestamp. Second, it allows you to have
members without matching data. Sometimes this can be important. Again,
using
Time as an example, you might not have sales over the weekend, but you
may
need to include the member itself during your analysis. Lastly and very
importantly, a dimension based on the fact table does not scale. I work
with customer having billions and billions of rows in their fact
tables --
having to do dimension processing based on those structures just isn't
practical. We have techniques to scale fact tables that large, e.g.
partitions -- but we don't have similar techniques for dimension tables.

Stick with independent dimension tables. In the long run, you will be
much
happier.

Hope that helps.






Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.