dbTalk Databases Forums  

Dimension design question

comp.databases comp.databases


Discuss Dimension design question in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
raidken@yahoo.com
 
Posts: n/a

Default Dimension design question - 06-27-2007 , 05:34 PM






When creating dimensions how is it determined if dimension should be
broken into two or more distinct dimensions. What are the cons of
breaking up a single dimension into multiple dimensions? Let's say for
a Date Time dimension, what is the downside of breaking down the
dimensions into two separate date and time dimensions. Related to this
should any other considerations be made when the data in the data
warehouse will eventually make it's way into a olap cube.
Here's the skull of the schema-

Dim_Date
Date ID
Year
....
Day

Dim_Time
Time ID
Hours
Minutes
Seconds

Fact_
time id
Date id

Thank you in advance.

Ken


Reply With Quote
  #2  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Dimension design question - 06-28-2007 , 01:59 PM






On Jun 27, 6:34 pm, raid... (AT) yahoo (DOT) com wrote:
Quote:
When creating dimensions how is it determined if dimension should be
broken into two or more distinct dimensions. What are the cons of
breaking up a single dimension into multiple dimensions? Let's say for
a Date Time dimension, what is the downside of breaking down the
dimensions into two separate date and time dimensions. Related to this
should any other considerations be made when the data in the data
warehouse will eventually make it's way into a olap cube.
Here's the skull of the schema-

Dim_Date
Date ID
Year
...
Day

Dim_Time
Time ID
Hours
Minutes
Seconds

Fact_
time id
Date id

Thank you in advance.

Ken
Why the ID's???? if attributes of a fact include date and time, then
just include date and time. you are making things more likely to break
and harder to verify data correctness with the structure you
presented. So wliminating those IDs the question is boils down to the
fundamental SL datatypes: DATE, TIME, TIMESTAMP

If I may rephrase your question:
Assuming that a Fact needs Date and time, when is it better to use
DATE and TIME versus TIMESTAMP?

I think using separate data and time reflects a closer connection
between the table and the data entry application. If users manually
enter the information, it may be easiest to keep them separate as two
attributes for the Fact entity. OTOH, if you are doing any interval
calculations, then you might be betteroff using one TIMESTAMP
attribute. That's my high in the sky level view of this issue.

The correct answer of course depends on the business needs, which you
did not mention.

HTH,
ed



Reply With Quote
  #3  
Old   
Axel Hallez
 
Posts: n/a

Default Re: Dimension design question - 07-02-2007 , 03:53 AM



Ed Prochak wrote:
Quote:
On Jun 27, 6:34 pm, raid... (AT) yahoo (DOT) com wrote:
When creating dimensions how is it determined if dimension should be
broken into two or more distinct dimensions. What are the cons of
breaking up a single dimension into multiple dimensions? Let's say for
a Date Time dimension, what is the downside of breaking down the
dimensions into two separate date and time dimensions. Related to this
should any other considerations be made when the data in the data
warehouse will eventually make it's way into a olap cube.
Here's the skull of the schema-

Dim_Date
Date ID
Year
...
Day

Dim_Time
Time ID
Hours
Minutes
Seconds

Fact_
time id
Date id

Thank you in advance.

Ken

Why the ID's???? if attributes of a fact include date and time, then
just include date and time. you are making things more likely to break
and harder to verify data correctness with the structure you
presented. So wliminating those IDs the question is boils down to the
fundamental SL datatypes: DATE, TIME, TIMESTAMP
These would be valid claims for an operational DB, but the OP's question
is about data warehouse design. In this case the validity of the data is
(should be) checked either in the source system or in the ETL process
before the data enters the data warehouse.
In a data warehouse derived attributes are typically calculated during
the ETL process and stored as dimension attributes, so that these
calculations don't have to be redone for every query.

Splitting the time dimension into a date and an hour of the day
dimension is typically done to limit the dimension size. Whether this
limits the options for analysis depends on the flexibility of the query
tools. I would think that this doesn't create problems in most
environments.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.