dbTalk Databases Forums  

Newbie Questions About Surrogate Keys and SCD

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Newbie Questions About Surrogate Keys and SCD in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Gummy
 
Posts: n/a

Default Newbie Questions About Surrogate Keys and SCD - 05-20-2006 , 10:00 PM






Hello,

I am new to data warehousing and trying to get my head around two issues:

My understanding is that everything should use a surrogate key, not the
"natural" one that comes from the source system. For example, I have a fact
table of customers and it has a Product Key on it. That Product links to the
Product Dimension table. How do I convert that Product Key to a surrogate
key in both the fact table and the dimension table so they remain linked? Do
I just have a lookup or association table that does the conversion? I assume
that means I would need an association table for every dimension I add to
the database.

Next question:

I am going to have a fact table that uses Slowly Changing Dimensions for
employees. These employees have many actions that can occur - Hired, Fired,
Transfered, Promoted, etc. I will also include, on the fact table the Start
Date, End Date of the action and a flag for the Last Action.
How exactly do I query this fact table to figure out who is currently
active? I have to assume that if your last action was Fired then you are no
longer active. However, I am concerned that in our system you could be
fired, but then another event can take place like you received Cobra money
or some sort of pension action occurred.
In general, I am not clear how you could figure out how to query the data.
An example would be, give me the active headcount at the beginning of the
month, or who terminated between October and December.


Thank you for the help. I think this stuff has some amazing potential (as
soon as I understand it better).


p.s. Sorry about the cross-post, I found this newsgroup after positing the
question and this NG seemed more appropriate.



Reply With Quote
  #2  
Old   
Tim Dot NoSpam
 
Posts: n/a

Default Re: Newbie Questions About Surrogate Keys and SCD - 05-21-2006 , 05:26 PM






For surrogate keys, you'll tyipcally use an autoincrementing int or bigint.
Your business key column is carried over from the source system as a column
in your dimension.

If for example, you married two tables from a source system, you'd most
likely have the business key from TableA and the business key from TableB as
attributes in the Dimension table. Your fact table would contain the
surrogate key that corresponds with the business key(s).

For SCDs, your fact table does not participate in the SCD except to have a
key value to point to the appropriate row in the dimension. In your example
below with employees, your employee dimension, depending on the type of
slowly changing dimension you implement, will have the employee information
in it.

Fact tables tend to have additive or semi-additive attributes in them along
with the key values from the dimension tables.

I suggest that you get a couple books before you dive headfirst into this.
I've been working with relational databases for about 12 years and I will
admit that dimensional modeling is a slightly different animal. There are a
couple of books by Ralph Kimball (and friends), one is called the "Data
Warehouse Toolkit" and there is a second I can't remember the title, but
they both cover the basics. I think, from what you're describing that you
need an employee dimension that has history preservation. That way, if you
look at the fact table across time, the join to the employee table will
produce the correct results (the employee key will be different for an
employee who was hired Q1 and fired Q2).

"Gummy" <gumbatman (AT) hotmail (DOT) com> wrote

Quote:
Hello,

I am new to data warehousing and trying to get my head around two issues:

My understanding is that everything should use a surrogate key, not the
"natural" one that comes from the source system. For example, I have a
fact
table of customers and it has a Product Key on it. That Product links to
the
Product Dimension table. How do I convert that Product Key to a surrogate
key in both the fact table and the dimension table so they remain linked?
Do
I just have a lookup or association table that does the conversion? I
assume
that means I would need an association table for every dimension I add to
the database.

Next question:

I am going to have a fact table that uses Slowly Changing Dimensions for
employees. These employees have many actions that can occur - Hired,
Fired,
Transfered, Promoted, etc. I will also include, on the fact table the
Start
Date, End Date of the action and a flag for the Last Action.
How exactly do I query this fact table to figure out who is currently
active? I have to assume that if your last action was Fired then you are
no
longer active. However, I am concerned that in our system you could be
fired, but then another event can take place like you received Cobra money
or some sort of pension action occurred.
In general, I am not clear how you could figure out how to query the data.
An example would be, give me the active headcount at the beginning of the
month, or who terminated between October and December.


Thank you for the help. I think this stuff has some amazing potential (as
soon as I understand it better).


p.s. Sorry about the cross-post, I found this newsgroup after positing the
question and this NG seemed more appropriate.




Reply With Quote
  #3  
Old   
Gummy
 
Posts: n/a

Default Re: Newbie Questions About Surrogate Keys and SCD - 05-23-2006 , 10:37 PM



Tim,

Thank you so much for the very detailed and clear answer.

I've worked with relational databases for a few years and this dimension
stuff is new to me but very interesting.

I happened to order the Kimball book right before this post. It is a great
read (if your into this stuff) and I appreciate your recommendations.

Thanks again.

-Gummy

"Tim Dot NoSpam" <Tim.NoSpam (AT) hughes (DOT) net> wrote

Quote:
For surrogate keys, you'll tyipcally use an autoincrementing int or
bigint. Your business key column is carried over from the source system as
a column in your dimension.

If for example, you married two tables from a source system, you'd most
likely have the business key from TableA and the business key from TableB
as attributes in the Dimension table. Your fact table would contain the
surrogate key that corresponds with the business key(s).

For SCDs, your fact table does not participate in the SCD except to have a
key value to point to the appropriate row in the dimension. In your
example below with employees, your employee dimension, depending on the
type of slowly changing dimension you implement, will have the employee
information in it.

Fact tables tend to have additive or semi-additive attributes in them
along with the key values from the dimension tables.

I suggest that you get a couple books before you dive headfirst into this.
I've been working with relational databases for about 12 years and I will
admit that dimensional modeling is a slightly different animal. There are
a couple of books by Ralph Kimball (and friends), one is called the "Data
Warehouse Toolkit" and there is a second I can't remember the title, but
they both cover the basics. I think, from what you're describing that you
need an employee dimension that has history preservation. That way, if
you look at the fact table across time, the join to the employee table
will produce the correct results (the employee key will be different for
an employee who was hired Q1 and fired Q2).

"Gummy" <gumbatman (AT) hotmail (DOT) com> wrote in message
news:126vlu9s16slab0 (AT) corp (DOT) supernews.com...
Hello,

I am new to data warehousing and trying to get my head around two issues:

My understanding is that everything should use a surrogate key, not the
"natural" one that comes from the source system. For example, I have a
fact
table of customers and it has a Product Key on it. That Product links to
the
Product Dimension table. How do I convert that Product Key to a surrogate
key in both the fact table and the dimension table so they remain linked?
Do
I just have a lookup or association table that does the conversion? I
assume
that means I would need an association table for every dimension I add to
the database.

Next question:

I am going to have a fact table that uses Slowly Changing Dimensions for
employees. These employees have many actions that can occur - Hired,
Fired,
Transfered, Promoted, etc. I will also include, on the fact table the
Start
Date, End Date of the action and a flag for the Last Action.
How exactly do I query this fact table to figure out who is currently
active? I have to assume that if your last action was Fired then you are
no
longer active. However, I am concerned that in our system you could be
fired, but then another event can take place like you received Cobra
money
or some sort of pension action occurred.
In general, I am not clear how you could figure out how to query the
data.
An example would be, give me the active headcount at the beginning of the
month, or who terminated between October and December.


Thank you for the help. I think this stuff has some amazing potential (as
soon as I understand it better).


p.s. Sorry about the cross-post, I found this newsgroup after positing
the question and this NG seemed more appropriate.






Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.