![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hello, I am new to data warehousing and trying to get my head around two issues: My understanding is that everything should use a surrogate key, not the "natural" one that comes from the source system. For example, I have a fact table of customers and it has a Product Key on it. That Product links to the Product Dimension table. How do I convert that Product Key to a surrogate key in both the fact table and the dimension table so they remain linked? Do I just have a lookup or association table that does the conversion? I assume that means I would need an association table for every dimension I add to the database. Next question: I am going to have a fact table that uses Slowly Changing Dimensions for employees. These employees have many actions that can occur - Hired, Fired, Transfered, Promoted, etc. I will also include, on the fact table the Start Date, End Date of the action and a flag for the Last Action. How exactly do I query this fact table to figure out who is currently active? I have to assume that if your last action was Fired then you are no longer active. However, I am concerned that in our system you could be fired, but then another event can take place like you received Cobra money or some sort of pension action occurred. In general, I am not clear how you could figure out how to query the data. An example would be, give me the active headcount at the beginning of the month, or who terminated between October and December. Thank you for the help. I think this stuff has some amazing potential (as soon as I understand it better). p.s. Sorry about the cross-post, I found this newsgroup after positing the question and this NG seemed more appropriate. |
#3
| |||
| |||
|
|
For surrogate keys, you'll tyipcally use an autoincrementing int or bigint. Your business key column is carried over from the source system as a column in your dimension. If for example, you married two tables from a source system, you'd most likely have the business key from TableA and the business key from TableB as attributes in the Dimension table. Your fact table would contain the surrogate key that corresponds with the business key(s). For SCDs, your fact table does not participate in the SCD except to have a key value to point to the appropriate row in the dimension. In your example below with employees, your employee dimension, depending on the type of slowly changing dimension you implement, will have the employee information in it. Fact tables tend to have additive or semi-additive attributes in them along with the key values from the dimension tables. I suggest that you get a couple books before you dive headfirst into this. I've been working with relational databases for about 12 years and I will admit that dimensional modeling is a slightly different animal. There are a couple of books by Ralph Kimball (and friends), one is called the "Data Warehouse Toolkit" and there is a second I can't remember the title, but they both cover the basics. I think, from what you're describing that you need an employee dimension that has history preservation. That way, if you look at the fact table across time, the join to the employee table will produce the correct results (the employee key will be different for an employee who was hired Q1 and fired Q2). "Gummy" <gumbatman (AT) hotmail (DOT) com> wrote in message news:126vlu9s16slab0 (AT) corp (DOT) supernews.com... Hello, I am new to data warehousing and trying to get my head around two issues: My understanding is that everything should use a surrogate key, not the "natural" one that comes from the source system. For example, I have a fact table of customers and it has a Product Key on it. That Product links to the Product Dimension table. How do I convert that Product Key to a surrogate key in both the fact table and the dimension table so they remain linked? Do I just have a lookup or association table that does the conversion? I assume that means I would need an association table for every dimension I add to the database. Next question: I am going to have a fact table that uses Slowly Changing Dimensions for employees. These employees have many actions that can occur - Hired, Fired, Transfered, Promoted, etc. I will also include, on the fact table the Start Date, End Date of the action and a flag for the Last Action. How exactly do I query this fact table to figure out who is currently active? I have to assume that if your last action was Fired then you are no longer active. However, I am concerned that in our system you could be fired, but then another event can take place like you received Cobra money or some sort of pension action occurred. In general, I am not clear how you could figure out how to query the data. An example would be, give me the active headcount at the beginning of the month, or who terminated between October and December. Thank you for the help. I think this stuff has some amazing potential (as soon as I understand it better). p.s. Sorry about the cross-post, I found this newsgroup after positing the question and this NG seemed more appropriate. |
![]() |
| Thread Tools | |
| Display Modes | |
| |