![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I've done my homemade small data warehouse with Postgres, and a simple reporting tool. What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code. Is there really a performance gain in modern DBs? Or should I do that for conformance with query tools that I could use someday? The biggest dimension has 9000 rows, and about 2M rows for facts. Thanks! |
#3
| |||
| |||
|
|
But the latest price for the product is not as relevant as the price that was true on the date the sale took place. That means that, when the price changes, I don't want to update the price in the product row. Instead, I want to insert a new row with the same product and the new price. But if I'm using ProductID as the key to the product dimension, I can't do that. If I 'm using a surrogate key, I can. Of course, ProductID will be one of the attributes in a dimension table. |
#4
| |||
| |||
|
|
What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code. |
#5
| |||
| |||
|
|
What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code. These fake pointers are harmful. How do you verify and validate this magical number? |

#6
| |||
| |||
|
|
I've done my homemade small data warehouse with Postgres, and a simple reporting tool. What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code. Is there really a performance gain in modern DBs? Or should I do that for conformance with query tools that I could use someday? The biggest dimension has 9000 rows, and about 2M rows for facts. Thanks! |
#7
| |||
| |||
|
|
As a code programmer with some database experience, I think that surrogate keys are a natural for programmers. Surrogate keys are related to one of the central lessons of computer programming, related to the reason why we call our product 'soft' ware. |
|
As such I expect to see surrogate keys overused anywhere that code programmers are allowed anywhere near to database schemas. Since people with code programming experience are frequently involved in schema design, I think that you can always expect that there will be discussion about the use of surrogate keys, |
|
even when you are discussing data warehousing, but for data warehousing you just have to learn to ignore it. |
![]() |
| Thread Tools | |
| Display Modes | |
| |