dbTalk Databases Forums  

about star schema and surrogate keys

comp.databases comp.databases


Discuss about star schema and surrogate keys in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Marco Mariani
 
Posts: n/a

Default about star schema and surrogate keys - 05-04-2009 , 04:39 AM






I've done my homemade small data warehouse with Postgres, and a simple
reporting tool.

What do I gain by having a surrogate integer time_key in my table
dim_time (one row per day), as opposed to a DATE column?

The same question holds for other dimensions that could already have a
sound natural key, usually a short alphanumeric code.

Is there really a performance gain in modern DBs?
Or should I do that for conformance with query tools that I could use
someday?
The biggest dimension has 9000 rows, and about 2M rows for facts.

Thanks!

Reply With Quote
  #2  
Old   
Walter Mitty
 
Posts: n/a

Default Re: about star schema and surrogate keys - 05-04-2009 , 05:13 AM







"Marco Mariani" <marco (AT) sferacarta (DOT) com> wrote

Quote:
I've done my homemade small data warehouse with Postgres, and a simple
reporting tool.

What do I gain by having a surrogate integer time_key in my table dim_time
(one row per day), as opposed to a DATE column?

The same question holds for other dimensions that could already have a
sound natural key, usually a short alphanumeric code.

Is there really a performance gain in modern DBs?
Or should I do that for conformance with query tools that I could use
someday?
The biggest dimension has 9000 rows, and about 2M rows for facts.

Thanks!
I can think of one advantage for some dimensions. It allows the dimension
to be time varying.

For example, think of a star where the fact is a product sale. One of the
dimensions is product, and one of the attributes of product is price.
But the latest price for the product is not as relevant as the price that
was true on the date the sale took place. That means that, when the price
changes, I don't want to update the price in the product row. Instead, I
want to insert a new row with the same product and the new price. But if
I'm using ProductID as the key to the product dimension, I can't do that.
If I 'm using a surrogate key, I can. Of course, ProductID will be one of
the attributes in a dimension table.

This advantage does not occur in a date or time dimension.





Reply With Quote
  #3  
Old   
Marco Mariani
 
Posts: n/a

Default Re: about star schema and surrogate keys - 05-04-2009 , 06:55 AM



Walter Mitty wrote:


Quote:
But the latest price for the product is not as relevant as the price that
was true on the date the sale took place. That means that, when the price
changes, I don't want to update the price in the product row. Instead, I
want to insert a new row with the same product and the new price. But if
I'm using ProductID as the key to the product dimension, I can't do that.
If I 'm using a surrogate key, I can. Of course, ProductID will be one of
the attributes in a dimension table.
I already use a surrogate in cases like that. So it seems like there is
nothing I overlooked.

thank you!


Reply With Quote
  #4  
Old   
--CELKO--
 
Posts: n/a

Default Re: about star schema and surrogate keys - 05-05-2009 , 01:27 PM



Quote:
What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code.
These fake pointers are harmful. How do you verify and validate this
magical number? At best it is redundant. We live with 64 bit, multi-
core hardware, not 16 bit single processor 1970's minicomputers.


Reply With Quote
  #5  
Old   
Marco Mariani
 
Posts: n/a

Default Re: about star schema and surrogate keys - 05-06-2009 , 05:47 AM



--CELKO-- wrote:

Quote:
What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code.

These fake pointers are harmful. How do you verify and validate this
magical number?
I am not using the magical number right now in the context I described.
You don't need to convince me about using the "real stuff", you and
others already did a long time ago, and I thank you for that.

But I still need to understand which of the practices I use for my
day-to-day invoice stuff and contract management.... which practices
still make sense in the denormalized, zero-gravity platypus-shaped world
of my toy datawarehouse, and which ones don't.

I don't need to worry about verifying and validating anything, beyond
the ETL procedure, because the DWH is small enough that I drop and
recreate everything every night. It's never updated by anyone.

My main concern here is to have fast queries, and to avoid clumsiness in
the building process since I am not an expert and debugging that stuff
is painful at times.
I'm sure the next one will be cleaner



Reply With Quote
  #6  
Old   
david
 
Posts: n/a

Default Re: about star schema and surrogate keys - 06-07-2009 , 05:32 AM



As a code programmer with some database experience,
I think that surrogate keys are a natural for programmers.
Surrogate keys are related to one of the central lessons
of computer programming, related to the reason why we
call our product 'soft' ware.

As such I expect to see surrogate keys overused
anywhere that code programmers are allowed anywhere
near to database schemas. Since people with code
programming experience are frequently involved in schema
design, I think that you can always expect that there
will be discussion about the use of surrogate keys, even
when you are discussing data warehousing, but for
data warehousing you just have to learn to ignore it.

(david)


"Marco Mariani" <marco (AT) sferacarta (DOT) com> wrote

Quote:
I've done my homemade small data warehouse with Postgres, and a simple
reporting tool.

What do I gain by having a surrogate integer time_key in my table dim_time
(one row per day), as opposed to a DATE column?

The same question holds for other dimensions that could already have a
sound natural key, usually a short alphanumeric code.

Is there really a performance gain in modern DBs?
Or should I do that for conformance with query tools that I could use
someday?
The biggest dimension has 9000 rows, and about 2M rows for facts.

Thanks!

Reply With Quote
  #7  
Old   
Marco Mariani
 
Posts: n/a

Default Re: about star schema and surrogate keys - 06-08-2009 , 05:08 AM



david wrote:


Quote:
As a code programmer with some database experience,
I think that surrogate keys are a natural for programmers.
Surrogate keys are related to one of the central lessons
of computer programming, related to the reason why we
call our product 'soft' ware.
And that central lesson would be... ?


Quote:
As such I expect to see surrogate keys overused
anywhere that code programmers are allowed anywhere
near to database schemas. Since people with code
programming experience are frequently involved in schema
design, I think that you can always expect that there
will be discussion about the use of surrogate keys,
I fail to see the distinction between "code programmers" and "database
programmers". But I understand it would have made sense on mainframes.


Quote:
even when you are discussing data warehousing, but for
data warehousing you just have to learn to ignore it.
My question was related to a specific set-up. I didn't mean to open that
can of dead horses again.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.