dbTalk Databases Forums  

What are the design criteria for primary keys?

comp.databases.theory comp.databases.theory


Discuss What are the design criteria for primary keys? in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Bob Badour
 
Posts: n/a

Default What are the design criteria for primary keys? - 09-02-2010 , 11:20 PM






Choosing good *primary keys* and candidate keys is a vitally important
*database design* task--as much art as science. The design task has very
specific design criteria.

*What are the criteria?*


As an experiment, I asked the above question on StackOverflow.com having
first verified it wasn't already answered on the site. It is a very
important question that has a very simple and clear answer. I had
planned to offer a sizable bounty if nobody gave the correct answer
after the 1st day and answer it myself if nobody claimed the bounty.

http://tinyurl.com/ignorancetothefifth

Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."


I thought the question was clear enough. The answer, of course, is:
uniqueness, irreducibility, simplicity, stability and familiarity.

To anyone who uses that site as a resource, all I can say is: Caveat lector!

Reply With Quote
  #2  
Old   
Bob Badour
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-02-2010 , 11:58 PM






Quote:
Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."
Interesting, John Saunders is a Most Vociferous Person (MVP) and 3 of
the others are dotnet programmers. Ignorance continues to emanate from
Redmond--no surprise there, I guess.

Reply With Quote
  #3  
Old   
Brian
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 08:52 AM



On Sep 3, 12:20*am, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:
Quote:
Choosing good *primary keys* and candidate keys is a vitally important
*database design* task--as much art as science. The design task has very
specific design criteria.

*What are the criteria?*

As an experiment, I asked the above question on StackOverflow.com having
first verified it wasn't already answered on the site. It is a very
important question that has a very simple and clear answer. I had
planned to offer a sizable bounty if nobody gave the correct answer
after the 1st day and answer it myself if nobody claimed the bounty.

http://tinyurl.com/ignorancetothefifth

Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."

I thought the question was clear enough. The answer, of course, is:
uniqueness, irreducibility, simplicity, stability and familiarity.

To anyone who uses that site as a resource, all I can say is: Caveat lector!
The answer, of course, is:
uniqueness, irreducibility, familiarity, and if at all possible,
simplicity and stability.

There should be no reason to use surrogates, so I agree on
familiarity, but not every candidate key is simple, and while
stability is a worthwhile goal, it is not always achievable.

Reply With Quote
  #4  
Old   
Erwin
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 09:32 AM



On 3 sep, 15:52, Brian <br... (AT) selzer-software (DOT) com> wrote:
Quote:
On Sep 3, 12:20*am, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:





Choosing good *primary keys* and candidate keys is a vitally important
*database design* task--as much art as science. The design task has very
specific design criteria.

*What are the criteria?*

As an experiment, I asked the above question on StackOverflow.com having
first verified it wasn't already answered on the site. It is a very
important question that has a very simple and clear answer. I had
planned to offer a sizable bounty if nobody gave the correct answer
after the 1st day and answer it myself if nobody claimed the bounty.

http://tinyurl.com/ignorancetothefifth

Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."

I thought the question was clear enough. The answer, of course, is:
uniqueness, irreducibility, simplicity, stability and familiarity.

To anyone who uses that site as a resource, all I can say is: Caveat lector!

The answer, of course, is:
uniqueness, irreducibility, familiarity, and if at all possible,
simplicity and stability.

There should be no reason to use surrogates, so I agree on
familiarity, but not every candidate key is simple, and while
stability is a worthwhile goal, it is not always achievable.- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -
I'm pretty sure Bob didn't mean "100% stability" as a criterion for
either-or-not being a key.

Reply With Quote
  #5  
Old   
Bob Badour
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 09:46 AM



Brian wrote:

Quote:
On Sep 3, 12:20 am, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:

Choosing good *primary keys* and candidate keys is a vitally important
*database design* task--as much art as science. The design task has very
specific design criteria.

*What are the criteria?*

As an experiment, I asked the above question on StackOverflow.com having
first verified it wasn't already answered on the site. It is a very
important question that has a very simple and clear answer. I had
planned to offer a sizable bounty if nobody gave the correct answer
after the 1st day and answer it myself if nobody claimed the bounty.

http://tinyurl.com/ignorancetothefifth

Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."

I thought the question was clear enough. The answer, of course, is:
uniqueness, irreducibility, simplicity, stability and familiarity.

To anyone who uses that site as a resource, all I can say is: Caveat lector!

The answer, of course, is:
uniqueness, irreducibility, familiarity, and if at all possible,
simplicity and stability.
The criteria are what they are. One could just as easily rewrite them as
"simplicity, simplicy, familiarity, and if at all possible, uniqueness
and irreducibility" without really changing the meaning of anything.

They are a list of criteria that sometimes conflict requiring one to
make design tradeoffs.


Quote:
There should be no reason to use surrogates, so I agree on
familiarity, but not every candidate key is simple, and while
stability is a worthwhile goal, it is not always achievable.
I try not to should on anyone or anything; although, I did make a recent
exception.

All keys are surrogates. A natural key is merely a familiar surrogate.

While not every candidate key is simple, during database design, one
still chooses whether to express references with a compound candidate or
a simpler alternative.

Reply With Quote
  #6  
Old   
paul c
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 03:48 PM



On 02/09/2010 9:20 PM, Bob Badour wrote:
Quote:
Choosing good *primary keys* and candidate keys is a vitally important
*database design* task--as much art as science. The design task has very
specific design criteria.

*What are the criteria?*


As an experiment, I asked the above question on StackOverflow.com having
first verified it wasn't already answered on the site. It is a very
important question that has a very simple and clear answer. I had
planned to offer a sizable bounty if nobody gave the correct answer
after the 1st day and answer it myself if nobody claimed the bounty.

http://tinyurl.com/ignorancetothefifth

Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."


I thought the question was clear enough. The answer, of course, is:
uniqueness, irreducibility, simplicity, stability and familiarity.

To anyone who uses that site as a resource, all I can say is: Caveat
lector!
Same goes for most sites to do with db, I'd say. These five are typical
of most of those in IT, eg., any of inability to abstract, to focus,
considerable ability to produce so much verbiage that the overall
meaning becomes vacuous, in other words over-developed talent for
obfuscation combined with a protective herd mentality that shouts down
the occasional sane minority voice which threatens the courtiers by
pointing out that the Emperor has no clothes. Apparently it's not
illegal to use whatever steroids improve the obfuscation faculty.


I believe that the coherent, concise answer (which you have given before
here I think) could be assessed by any non-IT person who has had an
education that cultivated a modest level of literacy.


Ever since I first saw it mentioned I've especially liked 'familiarity'
but it really surprises me when people compare that with
'not-invented-here'. If one has never seen a particular system,
everything, with a capital E, is not familiar but soon becomes so.
Also, I don't see anything wrong with every relation having two or more
candidate keys, the ones that aren't primary being very useful
connections to useful features that are outside the RM, but innate to
most physical implementations, such as ordering. No matter what of
several definitions one uses for 'surrogate', they will soon become
familiar. Celko wants everybody to use published keys, sometimes that's
advantageous but it's not essential, after all the published keys one
has never seen before aren't familiar before one adopts them. Part of
the power of the Codd's RM (pick any version) is that it's open-ended
about such bells and whistles.

Reply With Quote
  #7  
Old   
Cimode
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 05:21 PM



On 3 sep, 22:48, paul c <toledobythe... (AT) oohay (DOT) ac> wrote:
Quote:
On 02/09/2010 9:20 PM, Bob Badour wrote:





Choosing good *primary keys* and candidate keys is a vitally important
*database design* task--as much art as science. The design task has very
specific design criteria.

*What are the criteria?*

As an experiment, I asked the above question on StackOverflow.com having
first verified it wasn't already answered on the site. It is a very
important question that has a very simple and clear answer. I had
planned to offer a sizable bounty if nobody gave the correct answer
after the 1st day and answer it myself if nobody claimed the bounty.

http://tinyurl.com/ignorancetothefifth

Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."

I thought the question was clear enough. The answer, of course, is:
uniqueness, irreducibility, simplicity, stability and familiarity.

To anyone who uses that site as a resource, all I can say is: Caveat
lector!

Same goes for most sites to do with db, I'd say. *These five are typical
of most of those in IT, eg., any of inability to abstract, to focus,
considerable ability to produce so much verbiage that the overall
meaning becomes vacuous, in other words over-developed talent for
obfuscation combined with a protective herd mentality that shouts down
the occasional sane minority voice which threatens the courtiers by
pointing out that the Emperor has no clothes. *Apparently it's not
illegal to use whatever steroids improve the obfuscation faculty.

I believe that the coherent, concise answer (which you have given before
here I think) could be assessed by any non-IT person who has had an
education that cultivated a modest level of literacy.

Ever since I first saw it mentioned I've especially liked 'familiarity'
but it really surprises me when people compare that with
'not-invented-here'. *If one has never seen a particular system,
everything, with a capital E, is not familiar but soon becomes so.
Also, I don't see anything wrong with every relation having two or more
candidate keys, the ones that aren't primary being very useful
connections to useful features that are outside the RM, but innate to
most physical implementations, such as ordering. *No matter what of
several definitions one uses for 'surrogate', they will soon become
familiar. *Celko wants everybody to use published keys, sometimes that's
advantageous but it's not essential, after all the published keys one
has never seen before aren't familiar before one adopts them. *Part of
the power of the Codd's RM (pick any version) is that it's open-ended
about such bells and whistles.
For a change from the usual sterile rantings about current state of
contemporary database design normative habits, and to play somehow
devil's advocate, I'd have to say that the concept of *key* is almost
orthogonal to RM. After all, the concept of *keys* mostly embodies
the IBM context in which RM was theorized by Codd rather than the
logical concept of unique tuple identifier in an algebraical sense.

Additionally, defining quasi metaphysical subjective criteria such as
*familiarity* somehow sounds wrong as opposed to the rigor of the
fundamental mathematics that are supposed to act as a foundation for
RM structural information representations. For some reason, I noticed
such criteria is oftent an open door to diversion from the concept of
*distinguishability* (which at least can be expressed algebraically)
which seems more important than others.

Finally, I'd have to say that looking for database theory definition
into a board, we know is filled by undeducated audiences is at best
pointless, at worst, sterile.

In conclusion, I think uniqueness (I'd add distinguishability),
irreducibility, stability are more useful for definition rather than
*simplicity*, *familiarity* since the last two can hardly be
formalized.

Reply With Quote
  #8  
Old   
Bob Badour
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 06:31 PM



Cimode wrote:

Quote:
On 3 sep, 22:48, paul c <toledobythe... (AT) oohay (DOT) ac> wrote:

On 02/09/2010 9:20 PM, Bob Badour wrote:

Choosing good *primary keys* and candidate keys is a vitally important
*database design* task--as much art as science. The design task has very
specific design criteria.

*What are the criteria?*

As an experiment, I asked the above question on StackOverflow.com having
first verified it wasn't already answered on the site. It is a very
important question that has a very simple and clear answer. I had
planned to offer a sizable bounty if nobody gave the correct answer
after the 1st day and answer it myself if nobody claimed the bounty.

http://tinyurl.com/ignorancetothefifth

Five people identified as John Saunders, David Stratton, Claudio Redi,
wallyk, and rockinthesixstring voted to close it with nothing
approaching a correct answer supposedly because "It's difficult to tell
what is being asked here. This question is ambiguous, vague, incomplete,
or rhetorical and cannot be reasonably answered in its current form."

I thought the question was clear enough. The answer, of course, is:
uniqueness, irreducibility, simplicity, stability and familiarity.

To anyone who uses that site as a resource, all I can say is: Caveat
lector!

Same goes for most sites to do with db, I'd say. These five are typical
of most of those in IT, eg., any of inability to abstract, to focus,
considerable ability to produce so much verbiage that the overall
meaning becomes vacuous, in other words over-developed talent for
obfuscation combined with a protective herd mentality that shouts down
the occasional sane minority voice which threatens the courtiers by
pointing out that the Emperor has no clothes. Apparently it's not
illegal to use whatever steroids improve the obfuscation faculty.

I believe that the coherent, concise answer (which you have given before
here I think) could be assessed by any non-IT person who has had an
education that cultivated a modest level of literacy.

Ever since I first saw it mentioned I've especially liked 'familiarity'
but it really surprises me when people compare that with
'not-invented-here'. If one has never seen a particular system,
everything, with a capital E, is not familiar but soon becomes so.
Also, I don't see anything wrong with every relation having two or more
candidate keys, the ones that aren't primary being very useful
connections to useful features that are outside the RM, but innate to
most physical implementations, such as ordering. No matter what of
several definitions one uses for 'surrogate', they will soon become
familiar. Celko wants everybody to use published keys, sometimes that's
advantageous but it's not essential, after all the published keys one
has never seen before aren't familiar before one adopts them. Part of
the power of the Codd's RM (pick any version) is that it's open-ended
about such bells and whistles.

For a change from the usual sterile rantings about current state of
contemporary database design normative habits, and to play somehow
devil's advocate, I'd have to say that the concept of *key* is almost
orthogonal to RM. After all, the concept of *keys* mostly embodies
the IBM context in which RM was theorized by Codd rather than the
logical concept of unique tuple identifier in an algebraical sense.

Additionally, defining quasi metaphysical subjective criteria such as
*familiarity* somehow sounds wrong as opposed to the rigor of the
fundamental mathematics that are supposed to act as a foundation for
RM structural information representations. For some reason, I noticed
such criteria is oftent an open door to diversion from the concept of
*distinguishability* (which at least can be expressed algebraically)
which seems more important than others.

Finally, I'd have to say that looking for database theory definition
into a board, we know is filled by undeducated audiences is at best
pointless, at worst, sterile.

In conclusion, I think uniqueness (I'd add distinguishability),
irreducibility, stability are more useful for definition rather than
*simplicity*, *familiarity* since the last two can hardly be
formalized.
One cannot formalize elegance, and yet we can recognize it:
http://userweb.cs.utexas.edu/users/E...xx/EWD709.html

If we can recognize it, we can use it for subjective comparison.

Reply With Quote
  #9  
Old   
paul c
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 07:05 PM



On 03/09/2010 3:21 PM, Cimode wrote:
Quote:
On 3 sep, 22:48, paul c<toledobythe... (AT) oohay (DOT) ac> wrote:
On 02/09/2010 9:20 PM, Bob Badour wrote:
...
For a change from the usual sterile rantings about current state of
contemporary database design normative habits, and to play somehow
devil's advocate, I'd have to say that the concept of *key* is almost
orthogonal to RM. After all, the concept of *keys* mostly embodies
the IBM context in which RM was theorized by Codd rather than the
logical concept of unique tuple identifier in an algebraical sense.

Additionally, defining quasi metaphysical subjective criteria such as
*familiarity* somehow sounds wrong as opposed to the rigor of the
fundamental mathematics that are supposed to act as a foundation for
RM structural information representations. For some reason, I noticed
such criteria is oftent an open door to diversion from the concept of
*distinguishability* (which at least can be expressed algebraically)
which seems more important than others.

Finally, I'd have to say that looking for database theory definition
into a board, we know is filled by undeducated audiences is at best
pointless, at worst, sterile.

In conclusion, I think uniqueness (I'd add distinguishability),
irreducibility, stability are more useful for definition rather than
*simplicity*, *familiarity* since the last two can hardly be
formalized.

(I don't think it's sterile to suggest that more of today's programmers,
at least the ones who post to websites and the ones who produce the
systems I'm forced to follow as an ordinary consumer or citizen, might
profit more from a course in "reading for comprehension" than another
computer language course.)


As for keys, what comes first, keys or relations? I think some people
do when faced with conceiving a db, is to think of what they want to
identify and then decide on a possibly tentative identifier which might
or might not end up as a key. Can't fault that. Perhaps more often,
other people first think of what they want to record and then how to
distinguish the 'records' (won't quibble about semantic differences
among 'unique', 'identity', 'distinct' and 'distinguishable'). Can't
fault that either.


I think Bob B is accurate when he says the effort to do this is more art
than science because it involves a certain amount of critical thinking,
a certain amount of lateral thinking, a certain amount of
experimentation, a certain amount of iteration and a small amount of
theory.


But they can't be satisfied until they've designed most or all of the
relations/tables they think they need and take advantage of
normalization theory to end up with as few key attributes as possible.
That's not the same as saying as few keys as possible, eg., a system
without surrogates can have fewer key attributes than one that
mindlessly uses surrogates for every relation but the opposite isn't
guaranteed. So that is another qualitative dimension that starts to
show up when additional tables are involved. I don't know that it has a
conventional label, but it appeals to me.


I think many people go at normalization in a knee-jerk kind of way, eg.,
because they think they're "supposed to" or because they want to avoid
the oft-mentioned old bugaboo, "update anomalies". But the advantage of
avoiding redundancy also has a lot to do with potential system
efficiency. I say "potential" because few if any dbms' are built to
take advantage of, for example, functional dependencies.


I've seen alternative db designs that looked equally useful and
efficient to me even though one of them used keys that were very
unfamiliar to me. If I had to choose, I think I would have chosen the
one that was familiar. So I think we must always expect subjective
attitudes / personal esthetics to slow down any group effort.


I abhor the fixed dictums many authors and consultants put out, maybe
the worst of them being to apply, universally, surrogate values that are
completely artificial, or to 'always use integer keys' as one of Bob B's
cuckoo correspondents put it. On occasion I've been ordered to
participate in some exercise or other run by psychologists or social
'scientists' of one ilk or another (often called the 'soft' sciences).
Although I was usually skeptical that anything good would come of those
exercises, there were a few times when I learned techniques that seemed
to have repetitive advantage. When it comes to the 'art' Bob B
mentions, I do think there are styles and approaches evolved by the soft
sciences that might help people conceive schemas more determindedly (not
deterministically!). But all the courses I took weren't aimed at single
or group design specifically, for example one of them only dealt with
recognizing different personality types and ways to avoid wasting time
when dealing with them. Still that would be better than nothing for
most group efforts. Personally, I prefer a division of schemas such
that one individual can explain all of his piece to me off the top of
his head.


It's likely that Codd must have been very much influenced by the
'physical' keys that were lingua franca in the mainframe world of the
1950's and 60's. (I started with that stuff and remember that you
couldn't look another assembler programmer in the eye unless you could
write a channel program even though that was hardly ever necessary
compared to the packaged access methods of the time. For a time there
was even a trend to move more access method logic outboard of the cpu.)
So the term 'key' has been quite 'familiar' to me for forty years. But
I'd say my narrow view of it probably impeded my awareness of logical
database for more than twenty of those years. The dbms products, most
of them originated by people of my generation are, I think, equally
responsible for my own lack of progress. Most younger people don't have
the advantage of an historical perspective and sadly most don't want it
either. I don't like some the historical baggage Codd's use of 'key'
connotes but I also don't like the ignorance of its possibilities (like
the FD leverage I referred to above) amongst dbms developers.


I wouldn't object if you called the above 'directionless', as opposed to
'sterile', ha, ha.

Reply With Quote
  #10  
Old   
-CELKO-
 
Posts: n/a

Default Re: What are the design criteria for primary keys? - 09-03-2010 , 08:30 PM



Quote:
Celko wants everybody to use published keys, sometimes that's advantageous but it's not essential, after all the published keys one has never seen before aren't familiar before one adopts them.
I like industry standards for several reasons:
1) Validation = Can I look at it and see the form is correct?
2) Verification = Can an external source map the key to the entity?
3) Universality = Does everyone agree on the meaning? This is the
idea of a trusted source to maintain the standard for me (the
laziness principle of programming).

I think this is more important than familiarity, which is
subjective.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.