dbTalk Databases Forums  

Natural keys vs Aritficial Keys

comp.databases.theory comp.databases.theory


Discuss Natural keys vs Aritficial Keys in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Walter Mitty
 
Posts: n/a

Default Natural keys vs Aritficial Keys - 05-15-2009 , 11:57 AM






I'm confused about some recent trends in database design.

When I learned databases, about a quarter of a century ago, the preferred
practice was to use natural keys to refer to individual table rows, unless
there's some real good reason to go with an artificial key. I've run into a
few cases where I chose to use an artificial key, but most of the time I've
used a natural key. (I should clarify: if a university's registration
office assigns each student a StudentID, before entering that student into
the database, I'm treating that key as "natural" in the context of the
database, even though one might argue that it's "artificial" in some other
context.) This generally surfaced in the choice of one or more columns to
be declared as a PRIMARY KEY.

Every time the issue of natural keys comes up in Stackoverflow.com, the
prevailing view seems to be that the best primary keys are artificial and
opaque. And responses that take this view get the votes. They emphasize
efficiency (mainly efficiency in the index that you usually get
automatically when you declare a PRIMARY KEY). But data integrity seems to
be forgotten, here. The same is not generally tru when the conversation
turns to referential integrity. Slowly but surely the programming community
seems to heve been dragged, kicking and screaming, into turning over
referential integrity enforcement to the DBMS.


What I don't get is whether I've understood something that today's
trendsetters never learned or whether they are taking into consideration
matters that I never considered important, back in the day. I've read their
arguments carefully, and it seems to me that failure to declare UNIQUE and
NOT NULL constraints on the most widely used candidate keys is simply an
invitation to degradation of data integrity. PRIMARY KEY is a convenient
way to get those two constraints with one declaration.

Is data integrity less important than it used to be? Are most people
building databases that get embedded in some appplication to be sold to the
non technical buyer? Is something else going on that I'm unaware of? Can
some body explain to me what these people are thinking?





Reply With Quote
  #2  
Old   
Tony Toews [MVP]
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-15-2009 , 08:24 PM






"Walter Mitty" <wamitty (AT) verizon (DOT) net> wrote:

Quote:
When I learned databases, about a quarter of a century ago, the preferred
practice was to use natural keys to refer to individual table rows, unless
there's some real good reason to go with an artificial key. I've run into a
few cases where I chose to use an artificial key, but most of the time I've
used a natural key. (I should clarify: if a university's registration
office assigns each student a StudentID, before entering that student into
the database, I'm treating that key as "natural" in the context of the
database, even though one might argue that it's "artificial" in some other
context.)
That's a decent natural key. But life is seldom so clear.

Consider names. My brother has the same birth date, first, middle and last name as
a career criminal in Canada. And we have a somewhat rare last name. It's a pain for
him to cross the border into the US. He allows himself an extra hour. Although
they must now have photo's in the US Customs computers as lately the delay as been
quite a bit shorter.

Consider business name. Duplicate business names can exist in adjacent provinces,
states or countries.

Now what happens when you get to the child tables? For example student/course
table. What do you use as a primary key? Student ID, Course Number? What if a
student takes the course a second time? Now the primary key needs three fields and
some logic to ensure the third field is unique. What about child tables of that
child table? Now it's a table with four fields, or more in the primary key.

Quote:
Every time the issue of natural keys comes up in Stackoverflow.com, the
prevailing view seems to be that the best primary keys are artificial and
opaque. And responses that take this view get the votes. They emphasize
efficiency (mainly efficiency in the index that you usually get
automatically when you declare a PRIMARY KEY). But data integrity seems to
be forgotten, here.
How does this affect data integrity?

When there is a possibility of duplicate data we should be presenting the users with
the current data that is a close match so the user can decide if the
person/company/whatever is already in the system or not. For example the first two
letters of the first and last name give surprisingly few matches in a database I have
of 10,000 names. Even Jo Sm for Joan Smythe.

Quote:
The same is not generally tru when the conversation
turns to referential integrity. Slowly but surely the programming community
seems to heve been dragged, kicking and screaming, into turning over
referential integrity enforcement to the DBMS.
I don't quite understand that the problem is with turning referential integrity over
to the DBMS.

I'm quite happy with an artificial primary key (autonumber in Access) and unique
indexes in data fields as appropriate.

That said Access forms and reports don't handle multifield artificial keys very well.
The wizards, for example, don't handle them at all. So you're first to labouriously
enter the field names by hande in the various properties such as a subforms Link
Child Fields and Link Master Fields properties.

I should point out that
1) I build simple and moderately complex applications in MS Access so that clearly
colours my viewpoints. A product many readers of my postings in this newsgroup hold
in derision.
2) I look at this from both the DBAs viewpoint and the developers viewpoint although
again many folks in this newsgroup would not consider me a DBA of any sort.
3) I've managed to irritate a number of the folks in this newsgroup with my postings
in the past. As a result I've been plonked by a few. Oh well.
3a) As a result if one or two of those read replies to me they will be complaining
that they were forced to read my opinion. <shrug>

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
Granite Fleet Manager http://www.granitefleet.com/


Reply With Quote
  #3  
Old   
paul c
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-15-2009 , 11:23 PM



Tony Toews [MVP] wrote:
....


Basically, it's just a phony issue foisted by db semi-literates, usually
product-oriented to the extreme, like the kind who get into arguments
about whether table names should have an 's' at the end or who grew up
with 1980's 4GL's and their puerile successor products that couldn't
support composites. A table/relation/relvar should have exactly the
attributes and exactly the keys that the app needs. It is just
self-flagellation to ignore a dbms feature that will generate some of
them (assuming the dbms doesn't insist on them).


The so-called 'natural' key for all the airwaybills in a ULD on a
typical airline flight segment is about seven or eight attributes. When
the cargo master, at the last minute, offloads a container or two onto
the tarmac, you don't want a ponderous logic to un-assign it and put it
on the plane at the next gate. You need a form of indirection, for that
matter the same is needed in any robotic application such as a freight
warehouse.


Somebody in this thread suggested, if I took the meaning right, that a
design could actually profit from using a single attribute to stand in
for a composite key, where both are part of the design. I've seen such
systems and considered that to be perfectly reasonable, although one of
the big names in the field told me I was confusing physical with
logical. But I say the big names aren't right all the time. I think
there is such a thing as logical efficiency, not just physical
efficiency and sometimes it is a requirement that a 'time-critical'
massive update be coded as simply as possible, leaving the slow-poke
work-a-day requirements to handle the ponderous side. I admit that
doesn't happen very often now with all the fast machinery around. The
decision for me would be based entirely on whether the application
required it, and nothing to do with any dogma.

Reply With Quote
  #4  
Old   
Walter Mitty
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-16-2009 , 12:22 PM




"paul c" <toledobythesea (AT) oohay (DOT) ac> wrote

Quote:
Tony Toews [MVP] wrote:
...


Basically, it's just a phony issue foisted by db semi-literates, usually
product-oriented to the extreme, like the kind who get into arguments
about whether table names should have an 's' at the end or who grew up
with 1980's 4GL's and their puerile successor products that couldn't
support composites. A table/relation/relvar should have exactly the
attributes and exactly the keys that the app needs. It is just
self-flagellation to ignore a dbms feature that will generate some of them
(assuming the dbms doesn't insist on them).


The so-called 'natural' key for all the airwaybills in a ULD on a typical
airline flight segment is about seven or eight attributes. When the cargo
master, at the last minute, offloads a container or two onto the tarmac,
you don't want a ponderous logic to un-assign it and put it on the plane
at the next gate. You need a form of indirection, for that matter the
same is needed in any robotic application such as a freight warehouse.


Somebody in this thread suggested, if I took the meaning right, that a
design could actually profit from using a single attribute to stand in for
a composite key, where both are part of the design. I've seen such
systems and considered that to be perfectly reasonable, although one of
the big names in the field told me I was confusing physical with logical.
But I say the big names aren't right all the time. I think there is such
a thing as logical efficiency, not just physical efficiency and sometimes
it is a requirement that a 'time-critical' massive update be coded as
simply as possible, leaving the slow-poke work-a-day requirements to
handle the ponderous side. I admit that doesn't happen very often now
with all the fast machinery around. The decision for me would be based
entirely on whether the application required it, and nothing to do with
any dogma.
Thanks for your reply. I can't tell you the number of times when I've read
one of your comments, agreed with it completely, and never bothered to write
a response.

In the case of stackoverflow, I don't feel quite comforatble dismissing
those who hold opinions that differe from mine as "semi literates". The
rest of what they write suggest that they have been building databases for
some time. At least, that's true for some of them.

I have to admit that I've got a firm opinion on the "tables should be
plural" controversy myself. Furthermore, my own practice is inconsistent
because when I use the "table_name AS table_alias" construct, the alias is
almost always singular, or an abbreviation of a singular, even though my
table names are almost always plural. I don't regard that opinion of mine
to be product driven, and I don't think it's much more than esthetic, and
ever so slightly mnemonic.

I agree that the big names aren't right all the time. The ones who really
deserve the respect they claim welcome somebody who questions the truth of
the wisdom they expound. If they are proven right, it just confirms the
value of their work. If they are proven wrong, everybody wins. The big
names who adopt the rhetoric of "because I'm an expert and you're not" have
gotten too big for their britches.

I also have seen cases where concocting a simple key to stand in for a very
complicated compound key makes sense. There's even an academic term for
that, at the conceptual level. It's called "reifying a relationship." The
confirmation number I get from Hertz rental, and that I use when I approach
the counter at my destination airport, is an example of such a key,
although it is obviously not hidden from the users. And the confirmation
now takes on the role of an entity, instead of a relationship,. But the
existence of such cases is a far cry from "always create an autosequnce
field, called ID, for every table".

I think the desire for a unique ID for every row in every table pursues the
same dream as accessing rows by address instead of by contents. And while
an ID column doesn't pin a row the way an address stored in other rows
would, it still reduces the relational model to the graph model in some way
that I can't describe.

My big bugaboo with the ID field is that if two rows have different IDs but
otherwise identical data, there tends to be a disconect between developers
and analysts as to whether this is harmful duplication or not. It gets even
worse if the two rows have different ID fields, identical key values, and
at least one non key value different. I've actually seen a case like this
in the field. In this case, the two rows are contradicting each other in
the anlayst's view, but not in the developer's view. There's something
wrong with that.

Again, thanks for a considered response.












Reply With Quote
  #5  
Old   
Walter Mitty
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-16-2009 , 12:22 PM




"Tony Toews [MVP]" <ttoews (AT) telusplanet (DOT) net> wrote

Quote:
"Walter Mitty" <wamitty (AT) verizon (DOT) net> wrote:

When I learned databases, about a quarter of a century ago, the preferred
practice was to use natural keys to refer to individual table rows, unless
there's some real good reason to go with an artificial key. I've run into
a
few cases where I chose to use an artificial key, but most of the time
I've
used a natural key. (I should clarify: if a university's registration
office assigns each student a StudentID, before entering that student
into
the database, I'm treating that key as "natural" in the context of the
database, even though one might argue that it's "artificial" in some other
context.)

That's a decent natural key. But life is seldom so clear.

Consider names. My brother has the same birth date, first, middle and
last name as
a career criminal in Canada. And we have a somewhat rare last name. It's
a pain for
him to cross the border into the US. He allows himself an extra hour.
Although
they must now have photo's in the US Customs computers as lately the delay
as been
quite a bit shorter.

Consider business name. Duplicate business names can exist in adjacent
provinces,
states or countries.

Now what happens when you get to the child tables? For example
student/course
table. What do you use as a primary key? Student ID, Course Number?
What if a
student takes the course a second time? Now the primary key needs three
fields and
some logic to ensure the third field is unique. What about child tables
of that
child table? Now it's a table with four fields, or more in the primary
key.

Every time the issue of natural keys comes up in Stackoverflow.com, the
prevailing view seems to be that the best primary keys are artificial and
opaque. And responses that take this view get the votes. They emphasize
efficiency (mainly efficiency in the index that you usually get
automatically when you declare a PRIMARY KEY). But data integrity seems
to
be forgotten, here.

How does this affect data integrity?

When there is a possibility of duplicate data we should be presenting the
users with
the current data that is a close match so the user can decide if the
person/company/whatever is already in the system or not. For example the
first two
letters of the first and last name give surprisingly few matches in a
database I have
of 10,000 names. Even Jo Sm for Joan Smythe.

The same is not generally tru when the conversation
turns to referential integrity. Slowly but surely the programming
community
seems to heve been dragged, kicking and screaming, into turning over
referential integrity enforcement to the DBMS.

I don't quite understand that the problem is with turning referential
integrity over
to the DBMS.

I'm quite happy with an artificial primary key (autonumber in Access) and
unique
indexes in data fields as appropriate.

That said Access forms and reports don't handle multifield artificial keys
very well.
The wizards, for example, don't handle them at all. So you're first to
labouriously
enter the field names by hande in the various properties such as a
subforms Link
Child Fields and Link Master Fields properties.

I should point out that
1) I build simple and moderately complex applications in MS Access so that
clearly
colours my viewpoints. A product many readers of my postings in this
newsgroup hold
in derision.
2) I look at this from both the DBAs viewpoint and the developers
viewpoint although
again many folks in this newsgroup would not consider me a DBA of any
sort.
3) I've managed to irritate a number of the folks in this newsgroup with
my postings
in the past. As a result I've been plonked by a few. Oh well.
3a) As a result if one or two of those read replies to me they will be
complaining
that they were forced to read my opinion. <shrug

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
Granite Fleet Manager http://www.granitefleet.com/
Thanks for your response. You've given me a lot to ponder.

On the question of names, I consider names to be so different from natural
keys that I honestly didn't even give a thought to names when I first raised
the question of natural versus artificial keys. We all know the reasons why
names fail as keys, and you've outlined some of them. About all we can say
about names is that they were used of identification and reference purposes
in the era before computerization. When I say natural key, I mean something
that's immutable, not used twice in two contradictory fashions, and not
optional regarding the thing being identified. As I said in my OP, keys
viewed as natural by the database may well be viewed as artificial in some
larger system, one that includes applications and maybe even humans.

The student/course relation is a difficult example for considering
parent/child relationships, because it's really a junction box to capture a
relationship between many students and many courses. Is it a child of the
student, or is it a child of the course? This is actually the kind of thing
that resulted in so much trouble in the hierarchical and network databases
that existed in 1970 and shortly therafter. As far as relationship tables
go, I see nothing LOGICALLY wrong with assigning them a composite primary
key. As to whether a given student and a given course can have more that
one instance of a relationship between them, that's actually a matter of
how the subject matter organizes the data, isn't it? And wouldn't the
solution, in any event, be to come up with a ternary relationship in which
the relationship among a student, a course, and some third thing has at most
one row per instance?

This can only be discerned by studying the subject matter, and analyzing the
data. Adding a ID PK field to the binary relationship just allows you
postpone analysis until after the design and implementation are done. I try
to avoid doing that, for obvious reasons.

With regard to presenting the user with the opportunity to disambiguate
duplicate data, I think that's very relevant to a certain class of
problems. But there is another class of problems where is of the essence to
prevent such ambiguity from arising in the first place. I think it's that
latter class of problems that I was mostly concerned with, back in the day.

The question of whether Access wizards deal with composite keys well,
poorly, or not at all is a question that might influence the design of a
database intended to be used with an Access application. But the fact that
this benefit of a simple key attaches to a simgnle product (or a class of
products) ought to be made explicit. Instead, today's database experts are
telling neophytes that creating an autonumbered ID field is always a good
practice, regardless of the tools one is using, whether the database is
single user or multiuser, whether its embedded in a single application or
integrates the management of data across many applications, and so on.
There are consequences to permitting the same entity to be recorde twice in
a table, in two rows that differ only in the value of the ID field. I
believe Paul C outlined those difficulties in a separte response.

I've actually gotten beyond snobby attitudes towards Access. In my old age,
I decided to return and learn the product, and there are actually some
things about it that I like quite a lot. But respect should obscure the
fact that Ed Codd's proposal for a way to organize data in order to share
the data on an enterprise wide basis is fundamentally different from the
goals of the early versions of MS Access, which is to facilitate management
of data on a single user's personal computer. The difference between those
two goals is so profound that translating what works well in one setting
into the other ssetting can produce absurdities.

I hope I've established a tone here that permits us to explore big
differences without simply engaging in insult and invective. I look forward
to your reply.










Reply With Quote
  #6  
Old   
Roy Hann
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-16-2009 , 01:45 PM



Walter Mitty wrote:


Quote:
My big bugaboo with the ID field is that if two rows have different IDs but
otherwise identical data, there tends to be a disconect between developers
and analysts as to whether this is harmful duplication or not.
There's no two ways about it. If the row would be duplicated if not for
the spurious distinction of a value that was generated only to ensure
distinction and *for no other reason*, it's logically still a duplicate
row.

By itself a duplicate is absurd but pretty harmless. The problem is
when one copy gets updated. Then you've got a contradiction and no
logical basis to prefer one over the other. That just can't ever be
right and it's not something that can ever be a matter of opinion or
taste.

--
Roy




Reply With Quote
  #7  
Old   
--CELKO--
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-16-2009 , 02:24 PM



Quote:
Consider business name. Duplicate business names can exist in adjacent provinces,
states or countries.

Ever look up the DUNS? It is great for International use. Again, do
your Google Diligence, do your Wikipedia Diligence.

Quote:
I'm quite happy with an artificial primary key (autonumber in Access) and unique
indexes in data fields [sic: columns are not fields] as appropriate.
<<

The Department of Redundancies Department, just like Monty Python
Multi-column keys are quite easy to use with a text editor. Assuming
that you don't name them all "id" in violation of basic data modeling
rules.



Reply With Quote
  #8  
Old   
Tony Toews [MVP]
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-16-2009 , 02:47 PM



--CELKO-- <jcelko212 (AT) earthlink (DOT) net> wrote:

Quote:
Consider business name. Duplicate business names can exist in adjacent provinces,
states or countries.

Ever look up the DUNS? It is great for International use. Again, do
your Google Diligence, do your Wikipedia Diligence.
And I'm sure there are lots of small businesses that don't have a DUNS.

Quote:
I'm quite happy with an artificial primary key (autonumber in Access) and unique
indexes in data fields [sic: columns are not fields] as appropriate.

The Department of Redundancies Department, just like Monty Python
Multi-column keys are quite easy to use with a text editor.
So you're going to build an app with 100 forms using a text editor? I'm not.

Quote:
Assuming
that you don't name them all "id" in violation of basic data modeling
rules.
No, I like usingTony's Table and Field Naming Conventions
http://www.granite.ab.ca/access/tablefieldnaming.htm
Which you have stated quite a while ago you don't like. Oh well.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
Granite Fleet Manager http://www.granitefleet.com/


Reply With Quote
  #9  
Old   
Tony Toews [MVP]
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-16-2009 , 03:49 PM



paul c <toledobythesea (AT) oohay (DOT) ac> wrote:

Quote:
The so-called 'natural' key for all the airwaybills in a ULD on a
typical airline flight segment is about seven or eight attributes. When
the cargo master, at the last minute, offloads a container or two onto
the tarmac, you don't want a ponderous logic to un-assign it and put it
on the plane at the next gate. You need a form of indirection, for that
matter the same is needed in any robotic application such as a freight
warehouse.
Preumably though there are bar codes on the cargo containers and a simple form where
the cargo master can remove selected containers from the airplane.

A welding shop client that empoyed hundreds of welders built and assembled very
complex piping assemblies for refineries, power plants and oil sands plants. These
are designed to, usually, fit on a 50' flat bed trailer. The client number could be
in excess of 20 or 25 characters. The internal number they used went from 1 to
whatever. It in turn was prefixed with a job number which started at 1 and went to
whatever. The system printed multiple weather proof tag with the internal number as
well as the bar coded long client number.

(Occasionally they would have to rebuild a particular item. The gravel pad at one
client where these are stored is about a mile square. Well, if the plant has a
large expansion, and there's a lot of snow that winter, you can't find the
assemblies. Until the expansion is finished a year or two later and you're
looking at the excess assemblies which are laying on the gravel.. And the folks at
the plant getting paid $25 and $30 an hour love being told to go through all the
items on this gravel pad looking for particular assemblies. A great way to spend a
shift rather than hauling stuff around or whatever.)

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
Granite Fleet Manager http://www.granitefleet.com/


Reply With Quote
  #10  
Old   
paul c
 
Posts: n/a

Default Re: Natural keys vs Aritficial Keys - 05-16-2009 , 03:54 PM



Walter Mitty wrote:
....
Quote:
In the case of stackoverflow, I don't feel quite comforatble dismissing
those who hold opinions that differe from mine as "semi literates". The
rest of what they write suggest that they have been building databases for
some time. At least, that's true for some of them.
...
There is really so little that's important to read and yet the vast
majority of practioners have not even done that, let alone spent any
time on the implications and the holes in the original theory. Maybe
one in fifty can discuss fundamentals without lapsing into jargon or
confusing theory with implementation. Academia is the same way, perhaps
worse, preferring to write papers before they understand their subject.
The big corporations have really been of no help at all. It was a
fluke that Codd got anywhere in the first place, at the time he was
being paid to study IMS and much of his time, energy and health was
dissipated fighting the corporate establishment and various commercial
sycophants and vested interests from all corners. I'd even say Date has
wasted many words for the similar reason.


The very noisy db literacy lowest-common-denominator crowd have been
distracting progress for most of my life, at times I've even been among
them, trouble is that most of them don't know they are in that segment
and don't understand even when they are told. It is a truism that the
stupid and the arrogant usually are incapable of recognizing their own
behaviour. That is why politeness has nothing to do with general
improvement. Much of all this is true of other fields too, but this one
has the advantage of very low cost for truly radical implementations,
one or two of which might wake people up, which is the only way out I
can see even if it's a long-shot. The chief value of the semi-literates
is in their exceptional ability to expose problems , although what they
call a problem is often not a real problem, occasionally they unearth a
nugget. When they occasionally tout a solution, it is generally an
accident.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.