dbTalk Databases Forums  

Newbie question about db normalization theory: redundant keys OK?

comp.databases.theory comp.databases.theory


Discuss Newbie question about db normalization theory: redundant keys OK? in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #41  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 03:37 PM






-CELKO- <jcelko212 (AT) earthlink (DOT) net> wrote:

Quote:
If teachers are permitted to change their names, then names are not appropriate keys for teachers.

Agreed; this was a skeleton table to demonstrate to the OP that tables
can have redundant or even overlapping keys to preserve data
integrity. Tony just gets easily side tracked and I am afraid I tend
to fall for it myself. Sorry for the thread drift ..

Instead of a teacher's name, we can tattoo a permanent bar code on the
foreheads of the faculty, using their tax identification number
That does not allow for internationali[sz]ation.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.


Reply With Quote
  #42  
Old   
Tony Rogerson
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 03:40 PM






Quote:
Agreed; this was a skeleton table to demonstrate to the OP that tables
can have redundant or even overlapping keys to preserve data
Perhaps next time you won't be so dam condecending to the newbies posting
problems when they post a 'skeleton table' to 'demonstrate' the problem.

Quote:
integrity. Tony just gets easily side tracked and I am afraid I tend
to fall for it myself. Sorry for the thread drift ..
No thread drift - and you still haven't answered the questions; yet again
instead of answering the actual question I posed you wonder off with insults
or digress into completely unrelated woffle.

--
Tony Rogerson, SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson
[Ramblings from the field from a SQL consultant]
http://sqlserverfaq.com
[UK SQL User Community]




Reply With Quote
  #43  
Old   
raylopez99
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 03:43 PM



On Dec 14, 11:49 am, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:

Quote:
An example of a compound key. An enrollment system for courses, students,
and enrollments.
An enrollment might be called a student-course. In the MS design pattern,
they will create keys for each table:

StudentId for Students,
CourseID for Courses,
and StudentCourseID for StudentCourses.

Lets accept for now that StudentID and CourseID are good ways to identify
students and courses. Other participants will object, if the past is any
guide, but you can analyze their objections for yourself.

I claim, at this point, that StudentCourseID as a primary key for the third
table, the StdentCourses table, is a bad idea. A better idea is to let
StudentID and CourseID be a compound primary key for StudentCourses.

These two columns will be in the database you get from the wizard anyway.
They have to be there to serve as foreign keys, relating the row in
StudentCourses to parent rows in Students and Courses. There is little to
be lost, and much to be gained, by making these two column the compound
primary key of this table.
OK, I just did this, and I would greatly appreciate your feedback at
some point.

I have three tables, STUDENT table, with StudentID (PK), COURSE table,
with CourseID (PK), and a third table STUDENTCOURSE, which within
Design View I was actually able (without using the Wizard!) to set up
a compound key using this trick: make sure the two fields (shows an
rows but they're fields of course) you want to be the compound primary
key are next to each other (this is important it seems), then holding
down the "Control" button, select both 'rows' (I used the mouse, but
arrow keys probably also work), then right click and select "Primary
Key" (symbol), then the little "PK" key will show up next to both
fields! Pretty cool, now both fields are your primary compound key.
Then, in the relationship window, I enforced Referential Integrity
(always a good thing from what I can tell) for this compound key of
table STUDENTCOURSE with the PKs of both STUDENT and COURSE. It's a
non-exclusive, one to many relationship so I had the "1" symbol
running from both the first two tables to TWO "infinity" symbols (for
many) running into the third table. Seems to work fine.

If anything doesn't look kosher please let me know.

The first time I did this I didn't have a true compound key, since for
reasons too convoluted to mention now, but I will anyway, my second
table (COURSE) had a foreign key that was in fact the migrated primary
key of the STUDENT table, so essentially I was using as a compound
primary key in STUDENTCOURSE the primary key of STUDENT alone (why
this didn't end up as a compile error I don't know, since it's
illegal, perhaps I played with the properties checkboxes in such a way
that I allowed duplicates to be formed, but then again I didn't really
test it, and perhaps it would have failed had I run a query).

I think I'm getting the hang of this RDBMS stuff, it's pretty neat.

David Cressey, thanks!

RL

PS--at some point I'm going to have to find out why a compound key is
preferred sometimes, and I'm not talking about the semantics of what a
compound key is, but that's further down the road and this is neither
the time nor place for that probably.


Reply With Quote
  #44  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 03:44 PM



raylopez99 <raylopez99 (AT) yahoo (DOT) com> wrote:

[snip]

Quote:
THE SOFTWARE AND UNDERLYING TECHNOLOGY ARE NOT FAULT-TOLERANT AND ARE
NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE OR RESALE AS ON-LINE
CONTROL EQUIPMENT IN HAZARDOUS ENVIRONMENTS REQUIRING FAIL-SAFE
PERFORMANCE, SUCH AS IN THE OPERATION OF NUCLEAR FACILITIES, AIRCRAFT
NAVIGATION OR COMMUNICATION SYSTEMS, AIR TRAFFIC CONTROL, DIRECT LIFE
SUPPORT MACHINES, OR WEAPONS SYSTEMS, IN WHICH
THE FAILURE OF THE SOFTWARE OR UNDERLYING TECHNOLOGY OR PRODUCT COULD
LEAD DIRECTLY TO DEATH, PERSONAL INJURY, OR SEVERE PHYSICAL OR
ENVIRONMENTAL DAMAGE ("HIGH RISK ACTIVITIES"). LICENSOR SPECIFICALLY
DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR HIGH RISK
ACTIVITIES
.. . . not that our marketing and sales departments will mention it
first.

Sincerely,

Gene Wirchenko


Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.


Reply With Quote
  #45  
Old   
raylopez99
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 03:51 PM



On Dec 14, 12:30 pm, -CELKO- <jcelko... (AT) earthlink (DOT) net> wrote:
Quote:
If teachers are permitted to change their names, then names are not appropriate keys for teachers.

Agreed; this was a skeleton table to demonstrate to the OP that tables
can have redundant or even overlapping keys to preserve data
integrity. Tony just gets easily side tracked and I am afraid I tend
to fall for it myself. Sorry for the thread drift ..

Instead of a teacher's name, we can tattoo a permanent bar code on the
foreheads of the faculty, using their tax identification number
Celko what do you think of 1NF, 2NF, 3NF as programming exercises?
(see here for my understanding of these concepts:
http://www.utexas.edu/its-archive/wi...ng/rm/rm7.html
) Obviously for 'cascades' of UPDATE changes and the like, 3NF is the
easiest to code, but getting a dB into 3NF form if it's in 2NF or 1NF
form is a bit of work, agreed? The U of Texas example was easy to
follow but in the real world I doubt it's so cut and dry.

Keep in mind I have a few days experience in this matter, but I think
I'm on the right track with the above statement.

RL


Reply With Quote
  #46  
Old   
-CELKO-
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 06:06 PM



Quote:
Celko what do you think of 1NF, 2NF, 3NF as programming exercises?
A lot of it is common sense made formal. Ask yourself what are the
things in your data model -- a lot of them will be physical
(customers, goods, trains, cars, etc.) A lot of them will be
relationships among (not always between!) entities (customer buys
goods) and have a name of their own (sales). And somethings will be
fuzzy, which is where things are tricky.

Quote:
but getting a dB into 3NF form if it's in 2NF or 1NF form is a bit of work, agreed? The U of Texas example was easy to follow but in the real world I doubt it's so cut and dry.
The bad news is that it is a bitch to refactor a bad RDBMS design ..

Quote:
Keep in mind I have a few days experience in this matter, but I think I'm on the right track with the above statement.
You're very well, grasshopper


Reply With Quote
  #47  
Old   
-CELKO-
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 06:09 PM



Quote:
That does not allow for internationali[sz]ation.

I guess we will have to use the Kabblah number the Anti_Christ will
assign us when Satan takes over the world. Oh, wait minute, that's
Bill Gates and GUIDs


Reply With Quote
  #48  
Old   
-CELKO-
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 06:12 PM



Quote:
BTW thinking of foreign keys as links is a common hard-to-get-rid-of misconception; Unlearn it, they really are different. Links point, foreign keys constrain.
Thank you! I need to add that to my "columns are not fields, rows are
not records, files are not tables!" rant.


Reply With Quote
  #49  
Old   
Brian Selzer
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 11:17 PM




"David Portas" <REMOVE_BEFORE_REPLYING_dportas (AT) acm (DOT) org> wrote

Quote:
On 14 Dec, 17:51, "Tony Rogerson" <tonyroger... (AT) torver (DOT) net> wrote:
First of all, let's go back to your statement about just store the
history -
can you please show how you would do that with celko schema and still be
able to reach the rows as per my examples?

precisely: "The real world gets in the way" tells us nothing about why
it would be a problem to update email address X to become email
address Y. I just don't see what you are getting at.

Here's another...

create table blah (
emiladdress varchar(?) not null unique,
who varchar(100) not null
)

In the year 2006 I have an email address called
tonyroger... (AT) sqlserver (DOT) eu.com everybody sends email to that address.

insert blah values( 'tonyroger... (AT) sqlserver (DOT) eu.com' )

On 1st Jan 2007 I change the email address to tonyroger... (AT) torver (DOT) net I
suddenly stop receiving emails because everybody is still emailing
tonyroger... (AT) sqlserver (DOT) eu.com.

update blah set emailaddress = 'tonyroger... (AT) torver (DOT) net' where
emailaddress
= 'tonyroger... (AT) sqlserver (DOT) eu.com'

How do all the applications disconnected from the database now reach the
record? They can't - the natural key has changed.


You are saying that this is a distributed database but that it lacks a
mechanism for accurately propagating changes out to all its nodes?
Well in my view such a DBMS would be broken. It surely violates Codd's
principle of "Distribution Independence". Let's follow your example to
its conclusion though. The solution is to replace whatever copy of the
Blah relation exists in the application with the new Blah relation
that superceded it. Now all emails reach the correct address and there
is no problem that requires a different key.

Forgive me for butting in, David, but where did you come up with the idea
that it is a distributed database? What have disconnected
applications--that is, applications that use something akin to disconnected
ADO recordsets or ADO.NET datasets--to do with distributed databases?

The question is: for how long is the data that was just read out of the
database considered to be valid? Until the next update? Or is it stale as
soon as its read? Does it have something to do with transaction control or
locking? If several updates occur between the reading of one piece of
information and the reading of another, how can you be sure that any answer
that involves both pieces of information is correct? How can you be sure
that you haven't read the same information twice? If you use an artificial
key and a timestamp (or rowversion), then there can be no doubt as to
whether or not the information in question changed between the first reading
and the second.

Quote:
--
David Portas






Reply With Quote
  #50  
Old   
David Portas
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-15-2007 , 03:08 AM



"Brian Selzer" <brian (AT) selzer-software (DOT) com> wrote

Quote:
You are saying that this is a distributed database but that it lacks a
mechanism for accurately propagating changes out to all its nodes?
Well in my view such a DBMS would be broken. It surely violates Codd's
principle of "Distribution Independence". Let's follow your example to
its conclusion though. The solution is to replace whatever copy of the
Blah relation exists in the application with the new Blah relation
that superceded it. Now all emails reach the correct address and there
is no problem that requires a different key.


Forgive me for butting in, David, but where did you come up with the idea
that it is a distributed database? What have disconnected
applications--that is, applications that use something akin to
disconnected ADO recordsets or ADO.NET datasets--to do with distributed
databases?

The question is: for how long is the data that was just read out of the
database considered to be valid? Until the next update? Or is it stale
as soon as its read? Does it have something to do with transaction
control or locking? If several updates occur between the reading of one
piece of information and the reading of another, how can you be sure that
any answer that involves both pieces of information is correct? How can
you be sure that you haven't read the same information twice?
I agree that these are important issues of application design. I don't think
they need to affect the database logical design in this case.

Quote:
If you use an artificial key and a timestamp (or rowversion), then there
can be no doubt as to whether or not the information in question changed
between the first reading and the second.

A ROWVERSION (as defined by Microsoft SQL Server) does not tell you whether
any data has changed. It tells you whether some rows were possibly affected
by update operations. In other words it can give false positives -
indicating a change where there is none.

BTW I seriously doubt whether it would be possible or desirable to implement
anything like a ROWVERSION in a true RDBMS. The consequences of SQL Server's
implementation are serious because it attempts to identify row data based on
something other than keys. I have never been a fan of the ROWVERSION
feature.

The only way to tell whether the current state of the database equals some
previous state is to query again and compare that to a result previously
retrieved. That comparison is usually based on a key value. It makes no
difference what type of value is used for the key. The comparison is exactly
the same whether it is an "artificial" key or otherwise. (I'm not too
concerned about defining what an "artificial" key is because I don't think
it matters).

Consider a set of transformations:

T1:a -> T2:b -> T3:c -> T4:a

Where "T1:a" means "The value of the database at time T1 is a". When we
requery the database at time T4 all we will ever know is that the value is
the same as at time T1. If it is a requirement to know about the previous
updates at T2 and T3 then obviously we ought to preserve that information in
the database - but in many cases it is quite reasonable not to do that.

--
David Portas




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.