dbTalk Databases Forums  

Newbie question about db normalization theory: redundant keys OK?

comp.databases.theory comp.databases.theory


Discuss Newbie question about db normalization theory: redundant keys OK? in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
raylopez99
 
Posts: n/a

Default Newbie question about db normalization theory: redundant keys OK? - 12-12-2007 , 04:26 PM






With a few hours of theory under my belt, I'd like to ask if there's
ever a time that you don't want a completely normalized dB, that is, a
normalized database being a dB that has no redundant information (my
understanding of what a normalized database is).

Or, is there ever a time that you want redundant keys (that is, the
same keys in many different tables, that obviously are not linked (in
a relationship) between two tables?). Having redundant attributes and/
or keys seems to me a very lazy way of designing a database that
doesn't require lots of initial thought, but of course you have to pay
for it by meticulously "synching" all redundant keys to one another
everytime there is a change in one of the redundant keys, so the keys
don't drift and have different values.

But is there ever a time you want to do this?

THanks in advance

RL

Reply With Quote
  #2  
Old   
David Cressey
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-13-2007 , 06:14 AM







"raylopez99" <raylopez99 (AT) yahoo (DOT) com> wrote

Quote:
With a few hours of theory under my belt, I'd like to ask if there's
ever a time that you don't want a completely normalized dB, that is, a
normalized database being a dB that has no redundant information (my
understanding of what a normalized database is).

Or, is there ever a time that you want redundant keys (that is, the
same keys in many different tables, that obviously are not linked (in
a relationship) between two tables?). Having redundant attributes and/
or keys seems to me a very lazy way of designing a database that
doesn't require lots of initial thought, but of course you have to pay
for it by meticulously "synching" all redundant keys to one another
everytime there is a change in one of the redundant keys, so the keys
don't drift and have different values.

But is there ever a time you want to do this?

THanks in advance

RL
The answer is yes, there are times when a design is a good one, even if
less than fully normalized. For each normalization form, there is a known
set of anomalies that come up when you insert, update, or delete data in
that form. If you are willing and able to program around those anomalies,
and if the design yields benefits that justify that effort, it can be the
right thing to do. Learning when to normalize is more subtle than learning
how to normalize.

There is a particular form of database design, called "star schema" that
yields good results when used in a data mart or data warehouse situation. A
star schema mimics a multidimensional database in relational (or SQL) form.
A star schema follows design rules of its own, and those rules sometimes
contradict the rules of normalization. The up side of star schema is that
it's very easy to use with report generators, or with OLAP tools like
Cognos or Business Objects. The down side of star schema is that the
process of keeping the data current involves some fairly intricate
programming, and heavy use of computer resources.

Star schema, and other unnormalized or denormalized designs almost always
cost more than they are worth when used in a high transaction operational
setting, like OLTP.

Unfortunately, most deviations from normalization occur due to blunders, and
not due to well considered design decisions. Many deviations from
normalization occur because the designer is unfamiliar with some of the
normal forms. Back when I was building databases, I only really knew 1NF,
2NF, and 3NF. Update anomalies due to deviations from BCNF and beyond were
rare, but my design process would not have obviated them.

Another major cause of deviations from normalization is failure to
understand the data. In particular, the functional dependendencies inherent
in the data are not discovered during data analysis, and the design
unknowingly violates normalization rules. By the time this is discovered,
there is usually a large body of application code that is dependent of the
bad design.

Sometimes, denormalized design is the reult of sheer pigheadedness.






Reply With Quote
  #3  
Old   
-CELKO-
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-13-2007 , 08:12 AM




CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the
columns. Typical rows would look like this:

('Mr. Celko', 'Database 101', 222, 6)

The rules we want to enforce are:

1) A teacher is in only one room each period.
2) A teacher teaches only one class each period.
3) A room has only one class each period.
4) A room has only one teacher in it each period.

Stop reading and see what you come up with for an answer. Okay, now
consider using one constraint for each rule in the list, thus.

CREATE TABLE Schedule_1 -- version one, wrong!
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period), -- rule #3
UNIQUE (teacher, room, period), -- rule #4
PRIMARY KEY (teacher, class, room, period));

We know that there are 4 ways to pick three things from a set of four
things permutation. While column order is important in creating an
index, we can ignore it for now and then worry about index tuning
later in the book.

I could drop the PRIMARY KEY as redundant if I have all four of these
constraints in place. But what happens if I drop the PRIMARY KEY and
then one of the constraints?

CREATE TABLE Schedule_2 -- still wrong
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period)); -- rule #3

I can now insert these rows in the second version of the table:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)

This gives me a very tough sixth period class load since I have to be
in two different rooms at the same time. Things can get even worse
when another teacher is added to the schedule:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)
('Ms. Shields', 'Database 101', 223, 6)

Ms. Shields and I are both in room 223, trying to teach different
classes at the same time. Matthew Burr looked at the constraints and
the rules, came up with this analysis.

CREATE TABLE Schedule_3 -- corrected version
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher, period), -- rules #1 and #2
UNIQUE (room, period)); -- rules #3 and #4

If a teacher is in only one room each period, then given a period and
a teacher I should be able to determine only one room, i.e. room is
functionally dependent upon the combination of teacher and period.
Likewise, if a teacher teaches only one class each period, then class
is functionally dependent upon the combination of teacher and period.
The same thinking holds for the last two rules: class is functionally
dependent upon the combination of room and period, and teacher is
functionally dependent upon the combination of room and period.

With the constraints that were provided in the first version, you will
find that the rules are not enforced. For example, I could enter the
following rows:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)

These rows violate two of your rules, rule #1 and rule #2. However,
the unique constraints first provided in Schedule_2 do not capture
this violation and will allow the rows to be entered.

The constraint

UNIQUE (teacher, room, period)

is checking the complete combination of teacher, room, and period, and
since ('Mr. Celko', 222, 6) is different from ('Mr. Celko', 223, 6),
the DDL does not find any problem with both rows being entered, even
though that means that Mr. Celko is in more than one room during the
same period.

UNIQUE (teacher, class, period)

doesn't catch its associated rule either since ('Mr. Celko', 'Database
101', 6) is different from ('Mr. Celko', 'Database 102', 6), and so,
Mr. Celko is able to teach more than one class during the same period,
thus violating rule two. It seems that we'd also be able to add the
following row:

('Ms. Shields', 'Database 103', 222, 6)

which violates rules #3 and #4.

Reply With Quote
  #4  
Old   
Bob Badour
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keysOK? - 12-13-2007 , 11:30 AM



David Cressey wrote:

Quote:
"raylopez99" <raylopez99 (AT) yahoo (DOT) com> wrote in message
news:15b312ea-1f66-4f22-abbb-63581e0eca73 (AT) x69g2000hsx (DOT) googlegroups.com...

With a few hours of theory under my belt, I'd like to ask if there's
ever a time that you don't want a completely normalized dB, that is, a
normalized database being a dB that has no redundant information (my
understanding of what a normalized database is).

Or, is there ever a time that you want redundant keys (that is, the
same keys in many different tables, that obviously are not linked (in
a relationship) between two tables?). Having redundant attributes and/
or keys seems to me a very lazy way of designing a database that
doesn't require lots of initial thought, but of course you have to pay
for it by meticulously "synching" all redundant keys to one another
everytime there is a change in one of the redundant keys, so the keys
don't drift and have different values.

But is there ever a time you want to do this?

THanks in advance

RL

The answer is yes, there are times when a design is a good one, even if
less than fully normalized. For each normalization form, there is a known
set of anomalies that come up when you insert, update, or delete data in
that form. If you are willing and able to program around those anomalies,
and if the design yields benefits that justify that effort, it can be the
right thing to do. Learning when to normalize is more subtle than learning
how to normalize.

There is a particular form of database design, called "star schema" that
yields good results when used in a data mart or data warehouse situation. A
star schema mimics a multidimensional database in relational (or SQL) form.
A star schema follows design rules of its own, and those rules sometimes
contradict the rules of normalization. The up side of star schema is that
it's very easy to use with report generators, or with OLAP tools like
Cognos or Business Objects. The down side of star schema is that the
process of keeping the data current involves some fairly intricate
programming, and heavy use of computer resources.

Star schema, and other unnormalized or denormalized designs almost always
cost more than they are worth when used in a high transaction operational
setting, like OLTP.

Unfortunately, most deviations from normalization occur due to blunders, and
not due to well considered design decisions. Many deviations from
normalization occur because the designer is unfamiliar with some of the
normal forms. Back when I was building databases, I only really knew 1NF,
2NF, and 3NF. Update anomalies due to deviations from BCNF and beyond were
rare, but my design process would not have obviated them.

Another major cause of deviations from normalization is failure to
understand the data. In particular, the functional dependendencies inherent
in the data are not discovered during data analysis, and the design
unknowingly violates normalization rules. By the time this is discovered,
there is usually a large body of application code that is dependent of the
bad design.

Sometimes, denormalized design is the reult of sheer pigheadedness.
Don't listen to a work David says. Star schema was sold by Cognos and
Business Objects so their customers would have to do the work they
should have done in the first place.

I seldom see anyone 'denormalize' who is aware of the actual costs of
doing so. On the other hand, I have seen plenty of ignoramuses
'denormalize' when physical clustering for the same performance
characteristics was an available option.


Reply With Quote
  #5  
Old   
mAsterdam
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keysOK? - 12-13-2007 , 01:10 PM



raylopez99 schreef:

Quote:
With a few hours of theory under my belt, I'd like to ask if there's
ever a time that you don't want a completely normalized dB, that is, a
normalized database being a dB that has no redundant information (my
understanding of what a normalized database is).
This question is very well put.

First, about your understanding.
The goal of normalizing is not to get rid of redundancies but to get
rid of update anomalies. If, when in the real world your database is
modeling one thing changes and you have to change several pieces of
data in your database to keep it up to date, this means you have an
update anomaly.

Several (mostly increasingly strict) normal forms exist to check if
your set of relations suffers from some anomaly with recepies of
changes to it to make it comply. Just google for normal form if you
didn't already.

Now I'll make a distinction between a normalized model and a
normalized database. I have seen several teams of people perfectly
happy with heavily denormalized databases, such as the (heavily biased
to some types of query) star schemas David talks about. My feeling is
that they could only succeed because of the normalized models they had
to begin with.

Quote:
Or, is there ever a time that you want redundant keys (that is, the
same keys in many different tables, that obviously are not linked (in
a relationship) between two tables?). Having redundant attributes and/
or keys seems to me a very lazy way of designing a database that
doesn't require lots of initial thought, but of course you have to pay
for it by meticulously "synching" all redundant keys to one another
everytime there is a change in one of the redundant keys, so the keys
don't drift and have different values.
I do not completely understand what you are saying, but your
understanding does not seem very off-track to me.

Quote:
But is there ever a time you want to do this?
In short: if you are relieved of the burdon to keep your database up
to date, and your main interest is in only a subset of what your
database contains, you don't worry about normalization.

--
What you see depends on where you stand.


Reply With Quote
  #6  
Old   
Tony Rogerson
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 02:01 AM



Quote:
CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the
columns. Typical rows would look like this:

('Mr. Celko', 'Database 101', 222, 6)
And half way through term the teacher changed their name to 'Mrs Bunting'
and now everybody is confused!

Great example of why you should use an artificial key, I'll remember your
example for a blog entry I'm writing.

--
Tony Rogerson, SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson
[Ramblings from the field from a SQL consultant]
http://sqlserverfaq.com
[UK SQL User Community]



Reply With Quote
  #7  
Old   
raylopez99
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 02:26 AM



On Dec 13, 2:10 pm, mAsterdam <mAster... (AT) vrijdag (DOT) org> wrote:
Quote:
raylopez99 schreef:

With a few hours of theory under my belt, I'd like to ask if there's
ever a time that you don't want a completely normalized dB, that is, a
normalized database being a dB that has no redundant information (my
understanding of what a normalized database is).

This question is very well put.

First, about your understanding.
The goal of normalizing is not to get rid of redundancies but to get
rid of update anomalies. If, when in the real world your database is
modeling one thing changes and you have to change several pieces of
data in your database to keep it up to date, this means you have an
update anomaly.

Several (mostly increasingly strict) normal forms exist to check if
your set of relations suffers from some anomaly with recepies of
changes to it to make it comply. Just google for normal form if you
didn't already.
Thanks; I did Google it,and learned a bit about 1NF, 2NF, 3NF here:
http://defiant.yk.psu.edu/~lxn/IST_2...finitions.html

While I didn't really understand the differences between the three
types of NF (for one thing, it seems that they were dealing with a
situation where the primary key was a 'compound key' that depended on
several fields, and since I like to use GUIDs as the primary key
almost exclusively, I don't really suffer from this problem*), it also
seems that even with a fully normalized database (in 1,2 or 3NF), you
will always have an UPDATE problem. The 'solution' is to minimize
your hierarchy of UPDATES by employing a "star" configuration" or
"cluster", as suggested or implied by David Cressey. This is where
there is never more than two tables linked by the same key, which
makes programming easlier (that is, you don't have to program the
UPDATE anomalies recursively, but simply program the almost trivial
parent-child UPDATE case). But, as a practical matter, sometimes it's
hard to not have the same key propigate to a third table, so even as I
type this I realize this advice is not practical.

Anyway, thanks to everybody who replied, it seems that there's a lot
to learn in dBs.

RL

* as a further practical consideration, one dB I use, personally more
than anything else, is Access, which doesn't have compound keys to
begin with



Reply With Quote
  #8  
Old   
Roy Hann
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 02:35 AM




"Tony Rogerson" <tonyrogerson (AT) torver (DOT) net> wrote

Quote:
CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the
columns. Typical rows would look like this:

('Mr. Celko', 'Database 101', 222, 6)

And half way through term the teacher changed their name to 'Mrs Bunting'
and now everybody is confused!

Great example of why you should use an artificial key, I'll remember your
example for a blog entry I'm writing.
I'll remember this. It is a great example of why no one should bother
reading your blog.

Roy




Reply With Quote
  #9  
Old   
Tony Rogerson
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 06:16 AM



Quote:
I'll remember this. It is a great example of why no one should bother
reading your blog.
Oh, I see Roy; you agree with Celko's design.

Interesting....

--
Tony Rogerson, SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson
[Ramblings from the field from a SQL consultant]
http://sqlserverfaq.com
[UK SQL User Community]


Reply With Quote
  #10  
Old   
David Cressey
 
Posts: n/a

Default Re: Newbie question about db normalization theory: redundant keys OK? - 12-14-2007 , 07:07 AM




"raylopez99" <raylopez99 (AT) yahoo (DOT) com> wrote

Quote:
On Dec 13, 2:10 pm, mAsterdam <mAster... (AT) vrijdag (DOT) org> wrote:
raylopez99 schreef:

With a few hours of theory under my belt, I'd like to ask if there's
ever a time that you don't want a completely normalized dB, that is, a
normalized database being a dB that has no redundant information (my
understanding of what a normalized database is).

This question is very well put.

First, about your understanding.
The goal of normalizing is not to get rid of redundancies but to get
rid of update anomalies. If, when in the real world your database is
modeling one thing changes and you have to change several pieces of
data in your database to keep it up to date, this means you have an
update anomaly.

Several (mostly increasingly strict) normal forms exist to check if
your set of relations suffers from some anomaly with recepies of
changes to it to make it comply. Just google for normal form if you
didn't already.

Thanks; I did Google it,and learned a bit about 1NF, 2NF, 3NF here:
http://defiant.yk.psu.edu/~lxn/IST_2...finitions.html

While I didn't really understand the differences between the three
types of NF (for one thing, it seems that they were dealing with a
situation where the primary key was a 'compound key' that depended on
several fields, and since I like to use GUIDs as the primary key
almost exclusively, I don't really suffer from this problem*), it also
seems that even with a fully normalized database (in 1,2 or 3NF), you
will always have an UPDATE problem. The 'solution' is to minimize
your hierarchy of UPDATES by employing a "star" configuration" or
"cluster", as suggested or implied by David Cressey. This is where
there is never more than two tables linked by the same key, which
makes programming easlier (that is, you don't have to program the
UPDATE anomalies recursively, but simply program the almost trivial
parent-child UPDATE case). But, as a practical matter, sometimes it's
hard to not have the same key propigate to a third table, so even as I
type this I realize this advice is not practical.
A few comments here:

2NF applies only to compound keys. A table that is in 1NF and has only a
simple key is in 2NF necessarily. 1NF and 3NF can apply in situations where
the key is a simple key.

Be aware that the definition of 1NF changed rather profoundly since the
1980s (when I learned relational databases). By the definition used by
Date & Darwen a relation is always in 1NF. The same is true of a relvar.
Hence if your model is a relational one, the issue of 1NF is moot, in the
D&D framework. The D&D framework is widely accepted.

I never claimed that star schemas would be free from UPDATE anomalies. I am
not sure of the case you outline above, but I most certainly can come up
with update problems that affect star schemas, but do not affect fully
normalized databases. My claims of star schema have to do with ease of
access, not ease of update. And the "ease" I'm talking about is largely a
matter of perceived difference rather than logical difference.


1NF, 2NF, and 3NF are not the end of the story. In addition, there are
BCNF, 4NF, 5NF, and a final normal form called domain-key normal form.
I've een references to 6NF in this newsgroup. As far as a can tell, 6NF is
the same thing as domain-key normal form.


Quote:
Anyway, thanks to everybody who replied, it seems that there's a lot
to learn in dBs.

You bet.

Quote:
RL

* as a further practical consideration, one dB I use, personally more
than anything else, is Access, which doesn't have compound keys to
begin with

I beg to differ. MS Access has supported compound keys since Access 97, and
maybe earlier. (I'm just coming up to speed on Access myself, having used
more classical SQL products.) If you buy the design of either the
"Northwind" database, or any of the databases that the wizards can create
for you, you are going to learn several unfortunate design habits.

The absence of compound keys is just one of them.





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.