dbTalk Databases Forums  

Re: foreign key constraint versus referential integrity constraint

comp.databases.theory comp.databases.theory


Discuss Re: foreign key constraint versus referential integrity constraint in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #21  
Old   
Cimode
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 02:24 AM






On 25 oct, 04:56, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:
Quote:
paul c wrote:
Bob Badour wrote:
...

You asked about semantic joinability. Semantically, the join works if
like-named attributes have an equality comparison defined and not if
they don't.
...

In RT, how could they not define equality?

What reason would one have to define an equality comparison to compare
employee ids with department ids or with names? Or charge with current
or potential? Or age with price?
Hi paul,

A valid set of questions (pun intended). My conclusion were that Codd
was focused onto defining a side of the relational model that would be
part of information system as opposed as being a part of mathematics
since relational domain analysis is more loosely coupled with set
theory than relational general model. While relational domain analysis
is not mandatory in the context of the general relational model, it is
mandatory in the context of implementation of RM. It even makes
header based definitions somehow moot...IMHO...

Regards...

Reply With Quote
  #22  
Old   
compdb@hotmail.com
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 03:08 AM






On Oct 24, 8:56*pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:
Quote:
What reason would one have to define an equality comparison to compare
employee ids with department ids or with names? Or charge with current
or potential? Or age with price?
shoot your age and you are worth your weight in gold.

Reply With Quote
  #23  
Old   
Bob Badour
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 09:40 AM



paul c wrote:

Quote:
Bob Badour wrote:

paul c wrote:

Bob Badour wrote:
...

You asked about semantic joinability. Semantically, the join works
if like-named attributes have an equality comparison defined and not
if they don't.
...

In RT, how could they not define equality?

What reason would one have to define an equality comparison to compare
employee ids with department ids or with names? Or charge with current
or potential? Or age with price?

None in theory, I thought you were talking about attributes in the same
domain.
Nope. Natural join doesn't care about attributes in the same domain per
se. It compares like named attributes regardless of domain.

TTM explains when and how the most specific types can differ and still
have an equality comparison return true: they must share some common
supertype other than the universal supertype. In this case, the MST of
the join will be the most specific common supertype. If the MSTs of the
original relations are mutually exclusive, the body will be empty. If
not, the join body will consist of values drawn from subtypes of both
MSTs and that appear in both relations.

Otherwise, the body of the join will be empty and the type of the like
named attribute will be the universal supertype. As a type with a set of
values and a set of operations, the universal supertype's set of values
is the union of all values, and its set of operations is empty.

I expect most implementations would treat an attribute with the
universal supertype as an error, or at least complain loudly with a warning.

Reply With Quote
  #24  
Old   
Bob Badour
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 09:44 AM



compdb (AT) hotmail (DOT) com wrote:

Quote:
On Oct 24, 8:56 pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:

What reason would one have to define an equality comparison to compare
employee ids with department ids or with names? Or charge with current
or potential? Or age with price?

shoot your age and you are worth your weight in gold.
My age in what measure? Years? Integral or rational? If integral,
cardinal or ordinal? According to what calendar?

Reply With Quote
  #25  
Old   
paul c
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 02:55 PM



Bob Badour wrote:
Quote:
paul c wrote:

Bob Badour wrote:

paul c wrote:

Bob Badour wrote:
...

You asked about semantic joinability. Semantically, the join works
if like-named attributes have an equality comparison defined and
not if they don't.
...

In RT, how could they not define equality?

What reason would one have to define an equality comparison to
compare employee ids with department ids or with names? Or charge
with current or potential? Or age with price?

None in theory, I thought you were talking about attributes in the
same domain.

Nope. Natural join doesn't care about attributes in the same domain per
se. It compares like named attributes regardless of domain.

TTM explains when and how the most specific types can differ and still
have an equality comparison return true: they must share some common
supertype other than the universal supertype. In this case, the MST of
the join will be the most specific common supertype. If the MSTs of the
original relations are mutually exclusive, the body will be empty. If
not, the join body will consist of values drawn from subtypes of both
MSTs and that appear in both relations.

Otherwise, the body of the join will be empty and the type of the like
named attribute will be the universal supertype. As a type with a set of
values and a set of operations, the universal supertype's set of values
is the union of all values, and its set of operations is empty.

I expect most implementations would treat an attribute with the
universal supertype as an error, or at least complain loudly with a
warning.
I can't make much comment on type theory, which I think is optional, not
essential to RT. Appendix A, the relational underpinning of TTM,
doesn't depend on MST's or inheritance, just set membership except that
set equality crops up for unions of headings, as far as I can tell. It
does say, eg., in the <AND> definition: "It is required that if <A,T1>
'is member of' Hr1 and <A,T2> 'is member of' Hr2, then T1 = T2" (edited
quote). I presume those qualified attributes are the "like named" ones
you mean. While I wouldn't argue when you say "it compares like named
attributes regardless of domain", TTM does require that the "state is
set" beforehand, Appendix A uses the qualification to ensure that the
type for both attributes is the same.

Reply With Quote
  #26  
Old   
Bob Badour
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 03:07 PM



paul c wrote:

Quote:
Bob Badour wrote:

paul c wrote:

Bob Badour wrote:

paul c wrote:

Bob Badour wrote:
...

You asked about semantic joinability. Semantically, the join works
if like-named attributes have an equality comparison defined and
not if they don't.
...


In RT, how could they not define equality?


What reason would one have to define an equality comparison to
compare employee ids with department ids or with names? Or charge
with current or potential? Or age with price?


None in theory, I thought you were talking about attributes in the
same domain.


Nope. Natural join doesn't care about attributes in the same domain
per se. It compares like named attributes regardless of domain.

TTM explains when and how the most specific types can differ and still
have an equality comparison return true: they must share some common
supertype other than the universal supertype. In this case, the MST of
the join will be the most specific common supertype. If the MSTs of
the original relations are mutually exclusive, the body will be empty.
If not, the join body will consist of values drawn from subtypes of
both MSTs and that appear in both relations.

Otherwise, the body of the join will be empty and the type of the like
named attribute will be the universal supertype. As a type with a set
of values and a set of operations, the universal supertype's set of
values is the union of all values, and its set of operations is empty.

I expect most implementations would treat an attribute with the
universal supertype as an error, or at least complain loudly with a
warning.

I can't make much comment on type theory, which I think is optional, not
essential to RT. Appendix A, the relational underpinning of TTM,
doesn't depend on MST's or inheritance, just set membership except that
set equality crops up for unions of headings, as far as I can tell. It
does say, eg., in the <AND> definition: "It is required that if <A,T1
'is member of' Hr1 and <A,T2> 'is member of' Hr2, then T1 = T2" (edited
quote). I presume those qualified attributes are the "like named" ones
you mean. While I wouldn't argue when you say "it compares like named
attributes regardless of domain", TTM does require that the "state is
set" beforehand, Appendix A uses the qualification to ensure that the
type for both attributes is the same.
Which still leaves Keith with the same answer: data types or domains
determine the semantics of joinability. In a system that supports
specialization and generalization, the attributes have to share a common
supertype. In a system that does not support specialization and
generalization, the types must be the same.

Reply With Quote
  #27  
Old   
Marshall
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 05:05 PM



On Oct 24, 10:53 am, Keith H Duggar <dug... (AT) alum (DOT) mit.edu> wrote:
Quote:
Anyhow, the question here is not one of our imagination but rather
simply this: if it makes sense for the RM to support constraints
on relational /values/ (taken on by variables) why does it not
make sense to support constraints on relational /expressions/?
That is a question of general principle not specific design.
This question, it seems to me, is clear and to the point.
And I would answer it by saying that we shouldn't really
even make the distinction! (At least not formally.)

There are quite a lot of constraint taxonomies out there,
and I haven't ever really been able to derive any particular
value of any of them. Most generally, ever constraint
is a constraint on the entire database; if the constraint
mentions only a single variable, then we might say
it is a "table constraint" or "relvar constraint" or some
such, but that's only a degenerate case of a database
constraint. So let us say there is only one type of
constraint, the database constraint, and be done with it.

Now that I've said that, let me take it back a bit.
Transition constraints don't seem to be quite like
anything else. Transition constraints are therefore
annoying. :-)

It's Sunday and I'm hot, so I'm posting with
reckless abandon. Read my sentences at your
own risk.


Marshall

Reply With Quote
  #28  
Old   
Marshall
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 05:09 PM



On Oct 24, 12:13*pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:
Quote:
Domains, however, define what is or is not joinable because natural join
applies the equality comparison to attributes with like names, and
domains define equality comparisons.
That may be the most practical perspective, but from a theoretical
perspective, I find myself lately thinking that every value of every
domain is theoretically able to be compared for equality with
every value of every domain. If we are considering comparing
an integer and a string, we can say with confidence ahead of
time that the comparison will return that they are unequal.

I also am suspicious of the idea of allowing a domain designer
to define equality. From a theoretical standpoint, the system
ought to know enough to figure it out, and from a practical
standpoint, I've seen enough class designers screw it up
with subtle and annoying consequences. Damn them!


Marshall

(Still hot; still grumpy.)

Reply With Quote
  #29  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 05:47 PM



Quote:
Hmm. Usually we talk about constraints when they're syntactic, because
that's what makes it possible to enforce them automatically, simply by
manipulating the symbols in the database. So what you actually mean
is, what syntactic construct is appropriate for enforcing the
semantics you describe. I'd argue that that's what domains were
invented for.

I agree that it is possible to enforce such join semantics with
domains but I'm not sure it is entirely appropriate. I say that
because to the greatest extent the RM is orthogonal to domains.
To quote CJ Date "Databases In Depth" Chapter 2 Summary:

"It's a very common misconception that the relational model deals
only with rather simple types: numbers, strings, perhaps dates
and times, and not much else. In this chapter, I've tried to show
that this is indeed a misconception. Rather, relations can have
attributes of /any type whatsoever/ -- the relational model
nowhere predicates what those types must be, and in fact they
can be as complex as we like ... In other words, the question as
to what types are supported is orthogonal to the question of
support for the relational model itself. Or (less precisely but
more catchily): /types are orthogonal to tables/."

To put this another way, the RM should be just as complete and
effective for a universe with a single domain as it is for a
universe with a rich domain set (such as you propose to handle
my GenericID join constraint example). And placing constraints
on /relational expressions/ seems as fundamental to the RM as
placing constraints on /relational values/.

By pushing such expression constraints off to domains, we are
admitting that the RM itself has no support for constraining
relational expressions and I'm not prepared to admit that
limitation. Especially since it does provide support for
constraining the values of relational variables.

I will also note that both Codd and Date proposed RM mechanisms
to operate the other way around ie to allow designers to force
cross-domain operations. Codd with "Domain Check Overrides" and
date with the THE_ operators that provide type coercion.

Domains, however, define what is or is not joinable because natural join
applies the equality comparison to attributes with like names, and
domains define equality comparisons.

Domains also define a set of permissible values for attributes.
However, that doesn't stop RM from adding additional constraints
to limit those values further.

You asked about semantic joinability. Semantically, the join works if
like-named attributes have an equality comparison defined and not if
they don't.

Why should relational operations
be any different in this respect from relational variables? Ie
why shouldn't the RM allow us to further constrain what domains
permit for both values AND operations?

Because the RM doesn't have anything much to say about domains. Domains
are a separate--albeit vitally important--concern.

Reply With Quote
  #30  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-25-2009 , 05:54 PM



On Oct 24, 7:53 pm, Keith H Duggar <dug... (AT) alum (DOT) mit.edu> wrote:

(Sorry for the resend, Google Groups seems to be taunting me...)

Quote:
I agree that it is possible to enforce such join semantics with
domains but I'm not sure it is entirely appropriate.
Uhm, being the practical kind of guy, I wonder where you're going with
this. I mean, you too agree that domains/types do get the job done.
Was there something else you wanted in addition to that?

Quote:
To put this another way, the RM should be just as complete and
effective for a universe with a single domain as it is for a
universe with a rich domain set (such as you propose to handle
my GenericID join constraint example).
And it is. There is no reason why the relational model wouldn't work
just fine with only a single type, such as an integer. (Encoding then,
say, textual data, would become somewhat convoluted, but it could be
done.) So the type system is indeed completely orthogonal to the
model.

However, that only means that when we model data, the relational model
by itself isn't sufficient to fully handle all of our conceptual
modelling needs. It's a very general theory, and as such it can
benefit from the addition of more specific constraints and semantics,
for example in the form of a rich type system.

From this point of view I would then say that your original question
about joinability is something that sits squarely within the type
system, and thus is itself fully orthogonal to the relational model.
Trying to somehow make the skeletal relational model responsible for
enforcing such constraints seems much akin to asking why the hammer
ain't working too well with the cream, right after casting aside the
whisk.

Quote:
By pushing such expression constraints off to domains, we are
admitting that the RM itself has no support for constraining
relational expressions and I'm not prepared to admit that
limitation. Especially since it does provide support for
constraining the values of relational variables.
To reiterate, it seems to me that you're now trying to have your cake
and eat it too. First you say that domains, a particular
implementation of a type system, are outside of the relational model.
Then you go on to claim that the relational model has a problem
because in this amputated form it cannot handle a problem that is
caused by the lack of a type system.

The way I see it, both Date and Codd advocate a view of the relational
model which already includes a rich type system, and actively utilizes
it to express e.g. join semantics. Domains, they're Codd's particular
idea of how to handle typing in a form that actually reflects much of
the relevant metadata about the types back into a shared encoding that
is accessible under relational semantics. Granted, there will be other
ways of handling types, but Codd's idea doesn't seem all that bad to
me, and is also amenable to straightforward extension should we find
it somehow lacking.

Quote:
I will also note that both Codd and Date proposed RM mechanisms
to operate the other way around ie to allow designers to force
cross-domain operations. Codd with "Domain Check Overrides" and
date with the THE_ operators that provide type coercion.
True, but I would be inclined to believe that in practice those
mechanisms would only be used to circumvent modelling errors, before
the conceptual model was changed to explicitly allow the join. Most
commonly that would probably include the addition of a common
supertype, which Bob Badour already mentioned as a necessary
prerequisite for joinability. Under the domain model, this fact would
also be explicitly recorded in the data dictionary.

Quote:
I think there are other views, see above.
As I said, I could think of "at least two" ways to view it. However, I
don't think your quotation from Date falls outside of either of them.

Quote:
In short, we should not let our lack of imagination guide our design principles.
There we agree fully. My point is just that a) yes, joinability is
something that probably should be explicitly constrained in the
database, b) domains at least get the job done, c) unlike abstract
typing, domains also reflect type metadata back into a formal, shared
and accessible form within the database which is a clear plus, so that
d) in the absence of a viable alternative, they or something closely
related to them is what we should use until we think of something even
better.

Quote:
It's not for us to wonder such whys because our imagination is
usually quite limited. For example, GenericID seems an entirely
reasonable thing to me if I labelled every item I sent out (even
the coupons) with a barcode and have other relations expressing
facts about those barcodes apart from what they label.
In that case, it's the systematic barcoding of certain real world
items that makes the union type sensible. The union type could then be
given the name of BarcodedItem, and could perhaps have its own
dependent data, like type of barcode, printing method, and so on.

My point was simply that you have to have a valid, real life/business
reason, or semantic justification for the union type/generalization.
Since you just gave one, I'm happy with that. At the same time you
should then note that under these semantics, you cannot add to the
union type willy nilly. For example including people in it is out of
the picture, unless you actually insist on branding your customers
with a barcode.

Quote:
Anyhow, the question here is not one of our imagination but rather
simply this: if it makes sense for the RM to support constraints
on relational /values/ (taken on by variables) why does it not
make sense to support constraints on relational /expressions/?
I don't really see what the connection with this and the above is, but
again we're in full agreement: indeed support for constraints on
relational expressions should be available. And it in fact would be if
RDBMS vendors just implemented generic support of SQL assertions.

Quote:
If you have the means of keeping such ID's private -- this would
necessitate making the ID's fully opaque from an outside user's point
of view, which cannot be done in any DBMS I know of -- you might just
be able to justify their existence as surrogates. But this is still
stuff that requires tremendous discipline and insight into the entity
integrity issues that need to be addressed in addition to the
referential integrity between the surrogates. I wouldn't recommend
going down this slippery slope unless you're *absolutely* sure you
know what you're doing -- I haven't seen a single wider scale
deployment which got this right and didn't suffer integrity issues in
the long run.

Frankly I don't understand the point of the above paragraph at
all.
The point is that when you start creating union types, and so the keys
used to them willy nilly, you tend to rapidly get to the point where
the resulting space of identifiers has no semantic relevance. What you
then have is a space of invented, autogenerated, and so on,
identifiers. Those cannot be correlated against the real world
entities they represent in any way except by reference to the data
dependent on them. What you then have is surrogate keys, and if you
take a look at RM/T and RM/V2, Codd makes an eloquent case against
revealing them to the users. Because making them truly opaque is then
not supported by current RDBMSes, I would hesitate to use them at all.

Quote:
Also I don't see how entity integrity enters into this at
all; I'm assuming we follow the sage advice of having no NULLs
to begin with; so entity integrity is non-issue.
Heh, I did say going that way requires tremendous discipline and
insight...

At its most general, entity integrity has nothing to do with nulls.
Rather it has to do with whether you can reliably correlate the data
in your database with real life facts. It is essentially a semantic
construct which tries to ensure that it is possible to unambiguously
update the contents of your database against the changing reality
we're trying to capture/encode.

This is not a process that can be done strictly at the logical,
relational level. All we can do there is to to put up syntactic
constraints to help in the process along. The main, commonest,
simplest, and especially easiest to implement example is the key. It
tries to capture the semantics of real life things/entities a) always
having a certain set of characteristics we can use to correlate them
with the data we have on them (i.e. the not null syntactic
constraint), and b) being separable from each other once we're privy
to that set of characteristics (i.e. the uniqueness constraint).

That means that entity integrity is only certain to be maintained when
a) the set of facts we actually have on real life objects/concepts is
sufficient to uniquely identify their representation in the database,
b) in full generality, that does *not* mean that all of the data on
the primary key is available, only that all of the data required to
identify any candidate key is present, and that in case c) does not
mean that the identification has to take a form that only involves one
table; the data might be distributed all over your database, and in
general will be if you're using something other than full, natural
keys to link your data together.

Maybe a recycled example could make this clearer... Suppose you have
persons and the cars that they own. You're keying them both using
surrogates, which obviously have no real world significance because
you cannot take a look at a person or a car and immediately read off
the corresponding surrogate. (Tattooing the number on each person or
carving it into the manufacturer's stamp in a car would make the key
natural, and not a surrogate.) You can clearly have a not null and
unique primary key for each of the entities, but that no longer
guarantees entity integrity: there might be more than one person
called John Smith, and more than two cars with the make of Audi. In
order to maintain entity integrity, the syntactic constraint that the
surrogates be not null and unique is not sufficient; when you try to
update your database to reflect reality, you don't necessarily know
which John Smith is which, and certainly you can't tell one Audi apart
from the other. In particular, you don't have any reliable means of
stopping somebody from adding yet another John Smith or Audi that are
already described in your database; suddenly duplicates and the
resulting update anomalies will abound.

So the first lesson is that surrogates don't really work towards
entity integrity. Only keys referring to real life, measurable/
observable things, that is natural keys, do that.

Then, secondly, there is the more complicated case of distributed
information over multiple tables. Suppose that mess of surrogates you
have tells you that there are two John Smith's, which cannot be easily
told apart from each other. But then you notice that doing a join
against the table of vehicles owned tells you that the two Johns own
mutually disjoint sets of vehicles, which you *can* tell apart from
each other by their licence plate numbers (i.e. a natural key).
Suddenly you can tell those people apart at some level of certainty --
and counterwise if the sets overlapped, you could be perfectly certain
that entity integrity has been violated, the two Johns are actually
the same person, their records need to be unified/deduplicated, and
the update has to be cascaded into the rest of the database to reflect
the new knowledge.

Here the lesson is that keys and the related single table reasoning
aren't all there is to entity integrity, and that surrogates usually
make the situation so complicated that establishing entity integrity
even at the syntactic level can easily force you to scan through your
entire database. Which of course nobody ever does when using
surrogates, which then almost surely leads to integrity issues in the
long run.

Formally speaking, then, entity integrity is an inseparable part of
integrity as a whole, and it involves not just syntax but semantics as
well; especially model theory. Since a database essentially contains a
logical theory, the first part is about internal consistency; the
theory has to be consistent in order to have a real life model in the
first place. The second, more difficult part is that the theory also
has to be refined enough to exclude real life models which are
inappropriate, such as the one where we have one John Smith in the
database and a million of them in the wild, with no means of telling
which one is which at the logical level. And finally, third, the
hardest part is to strike the proper engineering balance between
maintainable detail, availability of real life identifying data and
the processes of efficiently correlating reality with the symbolic
universe represented by the database.

More often than not, then, surrogates and the improper generalizations
they imply hinder this process. Most often by creating a false sense
of complacency which obscures the real, underlying, logical issues
which need to be addressed. Then the result is cumulative damage to
overall integrity via simple laziness and neglect.

Quote:
I don't think I'm talking about "physical level" types (posreps)
at all.
You seem to be, because your conflating the two types of identifiers,
simply because they're, well, "identifiers". That don't mean that they
semantically are of the same type, eventhough they might look like
that when they carry the same physical type (posrep).

Quote:
Consider Date's point that RM is orthogonal to type support.
Then what argument from principle do you have for prohibiting
the RM from defining constraints on relational expressions by
pushing that responsibility off to domain support?
None. Thus I think we're actually in what they call "vigorous
agreement".

Quote:
How would we express the constraints in a universe with one domain?
In a highly convoluted fashion. Such a convoluted fashion that I'm not
inclined to give an example. Nor would I probably be inclined to go
through such an exercise when designing a database.

That, to me, suggests that types and domains simply make life easier,
and should be actively utilized when modelling data.
--
Sampo

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.