dbTalk Databases Forums  

Re: foreign key constraint versus referential integrity constraint

comp.databases.theory comp.databases.theory


Discuss Re: foreign key constraint versus referential integrity constraint in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-21-2009 , 11:05 AM






Quote:
So Coupons may or may not be redeemed so there may be CouponID
appearing in rvOrders that have no entry in rvRedemptions.

Now it seems to me that 1) conceptually to me rvOrders:CouponID
is a foreign key to rvRedemptions
You have the direction wrong. Coupons are created and go out with
orders, so the set of coupons defined in rvOrders is the total set
known of by the system and serves to define the associated domain.
rvRedemptions has the set of coupons that have been redeemed, and of
course it's not possible to redeem an inexisting coupon. So there's an
inclusion dependency from rvRedemptions to rvOrders, which is then
enforced using a foreign key, in that direction. Notice that a foreign
key can't be used to enforce this constraint unless CouponID is a
candidate key of rvRedemptions.

Quote:
Or is a foreign key constraint a synonym for a referential integrity constraint?
It is an SQL mechanism, and not particularly powerful or convenient
one at that, to implement referential integrity constraints. On the
theoretical side the slightly more general concept of inclusion
dependency is used instead of referential integrity constraint; it
additionally applies to the case where multiple such constraints can
have arbitrary overlap in their attributes, and the "target" side need
not constitute a candidate key. The definition is "one projection must
remain the subset of another".
--
Sampo

Reply With Quote
  #2  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-21-2009 , 11:18 AM






Quote:
I prefer the term "inclusion dependency": projection of one relation
(that is rvRedemptions v [CouponID]) is a supepersetset of projection
of the other (i.e. rvOrders v [CouponID]). I thought that all three
terms are the same; perhaps with foreign key constraint adding some
insignificant matter, like the "smaller" set being unique.
BTW, one limitation of foreign keys which I find particularly annoying
is that they only work when we're talking about base tables whereas
I've already bumped a few times into a situation where I would have
liked to constrain (on) the contents of a view. That sometimes happens
when you have to go beyond 3NF or you're working with a conceptual
model which allows multiple inheritance and/or union types. Do you
happen to know whether this sort of thing is formally covered by the
concept of inclusion dependency?
--
Sampo

Reply With Quote
  #3  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-21-2009 , 05:36 PM



Quote:
(Since I first saw Codd's 1970 paper I've never been able to understand
why on the one hand he talked about "symmetrical" access but on the
other he used the very restrictive term "foreign key".) *For me, a
"reference" is a reference even if most dbms' have similar restrictions
to Codd's term.
I think the later talk about domains and e-relations in RM/T (and to a
lesser extent in RM/V2) makes that a bit easier to understand. As I
interpret it, Codd started with a very clean and purely syntactic
model in his seminal article, which will definitely lead to pure
symmetry in access. But from the very start he also worried deeply
about the semantics and making them explicit as metadata, which leads
to the second influential discovery of his: the integral, reflective
data dictionary. Later on his development of the relational model not
only added surrounding functionality (which weighs in so heavily in RM/
V2, what with all of'em T-joins and the lot) but semantics as well.

I think what was driving Codd was the need to make his model fully
self-contained. If you really think about it, that tends to exclude
any and all free-form input by the user. I mean, let's say you let an
end user input a free form name. It's pretty much guaranteed that lots
of nonsense is going to be input then. After that the name field
doesn't really constitute a well-defined domain at all; to a
mathematician like Codd its proper type should probably be more or
less 'general-ape-shit;no-real-type;use-at-own-risk'. I.e. the field
could be straightforwardly likened to a blob of text as we call the
more sustained kind of drivel today; certainly not something one would
make into a domain, and not something which would fit Codd's neat idea
of formal, repetitive business data.

So, what he did was to go the way we'd now call Master Data
Management. He went on to separate all of the data we'd intuitively
call entities, and more formally would define as a) relatively stable
data b) centrally and well managed as a fully controlled vocabulary,
so that c) applying the closed world assumption to it is actually well
founded because there is a formal process to keep the real world and
the minimal model of the database trying to approximate it in close
correspondence, d) attaching to things that have real, societally
established permanence, uniqueness and omnipresence which can be
likened to "a candidate key on an established class of real world
entities which is at most slowly varying", and most of all e)
discrete, enumerated data that is mostly used to classify/qualify
other data, the latter usually being composed of abstract measurements
on a continuous scale (originally account balances, given that we're
after all talking about developments which happened within the
International *Business* Machines Corporation). What we'd nowadays
call "complex", "semistructured", "unstructured", "unconventional" or
even "nasty" data was explicitly excluded by Codd from the very
beginning.

That then lead to a sort of revival of the earlier intuitive idea of
entities as opposed to relationships. Only this time there was a clear
distinction in the criteria: relations were always relations, and they
represented a theory of the outlying reality that was perfectly
symmetric. Nevertheless, the connection to the real world was also
taken into account in that some relations had additional semantics
bolted on in addition to just the logical constraints. They referred
to real world entities with the kinds of properties that would make it
especially easy to keep the correspondence between the minimal model
of the DB and the real world the DB was trying to represent in synch.

The difference to today's MDM and OLAP-dimension frenzy was then that
the logical level design criteria were fully unaffected by such
semantic, real world concerns. They remained completely syntactic, as
they should have. The only thing added was the idea that some
relations which happened to have a certain commonly reoccurring
structure, and in addition were meant to semantically correspond to
real life in what was also a commonplace correspondence, could have
those facts formally notarized in the data dictionary, using a well-
known syntax.

That, and only that, was what made a relation an e-relation. You
neither could nor had to reference the e-relation, but still its
semantics necessarily made it so that a) it made no sense to refer to
any real life modelable thing that wasn't already modeled by the e-
relation, b) thus semantically speaking we had an inclusion dependency
towards the e-relation already, c) of course all such semantic
constraints should be enforced if possible, d) a foreign key
constraint is pretty much The Mechanism to do that in the databases of
the past couple of decades, and finally e) foreign keys seem an awful
lot like unidirectional links, eventhough they could and should be a
lot more besides.

So, in addition to the semantics, we also have the direct implication
that an e-relation fully defines a domain (though not all domains need
to be defined via such enumeration). Then it's quite clear that all
all relations referring in any way to the kind of real life object the
e-relation represents, should be limited to utilizing the
corresponding domain/type determined by it as the type of the
attribute. (In SQL databases that means declaring foreign keys
referring to primary keys. In my ideal database it would amount to
declaring domains, and in addition to those also full inclusion
dependencies which could span any combination of attributes on both
sides, perhaps even using views in the middle to cater for the more
complex cases; I take constraints rather seriously.)

At the logical level all that was done was to declare some
dependencies, syntactic constraints, and telling a little bit about
how to read/update the data to a live person, using a controlled
vocabulary to do so (e.g. "social security number is the primary key
and is supposed to uniquely identify persons; thus if you have a
duplicate number, it's semantically fucked, leaving you only the
option of gathering even more data, until you can again make the
minimal model of the database correspond to reality").

But of course the all-round inclusion dependency also closely mimics
earlier referential semantics in its effects, and the usual way we
tend to use data typically reinforces the notion. You rarely see the
symmetry ("there is no link, there are just equal values"). Still, I
can tell you that when you do ad hoc OLAP queries long enough, you'll
eventually find yourself starting the query "against the tide". That
is then when you finally get it -- you simply couldn't have done it in
the presence of a directional link without "going through the whole
fucking thing using a for-loop" -- and off you are, into the
relational brotherhood.

Quote:
It seems arbitrary and unnecessarily doctrinaire to
restrict references to only primary keys and unique indexes because I
think if one wants a particular constraint, one should not have specify
two constraints, eg., a 'reference' as well as a key or index. *This
doesn't prevent a language from offering shortcuts that are understood
to involve both.)
That is arbitrary indeed, and one of the few things where genuine
progress has been made in the theory of dependency and normalization
on the issues Codd originated. I also think that it is one of the very
few points where Codd as well fell victim to the earlier record
centric thought -- "referential integrity" often implies the mentality
that we're "linking" from some dependent data to stuff that "is
defined and primarily lives at another place". That is precisely the
record and index centric way of doing things, whereas the central
relational innovations were to a) place such references squarely at
the logical level, which makes them fully symmetric (i.e. in the
current case we might as well say that rvOrders refers to
rvRedemptions as we could say the opposite; it's not about directional
linking, but about referring to the same logical entity in two
different places at once, and just defining the propositional
semantics correctly), b) focus attention on the formal properties,
modelling and constraints (e.g. inclusion dependencies) within the
data instead of playing by real-world intuition alone, and c)
abolishing even the idea that base entities and relationships should
somehow be separated (i.e. what matters is only the formally
verifiable, syntactic dependencies that will necessarily result from
faithfully modeling the data, which then reveals that relationships
and base entities really behave much the same way).
--
Sampo

Reply With Quote
  #4  
Old   
paul c
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-21-2009 , 07:47 PM



Sampo Syreeni wrote:
....
Quote:
I think the later talk about domains and e-relations in RM/T (and to a
lesser extent in RM/V2) makes that a bit easier to understand. As I
interpret it, Codd started with a very clean and purely syntactic
model in his seminal article, which will definitely lead to pure
symmetry in access. But from the very start he also worried deeply
about the semantics and making them explicit as metadata, which leads
to the second influential discovery of his: the integral, reflective
data dictionary. Later on his development of the relational model not
only added surrounding functionality (which weighs in so heavily in RM/
V2, what with all of'em T-joins and the lot) but semantics as well.
...
Regarding semantics, far be it from me to try to describe Codd's
motives. While I think it's certain he later on veered from his
original principles, I think no one will ever know if there were
personal or more general reasons that drove him to the so-called
"semantic model", I suspect it was because of various forces that we are
all subject to from time to time. For all I know he might have been
bored with the lack of uptake of his ideas and thought a little
speculation might stimulate progress. Unfortunately most people, who
pretended to read his stuff, including apparently the original system R
implementors, chose to fasten on one sentence or paragraph, sometimes
taking that as the pre-eminent gospel and ignoring the rest. What he
wrote is either full of caveats between the lines or extremely
case-based from the get-go. Pretty much every quote of Codd's I see is
out-of-context.

Reply With Quote
  #5  
Old   
Keith H Duggar
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-21-2009 , 09:54 PM



On Oct 21, 12:18 pm, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:
Quote:
I prefer the term "inclusion dependency": projection of one relation
(that is rvRedemptions v [CouponID]) is a supepersetset of projection
of the other (i.e. rvOrders v [CouponID]). I thought that all three
terms are the same; perhaps with foreign key constraint adding some
insignificant matter, like the "smaller" set being unique.

BTW, one limitation of foreign keys which I find particularly annoying
is that they only work when we're talking about base tables whereas
I've already bumped a few times into a situation where I would have
liked to constrain (on) the contents of a view. That sometimes happens
when you have to go beyond 3NF or you're working with a conceptual
model which allows multiple inheritance and/or union types. Do you
happen to know whether this sort of thing is formally covered by the
concept of inclusion dependency?
I'm guessing this is just a limitation of some particular products?
Because if I'm understanding "The Principle of Interchangeability"
that for example Date's discusses in "Databases In Depth" then the
RM has nothing to say about the arbitrary distinction between base
versus virtual relvars (views). So in principle one should be able
to define constraints on any relvars base or otherwise at least in
so far as the RM is concerned. Is this correct?

KHD

Reply With Quote
  #6  
Old   
paul c
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-21-2009 , 09:57 PM



Keith H Duggar wrote:
Quote:
On Oct 21, 12:18 pm, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:
I prefer the term "inclusion dependency": projection of one relation
(that is rvRedemptions v [CouponID]) is a supepersetset of projection
of the other (i.e. rvOrders v [CouponID]). I thought that all three
terms are the same; perhaps with foreign key constraint adding some
insignificant matter, like the "smaller" set being unique.
BTW, one limitation of foreign keys which I find particularly annoying
is that they only work when we're talking about base tables whereas
I've already bumped a few times into a situation where I would have
liked to constrain (on) the contents of a view. That sometimes happens
when you have to go beyond 3NF or you're working with a conceptual
model which allows multiple inheritance and/or union types. Do you
happen to know whether this sort of thing is formally covered by the
concept of inclusion dependency?

I'm guessing this is just a limitation of some particular products?
Because if I'm understanding "The Principle of Interchangeability"
that for example Date's discusses in "Databases In Depth" then the
RM has nothing to say about the arbitrary distinction between base
versus virtual relvars (views). So in principle one should be able
to define constraints on any relvars base or otherwise at least in
so far as the RM is concerned. Is this correct?

KHD
Date's POI has this caveat about "no unnecessary" which always troubles
me. I would like to know if Codd ever said his foreign key couldn't be
defined against his views.

Reply With Quote
  #7  
Old   
Keith H Duggar
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-21-2009 , 10:22 PM



Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Because simply having the same type is not enough for example
both OrderID and CouponID might both have type GenericID and
yet we may want to state that it doesn't make sense to join
them. Likewise attribute names for joinable attributes may be
different in different relations (ie needing rename).

In other words suppose we go with Mr. Scott's reformulation
but with some additional type information (name : type) and
attribute name changes in Redemptions for argument sake

Orders {
OrderID : GenericID }

Coupons {
OrderID : GenericID ,
CouponID : GenericID }

Redemptions {
CID : GenericID ,
OID : GenericID }

where CID is the redeemed coupon's ID and OID is the ID of the
order on which the coupon was redeemed (not the order the coupon
was shipped with).

So as above, we cannot tell it is ok to join CID with CouponID
from the attribute name, since they are different, nor can we
assume they can be joined simply because they are both type of
GenericID because for example so is OID which is not appropriate
to join. So if we wanted to express the additional constraint
that is semantically appropriate to join CouponID with CID
what kind of constraint would this be? Likewise if we wanted
to express it is semantically inappropriate to join OrderID
and CID?

By the way, Mr. Scott, I removed O# from your R because it
seemed redundant to me since we can recover the OrderID of the
order the coupon was shipped with from the Orders relation (if
we assume that CouponID is a candidate key of Coupons). Is
that not correct?

Thanks again for the help!

KHD

Reply With Quote
  #8  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-22-2009 , 06:34 AM



Quote:
Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?
Hmm. Usually we talk about constraints when they're syntactic, because
that's what makes it possible to enforce them automatically, simply by
manipulating the symbols in the database. So what you actually mean
is, what syntactic construct is appropriate for enforcing the
semantics you describe. I'd argue that that's what domains were
invented for.

Quote:
Because simply having the same type is not enough for example
both OrderID and CouponID might both have type GenericID and
yet we may want to state that it doesn't make sense to join
them.
I see at least two ways to view this. The first would be that you just
made a modelling error -- you assigned the same syntactic type to two
things that are actually quite different. What you should have done is
to assign them two different types/domains, which would then stop them
from being joined.

The second way would be to say that you've now implicitly created a
union type (Order|Coupon), and its key, GenericID, has to be such that
OrderID's and CouponID's never overlap. That way you can join willy
nilly, but the results will be identical to the case where you've
isolated the ID's into two separate types. Carried out in full this
approach eventually leads to a generalization hierarchy converging
towards a fully generic Object, and thus database wide object/unique
identifiers.

It is well known that this is rather a contentious and dangerous
design choice. In fact we can already see the first signs of trouble
in your minimal example: what on earth do Orders and Coupons have to
do with each other? What precisely are you trying to express by
creating the union type? In order for that generalization to make
sense, at the very minimum you should be able to point to some
hypothetical field that could be shared between the two types, i.e.
data belonging uniquely to the union type. What would that be,
precisely?

If you have the means of keeping such ID's private -- this would
necessitate making the ID's fully opaque from an outside user's point
of view, which cannot be done in any DBMS I know of -- you might just
be able to justify their existence as surrogates. But this is still
stuff that requires tremendous discipline and insight into the entity
integrity issues that need to be addressed in addition to the
referential integrity between the surrogates. I wouldn't recommend
going down this slippery slope unless you're *absolutely* sure you
know what you're doing -- I haven't seen a single wider scale
deployment which got this right and didn't suffer integrity issues in
the long run.

Quote:
Likewise attribute names for joinable attributes may be
different in different relations (ie needing rename).
Yes. Ideally the attributes wouldn't carry just a name and a physical
level type (e.g. string, integer) but also a semantic type/domain.
That way the name would indicate the semantics, or role, of the
attribute in this given relation, the domain would indicate what it
can be joined with and where to look for the set of permissible
values, and the syntacting type associated with the domain would tell
how to actually store the data.
--
Sampo

Reply With Quote
  #9  
Old   
Nilone
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-22-2009 , 07:40 AM



On Oct 22, 5:22*am, Keith H Duggar <dug... (AT) alum (DOT) mit.edu> wrote:
Quote:
Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Because simply having the same type is not enough for example
both OrderID and CouponID might both have type GenericID and
yet we may want to state that it doesn't make sense to join
them. Likewise attribute names for joinable attributes may be
different in different relations (ie needing rename).
If two attributes have the same domain and are semantically
comparable, I would call that a type. So I would call OrderID and
CouponID types, and GenericID would be their supertype. It just so
happens that the attributes of these types in Orders and Coupons have
the same name as the name of their type.

Orders { OrderID : OrderID }
Coupons { OrderID : OrderID, CouponID : CouponID }
Redemptions { CID : CouponID, OID : OrderID }
OrderID <: GenericID
CouponID <: GenericID

Reply With Quote
  #10  
Old   
Mr. Scott
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-22-2009 , 08:40 AM



"Keith H Duggar" <duggar (AT) alum (DOT) mit.edu> wrote

Quote:
Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Because simply having the same type is not enough for example
both OrderID and CouponID might both have type GenericID and
yet we may want to state that it doesn't make sense to join
them. Likewise attribute names for joinable attributes may be
different in different relations (ie needing rename).

In other words suppose we go with Mr. Scott's reformulation
but with some additional type information (name : type) and
attribute name changes in Redemptions for argument sake

Orders {
OrderID : GenericID }

Coupons {
OrderID : GenericID ,
CouponID : GenericID }

Redemptions {
CID : GenericID ,
OID : GenericID }

where CID is the redeemed coupon's ID and OID is the ID of the
order on which the coupon was redeemed (not the order the coupon
was shipped with).

So as above, we cannot tell it is ok to join CID with CouponID
from the attribute name, since they are different, nor can we
assume they can be joined simply because they are both type of
GenericID because for example so is OID which is not appropriate
to join. So if we wanted to express the additional constraint
that is semantically appropriate to join CouponID with CID
what kind of constraint would this be? Likewise if we wanted
to express it is semantically inappropriate to join OrderID
and CID?
If you use generic ids then I think you would have to explicitly specify the
ISA relationships between the objects represented in the database, but if
you use a separate domain or type for each kind of object, then the
relationships become implicit. I would like to explore this further, but I
don't have the time right now.

Quote:
By the way, Mr. Scott, I removed O# from your R because it
seemed redundant to me since we can recover the OrderID of the
order the coupon was shipped with from the Orders relation (if
we assume that CouponID is a candidate key of Coupons). Is
that not correct?
You can indeed recover the OrderID from the coupon in a join, but how would
you declare the constraint that coupons can't be applied to the order with
which they were shipped?

Quote:
Thanks again for the help!

KHD

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.