dbTalk Databases Forums  

Re: foreign key constraint versus referential integrity constraint

comp.databases.theory comp.databases.theory


Discuss Re: foreign key constraint versus referential integrity constraint in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
paul c
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-22-2009 , 12:45 PM






Mr. Scott wrote:
....
Quote:
A foreign key constraint is an inclusion dependency and an inclusion
dependency is a referential constraint, but there are referential
constraints that are not inclusion dependencies and there are inclusion
dependencies that are not foreign key constraints. For example, a
constraint that states that if there is a row in one table there cannot be a
corresponding row in a different table is neither an inclusion dependency
nor a foreign key constraint but is still a referential constraint
nonetheless. ...
Tempts me to use a new term (at least I'm guessing it's new) - exclusion
dependency, even if there is nothing new about what it connotes. I
think it is hard to separate the influence of the typical dbms product
that doesn't allow one to express all the possibilities that an algebra
allows (maybe easier for me being quite ignorant of most products). This
makes talk of theory harder so most people end up habitually confusing
various product documentation with theory. Personally, given that any
dbms is likely to have a number of practical limitations, I don't see
why a dbms couldn't allow restricted use of negation so that a foreign
'reference'/exclusion dependency might be read as "A{attr} = A{attr} AND
(NOT B{attr})", which is basically of the same form as any inclusion
dependency or what I call a reference. (Any dbms that has a 'delete'
operator already supports similar negation implictly.) When people talk
of referential integrity I suspect that they are usually talking about
such an equation. Also suspect that we have a number of qualified names
for that basic form (of which 'primary key' was apparently the first)
simply because using the terms in a dbms' language makes it easy for
implementers to physically optimize. IMHO the implementation artifacts
don't really contribute to, nor involve, any essential theory except for
the usually ignored theory of optimization.

(I once read an interview of Codd where he more or less acknowledged
that in his first two papers he was struggling to find the best terms to
use. Eg., he said that the American 'normalization' of relations with
China inspired him to use same term. I've read where CJ Date hadn't
heard of this, but I recall the interview, probably from a magazine
called 'DBMS' around 1994.)

Reply With Quote
  #12  
Old   
paul c
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-22-2009 , 01:22 PM






Sampo Syreeni wrote:
Quote:
... What we'd nowadays
call "complex", "semistructured", "unstructured", "unconventional" or
even "nasty" data was explicitly excluded by Codd from the very
beginning.
...
He was dealing with structured data all along. What he excluded was two
things i) unnecessary structure, ii) hidden structure.

Reply With Quote
  #13  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-22-2009 , 03:16 PM



Quote:
Tempts me to use a new term (at least I'm guessing it's new) - exclusion
dependency, even if there is nothing new about what it connotes.
I've seen that term being used, although far less commonly than is the
case for inclusion. There is not a whole lot of theory on that sort of
thing. (Which is why I forgot referential integrity can take that form
as well; Thanks, Mr. Scott!)

I've never seen a real life schema where this sort of constraint was
being enforced. Quite possibly because people don't come to think of
the logical presence, and thus the necessity of the constraint. The
main context in which I have in fact seen these used is a theoretical,
data modelling one. There the commonest case is to enforce the
distinction between plain and disjoint union types.

In practical, relational use that however mostly translates into a
data model which places the separate pieces of a disjoint union into
separate tables, and so makes the enforcement implicit. (Cf.
dependency preservation and all that.) The other kind of union then
tends to yield tables with lots of correlatedly null columns, or a
constellation of tables with a shared key (often a surrogate/OID/
whatever in case the participants in the union are different enough;
unsurprisingly the pattern often also leads to redundant columns as
well).

Quote:
*Personally, given that any
dbms is likely to have a number of practical limitations, I don't see
why a dbms couldn't allow restricted use of negation so that a foreign
'reference'/exclusion dependency might be read as "A{attr} = A{attr} AND
(NOT B{attr})", which is basically of the same form as any inclusion
dependency or what I call a reference.
Well, not quite the same form because it's more general. But no, there
is no reason why that sort of functionality couldn't be available. And
in fact, even SQL supports it nowadays, via assertions, eventhough
just about nobody implements that part of the standard.

That's really quite a shame. Every single form of dependency I can
think of could be asserted that way, and the wider use of assertions
instead of more specialised forms of integrity constraint would
basically for RDBMS vendors to treat constraints in a fully
declarative, generalized, high level fashion. And if you then think
about it, solving that problem would also pretty much solve the
problem of incremental maintenance of materialized views, and vice
versa.

Quote:
*Also suspect that we have a number of qualified names
for that basic form (of which 'primary key' was apparently the first)
simply because using the terms in a dbms' language makes it easy for
implementers to physically optimize.
Quite so. More specifically, if you special case for just a few forms
of constraints (or other functionality at that), you can then special
case your underlying implementation to deal particularly well with
those special cases. That can considerably simplify implementation and
lead to substantially higher performance in those cases which fit into
your defined interface. Just like in our original example here:
foreign keys as a concept have been heavily optimized by always
demanding that the target is a candidate key and unconditionally
creating a a unique index for it.

That optimization comes at the price of generality. So when we think
about the relational model in all, such solutions don't really fit in
too well. I mean, from the very start relational data management has
been about making the model and its support machinery completely
general and declarative. No difference between fields at the logical
level. No difference between entities and relationships. No difference
in the level at which data reside, unlike in hierarchical databases.
No difference between pointers and the data pointed to, because
pointers are explicitly forbidden. No difference between table and
view. And so on.

Thus what the relational model would like is purely declarative,
algebraically complete assertions as the means of guaranteeing
integrity. Not specialized mechanisms like foreign keys -- those
belong in the same era as pointers to data do.

Quote:
IMHO the implementation artifacts
don't really contribute to, nor involve, any essential theory except for
the usually ignored theory of optimization.
In this regard, there is a slight complication, though. If you think
about foreign keys, they have one extra aspect in addition to the
inclusion dependency they declare has to hold. That is the procedural
rule used to deal with impending violation.

Roughly there are two general approaches to this, and SQL allows a
limited form of both wrt foreign keys: 1) deny and roll back the
transaction if a violation is about to happen, or the hazier, far more
complicated option of 2) making it so that the update takes effect,
and then using whatever means to automatically propagate the logical
consequences of the update to once again make the integrity to fully
and globally check out.

As I said, even SQL has the latter in the limited form of "on delete
cascade". But in the fully general form, those operational semantics
would be exceedingly complex, and evenmore complex to specify exactly
if full generality was to be expected. So in that sense, we do have
some very serious theory going on in here, and the kind of theory that
hasn't been fully fleshed out yet even in the best of DB literature.
--
Sampo

Reply With Quote
  #14  
Old   
Keith H Duggar
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-24-2009 , 01:53 PM



On Oct 22, 7:34 am, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:
Quote:
Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Hmm. Usually we talk about constraints when they're syntactic, because
that's what makes it possible to enforce them automatically, simply by
manipulating the symbols in the database. So what you actually mean
is, what syntactic construct is appropriate for enforcing the
semantics you describe. I'd argue that that's what domains were
invented for.
I agree that it is possible to enforce such join semantics with
domains but I'm not sure it is entirely appropriate. I say that
because to the greatest extent the RM is orthogonal to domains.
To quote CJ Date "Databases In Depth" Chapter 2 Summary:

"It's a very common misconception that the relational model deals
only with rather simple types: numbers, strings, perhaps dates
and times, and not much else. In this chapter, I've tried to show
that this is indeed a misconception. Rather, relations can have
attributes of /any type whatsoever/ -- the relational model
nowhere predicates what those types must be, and in fact they
can be as complex as we like ... In other words, the question as
to what types are supported is orthogonal to the question of
support for the relational model itself. Or (less precisely but
more catchily): /types are orthogonal to tables/."

To put this another way, the RM should be just as complete and
effective for a universe with a single domain as it is for a
universe with a rich domain set (such as you propose to handle
my GenericID join constraint example). And placing constraints
on /relational expressions/ seems as fundamental to the RM as
placing constraints on /relational values/.

By pushing such expression constraints off to domains, we are
admitting that the RM itself has no support for constraining
relational expressions and I'm not prepared to admit that
limitation. Especially since it does provide support for
constraining the values of relational variables.

I will also note that both Codd and Date proposed RM mechanisms
to operate the other way around ie to allow designers to force
cross-domain operations. Codd with "Domain Check Overrides" and
date with the THE_ operators that provide type coercion.

Quote:
Because simply having the same type is not enough for example
both OrderID and CouponID might both have type GenericID and
yet we may want to state that it doesn't make sense to join
them.

I see at least two ways to view this. The first would be that you just
made a modelling error -- you assigned the same syntactic type to two
things that are actually quite different. What you should have done is
to assign them two different types/domains, which would then stop them
from being joined.

The second way would be to say that you've now implicitly created a
union type (Order|Coupon), and its key, GenericID, has to be such that
OrderID's and CouponID's never overlap. That way you can join willy
nilly, but the results will be identical to the case where you've
isolated the ID's into two separate types. Carried out in full this
approach eventually leads to a generalization hierarchy converging
towards a fully generic Object, and thus database wide object/unique
identifiers.
I think there are other views, see above. And I don't think either
of the views is accurate in this case and certainly may not apply
to every case one can imagine. In short, we should not let our lack
of imagination guide our design principles.

Quote:
It is well known that this is rather a contentious and dangerous
design choice. In fact we can already see the first signs of trouble
in your minimal example: what on earth do Orders and Coupons have to
do with each other? What precisely are you trying to express by
creating the union type? In order for that generalization to make
sense, at the very minimum you should be able to point to some
hypothetical field that could be shared between the two types, i.e.
data belonging uniquely to the union type. What would that be,
precisely?
It's not for us to wonder such whys because our imagination is
usually quite limited. For example, GenericID seems an entirely
reasonable thing to me if I labelled every item I sent out (even
the coupons) with a barcode and have other relations expressing
facts about those barcodes apart from what they label. But
arguing about such specific design semantics I think is irrelevant
to basic questions of what constraint capabilities the relational
model supports.

Anyhow, the question here is not one of our imagination but rather
simply this: if it makes sense for the RM to support constraints
on relational /values/ (taken on by variables) why does it not
make sense to support constraints on relational /expressions/?
That is a question of general principle not specific design.

Quote:
If you have the means of keeping such ID's private -- this would
necessitate making the ID's fully opaque from an outside user's point
of view, which cannot be done in any DBMS I know of -- you might just
be able to justify their existence as surrogates. But this is still
stuff that requires tremendous discipline and insight into the entity
integrity issues that need to be addressed in addition to the
referential integrity between the surrogates. I wouldn't recommend
going down this slippery slope unless you're *absolutely* sure you
know what you're doing -- I haven't seen a single wider scale
deployment which got this right and didn't suffer integrity issues in
the long run.
Frankly I don't understand the point of the above paragraph at
all. Also I don't see how entity integrity enters into this at
all; I'm assuming we follow the sage advice of having no NULLs
to begin with; so entity integrity is non-issue.

Quote:
Likewise attribute names for joinable attributes may be
different in different relations (ie needing rename).

Yes. Ideally the attributes wouldn't carry just a name and a physical
level type (e.g. string, integer) but also a semantic type/domain.
That way the name would indicate the semantics, or role, of the
attribute in this given relation, the domain would indicate what it
can be joined with and where to look for the set of permissible
values, and the syntacting type associated with the domain would tell
how to actually store the data.
I don't think I'm talking about "physical level" types (posreps)
at all. And again, I'm not convinced of your conclusion that the
responsibility of constraining /relational/ expressions should be
shoved off to domains.

Consider Date's point that RM is orthogonal to type support.
Then what argument from principle do you have for prohibiting
the RM from defining constraints on relational expressions by
pushing that responsibility off to domain support? How would
we express the constraints in a universe with one domain?

Thanks!

KHD

Reply With Quote
  #15  
Old   
Bob Badour
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-24-2009 , 03:13 PM



Keith H Duggar wrote:

Quote:
On Oct 22, 7:34 am, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:

Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Hmm. Usually we talk about constraints when they're syntactic, because
that's what makes it possible to enforce them automatically, simply by
manipulating the symbols in the database. So what you actually mean
is, what syntactic construct is appropriate for enforcing the
semantics you describe. I'd argue that that's what domains were
invented for.

I agree that it is possible to enforce such join semantics with
domains but I'm not sure it is entirely appropriate. I say that
because to the greatest extent the RM is orthogonal to domains.
To quote CJ Date "Databases In Depth" Chapter 2 Summary:

"It's a very common misconception that the relational model deals
only with rather simple types: numbers, strings, perhaps dates
and times, and not much else. In this chapter, I've tried to show
that this is indeed a misconception. Rather, relations can have
attributes of /any type whatsoever/ -- the relational model
nowhere predicates what those types must be, and in fact they
can be as complex as we like ... In other words, the question as
to what types are supported is orthogonal to the question of
support for the relational model itself. Or (less precisely but
more catchily): /types are orthogonal to tables/."

To put this another way, the RM should be just as complete and
effective for a universe with a single domain as it is for a
universe with a rich domain set (such as you propose to handle
my GenericID join constraint example). And placing constraints
on /relational expressions/ seems as fundamental to the RM as
placing constraints on /relational values/.

By pushing such expression constraints off to domains, we are
admitting that the RM itself has no support for constraining
relational expressions and I'm not prepared to admit that
limitation. Especially since it does provide support for
constraining the values of relational variables.

I will also note that both Codd and Date proposed RM mechanisms
to operate the other way around ie to allow designers to force
cross-domain operations. Codd with "Domain Check Overrides" and
date with the THE_ operators that provide type coercion.
Domains, however, define what is or is not joinable because natural join
applies the equality comparison to attributes with like names, and
domains define equality comparisons.

Reply With Quote
  #16  
Old   
Keith H Duggar
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-24-2009 , 05:03 PM



On Oct 24, 3:13 pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:
Quote:
Keith H Duggar wrote:
On Oct 22, 7:34 am, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:

Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Hmm. Usually we talk about constraints when they're syntactic, because
that's what makes it possible to enforce them automatically, simply by
manipulating the symbols in the database. So what you actually mean
is, what syntactic construct is appropriate for enforcing the
semantics you describe. I'd argue that that's what domains were
invented for.

I agree that it is possible to enforce such join semantics with
domains but I'm not sure it is entirely appropriate. I say that
because to the greatest extent the RM is orthogonal to domains.
To quote CJ Date "Databases In Depth" Chapter 2 Summary:

"It's a very common misconception that the relational model deals
only with rather simple types: numbers, strings, perhaps dates
and times, and not much else. In this chapter, I've tried to show
that this is indeed a misconception. Rather, relations can have
attributes of /any type whatsoever/ -- the relational model
nowhere predicates what those types must be, and in fact they
can be as complex as we like ... In other words, the question as
to what types are supported is orthogonal to the question of
support for the relational model itself. Or (less precisely but
more catchily): /types are orthogonal to tables/."

To put this another way, the RM should be just as complete and
effective for a universe with a single domain as it is for a
universe with a rich domain set (such as you propose to handle
my GenericID join constraint example). And placing constraints
on /relational expressions/ seems as fundamental to the RM as
placing constraints on /relational values/.

By pushing such expression constraints off to domains, we are
admitting that the RM itself has no support for constraining
relational expressions and I'm not prepared to admit that
limitation. Especially since it does provide support for
constraining the values of relational variables.

I will also note that both Codd and Date proposed RM mechanisms
to operate the other way around ie to allow designers to force
cross-domain operations. Codd with "Domain Check Overrides" and
date with the THE_ operators that provide type coercion.

Domains, however, define what is or is not joinable because natural join
applies the equality comparison to attributes with like names, and
domains define equality comparisons.
Domains also define a set of permissible values for attributes.
However, that doesn't stop RM from adding additional constraints
to limit those values further. Why should relational operations
be any different in this respect from relational variables? Ie
why shouldn't the RM allow us to further constrain what domains
permit for both values AND operations?

KHD

Reply With Quote
  #17  
Old   
Bob Badour
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-24-2009 , 07:07 PM



Keith H Duggar wrote:

Quote:
On Oct 24, 3:13 pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:

Keith H Duggar wrote:

On Oct 22, 7:34 am, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:

Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Hmm. Usually we talk about constraints when they're syntactic, because
that's what makes it possible to enforce them automatically, simply by
manipulating the symbols in the database. So what you actually mean
is, what syntactic construct is appropriate for enforcing the
semantics you describe. I'd argue that that's what domains were
invented for.

I agree that it is possible to enforce such join semantics with
domains but I'm not sure it is entirely appropriate. I say that
because to the greatest extent the RM is orthogonal to domains.
To quote CJ Date "Databases In Depth" Chapter 2 Summary:

"It's a very common misconception that the relational model deals
only with rather simple types: numbers, strings, perhaps dates
and times, and not much else. In this chapter, I've tried to show
that this is indeed a misconception. Rather, relations can have
attributes of /any type whatsoever/ -- the relational model
nowhere predicates what those types must be, and in fact they
can be as complex as we like ... In other words, the question as
to what types are supported is orthogonal to the question of
support for the relational model itself. Or (less precisely but
more catchily): /types are orthogonal to tables/."

To put this another way, the RM should be just as complete and
effective for a universe with a single domain as it is for a
universe with a rich domain set (such as you propose to handle
my GenericID join constraint example). And placing constraints
on /relational expressions/ seems as fundamental to the RM as
placing constraints on /relational values/.

By pushing such expression constraints off to domains, we are
admitting that the RM itself has no support for constraining
relational expressions and I'm not prepared to admit that
limitation. Especially since it does provide support for
constraining the values of relational variables.

I will also note that both Codd and Date proposed RM mechanisms
to operate the other way around ie to allow designers to force
cross-domain operations. Codd with "Domain Check Overrides" and
date with the THE_ operators that provide type coercion.

Domains, however, define what is or is not joinable because natural join
applies the equality comparison to attributes with like names, and
domains define equality comparisons.

Domains also define a set of permissible values for attributes.
However, that doesn't stop RM from adding additional constraints
to limit those values further.
You asked about semantic joinability. Semantically, the join works if
like-named attributes have an equality comparison defined and not if
they don't.


Quote:
Why should relational operations
be any different in this respect from relational variables? Ie
why shouldn't the RM allow us to further constrain what domains
permit for both values AND operations?
Because the RM doesn't have anything much to say about domains. Domains
are a separate--albeit vitally important--concern.

Reply With Quote
  #18  
Old   
paul c
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-24-2009 , 11:47 PM



Bob Badour wrote:
....
Quote:
You asked about semantic joinability. Semantically, the join works if
like-named attributes have an equality comparison defined and not if
they don't.
...
In RT, how could they not define equality?

Reply With Quote
  #19  
Old   
paul c
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-24-2009 , 11:56 PM



Bob Badour wrote:
Quote:
paul c wrote:

Bob Badour wrote:
...

You asked about semantic joinability. Semantically, the join works if
like-named attributes have an equality comparison defined and not if
they don't.
...

In RT, how could they not define equality?

What reason would one have to define an equality comparison to compare
employee ids with department ids or with names? Or charge with current
or potential? Or age with price?
None in theory, I thought you were talking about attributes in the same
domain.

Reply With Quote
  #20  
Old   
Bob Badour
 
Posts: n/a

Default Re: foreign key constraint versus referential integrity constraint - 10-24-2009 , 11:56 PM



paul c wrote:

Quote:
Bob Badour wrote:
...

You asked about semantic joinability. Semantically, the join works if
like-named attributes have an equality comparison defined and not if
they don't.
...

In RT, how could they not define equality?
What reason would one have to define an equality comparison to compare
employee ids with department ids or with names? Or charge with current
or potential? Or age with price?

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.