dbTalk Databases Forums  

A different definition of MINUS, Part 1

comp.databases.theory comp.databases.theory


Discuss A different definition of MINUS, Part 1 in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
paul c
 
Posts: n/a

Default A different definition of MINUS, Part 1 - 12-15-2008 , 06:35 PM






I'm still on the same view updating theme but I thought I'd make a new
subject so as not to sully D&D's stuff with some of my asides under the
other thread. I'm usually guilty, on purpose, of aside-overload because
some of the smarter people here often clarify my misunderstandings,
which is valuable to me. But that thread is now full of stuff that I
don't think is relevant to view updating. Secondly, when I said the
Tutorial D MINUS definition might not be "good enough", what I really
meant to say is that it doesn't go far enough for the usual dbms where a
table or relvar has a fixed header. Obviously D&D already knew this
because they mention "same heading" in their definition.

As interesting as theory is to me, I'm not much of a theorist (I want to
know just enough theory to understand a dbms's results, or maybe just
enough more to see a better way to make a dbms) and rather than me write
about this, it would be better if the big guns either solved it or
proved it can't be solved, but I guess they have other things on their
plates. But I'm impatient because I'm with McGoveran that this issue is
crucial to the RM, otherwise we're effectively saying that some results
are logical and some are arbitrary.

This is a pretty long post and I wish I were clever enough to make it
shorter. For ease of reference for anybody who cares to wade through
it, I'll divide it into parts one and two, even if that seems a little
pompous.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
PART ONE

Here's the D&D TTM definition:

(quote - page 92, TTM 2nd edition)

Difference: The Tutorial D <minus>

R1 MINUS R2
(where R1 and R2 have the same heading) is semantically equivalent to
the A expression
R1 <AND> ( <NOT> R2)

(end quote)

In the original dbdebunk exchange, McGoveran said: "...Thus, we must
solve the general case of update semantics for which updating "base"
relations is the special case (and not the reverse) ...".

I'm not expert in formal predicates and I find seeing a way to an
initial implementation easier with an algebra to follow, so I tried to
think about McGoveran's advice in a purely algebraic way. Another thing
about the A-algebra specifically: since I first saw it, it has always
seemed remarkable to me that an implementation such as Tutorial D might
have this view updating problem at the same time as there is no such
problem within the algebra, eg., a dbms that implemented only the
algebra might mean a db that grows like topsy storing all its results
and it might not be especially convenient for users to interpret, but
still how could the implementation have the problem and the algebra not
have it?

Anyway, here's what I imagine I mean by 'purely algebraic':

1) The A-algebra says nothing about constraints other than the ones in
the formal definition that have to do with relation structure (such as
two attributes with the same name in the same relation not having the
same type), where other constrains are concerned, it is merely the
vehicle for applying the ones that users desire. So, for the purpose of
determining results before user constraints are applied (eg.,
'<AND>ed'), an algebraic approach can ignore the key and foreign
constraints mentioned in the exchange.
2) Date suggests the Assignment Principle is essential, but I would say
it is essential only if we use language that has an assignment
operator. The S relation that underlies the SSP relvar mentions
supplier tuples that either don't participate in the view or that
aren't suggested by the WHERE clause. I interpret this to mean that if
an S-tuple has triples that aren't all in an SSP tuple, that S-tuple
must remain in the result, ie., those tuples are outside of the 'scope'
that McGoveran mentions. Below, I try to find an algebraic way to
satisfy this. I think the Assignment Principle doesn't mean that an
algebra needs an assignment operator, rather it is really a definition
in terms of an algebra as to what an assignment operator, if a language
has one, means.
3) As far as the 'Principle of Interchangeability' (users without
definition privileges would not be able to tell base relations from
views) is concerned, I don't yet see how the A-algebra by itself can
enforce it, so it is up for grabs. Closure seems far more important,
just as it is in Boolean Algebra. From an algebraic point of view, this
also means that there can't be any exceptions except for ones that the
formal definition of the algebra mentions. Users might prefer to be
warned when certain kinds of results are implied by the algebra, but
that is a choice for the language or dbms designer, not something the
algebra must cater to. So, for starters, I'd remove that "same heading"
proviso in the TD MINUS definition.
4) A semantic definition seems important, akin to the one for TD MINUS,
ie., one that avoids "implementation rules" and expresses a possible
naive implementation directly in terms of the algebra.
5) The distributive and associative properties of the algebra are also
important for implementation as they allow a dbms to optimize in various
situations.
6) Projection must be involved (given the A-algebra) if the usual
short-hands for eg., WHERE clauses that people find familiar and if the
usual fixed headings of most db's are to be supported. Whereas, the
A-algebra doesn't care about shorthands nor about fixed headings.

First, let me examine a simpler version of TD's MINUS that removes the
"same heading" proviso:

R MINUS D is semantically equivalent to R <AND> <NOT> D.

Whatever the equivalent to an implemented language's WHERE clause this
definition might be, users will surely want the leverage, such as SQL's
WHERE clause has, to remove multiple tuples by specifying a proper
subset of all of a relation's attributes. This means that D needn't
specify all R attributes. In the exchange, Fabian Pascal implied that
the example WHERE clause is not "sufficiently expressive". I'm not
arguing that the clause is sufficient for all purposes, just that it is
enough to apply the A-algebra. My approach then is to take the clause
as given and decide what it can definitely mean from the information
available to an algebra. With respect to the MINUS definition, the
WHERE clause refers only to a tuple in D (and not a tuple in R, since R
may not have the same attributes as D, as is the case for the SSP
relvar). Date goes beyond SQL syntax because not only the attribute
names are specified, but also the types, as in S# = S#('S1'), so as far
as types are concerned, this example doesn't need to refer to a
'catalog' . As far as the algebraic term D is concerned, all we are
given are attribute names, types and values and the D&D 'conjoin',
'<AND>' and enough is specified to form triples, tuples and a partial
predicate. No relvar or relation name is given, but we can still
proceed because the A-algebra allows us to form relations from tuples.
Still, I think we can't deal with the definitional problem unless we
think about tuples and their triples. As far as the algebraic equation
is concerned, the significance of the D-tuple is that it is in D and not
specifically stated to be in R.

To avoid maddening levels of parentheses in what follows, let me give
highest precedence to <NOT> , next-highest to <AND> and <OR>, next
highest to '=' and also assume that associativity is left-to-right.

For my purposes, it seems to me easier and clearer to resort to an
algebraic equation. I'm trying to omit extraneous notions, perhaps this
is the same as what Codd called 'essentiality', I'm not sure if this is
what he had in mind but I am coming to suspect that the usual arguments
against such 'deletes' stem from mingling physical and logical notions,
eg., that a base must be physical. For one thing, an equation avoids
introducing any notion of assignment to the algebra and also avoids
relvars, so there's no notion of 'before' and 'after', making it easier,
at least for me, to talk about results, ie.,

R' = R MINUS D
= R{HR} <AND> <NOT> D{HD}

where R' is equivalent to the relation that the MINUS operation produces
and HR, HD are the headings of R and D. The projections R{HR} and D{HD}
are equal to R and D respectively. Though they may seem redundant they
are necessary if I am to see this algebraic equation in the context of
the typical dbms that assumes fixed relvar or relation headings.

In this post, I'm concerned only with 'delete through join', not yet
'insert through union' or 'insert through projection'.

If R{HR} is a join, say equal to 'A <AND> B', Heath's Theorem can be
used to re-write 'R{HR}', provided the re-writing involves overlapping
headers, where HSR is some subset of R's header, ie.:

R{HR} = R{HSR} <AND> R{HR}

(This is because HSR -> HSR is a trivial dependency. Maybe somebody
will point out that I don't need Heath's theorem to re-write, it is just
a way that comes to mind that I believe can be shown with only the
assumptions in the A-algebra definitions, not any notions beyond that. )

By definition of <AND>, we know that there is an HSR that equals HA,
where HA is A's heading, and there is no non-equal HSR we care about for
this example, so we can rewrite the R' equation as:

R' = R{HA} <AND> R{HR} <AND> <NOT> D

If we introduce HB as the heading of B, we can also write:

R' = R{HB} <AND> R{HR} <AND> <NOT> D

I think this points the way to a result that avoids mingling base and
virtual concepts, because the R' equation that mentions the HA heading
doesn't mention the HB heading and vice-versa. If we apply this
interpretation to the longer equation, we can ask "what value for R{HA}
always makes R' true?", similarly for R{HB}, ie., if A' and B' are
partial resulting base relations, we don't need to talk about both of A'
and B' in the same breath, we can evaluate them separately. Further, we
can ignore any question of base or virtual when evaluating them (again,
because the A-algebra doesn't admit assignment).

Let me apply Heath to R{HR} and D, such that R{HR} = R{HR} <AND> D{HD},
where HD is the heading of D, eg.,

R' = R{HA} <AND> R{HR} <AND> <NOT> D
= R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD}

Further, let me apply Heath to the right-most D{HD} term where HD = {S#,
P#}, eg.,

D{HD} = D{HD} & D{S#}, so

R' = R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD} <AND> D{S#})

Since the heading of R' is HR, if HR includes S#, we could also write

R'{S#} <AND> R'{HR} = R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD}
<AND> D{S#})
and removing some of the redundant projections,
R'{S#} <AND> R'{HR} = R{HA} <AND> R <AND> D <AND> <NOT> (D <AND> D{S#})

Let "value of S# = 'Sx'" stand for the value 'Sx' in some triple that
corresponds with a heading of {S# S#}. (In this case, 'S#' is not only
a heading, but an attribute name.) Let me now ask whether whenever
there is an 'Sx' in D{S#} must there not be an 'Sx' in R'{S#}? If 'Sx'
is in D{S#}, it must be in D. If it is in R, it must also be in R{HA}.
But it will not be in the right most term (the complement), so it will
not be in the result and it will not be in R'{S#}.

So substituting 'S1' for 'Sx', this definition of MINUS doesn't have the
S# value 'S1' in any of its resulting triples and if followed, the
definition would require that we always remove the S# value 'S1' from
the result. Similarly for the P# value 'P1'. So if the TD MINUS
definition is followed, we must always remove tuples with such triples
in the result. (I think we could say similar if 'S#' were not an
attribute but any subset of HR.) With a language assignment, the
analogy would be 'delete from both sides'.

With the Principle of Interchangeability, R'{S#} could be base or
virtual but in either case, the analogy would remain, so the POI is not
violated.

Now, I want to ask another question. Given the values for S and SP in
Date's example, does this equation have two solutions? (ie., two
solutions as far as the value 'S1' is concerned.) No, it doesn't, if
'S1' is in R'{S#}, it is in R'{HR} and vice versa. The analogy for a
language assignment is that there is only one solution to the 'update'
of S. From an algebraic standpoint, we don't need to consider the
removal of S-tuples as having anything to do with the removal of
SP-tuples, so why should a language that implements assignment introduce
such a consideration? The language's dbms can handle 'updates' to S in
isolation from 'updates' to SP.

As far as the value 'S1' is concerned, I think the Assignment Principle
isn't violated because if 'v' doesn't include the 'S1' value, the
resulting V doesn't include that value. But the WHERE clause makes no
mention of a tuple (<S# S# 'S2'>,<P# P# 'P1'>) and if there were such a
tuple in R{HD}, the P# value 'P1' would not be in R'{HR}, which would
violate the Assignment Principle. This is why I think the TD MINUS
definition must be inaccurate. Plus, from a practical point of view, it
results in an SSP result that seems very mangled, given the original
'delete' statement. In Part Two, I'll try to explain a different
definition.

-------------------------------------------------------------------------------------------------------------------------------------------------------



Reply With Quote
  #2  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM






Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #3  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #4  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #5  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #6  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #7  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #8  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #9  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
  #10  
Old   
Brian Selzer
 
Posts: n/a

Default Re: A different definition of MINUS, Part 1 - 12-17-2008 , 04:16 AM



Just some food for thought:

1. The scope that McGoveran referred to is the set of all tuples that appear
in any possible value for the relvar.

2. When dealing with an update, there is always a before and an after,
unless the update is null. If "Paul is asleep," and the update states,
"Paul is awake," then which is the case? Is he asleep or is he awake? He
can't be both. An expression of the algebra can only deal with one set of
circumstances--not both, since the algebra is for determining what is the
case rather than asserting what is the case. Each possible value for a
database corresponds to a distinct logical proposition, yet only one of
those propositions represents what /is/ the case. So whenever there is an
update (unless it is null, of course), you are always dealing with two
distinct propositions representing /what has been the case/ and /what is now
the case/. I just think that ignoring before and after invites dire
consequences.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.