dbTalk Databases Forums  

Relational query with path expressions

comp.databases.theory comp.databases.theory


Discuss Relational query with path expressions in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-05-2009 , 10:08 PM






On Apr 3, 12:46 pm, Tegiri Nenashi <TegiriNena... (AT) gmail (DOT) com> wrote:
Quote:
Granted, if you have a choice between join and generalized projection
you can infer the type of binary operator from the type of operand. In
the expression

x.y

if both are relations, you interpret it as join, and if one is a set
of attributes, than its a projection. That is some economy of
notation.
It occurred to me that the same language can be formalised more simply
and without any need to overload the dot syntax. The relevant rules
are:

1. R1.R2 always means project the join of R1 and R2 onto the
attributes of R2

2. (R1,...,Rn) always means the join of R1,...,Rn

3. x where x is an attribute name represents a relation with a single
attribute named x, and containing all the tuples associated with the
domain of attribute x. Often these relations are infinite and can't
be materialised.

This makes it clear for example that

SP.(QTY,P)

is equivalent to what Date would write as

(SP JOIN P) { QTY, P#, PNAME, COLOR, WEIGHT, CITY }

Providing access to (finite) domains as relations has other uses. E.g.

COLOR \ P.COLOR

to find the colours not used by any part.

The downside is that each attribute name must consistently use the
same domain type in all relations it appears in.



Reply With Quote
  #12  
Old   
Philipp Post
 
Posts: n/a

Default Re: Relational query with path expressions - 04-06-2009 , 04:17 AM






Quote:
just a suggestion, start a phony family tree at ancestry.com, try to put in a source for a marriage between two people and then check if ancestry's pathetic interpretation of the hierarchical gedcom so-called standard assigns the source to both spouses.
I do not know how Ancestry implemented it, but a marriage event
usually happens to a couple (in gedcom the FAM record) and as such the
source should be found at the couple if it is done right.

brgds

Philipp Post


Reply With Quote
  #13  
Old   
Brian Selzer
 
Posts: n/a

Default Re: Relational query with path expressions - 04-06-2009 , 07:31 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
I get the impression that a lot of the time, a query (or a large
portion of a query) can be thought of as a navigation from attribute
to attribute using a sequence of joins. This made me think there may
be some merit in thinking of attribute names as global in nature and
the user only has to specify the start and end attributes and the DBMS
is able to deduce the joins that are required. The potential upshot
would be that queries are simplified and the user is freed from the
burden of understanding the way the data has been decomposed into
separate relations. Just recently I read about the so called
Universal Relation which takes this approach. However there are a
number of criticisms - such as those described in a paper by Kent.

I think the biggest problem is that quite often there are multiple
useful paths.
I disagree with your line of reasoning because relation names have
significance. For example, suppose that amongst the myriad relations in the
manufacturing system for a synthetic rubber plant, two stand out: one that
specifies the recipies that /can be used/ to produce batches of rubber, and
another with exactly the same attributes that specifies the recipies that
/are being used/ to produce batches of rubber. Here it is not just the
juxtaposition of attributes that conveys information but also the names of
the relations.

Quote:
For example, consider Chris Date's supplier example
from his book Introduction to Database Systems. The relations are

S(S#, SNAME, STATUS, CITY)
Supplier number S# has name SNAME, status STATUS
and is located in CITY

P(P#, PNAME, COLOR, WEIGHT, CITY)
Part number P# has name PNAME, colour COLOR,
weight WEIGHT in pounds and is located in CITY

SP(S#, P#, QTY)
S# shipped QTY of P#

There are two obvious ways to navigate from S# to P#. One can either
go via the shipments relation SP (where S# and P# appear in the same
relation), or one can join relations S and P on S.CITY = P.CITY.

Presumably to define the Universal Relation one would be forced to
pick one of these paths as the default, and rename attributes
accordingly. E.g. we could arbitrarily assume the default
relationship between suppliers and parts is through the shipments, and
rename CITY as SCITY in S, and CITY as PCITY in P, so they have
distinct roles in the Universal Relation.

It seems that explicit joins are a better way to give the user precise
control over the access paths. However I find it interesting that a 3-
way natural join on S,P and SP imposes both of the above navigation
paths at the same time, and that's rarely what's required.

After thinking about that problem for a while, it occurred to me that
access paths should emphasise navigation from relation to relation,
rather than from attribute to attribute (as suggested by the hyper-
graphs in the literature on the Universal Relation).

The following describes some ideas for a query language on a
relational database based on path expressions that allow relation
names to be specified in the path. It is more closely related to the
relational algebra than the calculus. It's assumed there are no NULLs
and queries work on sets not bags.

Relvars
-------

The name of a base relvar evaluates to its current relation value.
For example, the path expression

P

evaluates to the current value of the relvar named P.

Restriction
-----------

A restriction can be taken using a postfix operator which is a boolean
expression enclosed in square brackets and parameterised on the
attributes of the relation. For example, the path expression

P[COLOR = 'red']

evaluates to the restriction on P where parts are red.

For convenience it's assumed it's not necessary to specify a selector
as in COLOR('red').

Projection
----------

The following path expression subsequently takes the projection onto
attributes P# and WEIGHT:

P[COLOR = 'red'].(P#,WEIGHT)

Note that since path expressions evaluate to sets, we know that
duplicates are removed from the projection.

When projecting onto a single attribute the brackets aren't required.
For example,

S.S#

is the projection of relation S onto attribute S#.

Join with projection
--------------------

Let R1 and R2 be relations, then the path expression R1.R2 denotes the
natural join of R1 and R2 followed by a projection onto the attributes
of R2. For example,

P[COLOR = 'red'].SP

retrieves the tuples in SP associated with the shipment of a red part.

The idea that R1.R2 can drop attributes from R1 may seem strange. A
benefit arises for example in path expressions like R1.R2.R3 because
it can avoid joining on attributes which happen to have the same name
in R1 and R3. This can help to control the joins along a path. For
example,

P[COLOR = 'red'].SP.S.SNAME

retrieves the names of the suppliers who ship at least one red part,
whereas

P[COLOR = 'red'].S.SNAME

returns the names of suppliers that are located in a city where there
are red parts.

This is exactly the distinction discussed in the introduction.

A brief comparison to other query languages
-------------------------------------------

From Date's book, the following SQL retrieves supplier names for
suppliers who supply at least one red part:

SELECT DISTINCT S.SNAME
FROM S
WHERE S.S# IN
( SELECT SP.S#
FROM SP
WHERE SP.P# IN
( SELECT P.P#
FROM P
WHERE P.COLOR = 'red'));

The following expression in the relational algebra is somewhat more
concise:

( ( ( P WHERE COLOR = 'red' )
JOIN SP ) { S# } JOIN S ) { SNAME }

Note that the above needs to project the result of the join of P and
SP onto { S# } before joining with S in order to avoid the join on
P.CITY = S.CITY.

For those interested in the comparison, Date also provides the same
query in the relational calculus and the domain calculus.

Evidently the path expression

P[COLOR = 'red'].SP.S.SNAME

is simpler than the above alternatives.

Join without projection
-----------------------

Given relations R1,...,Rn, (R1,...,Rn) denotes the natural join of
R1,...,Rn. For example,

(S,P)

gets all combinations of supplier and part information such that they
are located in the same city.

Aggregate operators
-------------------

Let R be a relation and every attribute of R is defined on a domain
that supports summation. Then the path expression R.@sum denotes a
relation with the same attribute names as R that contains a single
tuple which maps each attribute name to the sum of that attribute over
tuples from R. If R is empty then R.@sum contains a single tuple full
of zeros. For example,

SP[P# = 'P2'].QTY.@sum

retrieves the total quantity of shipments for part P2.

Similarly we define operators @min, @max, @avg, except there is no
tuple in the output if the input relation is empty (i.e. has no
tuples).

R.@count evaluates to a relation with a single tuple with a single
attribute which gives the number of tuples in R. For example,

S.@count

retrieves the total number of suppliers.

R.@exists evaluates to a relation with a single tuple with a single
boolean attribute that is true iff R is non-empty.

Generalisation of projection
----------------------------

It was stated earlier that R.(x1,...,xn) denotes a projection of R
onto attributes named x1,...,xn. This can be regarded as a special
case of something much more general - where we allow path expressions
instead of just attribute names for each xi.

Formally the result is defined as a union of joins where the union is
taken over the tuples t of R, and the joins are taken over the
relations {t}.xi for each i where {t} denotes the relation consisting
of the single tuple t. Typically the {t}.xi have no attributes in
common so the joins are cartesian products.

For example,

P.(COLOR, SP)

evaluates to the relation which is the natural join of P and SP,
projected onto the attributes COLOR,S#,P#,QTY.

To help understand this it may be helpful to pretend that the dot
operator distributes over the comma separated list. i.e. we think of
P.(COLOR, SP) as (P.COLOR, P.SP). However we don't simply take the
cartesian product P.COLOR x P.SP. Instead we account for the shared
prefix P in the paths which "correlate" these two path expressions.
In other words, for each tuple from P that we use to navigate to its
COLOR, we note that the same tuple is involved in the navigation from
P to SP by joining on P.P# = SP.P#.

It is convenient to support a syntax that adds extra attributes to an
existing relation. The path expression R.(*,x1,...,xn) is equivalent
to R.(y1,...,ym,x1,...,xn) where y1,...,ym are the attributes of R.

The aggregate operators can be used to aggregating over groups. For
example,

P.(P#, SP.QTY.@sum)

returns a relation where for each part, it returns the part number and
the total shipment quantity (which could be zero if there are no
shipments of that part).

Note that

P.(P#, SP.QTY.@min)

retrieves the minimum quantity of shipments for each part number, but
only for parts where there exists at least one shipment by some
supplier.

We can group over more than one attribute. For example,

P.(PNAME,CITY).(*,P.WEIGHT.@sum)

retrieves the sum of the weights, grouped over PNAME and CITY.

As a more complicated example,

P.(P#, SP.(QTY,S.STATUS).(@min,@max,@avg))

retrieves for each part, the part number and the minimum, maximum and
average shipment QTY, and the minimum, maximum and average supplier
status. However for this to make sense we would need some convention
to assign unique attribute names in the output.

Generalisation of restriction
-----------------------------

So far it has have assumed that for the restriction R[b(x1,...,xn)],
the boolean expression b can only be parameterised by attribute names
x1,...,xn from R. This is generalised to allow the xi to be path
expressions.

R[b(x1,...,xn)] is formalised as a restriction on R where for each
tuple t in R, b is evaluated as a boolean function parameterised on
the {t}.xi, which most generally are relations. If however, a {t}.xi
is a relation containing a single tuple defined on a single attribute,
then it can be treated as a scalar value within the boolean
expression.

For example,

P[SP.S.@count > 1].P#

retrieves the part numbers of the parts supplied by more than one
supplier.

P[SP.S# = {'S2'}].P#

retrieves part numbers for parts that are shipped only by supplier S2.

S[not SP.P[P#='P2'].@exists].SNAME

or equivalently,

S[not 'P2' in SP.P.P#].SNAME

retrieves names of suppliers that don't supply part P2.

Renaming
--------

Renaming of attributes is supported. For example

S.(CITY as SCITY, SP.P.CITY as PCITY)

retrieves all pairs of city names (SCITY, PCITY) such that a supplier
in city SCITY supplies a part to city PCITY

Calculated attributes
---------------------

calculated attributes are supported. For example,

P.(P#, WEIGHT*454 as W)[W > 10000]

retrieves part number P# and weight W in grams for each part with
weight > 10000 grams

Path expressions with global scope
----------------------------------

A path expression that begins with '..' is assumed to have a global
scope. For example:

S[STATUS < ..S.STATUS.@max].S#

retrieves supplier numbers for suppliers with status less than the
current maximum status.

As another example

S[SP.P# = ..P.P#].SNAME

retrieves supplier names for suppliers who ship all parts



Reply With Quote
  #14  
Old   
Tegiri Nenashi
 
Posts: n/a

Default Re: Relational query with path expressions - 04-06-2009 , 12:09 PM



On Apr 3, 4:31*am, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
For example, given binary relation R(x,y), one could define a unary
function f that maps a set of x values to a set of y values as follows

* * for all X, f(X) = {y | exists x in X, R(x,y)}

It can be shown that f distributes over union and this "additive"
property on sets is preserved over function composition. *In fact it
is easy to show that composition of these unary functions is
equivalent to a join + projection on the associated binary relations.
My interpretation of this is the following. You took binary named
relation (Codd model operates named relations), and abstracted away
named perspective. Then, you suggest that the join between binary
relations is defined the same way as in algebra of binary relations.
This allows you to express graph queries.



Reply With Quote
  #15  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-06-2009 , 08:37 PM



On Apr 6, 8:31 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:edd73d6f-678e-4a1c-bd1f-ac59e06f9bb1 (AT) s12g2000prc (DOT) googlegroups.com...

I get the impression that a lot of the time, a query (or a large
portion of a query) can be thought of as a navigation from attribute
to attribute using a sequence of joins. This made me think there may
be some merit in thinking of attribute names as global in nature and
the user only has to specify the start and end attributes and the DBMS
is able to deduce the joins that are required. The potential upshot
would be that queries are simplified and the user is freed from the
burden of understanding the way the data has been decomposed into
separate relations. Just recently I read about the so called
Universal Relation which takes this approach. However there are a
number of criticisms - such as those described in a paper by Kent.

I think the biggest problem is that quite often there are multiple
useful paths.

I disagree with your line of reasoning because relation names have
significance. For example, suppose that amongst the myriad relations in the
manufacturing system for a synthetic rubber plant, two stand out: one that
specifies the recipies that /can be used/ to produce batches of rubber, and
another with exactly the same attributes that specifies the recipies that
/are being used/ to produce batches of rubber. Here it is not just the
juxtaposition of attributes that conveys information but also the names of
the relations.
I agree with you and my post took exactly that position by emphasising
relation names in the path expressions. Sorry if my post was
confusing - I was only stating the idea (of using attribute names to
specify "access paths") in order to explain why I was rejecting it!

Note in any case that your argument isn't fool proof because
proponents of the UR use very descriptive attribute names that in
effect distinguish the alternative relations. However Kent suggests
these long winded attribute names can get tiresome in practise.



Reply With Quote
  #16  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-06-2009 , 11:06 PM



On Apr 7, 1:09 am, Tegiri Nenashi <TegiriNena... (AT) gmail (DOT) com> wrote:
Quote:
On Apr 3, 4:31 am, David BL <davi... (AT) iinet (DOT) net.au> wrote:

For example, given binary relation R(x,y), one could define a unary
function f that maps a set of x values to a set of y values as follows

for all X, f(X) = {y | exists x in X, R(x,y)}

It can be shown that f distributes over union and this "additive"
property on sets is preserved over function composition. In fact it
is easy to show that composition of these unary functions is
equivalent to a join + projection on the associated binary relations.

My interpretation of this is the following. You took binary named
relation (Codd model operates named relations), and abstracted away
named perspective. Then, you suggest that the join between binary
relations is defined the same way as in algebra of binary relations.
This allows you to express graph queries.
I'm afraid I didn't follow all of that. I'll explain what I was
stating in more detail...

Let R1(x,y) and R2(y,z) be two named binary relations on named
attributes.

Define unary function fxy that maps sets of x's to sets of y's using
R1 as follows:

for all X, fxy(X) = {y | exists x in X, R1(x,y)}

Similarly define fyz that maps sets of y's to sets of z's using R2

for all Y, fyz(Y) = {z | exists y in Y, R2(y,z)}


Let R3(x,z) be the join of R1,R2 on y, projected onto {x,z}. Let fxz
be derived from R3 as follows:

for all X, fxz(X) = {z | exists x in X, R3(x,z)}

Then it turns out that fxz = fyz o fxy

This justifies the use of these unary functions to express certain
queries, and a functional notation can be simple and intuitive.
However there are serious limitations with the approach - due to the
fact that

1) there is often more than one useful binary relation between a given
pair of attributes; and

2) relations aren't generally decomposable into binary relations in
the sense of a JD.


Given R(x,y), we can define 4 unary functions:

for all X, f1(X) = {y | exists x in X, R(x,y)}
for all Y, f2(Y) = {x | exists y in Y, R(x,y)}
for all X, f3(X) = {y | for all x in X, R(x,y)}
for all Y, f4(Y) = {x | for all y in Y, R(x,y)}

f1,f2 distribute over union (giving them an additive characteristic)
whereas f3,f4 distribute over intersection (giving them a subtractive
characteristic).

Each one of these preserves all the information in R:

R(x,y) = { (x,y) | y in f1({x}) }
= { (x,y) | x in f2({y}) }
= { (x,y) | y in f3({x}) }
= { (x,y) | x in f4({y}) }

It seems that most of the time, queries are concerned with the
existential form. However that is not always the case - such as Date's
example of finding a supplier that supplies /all/ the parts.

These unary functions could be used to answer queries like: Find any
cousin of every friend of every spouse of any ...

I find it curious that the simple "duality" between existential and
universally quantified forms for the unary functions don't seem to map
simply to two alternative join operators in the RA.

Let R1(x,y), R2(y,z) be two relations where y stands for all the
attributes in common. In the query language I defined,

R1.R2 = { (y,z) in R2 | exists (x,y) in R1 }

This can be seen as navigating from R1 into those tuples of R2 that
are related to /any/ of the tuples of R1 according to the shared
attributes.

Consider that we define a new operator * where R1*R2 navigates into
tuples of R2 that are related to /all/ tuples from R1 according to the
shared attributes.

R1*R2 = { (y,z) in R2 | (x,y') in R1 => (y',z) in R2 }

That would allow for the path expression

P*SP.S.S#

which finds supplier numbers of suppliers that ships /all/ the parts.

The following is interesting:

P[color='red']*SP.P#

This finds part numbers for any parts shipped by any supplier that
ships all the red parts. This may include parts that are not red.



Reply With Quote
  #17  
Old   
Tegiri Nenashi
 
Posts: n/a

Default Re: Relational query with path expressions - 04-07-2009 , 11:57 AM



On Apr 6, 9:06*pm, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
I'm afraid I didn't follow all of that.
Let me translate into algebra of binary relations (aka relation
algebra) along your explanation.

Quote:
I'll explain what I was
stating in more detail...

Let R1(x,y) and R2(y,z) be two named binary relations on named
attributes.
Consider BR1 and BR2 as unnamed binary relations.

Quote:
Define unary function fxy that maps sets of x's to sets of y's using
R1 as follows:

* * for all X, fxy(X) = {y | exists x in X, R1(x,y)}

Similarly define fyz that maps sets of y's to sets of z's using R2

* * for all Y, fyz(Y) = {z | exists y in Y, R2(y,z)}

Let R3(x,z) be the join of R1,R2 on y, projected onto {x,z}. *Let fxz
be derived from R3 as follows:

* * for all X, fxz(X) = {z | exists x in X, R3(x,z)}

Then it turns out that fxz = fyz o fxy
In algebra of binary relations we get composition in one step:

BR1 ; BR2

The middle attribute (the second one in the
BR1 and the first one in BR2) is projected away automatically.

If BR1 and BR2 are function (that is a relation with functional
constraint), their composition is function as well.

Why bother with binary relation algebras? Because the assertions (like
composition distributing over union that you mentioned below) are
relation algebra laws.

Quote:
This justifies the use of these unary functions to express certain
queries, and a functional notation can be simple and intuitive.
However there are serious limitations with the approach - due to the
fact that

1) there is often more than one useful binary relation between a given
pair of attributes; and

2) relations aren't generally decomposable into binary relations in
the sense of a JD.

Given R(x,y), we can define 4 unary functions:

* * for all X, f1(X) = {y | exists x in X, R(x,y)}
* * for all Y, f2(Y) = {x | exists y in Y, R(x,y)}
* * for all X, f3(X) = {y | for all x in X, R(x,y)}
* * for all Y, f4(Y) = {x | for all y in Y, R(x,y)}

f1,f2 distribute over union (giving them an additive characteristic)
whereas f3,f4 distribute over intersection (giving them a subtractive
characteristic).
It seems that f3 and f4 correspond to residuation in relation algebra.
Formally, BR1/BR2 the left residual of R1 by R2 is the relation
consisting of all the pairs (x,y) such that for all z (y,z) in BR2
implies (x,z) in BR2.

Quote:
Each one of these preserves all the information in R:

* * R(x,y) = { (x,y) | *y in f1({x}) }
* * * * * *= { (x,y) | *x in f2({y}) }
* * * * * *= { (x,y) | *y in f3({x}) }
* * * * * *= { (x,y) | *x in f4({y}) }

It seems that most of the time, queries are concerned with the
existential form. However that is not always the case - such as Date's
example of finding a supplier that supplies /all/ the parts.

These unary functions could be used to answer queries like: *Find any
cousin of every friend of every spouse of any ...

I find it curious that the simple "duality" between existential and
universally quantified forms for the unary functions don't seem to map
simply to two alternative join operators in the RA.

Let R1(x,y), R2(y,z) be two relations where y stands for all the
attributes in common. *In the query language I defined,

* * R1.R2 = { (y,z) in R2 | exists (x,y) in R1 }

This can be seen as navigating from R1 into those tuples of R2 that
are related to /any/ of the tuples of R1 according to the shared
attributes.

Consider that we define a new operator * where R1*R2 navigates into
tuples of R2 that are related to /all/ tuples from R1 according to the
shared attributes.

* * R1*R2 = { (y,z) in R2 | (x,y') in R1 => (y',z) in R2 }
This is set containment join operator. It is also known as left
residuation operator (the definition above that I copied verbatim from
Vaughan Pratt paper "The origins of calculus of Binary relation").

Quote:
That would allow for the path expression

* * P*SP.S.S#

which finds supplier numbers of suppliers that ships /all/ the parts.

The following is interesting:

* * P[color='red']*SP.P#

This finds part numbers for any parts shipped by any supplier that
ships all the red parts. *This may include parts that are not red.


Reply With Quote
  #18  
Old   
rpost
 
Posts: n/a

Default Re: Relational query with path expressions - 04-07-2009 , 02:40 PM



David BL wrote:

Quote:
I get the impression that a lot of the time, a query (or a large
portion of a query) can be thought of as a navigation from attribute
to attribute using a sequence of joins. This made me think there may
be some merit in thinking of attribute names as global in nature and
the user only has to specify the start and end attributes and the DBMS
is able to deduce the joins that are required.
I like this idea, but it doesn't generalize to joins in general.

Quote:
I think the biggest problem is that quite often there are multiple
useful paths.
[...]

Yes, and global renaming isn't always possible:

PERSON(P#, NAME, DAYOFBIRTH, FATHER#, MOTHER#)

Here, every FATHER# and every MOTHER# is also a P#. Pretty much all
of our queries involving FATHER# and MOTHER# are going to join
them with P#. We can't rename to express those joins.

Splitting it out doesn't help:

PERSON(P#, NAME, DAYOFBIRTH, FATHER#, MOTHER#)
FATHER(FATHER#)
MOTHER(MOTHER#)

We can name FATHER's attribute FATHER# or P#, but not both.

(BTW replacing the # attributes with natural keys doesn't help either.)

Quote:
After thinking about that problem for a while, it occurred to me that
access paths should emphasise navigation from relation to relation,
rather than from attribute to attribute (as suggested by the hyper-
graphs in the literature on the Universal Relation).
Yes, but sometimes you're going to have to specify which attributes
to join on.

[...]

Quote:
The idea that R1.R2 can drop attributes from R1 may seem strange. A
benefit arises for example in path expressions like R1.R2.R3 because
it can avoid joining on attributes which happen to have the same name
in R1 and R3. This can help to control the joins along a path.
Nice idea, but not completely general:
it cannot express joins on FATHER#/MOTHER# above.

[...]

Quote:
Renaming
--------

Renaming of attributes is supported. For example

S.(CITY as SCITY, SP.P.CITY as PCITY)

retrieves all pairs of city names (SCITY, PCITY) such that a supplier
in city SCITY supplies a part to city PCITY
Now I'm happy :-)

I initially assumed that SQL worked like this and was
startled to discover that it doesn't.

--
Reinier


Reply With Quote
  #19  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-07-2009 , 11:02 PM



On Apr 8, 3:40 am, rp... (AT) pcwin518 (DOT) campus.tue.nl (rpost) wrote:
Quote:
David BL wrote:
I get the impression that a lot of the time, a query (or a large
portion of a query) can be thought of as a navigation from attribute
to attribute using a sequence of joins. This made me think there may
be some merit in thinking of attribute names as global in nature and
the user only has to specify the start and end attributes and the DBMS
is able to deduce the joins that are required.

I like this idea, but it doesn't generalize to joins in general.

I think the biggest problem is that quite often there are multiple
useful paths.

[...]

Yes, and global renaming isn't always possible:

PERSON(P#, NAME, DAYOFBIRTH, FATHER#, MOTHER#)

Here, every FATHER# and every MOTHER# is also a P#. Pretty much all
of our queries involving FATHER# and MOTHER# are going to join
them with P#. We can't rename to express those joins.

Splitting it out doesn't help:

PERSON(P#, NAME, DAYOFBIRTH, FATHER#, MOTHER#)
FATHER(FATHER#)
MOTHER(MOTHER#)

We can name FATHER's attribute FATHER# or P#, but not both.

(BTW replacing the # attributes with natural keys doesn't help either.)

After thinking about that problem for a while, it occurred to me that
access paths should emphasise navigation from relation to relation,
rather than from attribute to attribute (as suggested by the hyper-
graphs in the literature on the Universal Relation).

Yes, but sometimes you're going to have to specify which attributes
to join on.

[...]

The idea that R1.R2 can drop attributes from R1 may seem strange. A
benefit arises for example in path expressions like R1.R2.R3 because
it can avoid joining on attributes which happen to have the same name
in R1 and R3. This can help to control the joins along a path.

Nice idea, but not completely general:
it cannot express joins on FATHER#/MOTHER# above.

[...]

Renaming
--------

Renaming of attributes is supported. For example

S.(CITY as SCITY, SP.P.CITY as PCITY)

retrieves all pairs of city names (SCITY, PCITY) such that a supplier
in city SCITY supplies a part to city PCITY

Now I'm happy :-)

I initially assumed that SQL worked like this and was
startled to discover that it doesn't.
Yes, I agree with all that (although I'm not sure what you meant
regarding SQL. It does have the AS keyword for renaming. What did
you mean by that comment?).

As an example, the names of fathers of people named 'Fred' could be
found using

PERSON[NAME='Fred'].(FATHER# as P#).PERSON.NAME

I think it can be useful to name reusable sub-sections of paths. For
example, let

FATHER = (FATHER# as P#).PERSON

allowing the fathers of people named Fred to be specified as

PERSON[NAME='Fred'].FATHER.NAME

FATHER can be regarded as a unary function that maps an input relation
to an output relation. R.FATHER represents the application of FATHER
on input relation R. In this case it maps a set of PERSON tuples to
another set of PERSON tuples (that are their fathers).

We could also define

MOTHER = (MOTHER# as P#).PERSON

If f,g are unary functions that map relations to relations, then
subject to typing rules we could define f union g as follows

for all R, R.(f union g) = R.f union R.g

Similarly we can define f intersect g, f minus g.

This allows us to define

PARENT = MOTHER union FATHER

CHILD = (P# as FATHER#).PERSON union (P# as MOTHER#).PERSON

SIBLING = PARENT.CHILD minus I

COUSIN = PARENT.SIBLING.CHILD

where 'I' is the identity function (that maps any relation to itself).

I think my comments elsewhere on "additive" versus "subtractive" unary
functions are applicable in this setting. I think function
composition preserves additivity and it also preserves subtractivity.
However, for example union of two functions preserves additivity but
not subtractivity whereas intersection of two functions preserves
subtractivity but not additivity. The difference operator preserves
neither. For example 'SIBLING' is not additive (even though
PARENT.CHILD and the identity are additive), meaning that it doesn't
distribute over union. That suggests it must be interpreted carefully
when used in a path. Eg if you ask for the siblings of all the
children of a given parent it will return the empty set!

Let C be a literal relation value. Then one can define f intersect C
as follows:

for all R, R.(f intersect C) = R.f intersect C

Similarly one can define

f union C
C minus f
f minus C

If f is additive then I think the following are necessarily additive

f intersect C
f minus C

but not necessarily

f union C
C minus f.

Another interesting idea is whether it's possible to define operators
that map between the forms f1,f2,f3,f4 that I described in another
post. More to the point I'm thinking there is a concept of additive
and subtractive closures as well as additive and subtractive inverses
of any given unary function defined on relations.

Let f be a given unary function on relations. Let x stand for all the
attributes in the input relations to f, and y stand for all the
attributes in the output relations from f. Consider that we ignore
the behaviour of f on input relations that don't contain a single
tuple. In fact we define a relation

G(x,y) = { (x,y) | exists y in {x}.f }

which is kind of like the graph of a function. From G, we can define
four unary functions that map relations to relations:

f1(X)={y| exists x in X, G(x,y)} (additive closure)
f2(Y)={x| exists y in Y, G(x,y)} (additive inverse)
f3(X)={y| for all x in X, G(x,y)} (subtractive closure)
f4(Y)={x| for all y in Y, G(x,y)} (subtractive inverse)

For example, the additive closure of SIBLING is the same as
PARENT.CHILD. The additive inverse of PARENT is CHILD and vice
versa.

Just in case I didn't spell it out clearly enough, please note that
additive/subtractive closures/inverses are defined on unary functions
on relations irrespective of the number of dimensions of the input and
output relations.



Reply With Quote
  #20  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-08-2009 , 12:40 AM



On Apr 8, 12:02 pm, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
G(x,y) = { (x,y) | exists y in {x}.f }

Oops - It doesn't make sense for y to be bound in that sub-expression,
so this should have been

G(x,y) = { (x,y) | y in {x}.f }

Note BTW that this notation is quite sloppy because within the set-
builder notation x,y are bound to tuples rather than representing
names of attributes from relations input to f or output from f
respectively.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.