dbTalk Databases Forums  

Relational query with path expressions

comp.databases.theory comp.databases.theory


Discuss Relational query with path expressions in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
David BL
 
Posts: n/a

Default Relational query with path expressions - 04-02-2009 , 10:58 PM






I get the impression that a lot of the time, a query (or a large
portion of a query) can be thought of as a navigation from attribute
to attribute using a sequence of joins. This made me think there may
be some merit in thinking of attribute names as global in nature and
the user only has to specify the start and end attributes and the DBMS
is able to deduce the joins that are required. The potential upshot
would be that queries are simplified and the user is freed from the
burden of understanding the way the data has been decomposed into
separate relations. Just recently I read about the so called
Universal Relation which takes this approach. However there are a
number of criticisms - such as those described in a paper by Kent.

I think the biggest problem is that quite often there are multiple
useful paths. For example, consider Chris Date's supplier example
from his book Introduction to Database Systems. The relations are

S(S#, SNAME, STATUS, CITY)
Supplier number S# has name SNAME, status STATUS
and is located in CITY

P(P#, PNAME, COLOR, WEIGHT, CITY)
Part number P# has name PNAME, colour COLOR,
weight WEIGHT in pounds and is located in CITY

SP(S#, P#, QTY)
S# shipped QTY of P#

There are two obvious ways to navigate from S# to P#. One can either
go via the shipments relation SP (where S# and P# appear in the same
relation), or one can join relations S and P on S.CITY = P.CITY.

Presumably to define the Universal Relation one would be forced to
pick one of these paths as the default, and rename attributes
accordingly. E.g. we could arbitrarily assume the default
relationship between suppliers and parts is through the shipments, and
rename CITY as SCITY in S, and CITY as PCITY in P, so they have
distinct roles in the Universal Relation.

It seems that explicit joins are a better way to give the user precise
control over the access paths. However I find it interesting that a 3-
way natural join on S,P and SP imposes both of the above navigation
paths at the same time, and that's rarely what's required.

After thinking about that problem for a while, it occurred to me that
access paths should emphasise navigation from relation to relation,
rather than from attribute to attribute (as suggested by the hyper-
graphs in the literature on the Universal Relation).

The following describes some ideas for a query language on a
relational database based on path expressions that allow relation
names to be specified in the path. It is more closely related to the
relational algebra than the calculus. It's assumed there are no NULLs
and queries work on sets not bags.

Relvars
-------

The name of a base relvar evaluates to its current relation value.
For example, the path expression

P

evaluates to the current value of the relvar named P.

Restriction
-----------

A restriction can be taken using a postfix operator which is a boolean
expression enclosed in square brackets and parameterised on the
attributes of the relation. For example, the path expression

P[COLOR = 'red']

evaluates to the restriction on P where parts are red.

For convenience it's assumed it's not necessary to specify a selector
as in COLOR('red').

Projection
----------

The following path expression subsequently takes the projection onto
attributes P# and WEIGHT:

P[COLOR = 'red'].(P#,WEIGHT)

Note that since path expressions evaluate to sets, we know that
duplicates are removed from the projection.

When projecting onto a single attribute the brackets aren't required.
For example,

S.S#

is the projection of relation S onto attribute S#.

Join with projection
--------------------

Let R1 and R2 be relations, then the path expression R1.R2 denotes the
natural join of R1 and R2 followed by a projection onto the attributes
of R2. For example,

P[COLOR = 'red'].SP

retrieves the tuples in SP associated with the shipment of a red part.

The idea that R1.R2 can drop attributes from R1 may seem strange. A
benefit arises for example in path expressions like R1.R2.R3 because
it can avoid joining on attributes which happen to have the same name
in R1 and R3. This can help to control the joins along a path. For
example,

P[COLOR = 'red'].SP.S.SNAME

retrieves the names of the suppliers who ship at least one red part,
whereas

P[COLOR = 'red'].S.SNAME

returns the names of suppliers that are located in a city where there
are red parts.

This is exactly the distinction discussed in the introduction.

A brief comparison to other query languages
-------------------------------------------

From Date's book, the following SQL retrieves supplier names for
suppliers who supply at least one red part:

SELECT DISTINCT S.SNAME
FROM S
WHERE S.S# IN
( SELECT SP.S#
FROM SP
WHERE SP.P# IN
( SELECT P.P#
FROM P
WHERE P.COLOR = 'red'));

The following expression in the relational algebra is somewhat more
concise:

( ( ( P WHERE COLOR = 'red' )
JOIN SP ) { S# } JOIN S ) { SNAME }

Note that the above needs to project the result of the join of P and
SP onto { S# } before joining with S in order to avoid the join on
P.CITY = S.CITY.

For those interested in the comparison, Date also provides the same
query in the relational calculus and the domain calculus.

Evidently the path expression

P[COLOR = 'red'].SP.S.SNAME

is simpler than the above alternatives.

Join without projection
-----------------------

Given relations R1,...,Rn, (R1,...,Rn) denotes the natural join of
R1,...,Rn. For example,

(S,P)

gets all combinations of supplier and part information such that they
are located in the same city.

Aggregate operators
-------------------

Let R be a relation and every attribute of R is defined on a domain
that supports summation. Then the path expression R.@sum denotes a
relation with the same attribute names as R that contains a single
tuple which maps each attribute name to the sum of that attribute over
tuples from R. If R is empty then R.@sum contains a single tuple full
of zeros. For example,

SP[P# = 'P2'].QTY.@sum

retrieves the total quantity of shipments for part P2.

Similarly we define operators @min, @max, @avg, except there is no
tuple in the output if the input relation is empty (i.e. has no
tuples).

R.@count evaluates to a relation with a single tuple with a single
attribute which gives the number of tuples in R. For example,

S.@count

retrieves the total number of suppliers.

R.@exists evaluates to a relation with a single tuple with a single
boolean attribute that is true iff R is non-empty.

Generalisation of projection
----------------------------

It was stated earlier that R.(x1,...,xn) denotes a projection of R
onto attributes named x1,...,xn. This can be regarded as a special
case of something much more general - where we allow path expressions
instead of just attribute names for each xi.

Formally the result is defined as a union of joins where the union is
taken over the tuples t of R, and the joins are taken over the
relations {t}.xi for each i where {t} denotes the relation consisting
of the single tuple t. Typically the {t}.xi have no attributes in
common so the joins are cartesian products.

For example,

P.(COLOR, SP)

evaluates to the relation which is the natural join of P and SP,
projected onto the attributes COLOR,S#,P#,QTY.

To help understand this it may be helpful to pretend that the dot
operator distributes over the comma separated list. i.e. we think of
P.(COLOR, SP) as (P.COLOR, P.SP). However we don't simply take the
cartesian product P.COLOR x P.SP. Instead we account for the shared
prefix P in the paths which "correlate" these two path expressions.
In other words, for each tuple from P that we use to navigate to its
COLOR, we note that the same tuple is involved in the navigation from
P to SP by joining on P.P# = SP.P#.

It is convenient to support a syntax that adds extra attributes to an
existing relation. The path expression R.(*,x1,...,xn) is equivalent
to R.(y1,...,ym,x1,...,xn) where y1,...,ym are the attributes of R.

The aggregate operators can be used to aggregating over groups. For
example,

P.(P#, SP.QTY.@sum)

returns a relation where for each part, it returns the part number and
the total shipment quantity (which could be zero if there are no
shipments of that part).

Note that

P.(P#, SP.QTY.@min)

retrieves the minimum quantity of shipments for each part number, but
only for parts where there exists at least one shipment by some
supplier.

We can group over more than one attribute. For example,

P.(PNAME,CITY).(*,P.WEIGHT.@sum)

retrieves the sum of the weights, grouped over PNAME and CITY.

As a more complicated example,

P.(P#, SP.(QTY,S.STATUS).(@min,@max,@avg))

retrieves for each part, the part number and the minimum, maximum and
average shipment QTY, and the minimum, maximum and average supplier
status. However for this to make sense we would need some convention
to assign unique attribute names in the output.

Generalisation of restriction
-----------------------------

So far it has have assumed that for the restriction R[b(x1,...,xn)],
the boolean expression b can only be parameterised by attribute names
x1,...,xn from R. This is generalised to allow the xi to be path
expressions.

R[b(x1,...,xn)] is formalised as a restriction on R where for each
tuple t in R, b is evaluated as a boolean function parameterised on
the {t}.xi, which most generally are relations. If however, a {t}.xi
is a relation containing a single tuple defined on a single attribute,
then it can be treated as a scalar value within the boolean
expression.

For example,

P[SP.S.@count > 1].P#

retrieves the part numbers of the parts supplied by more than one
supplier.

P[SP.S# = {'S2'}].P#

retrieves part numbers for parts that are shipped only by supplier S2.

S[not SP.P[P#='P2'].@exists].SNAME

or equivalently,

S[not 'P2' in SP.P.P#].SNAME

retrieves names of suppliers that don't supply part P2.

Renaming
--------

Renaming of attributes is supported. For example

S.(CITY as SCITY, SP.P.CITY as PCITY)

retrieves all pairs of city names (SCITY, PCITY) such that a supplier
in city SCITY supplies a part to city PCITY

Calculated attributes
---------------------

calculated attributes are supported. For example,

P.(P#, WEIGHT*454 as W)[W > 10000]

retrieves part number P# and weight W in grams for each part with
weight > 10000 grams

Path expressions with global scope
----------------------------------

A path expression that begins with '..' is assumed to have a global
scope. For example:

S[STATUS < ..S.STATUS.@max].S#

retrieves supplier numbers for suppliers with status less than the
current maximum status.

As another example

S[SP.P# = ..P.P#].SNAME

retrieves supplier names for suppliers who ship all parts

Reply With Quote
  #2  
Old   
Tegiri Nenashi
 
Posts: n/a

Default Re: Relational query with path expressions - 04-02-2009 , 11:46 PM






Granted, if you have a choice between join and generalized projection
you can infer the type of binary operator from the type of operand. In
the expression

x.y

if both are relations, you interpret it as join, and if one is a set
of attributes, than its a projection. That is some economy of
notation, although union, difference, and renaming still have to be
explicit?


Reply With Quote
  #3  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-03-2009 , 12:41 AM



On Apr 3, 12:46 pm, Tegiri Nenashi <TegiriNena... (AT) gmail (DOT) com> wrote:
Quote:
Granted, if you have a choice between join and generalized projection
you can infer the type of binary operator from the type of operand. In
the expression

x.y

if both are relations, you interpret it as join, and if one is a set
of attributes, than its a projection. That is some economy of
notation, although union, difference, and renaming still have to be
explicit?
Yes union, intersection and difference must be explicit. For example,

P[WEIGHT > 16].P# union SP[S# = 'S2'].P#

to retrieve the part numbers for parts that either weigh more than 16
pounds or are supplied by supplier S2 or both.

Of course quite often these set theoretic operators can be mapped to
boolean operators inside a restriction. E.g.,

P[WEIGHT > 16 or 'S2' in SP.S.S#].P#

I think equivalence of these depends on a FK constraint from SP.P# to
P.P#.

I was thinking rename could be made more convenient by embellishing
the syntactic sugar that the asterisk already provides. Normally the
asterisk expands into all the attribute names from the input relation.

For example,

P( *(-COLOR, WEIGHT as W), SP.S.SNAME )

is equivalent to

P( P#, PNAME, WEIGHT as W, CITY, SP.S.SNAME )




Reply With Quote
  #4  
Old   
cimode@hotmail.com
 
Posts: n/a

Default Re: Relational query with path expressions - 04-03-2009 , 04:32 AM



On 3 avr, 05:58, David BL <davi... (AT) iinet (DOT) net.au> wrote:
<<I get the impression that a lot of the time, a query (or a large
portion of a query) can be thought of as a navigation from attribute
to attribute using a sequence of joins. This made me think there may
be some merit in thinking of attribute names as global in nature and
the user only has to specify the start and end attributes and the DBMS
is able to deduce the joins that are required>>
Sorry to be so picky but the qualification of an *operation* as a
*navigation* has undertones that draw vagueness.

Your intuition however allows to rediscover the importance of
definition of specialization by constraint from domains as well
derived relations.

Regards...

Reply With Quote
  #5  
Old   
cimode@hotmail.com
 
Posts: n/a

Default Re: Relational query with path expressions - 04-03-2009 , 04:34 AM



<<the user only has to specify the start and end attributes and the
DBMS is able to deduce the joins that are required>>
There is no such thing as a *start and end* attribute in a relation.


Reply With Quote
  #6  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-03-2009 , 06:31 AM



On Apr 3, 5:34 pm, cim... (AT) hotmail (DOT) com wrote:
Quote:
the user only has to specify the start and end attributes and the
DBMS is able to deduce the joins that are required

There is no such thing as a *start and end* attribute in a relation.
I meant that the user specifies a "start attribute", and an "end
attribute" and supposing there is a Universal Relation there would be
a projection on those two attributes which is a binary relation
connecting them. This means it's possible to define a unary function
that maps sets of the "start attribute" to sets of the "end
attribute".

For example, given binary relation R(x,y), one could define a unary
function f that maps a set of x values to a set of y values as follows

for all X, f(X) = {y | exists x in X, R(x,y)}

It can be shown that f distributes over union and this "additive"
property on sets is preserved over function composition. In fact it
is easy to show that composition of these unary functions is
equivalent to a join + projection on the associated binary relations.

Of course, n-ary relations aren't generally decomposable to binary
relations (in the sense of a JD) so this simplistic idea of composing
unary functions cannot express many useful queries. In other words,
path expressions that only try to navigate from attribute to attribute
would be quite limiting.

That being said there are cases where the approach could be quite
useful. For example, let x.f denote the application of unary function
f to x, and let there be a many to many binary relation between parent
and child, and we define corresponding parent and child unary
functions.

We can use these functions to construct queries. For example, for any
X we define:

X.grandparent = X.parent.parent

or just

grandparent = parent.parent

where the dot now denotes function composition rather than function
application. The definition of the grandparent relation would
conventionally require a self-join on the binary parent-child
relation.

As another example, we could define a sibling function as follows

X.sibling = X.parent.child \ X

where we have used a set difference operator to ensure a person is
never regarded as his or her own sibling.

The sibling function helps us define the cousin function:

X.cousin = X.parent.sibling.child

etc




Reply With Quote
  #7  
Old   
paul c
 
Posts: n/a

Default Re: Relational query with path expressions - 04-03-2009 , 10:05 PM



David BL wrote:
....

at least one nuance of TCLOSE here without mention of recursive
structures to ease the practical problems. more mixing up of physical
and logical, not to mention the 'choices' presented as alternatives,
when one of them Codd called the 'connection trap' about forty years
ago.


just a suggestion, start a phony family tree at ancestry.com, try to
put in a source for a marriage between two people and then check if
ancestry's pathetic interpretation of the hierarchical gedcom
so-called standard assigns the source to both spouses.


also consider the rather logical definition of cousin on wikipedia and
ask whether the offered syntax can ever possibly deal with zeroth
cousins (siblings) or second cousins, not to mention cousins several
times removes.


you have every right to express various motives here, but they look to
me, too scatterbrained for most non-mystics to comprehend.

Reply With Quote
  #8  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-03-2009 , 11:32 PM



On Apr 4, 11:05 am, paul c <toledobythe... (AT) oohay (DOT) ac> wrote:
Quote:
David BL wrote:

...

at least one nuance of TCLOSE here without mention of recursive
structures to ease the practical problems. more mixing up of physical
and logical, not to mention the 'choices' presented as alternatives,
when one of them Codd called the 'connection trap' about forty years
ago.
Where was there a nuance of TCLOSE?

Mixing of physical and logical? You think the path expressions imply
some mapping to a physical implementation? Actually they are nothing
more than a convenient way to express queries at a logical level for a
given a relational schema.

The proof is in the pudding as they say. Whether you dislike the
intuition behind the path expressions or not, you have to admit they
are remarkably concise for every example from Date's Intro book. Care
to comment on that?

With regards to the 'connection trap', I already commented that n-ary
relations are not generally decomposable into binary relations in the
sense of a join dependency. You appear to have missed the point
(where I rejected the idea of "navigating" from attribute to attribute
along an "access path").

Quote:
just a suggestion, start a phony family tree at ancestry.com, try to
put in a source for a marriage between two people and then check if
ancestry's pathetic interpretation of the hierarchical gedcom
so-called standard assigns the source to both spouses.

also consider the rather logical definition of cousin on wikipedia and
ask whether the offered syntax can ever possibly deal with zeroth
cousins (siblings) or second cousins, not to mention cousins several
times removes.

you have every right to express various motives here, but they look to
me, too scatterbrained for most non-mystics to comprehend.
Evidently you don't realise I was regarding the unary functions
derived from binary relations as merely a curiosity, and not to be
taken too seriously.

For someone who thinks of himself as a non-mystic, you seem pre-
occupied with intuition rather than the mathematical content of my
original post, which defined a query language. If you think it's
problematic then please explain why without your hand waving.



Reply With Quote
  #9  
Old   
paul c
 
Posts: n/a

Default Re: Relational query with path expressions - 04-04-2009 , 04:22 PM



David BL wrote:
....
Quote:
For someone who thinks of himself as a non-mystic, you seem pre-
occupied with intuition rather than the mathematical content of my
original post, which defined a query language. If you think it's
problematic then please explain why without your hand waving.

Never said I wasn't a mystic, most people have some of that quality
and don't admit it, but when machines are involved, it must be kept
under control.


Looks like all you are talking about is syntax.


Reply With Quote
  #10  
Old   
David BL
 
Posts: n/a

Default Re: Relational query with path expressions - 04-05-2009 , 08:31 PM



On Apr 5, 5:22 am, paul c <toledobythe... (AT) oohay (DOT) ac> wrote:
Quote:
David BL wrote:

...

For someone who thinks of himself as a non-mystic, you seem pre-
occupied with intuition rather than the mathematical content of my
original post, which defined a query language. If you think it's
problematic then please explain why without your hand waving.

Never said I wasn't a mystic, most people have some of that quality
and don't admit it, but when machines are involved, it must be kept
under control.

Looks like all you are talking about is syntax.
Yes.

Sorry for being a little hostile ;-)



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.