dbTalk Databases Forums  

RM formalism supporting partial information

comp.databases.theory comp.databases.theory


Discuss RM formalism supporting partial information in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #81  
Old   
Jan Hidders
 
Posts: n/a

Default Re: RM formalism supporting partial information - 11-28-2007 , 09:23 AM






On 28 nov, 15:05, "David Cressey" <cresse... (AT) verizon (DOT) net> wrote:
Quote:
"paul c" <toledobythe... (AT) ooyah (DOT) ac> wrote in message

news:e_d3j.62082$cD.25240 (AT) pd7urf2no (DOT) ..



Jan Hidders wrote:
On 28 nov, 01:58, David BL <davi... (AT) iinet (DOT) net.au> wrote:
...
Consider a query to find all the 27 year old pilots from a census
recorded in an RDB. If the age or occupation is missing we could
think of the person as a possible answer.

I believe there is a terminology problem here concerning the terms
"possible answers" and "certain answers". In the context of research
on incomplete databases (i.e. anywhere the classical CWA does not
apply fully) that usually means the following. Given a query and the
assumptions about "closedness" the set all tuples with the right
header can be partitioned into three groups: the certain answers
(those that are certain to be in the result of the query on the
omniscient database), the possible answers (those that might be in the
aforementioned result) and the impossible answers (those that are
certain not to be in the aforementioned result).

In that sense the tuple describing the person you mentioned above
(presuming it is projected on the non-null fields) is a certain
answer, not a possible answer.
...

When it comes to a public census I believe the possible answers or
non-answers are planned for. As Bob B pointed out a "Don't Know"
response or even a refusal is often considered a specific answer, ie.,
some number of those is expected. Interesting that even statisticians
who are more interested in probability than db theory do this. Seems
quite different from the usual null examples. (I'm not touting census
methods in general - I've seen outrageous cheating by census-takers,
making up answers or even non-existent people in order to meet quota
maximums for DK's/NA's.)

A database derived from census data might be about two different subject
matters:

The first is the responses to the census questions.
The second is the demographics the census purports to pin down.

If it's the first, a "Don't Know" is a specific answer, and should be
recorded as such.
If it's the second, a "Don't Know" probably means that the database
doesn't know.
Exactly. An which interpretation applies determines which CWA should
be assumed.

Quote:
BTW, in the case on the query about 27 year old pilots, a person with a
missing age and a person with a missing occupation are clear enough. But
what about a person who is missing from the database altogether? Is that
not a possible answer?
Yes, it can be.

-- Jan Hidders


Reply With Quote
  #82  
Old   
David BL
 
Posts: n/a

Default Re: RM formalism supporting partial information - 11-28-2007 , 08:54 PM






On Nov 29, 12:15 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:
Quote:
On 28 nov, 01:58, David BL <davi... (AT) iinet (DOT) net.au> wrote:





On Nov 27, 9:43 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 26 nov, 15:06, David BL <davi... (AT) iinet (DOT) net.au> wrote:
On Nov 26, 7:47 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:
On 26 nov, 08:52, David BL <davi... (AT) iinet (DOT) net.au> wrote:

Firstly a minor nit pick: you can't say "possible answers", because
they don't actually represent an upper bound on the result in the
omniscient database.

?? They do so by definition.

What I meant was that unless CWA is available on an appropriate
projection there may be so much missing information (eg all
information about an entity) that the query purported to return the
"possible answers" does no such thing. ie it suffers a similar
problem to negation (it returns neither the certain nor the possible
answers).

I'm not sure what you mean by "the query purported to return the
'possible answers'". If the user formulates a query then this will now
include an indication of whether he or she wants the possible/certain
answers. It is up to the DBMS to efficiently compute the answer, and
this is not necessarily done by the usual translation of calculus to
algebra or even one very similar to it.

Consider a query to find all the 27 year old pilots from a census
recorded in an RDB. If the age or occupation is missing we could
think of the person as a possible answer. However we cannot say the
query returns all possible answers unless we assume every person took
part in the census.

Ok. Forget my other reply, for some reason I had missed something very
simple. Whether the suggested computation gives you all possible
answers or not depends on the query that is being asked. If it
concerned only the persons that took part in the census and you are
assuming the CWA for the value-unknown interpretation, then it does.
If you really meant all persons, then it doesn't, and you need another
computation if you want that answer.
The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.

Did you read my response to Brian regarding the approach to absorb the
CWA/OWA distinction into the intensional definitions?

What do you think of the suggestion that the formalism (which is
concerned with extensions rather than intensions)

1) ignores the CWA/OWA distinction;

2) assumes the CWA applies everywhere; and

3) null is *always* interpreted as non-existence w.r.t.
the (carefully worded) intensional definitions?

This approach seems simple and self consistent. In fact I think it
gives the best of both worlds - it becomes

a) Zaniolo's interpretation of no-information when the
OWA applies in the intensional definition; or else

b) (actual) non-existence when the CWA applies in the
intensional definition.

It doesn't however, attempt to model the case of "value exists but is
unknown". IMO that case should be modeled *explicitly* with a
different predicate. More specifically I would say a distinct base
relvar is required.

As an example suppose we have the predicate owns_car(Person,Car), and
we want to be able to record persons that are known to own some
unknown car. That implies that the intensional definition of
owns_car has an OWA. Therefore we introduce a distinct base relvar
owns_some_car(Person) that must satisfy the following integrity
constraint

owns_car(Person,Car) => owns_some_car(Person)

However the converse is false because of the OWA nature of
owns_car(Person,Car). As a result owns_some_car(Person) is not equal
to a projection on owns_car(Person,Car).

It is interesting to note that the intensional definition of
owns_some_car can itself involve an OWA.




Reply With Quote
  #83  
Old   
Jan Hidders
 
Posts: n/a

Default Re: RM formalism supporting partial information - 11-30-2007 , 09:48 AM



On 29 nov, 03:54, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
On Nov 29, 12:15 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:



On 28 nov, 01:58, David BL <davi... (AT) iinet (DOT) net.au> wrote:

On Nov 27, 9:43 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 26 nov, 15:06, David BL <davi... (AT) iinet (DOT) net.au> wrote:
On Nov 26, 7:47 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:
On 26 nov, 08:52, David BL <davi... (AT) iinet (DOT) net.au> wrote:

Firstly a minor nit pick: you can't say "possible answers", because
they don't actually represent an upper bound on the result in the
omniscient database.

?? They do so by definition.

What I meant was that unless CWA is available on an appropriate
projection there may be so much missing information (eg all
information about an entity) that the query purported to return the
"possible answers" does no such thing. ie it suffers a similar
problem to negation (it returns neither the certain nor the possible
answers).

I'm not sure what you mean by "the query purported to return the
'possible answers'". If the user formulates a query then this will now
include an indication of whether he or she wants the possible/certain
answers. It is up to the DBMS to efficiently compute the answer, and
this is not necessarily done by the usual translation of calculus to
algebra or even one very similar to it.

Consider a query to find all the 27 year old pilots from a census
recorded in an RDB. If the age or occupation is missing we could
think of the person as a possible answer. However we cannot say the
query returns all possible answers unless we assume every person took
part in the census.

Ok. Forget my other reply, for some reason I had missed something very
simple. Whether the suggested computation gives you all possible
answers or not depends on the query that is being asked. If it
concerned only the persons that took part in the census and you are
assuming the CWA for the value-unknown interpretation, then it does.
If you really meant all persons, then it doesn't, and you need another
computation if you want that answer.

The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.
The concept of 'possible answers' applies and is well defined for all
databases where you have precisely defined what it means if certain
data is missing, and note that his includes the definition that says
that it means nothing. So what you mean by "isn't universally
applicable" is completely beyond my comprehension.

Quote:
Did you read my response to Brian regarding the approach to absorb the
CWA/OWA distinction into the intensional definitions?
Yes, I did. Typical case of "let's redefine our terminology to make
the problem go away". :-) It won't do.

Quote:
What do you think of the suggestion that the formalism (which is
concerned with extensions rather than intensions)

1) ignores the CWA/OWA distinction;

2) assumes the CWA applies everywhere; and

3) null is *always* interpreted as non-existence w.r.t.
the (carefully worded) intensional definitions?

This approach seems simple and self consistent.
If I ignore for the moment 1) (because 1) and 2) seem contradictory
because I cannot assume there is no difference between X and Y and at
the same time assume that only Y applies everywhere) this is just the
classical value-does-not-apply interpretation.

Quote:
It doesn't however, attempt to model the case of "value exists but is
unknown". IMO that case should be modeled *explicitly* with a
different predicate.Of
Sure, the value-does-not-apply interpretation can always also be
represented without null values.

The thing is that you have now fully ignored the real problem of
incomplete information which is that in practice the CWA does not
always fully apply. Your main solution seems to be to redefine the
meaning of the relations such that it does, which, of course, doesn't
solve anything at all and simply puts the problem back on the plate of
the user.

-- Jan Hidders


Reply With Quote
  #84  
Old   
David BL
 
Posts: n/a

Default Re: RM formalism supporting partial information - 11-30-2007 , 09:17 PM



On Dec 1, 12:48 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:
Quote:
On 29 nov, 03:54, David BL <davi... (AT) iinet (DOT) net.au> wrote:





On Nov 29, 12:15 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 28 nov, 01:58, David BL <davi... (AT) iinet (DOT) net.au> wrote:

On Nov 27, 9:43 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 26 nov, 15:06, David BL <davi... (AT) iinet (DOT) net.au> wrote:
On Nov 26, 7:47 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:
On 26 nov, 08:52, David BL <davi... (AT) iinet (DOT) net.au> wrote:

Firstly a minor nit pick: you can't say "possible answers", because
they don't actually represent an upper bound on the result in the
omniscient database.

?? They do so by definition.

What I meant was that unless CWA is available on an appropriate
projection there may be so much missing information (eg all
information about an entity) that the query purported to return the
"possible answers" does no such thing. ie it suffers a similar
problem to negation (it returns neither the certain nor the possible
answers).

I'm not sure what you mean by "the query purported to return the
'possible answers'". If the user formulates a query then this will now
include an indication of whether he or she wants the possible/certain
answers. It is up to the DBMS to efficiently compute the answer, and
this is not necessarily done by the usual translation of calculus to
algebra or even one very similar to it.

Consider a query to find all the 27 year old pilots from a census
recorded in an RDB. If the age or occupation is missing we could
think of the person as a possible answer. However we cannot say the
query returns all possible answers unless we assume every person took
part in the census.

Ok. Forget my other reply, for some reason I had missed something very
simple. Whether the suggested computation gives you all possible
answers or not depends on the query that is being asked. If it
concerned only the persons that took part in the census and you are
assuming the CWA for the value-unknown interpretation, then it does.
If you really meant all persons, then it doesn't, and you need another
computation if you want that answer.

The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.

The concept of 'possible answers' applies and is well defined for all
databases where you have precisely defined what it means if certain
data is missing, and note that his includes the definition that says
that it means nothing. So what you mean by "isn't universally
applicable" is completely beyond my comprehension.
Consider the following predicates, all with OWA intensional
definitions

age(Person,Age)
occupation(Person, Occupation)
married(Person,Person)
died(Person,Date)

You say the concept of possible answers is well defined. How exactly
would you calculate the possible 27 year old pilots? What does it
mean precisely?

Quote:
Did you read my response to Brian regarding the approach to absorb the
CWA/OWA distinction into the intensional definitions?

Yes, I did. Typical case of "let's redefine our terminology to make
the problem go away". :-) It won't do.

What do you think of the suggestion that the formalism (which is
concerned with extensions rather than intensions)

1) ignores the CWA/OWA distinction;

2) assumes the CWA applies everywhere; and

3) null is *always* interpreted as non-existence w.r.t.
the (carefully worded) intensional definitions?

This approach seems simple and self consistent.

If I ignore for the moment 1) (because 1) and 2) seem contradictory
because I cannot assume there is no difference between X and Y and at
the same time assume that only Y applies everywhere) this is just the
classical value-does-not-apply interpretation.
I meant that the actual CWA/OWA distinction is absorbed into the
intensional definition, so that it can be assumed that with respect to
the intensional definition the formalism assumes a CWA. I thought
that was clear.

Quote:
It doesn't however, attempt to model the case of "value exists but is
unknown". IMO that case should be modeled *explicitly* with a
different predicate.Of

Sure, the value-does-not-apply interpretation can always also be
represented without null values.

The thing is that you have now fully ignored the real problem of
incomplete information which is that in practice the CWA does not
always fully apply. Your main solution seems to be to redefine the
meaning of the relations such that it does, which, of course, doesn't
solve anything at all and simply puts the problem back on the plate of
the user.
You say "of course doesn't solve anything at all" without giving any
hint at all why you say that. Can you elaborate?

What problem doesn't it address? Can you provide a specific example?



Reply With Quote
  #85  
Old   
Jan Hidders
 
Posts: n/a

Default Re: RM formalism supporting partial information - 12-02-2007 , 06:22 AM



On 1 dec, 04:17, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
On Dec 1, 12:48 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:



On 29 nov, 03:54, David BL <davi... (AT) iinet (DOT) net.au> wrote:

The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.

The concept of 'possible answers' applies and is well defined for all
databases where you have precisely defined what it means if certain
data is missing, and note that his includes the definition that says
that it means nothing. So what you mean by "isn't universally
applicable" is completely beyond my comprehension.

Consider the following predicates, all with OWA intensional
definitions

age(Person,Age)
occupation(Person, Occupation)
married(Person,Person)
died(Person,Date)

You say the concept of possible answers is well defined. How exactly
would you calculate the possible 27 year old pilots?
You seem to assume that "well defined" and "can be computed" is the
same, which it isn't. But to answer your question, assuming that
everybody has only one occupation that would be every person p for
which there is no tuple (p, a) with a<>27 in relation age, and no
tuple (p, o) with o<>"pilot" in relation occupation. If the domain of
Person is not finite, or restricted by a relation person(Person) then
the result may be infinite.

Quote:
What does it mean precisely?
It contains every person that might be a 27 year old pilot as far as
the given database is concerned.

Quote:
What do you think of the suggestion that the formalism (which is
concerned with extensions rather than intensions)

1) ignores the CWA/OWA distinction;

2) assumes the CWA applies everywhere; and

3) null is *always* interpreted as non-existence w.r.t.
the (carefully worded) intensional definitions?

This approach seems simple and self consistent.

If I ignore for the moment 1) (because 1) and 2) seem contradictory
because I cannot assume there is no difference between X and Y and at
the same time assume that only Y applies everywhere) this is just the
classical value-does-not-apply interpretation.

I meant that the actual CWA/OWA distinction is absorbed into the
intensional definition, so that it can be assumed that with respect to
the intensional definition the formalism assumes a CWA. I thought
that was clear.
It was.

Quote:
It doesn't however, attempt to model the case of "value exists but is
unknown". IMO that case should be modeled *explicitly* with a
different predicate.Of

Sure, the value-does-not-apply interpretation can always also be
represented without null values.

The thing is that you have now fully ignored the real problem of
incomplete information which is that in practice the CWA does not
always fully apply. Your main solution seems to be to redefine the
meaning of the relations such that it does, which, of course, doesn't
solve anything at all and simply puts the problem back on the plate of
the user.

You say "of course doesn't solve anything at all" without giving any
hint at all why you say that. Can you elaborate?

What problem doesn't it address? Can you provide a specific example?
Suppose you have a table R(a,b,c) with candidate key {a} where column
c may contain null values that indicate that we don't know it's value.
You can now solve this by splitting this into R1(a,b) and R2(a,c) and
thus remove the null values. It could be that R was, apart from the
null values, complete so that would mean that the CWA applies to R1,
but not to R2. So it will be the case for some queries over R1 and R2
that when computed in the usual way they return the exact answer, some
will return the possible answers, and some will return neither.
Wouldn't it be nice if the DBMS could tell you which ones do what? Or
if you could tell the DBMS that it shouldn't compute the query as
given but rather such that it return the set of possible (or certain)
answers for the given query, if it can?

-- Jan Hidders


Reply With Quote
  #86  
Old   
David BL
 
Posts: n/a

Default Re: RM formalism supporting partial information - 12-02-2007 , 07:51 PM



On Dec 2, 9:22 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:
Quote:
On 1 dec, 04:17, David BL <davi... (AT) iinet (DOT) net.au> wrote:





On Dec 1, 12:48 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 29 nov, 03:54, David BL <davi... (AT) iinet (DOT) net.au> wrote:

The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.

The concept of 'possible answers' applies and is well defined for all
databases where you have precisely defined what it means if certain
data is missing, and note that his includes the definition that says
that it means nothing. So what you mean by "isn't universally
applicable" is completely beyond my comprehension.

Consider the following predicates, all with OWA intensional
definitions

age(Person,Age)
occupation(Person, Occupation)
married(Person,Person)
died(Person,Date)

You say the concept of possible answers is well defined. How exactly
would you calculate the possible 27 year old pilots?

You seem to assume that "well defined" and "can be computed" is the
same, which it isn't. But to answer your question, assuming that
everybody has only one occupation that would be every person p for
which there is no tuple (p, a) with a<>27 in relation age, and no
tuple (p, o) with o<>"pilot" in relation occupation. If the domain of
Person is not finite, or restricted by a relation person(Person) then
the result may be infinite.
Indeed in the example there is no person(Person) and no CWA specified
to limit what's possible. This raises questions like: Should
"possible" include people in the distant future? Should it include
alternative realities (if one accepts the Many Worlds Interpretation
(MWI) of Quantum Mechanics)? What about a person described in a work
of fiction?

Do you still say it is "well defined"?

Quote:
What does it mean precisely?

It contains every person that might be a 27 year old pilot as far as
the given database is concerned.
What do you mean by "as far as the given database is concerned"?
Surely that can only be regarded as CWAs on various intensional
definitions of the predicates?

Quote:
What do you think of the suggestion that the formalism (which is
concerned with extensions rather than intensions)

1) ignores the CWA/OWA distinction;

2) assumes the CWA applies everywhere; and

3) null is *always* interpreted as non-existence w.r.t.
the (carefully worded) intensional definitions?

This approach seems simple and self consistent.

If I ignore for the moment 1) (because 1) and 2) seem contradictory
because I cannot assume there is no difference between X and Y and at
the same time assume that only Y applies everywhere) this is just the
classical value-does-not-apply interpretation.

I meant that the actual CWA/OWA distinction is absorbed into the
intensional definition, so that it can be assumed that with respect to
the intensional definition the formalism assumes a CWA. I thought
that was clear.

It was.





It doesn't however, attempt to model the case of "value exists but is
unknown". IMO that case should be modeled *explicitly* with a
different predicate.Of

Sure, the value-does-not-apply interpretation can always also be
represented without null values.

The thing is that you have now fully ignored the real problem of
incomplete information which is that in practice the CWA does not
always fully apply. Your main solution seems to be to redefine the
meaning of the relations such that it does, which, of course, doesn't
solve anything at all and simply puts the problem back on the plate of
the user.

You say "of course doesn't solve anything at all" without giving any
hint at all why you say that. Can you elaborate?

What problem doesn't it address? Can you provide a specific example?

Suppose you have a table R(a,b,c) with candidate key {a} where column
c may contain null values that indicate that we don't know it's value.
You can now solve this by splitting this into R1(a,b) and R2(a,c) and
thus remove the null values. It could be that R was, apart from the
null values, complete so that would mean that the CWA applies to R1,
but not to R2. So it will be the case for some queries over R1 and R2
that when computed in the usual way they return the exact answer, some
will return the possible answers, and some will return neither.
Wouldn't it be nice if the DBMS could tell you which ones do what? Or
if you could tell the DBMS that it shouldn't compute the query as
given but rather such that it return the set of possible (or certain)
answers for the given query, if it can?
I think if we have the proper intensional definitions in mind, and
assume every relation has a CWA then we will easily be able to
interpret a query correctly.

Eg R1(a,b), and R2(a,c) are

employee(Person,Date) <=>
Person is current employee of company X who commenced on Date.

address(Person,Address) <=>
It is known that Person lives at Address

then, as an example { P | employee(P,D) } \ { P | address(P,A) } can
be interpreted as (all) the persons that currently work for company X
that don't have a known address.

It seems to me there can be lots of subtle variations in the
intensional definitions, such as the way employee(P,D) mentioned
company X whereas address(P,A) did not. This makes me skeptical
whether a DBMS will be able to formalise it, beyond simply allowing
the user to define a schema, support the RA, allow for integrity
constraints etc




Reply With Quote
  #87  
Old   
Jan Hidders
 
Posts: n/a

Default Re: RM formalism supporting partial information - 12-03-2007 , 11:32 AM



On 3 dec, 02:51, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
On Dec 2, 9:22 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:



On 1 dec, 04:17, David BL <davi... (AT) iinet (DOT) net.au> wrote:

On Dec 1, 12:48 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 29 nov, 03:54, David BL <davi... (AT) iinet (DOT) net.au> wrote:

The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.

The concept of 'possible answers' applies and is well defined for all
databases where you have precisely defined what it means if certain
data is missing, and note that his includes the definition that says
that it means nothing. So what you mean by "isn't universally
applicable" is completely beyond my comprehension.

Consider the following predicates, all with OWA intensional
definitions

age(Person,Age)
occupation(Person, Occupation)
married(Person,Person)
died(Person,Date)

You say the concept of possible answers is well defined. How exactly
would you calculate the possible 27 year old pilots?

You seem to assume that "well defined" and "can be computed" is the
same, which it isn't. But to answer your question, assuming that
everybody has only one occupation that would be every person p for
which there is no tuple (p, a) with a<>27 in relation age, and no
tuple (p, o) with o<>"pilot" in relation occupation. If the domain of
Person is not finite, or restricted by a relation person(Person) then
the result may be infinite.

Indeed in the example there is no person(Person) and no CWA specified
to limit what's possible.
But a type or domain will have been associated with the column. That
defines the upper bound of the possibilities.

Quote:
Do you still say it is "well defined"?
Of course. This is not some terminology I made up, it's well
established in the literature.

Quote:
What does it mean precisely?

It contains every person that might be a 27 year old pilot as far as
the given database is concerned.

What do you mean by "as far as the given database is concerned"?
That the database contains no information that logically implies the
opposite.

Quote:
Surely that can only be regarded as CWAs on various intensional
definitions of the predicates?
Not if you use those terms in their usual meaning.

Quote:
What do you think of the suggestion that the formalism (which is
concerned with extensions rather than intensions)

1) ignores the CWA/OWA distinction;

2) assumes the CWA applies everywhere; and

3) null is *always* interpreted as non-existence w.r.t.
the (carefully worded) intensional definitions?

This approach seems simple and self consistent.

If I ignore for the moment 1) (because 1) and 2) seem contradictory
because I cannot assume there is no difference between X and Y and at
the same time assume that only Y applies everywhere) this is just the
classical value-does-not-apply interpretation.

I meant that the actual CWA/OWA distinction is absorbed into the
intensional definition, so that it can be assumed that with respect to
the intensional definition the formalism assumes a CWA. I thought
that was clear.

It was.

It doesn't however, attempt to model the case of "value exists but is
unknown". IMO that case should be modeled *explicitly* with a
different predicate.Of

Sure, the value-does-not-apply interpretation can always also be
represented without null values.

The thing is that you have now fully ignored the real problem of
incomplete information which is that in practice the CWA does not
always fully apply. Your main solution seems to be to redefine the
meaning of the relations such that it does, which, of course, doesn't
solve anything at all and simply puts the problem back on the plate of
the user.

You say "of course doesn't solve anything at all" without giving any
hint at all why you say that. Can you elaborate?

What problem doesn't it address? Can you provide a specific example?

Suppose you have a table R(a,b,c) with candidate key {a} where column
c may contain null values that indicate that we don't know it's value.
You can now solve this by splitting this into R1(a,b) and R2(a,c) and
thus remove the null values. It could be that R was, apart from the
null values, complete so that would mean that the CWA applies to R1,
but not to R2. So it will be the case for some queries over R1 and R2
that when computed in the usual way they return the exact answer, some
will return the possible answers, and some will return neither.
Wouldn't it be nice if the DBMS could tell you which ones do what? Or
if you could tell the DBMS that it shouldn't compute the query as
given but rather such that it return the set of possible (or certain)
answers for the given query, if it can?

I think if we have the proper intensional definitions in mind, and
assume every relation has a CWA then we will easily be able to
interpret a query correctly.

Eg R1(a,b), and R2(a,c) are

employee(Person,Date) <=
Person is current employee of company X who commenced on Date.

address(Person,Address) <=
It is known that Person lives at Address
This smells a bit tautological. Unless you specify what "it is known
that" means more precisely it might very well be "is in the address
table" in which case this would be meaningless statement.

Quote:
then, as an example { P | employee(P,D) } \ { P | address(P,A) } can
be interpreted as (all) the persons that currently work for company X
that don't have a known address.

It seems to me there can be lots of subtle variations in the
intensional definitions, such as the way employee(P,D) mentioned
company X whereas address(P,A) did not. This makes me skeptical
whether a DBMS will be able to formalise it, beyond simply allowing
the user to define a schema, support the RA, allow for integrity
constraints etc.
Formalizing that can in general be done by formulating a constraint
over two databases d and d' with the schema, where d represents the
actual database and d' the ideal correct database. In general that is
too powerful so various more restricted ways of doing that are under
research at the moment. One you have already formulated yourself,
i.e., declaring that the CWA appplies to certain views, and the other
is in terms of a relation R and a query Q over the ideal database that
returns a result with the same header and the interpretation that R is
complete for the tuples in Q.

-- Jan Hidders




Reply With Quote
  #88  
Old   
David BL
 
Posts: n/a

Default Re: RM formalism supporting partial information - 12-03-2007 , 07:48 PM



On Dec 4, 2:32 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:
Quote:
On 3 dec, 02:51, David BL <davi... (AT) iinet (DOT) net.au> wrote:





On Dec 2, 9:22 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 1 dec, 04:17, David BL <davi... (AT) iinet (DOT) net.au> wrote:

On Dec 1, 12:48 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 29 nov, 03:54, David BL <davi... (AT) iinet (DOT) net.au> wrote:

The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.

The concept of 'possible answers' applies and is well defined for all
databases where you have precisely defined what it means if certain
data is missing, and note that his includes the definition that says
that it means nothing. So what you mean by "isn't universally
applicable" is completely beyond my comprehension.

Consider the following predicates, all with OWA intensional
definitions

age(Person,Age)
occupation(Person, Occupation)
married(Person,Person)
died(Person,Date)

You say the concept of possible answers is well defined. How exactly
would you calculate the possible 27 year old pilots?

You seem to assume that "well defined" and "can be computed" is the
same, which it isn't. But to answer your question, assuming that
everybody has only one occupation that would be every person p for
which there is no tuple (p, a) with a<>27 in relation age, and no
tuple (p, o) with o<>"pilot" in relation occupation. If the domain of
Person is not finite, or restricted by a relation person(Person) then
the result may be infinite.

Indeed in the example there is no person(Person) and no CWA specified
to limit what's possible.

But a type or domain will have been associated with the column. That
defines the upper bound of the possibilities.

Do you still say it is "well defined"?

Of course. This is not some terminology I made up, it's well
established in the literature.
Ok, I understand now. I agree it is well defined.

I can see it is quite useful when the domain has a small cardinality,
such as a fixed enumeration of colours. How is the definition useful
for a string or numerical domain? Clearly reporting the possible
answers is impossible where infinite, and silly where finite but
extremely large - such as the case when a string domain constrains the
maximum number of characters.

Quote:
What does it mean precisely?

It contains every person that might be a 27 year old pilot as far as
the given database is concerned.

What do you mean by "as far as the given database is concerned"?

That the database contains no information that logically implies the
opposite.

Surely that can only be regarded as CWAs on various intensional
definitions of the predicates?

Not if you use those terms in their usual meaning.





What do you think of the suggestion that the formalism (which is
concerned with extensions rather than intensions)

1) ignores the CWA/OWA distinction;

2) assumes the CWA applies everywhere; and

3) null is *always* interpreted as non-existence w.r.t.
the (carefully worded) intensional definitions?

This approach seems simple and self consistent.

If I ignore for the moment 1) (because 1) and 2) seem contradictory
because I cannot assume there is no difference between X and Y and at
the same time assume that only Y applies everywhere) this is just the
classical value-does-not-apply interpretation.

I meant that the actual CWA/OWA distinction is absorbed into the
intensional definition, so that it can be assumed that with respect to
the intensional definition the formalism assumes a CWA. I thought
that was clear.

It was.

It doesn't however, attempt to model the case of "value exists but is
unknown". IMO that case should be modeled *explicitly* with a
different predicate.Of

Sure, the value-does-not-apply interpretation can always also be
represented without null values.

The thing is that you have now fully ignored the real problem of
incomplete information which is that in practice the CWA does not
always fully apply. Your main solution seems to be to redefine the
meaning of the relations such that it does, which, of course, doesn't
solve anything at all and simply puts the problem back on the plate of
the user.

You say "of course doesn't solve anything at all" without giving any
hint at all why you say that. Can you elaborate?

What problem doesn't it address? Can you provide a specific example?

Suppose you have a table R(a,b,c) with candidate key {a} where column
c may contain null values that indicate that we don't know it's value.
You can now solve this by splitting this into R1(a,b) and R2(a,c) and
thus remove the null values. It could be that R was, apart from the
null values, complete so that would mean that the CWA applies to R1,
but not to R2. So it will be the case for some queries over R1 and R2
that when computed in the usual way they return the exact answer, some
will return the possible answers, and some will return neither.
Wouldn't it be nice if the DBMS could tell you which ones do what? Or
if you could tell the DBMS that it shouldn't compute the query as
given but rather such that it return the set of possible (or certain)
answers for the given query, if it can?

I think if we have the proper intensional definitions in mind, and
assume every relation has a CWA then we will easily be able to
interpret a query correctly.

Eg R1(a,b), and R2(a,c) are

employee(Person,Date) <=
Person is current employee of company X who commenced on Date.

address(Person,Address) <=
It is known that Person lives at Address

This smells a bit tautological. Unless you specify what "it is known
that" means more precisely it might very well be "is in the address
table" in which case this would be meaningless statement.
Agreed.

Please substitute "... it is currently known to the company X HR
department that ..."

Quote:
then, as an example { P | employee(P,D) } \ { P | address(P,A) } can
be interpreted as (all) the persons that currently work for company X
that don't have a known address.

It seems to me there can be lots of subtle variations in the
intensional definitions, such as the way employee(P,D) mentioned
company X whereas address(P,A) did not. This makes me skeptical
whether a DBMS will be able to formalise it, beyond simply allowing
the user to define a schema, support the RA, allow for integrity
constraints etc.

Formalizing that can in general be done by formulating a constraint
over two databases d and d' with the schema, where d represents the
actual database and d' the ideal correct database. In general that is
too powerful so various more restricted ways of doing that are under
research at the moment. One you have already formulated yourself,
i.e., declaring that the CWA appplies to certain views, and the other
is in terms of a relation R and a query Q over the ideal database that
returns a result with the same header and the interpretation that R is
complete for the tuples in Q.
If I get the chance I'll read more of the literature on CWA. Sometime
I want to look at Reiter's work.



Reply With Quote
  #89  
Old   
Jan Hidders
 
Posts: n/a

Default Re: RM formalism supporting partial information - 12-04-2007 , 04:51 AM



On 4 dec, 02:48, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
On Dec 4, 2:32 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:



On 3 dec, 02:51, David BL <davi... (AT) iinet (DOT) net.au> wrote:

On Dec 2, 9:22 pm, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 1 dec, 04:17, David BL <davi... (AT) iinet (DOT) net.au> wrote:

On Dec 1, 12:48 am, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 29 nov, 03:54, David BL <davi... (AT) iinet (DOT) net.au> wrote:

The concept of "possible answers" isn't universally applicable, and
therefore seems to represent quite a problem for any model of partial
information that emphasises that concept as fundamental.

The concept of 'possible answers' applies and is well defined for all
databases where you have precisely defined what it means if certain
data is missing, and note that his includes the definition that says
that it means nothing. So what you mean by "isn't universally
applicable" is completely beyond my comprehension.

Consider the following predicates, all with OWA intensional
definitions

age(Person,Age)
occupation(Person, Occupation)
married(Person,Person)
died(Person,Date)

You say the concept of possible answers is well defined. How exactly
would you calculate the possible 27 year old pilots?

You seem to assume that "well defined" and "can be computed" is the
same, which it isn't. But to answer your question, assuming that
everybody has only one occupation that would be every person p for
which there is no tuple (p, a) with a<>27 in relation age, and no
tuple (p, o) with o<>"pilot" in relation occupation. If the domain of
Person is not finite, or restricted by a relation person(Person) then
the result may be infinite.

Indeed in the example there is no person(Person) and no CWA specified
to limit what's possible.

But a type or domain will have been associated with the column. That
defines the upper bound of the possibilities.

Do you still say it is "well defined"?

Of course. This is not some terminology I made up, it's well
established in the literature.

Ok, I understand now. I agree it is well defined.
*phew* :-)

Quote:
I can see it is quite useful when the domain has a small cardinality,
such as a fixed enumeration of colours. How is the definition useful
for a string or numerical domain? Clearly reporting the possible
answers is impossible where infinite, and silly where finite but
extremely large - such as the case when a string domain constrains the
maximum number of characters.
It's main role is just as the logical counterpart of the set of
certain answers and the set of impossible answers. In the sense that
you use "useful" here the set of certain answers is certainly much
more directly useful. However, if the query for the possible answers
is domain dependent then the DBMS could warn you about this, indicate
the problem and ask you to reformulate the query slightly such that is
isn't. In your example this could be done by restricting the set of
possible persons to those in a certain relation for which the CWA
holds. I'd suggest that in practice that is often what you want
anyway.

-- Jan Hidders


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.