dbTalk Databases Forums  

NULLs

comp.databases.theory comp.databases.theory


Discuss NULLs in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #41  
Old   
David BL
 
Posts: n/a

Default Re: NULLs - 01-02-2008 , 06:49 PM






On Dec 30 2007, 12:00 am, JOG <j... (AT) cs (DOT) nott.ac.uk> wrote:
Quote:
On Dec 28, 10:59 pm, Hugo Kornelis

I doubt it. Given this information:

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

how would you answer the below questions:

* "List all employees aged 40 and above".
* "What is the average age of our employees?"
* "For each employee, how many years left until retirement (assume a
country with laws for retirement age of 65)"
* "List all employees that are female, under 35 years old, or both"
* "Is JJ older than Mary?"
* etc

Yup, I catch your drift. However I believe these questions should have
"in propositions that you know" concatenated to them, so we still have
2VL, and leaving us humans in the real world to interpret the results.
The CWA is a nonsense imo. But I do agree that with CWA, in RM, 3VL is
very difficult to avoid, and nulls are incredibly tempting to use. My
personal conclusion therefore is that perhaps we are trying to
address /symptoms/ and not the root cause of the issue, and a step
back is required.
It seems to me that it's the OWA that invites 3VL. I'm thinking that
the CWA simply means that an extension is uniquely defined given the
intension and when intensional definitions are stated properly 2VL
applies.

I think OWA means a recorded extension may only be a subset of the
actual extension corresponding to the stated intension, because OWA
asserts that a system's knowledge is incomplete. It is the OWA that
forces the logical inference system to deal with partial information.

Have I got it wrong?



Reply With Quote
  #42  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 04:55 PM






On Fri, 28 Dec 2007 17:59:13 -0800 (PST), David BL wrote:

(snip)
Quote:
In the presence of missing information the query

"List all employees aged 40 and above"

cannot be answered at all. Instead the query should be

"List all employees known by the HR department to be aged 40 and
above"
(snip)

Quote:
In the case of the question

"Is JJ older than Mary?"

This should be

"Is it known to HR department that JJ is older than Mary?"

in which case the answer is no. See, no 3VL required.

We can also ask whether it is known to HR department that JJ is not
older than Mary". The answer is no as well; so we have been able to
tell (using only efficient 2VL) that the DB doesn't know who it older.
Hi David,

Happy new year to you and all else still readin gthis thread.

You make some fair points. Thanks for that.


In a purely pragmatic sense, are there any situations where this way of
looking at things produces different results than the traditional SQL
approach? It doesn't in the case of the examples above:

* "List all employees known by the HR department to be aged 40 and
above" will exclude employees with no age on file, just as "SELECT name
FREM employees WHERE age > 40" will exclude employees with age NULL.

* "Is it known to HR department that JJ is older than Mary?" will result
in "no", so that the ELSE part of a conditional statement is executed,
just as "IF (SELECT age FROM employees WHERE name = 'JJ') > (SELECT age
FROM employees WHERE name = 'Mary')" will evaluate to unknown, skip the
conditional action and execute the ELSE action.


Let's look at some examples that are often used to illustrate the
evilness of 3VL: aggregation and tautologies:

* "Return the average of all ages known to HR department" will return
the exact same value as AVG(age) currently does (i.e. ages not known to
HR department are disregarded by the former, and NULL ages are
disregarded by the latter).

* "Is it known to HR department that JJ is older than Mary or that JJ is
not older than Mary?" - neither is known, so this would evaluate to
False OR False = False. While technicallly possibly correct, I'm quite
sure that this result will enocunter the same amount of opposition as
the Unknown result that SQL will produce on (simplified) "32 > NULL OR
NOT (32 > NULL)" [which of coourse is again treated the same as False].

So, while you technically succeeded in answering the questions without
using 3VL, this didn't solve the problems usually associated with NULL
and 3VL. Instead, you found a way to introduce these problems into 2VL.

Best, Hugo


Reply With Quote
  #43  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 05:12 PM



On Sat, 29 Dec 2007 07:00:47 -0800 (PST), JOG wrote:

(snip)
Quote:
That would be the case if I were intentionally storing the fact that an
age is not on file, and using (or rather: abusing) the age column for
that. But I'm not. I'm just storing ages I have on file.

Then surely you are trying to fit propositions of different
structures into a mathematical construct (i.e. a relation) that
proscribes such an undertaking. The conclusion must be that relations
will not do the job that you require of them?
Agreed (more below)

Quote:
Yes, if there is no value (as represented by the NULL marker) in this
column in a specific row, then you can infer from this that I have no
age on file for the person described by that row - but this is an
unintended though unavoidable by-effect. Just as the fact that you can
infer "Jack is an adult" from "Jack is 43 years old" - surely you
wouldn't use that to maintain that storing an age violated 1NF?

I think there is a difference Hugo. Stating the Jack is an adult is an
inference from correct mathematical application of a relation to store
values, whereas a NULL flag (whatever its meaning) is not a value. I
understand what you aim to achieve but it seems perhaps that relations
are the wrong vehicle.
Agreed.

I tried to avoid the term relation in my previous messages (if one
accidentally slipped through, my bad). I talked aboout tables.

I'm a big supporter of ORM (not object-relational mappper -ugh, I have
to wash my hands every time I type that- but Object Role Modeling) and
other similar (but less known) methods such as NIAM. ORM and NIAM base
on elementary fact types. I truly believe that all relations should be
based on elementary fact types only.

When implementing relations in what is currently usually called a
relational database, it *is* possible to create a table per elementary
fact type. In that case, no NULLs are ever needed. The facts

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

are represented in two tables, like this:

EmployeeAge EmployeeGender
name | age name | gender
-----+----- -----+--------
Jack | 43 Mary | female
JJ | 32 Jack | male

But for various reasons (maintainability and performance being the most
important), it is often preferable to combine these relations in a
single table, like this:

Employee
name | age | gender
-----+-----+-------
Jack | 43 | male
JJ | 32 |
Mary | | female

Some way has to be found to ensure that JJ's gender is not mistaken as
'' (empty string), and to ensure that no datatype mismatch error is
raised on the blank age of Mary (as a blank is not numeric data). The
best (or, if you prefer, least bad) answer to that problem is NULL.

Coming from this background, you'll probably understand why I have
little issue with NULL - as it is (to me) just a marker that says "hey,
if you hadn't insisted on stuffing the fact types EmployeeAge and
EmployeeGender in the same table, there simply would have been no row
for Mary in the EmployeeAge table".

Quote:
It still appears to be a hack imo, and one still ends up
with 3VL.

3VL is not a result of using NULL to represent missing information, but
a result of allowing missing information. IMO, there are only two
options: either you deal with missing information, and with the 3VL that
results from it -- or you somehow alter reality so that information is
never missing again, for any reason.

I think I disagree there. In predicate logic I can state (and
question) everything I know about the world without turning to 3VL, so
it must be possible in a data model surely? If I know an attribute
does or doesn't exist, then I can state that formally (with
existential quantifiers), and If I don't know anything about that
attribute...well I just don't say anything about it at all.
Yes, indeed: "If I know an attribute does or doesn't exist, then I can
state that formally" - turning the fact that you know an attribute does
or doesn't exist into a new attribute. Add this attribute to your
relational model, and you can use it to avoid comparisons to NULL. But I
doubt if the actual results of the queries will ever be different from
what the current NULL handling does. (See my reply to David).

(snip remainder - either already adressed in reply to David, or no need
to react)

Quote:
Happy New Year, Jim.
And the same to you!

Best, Hugo


Reply With Quote
  #44  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 05:14 PM



On Sat, 29 Dec 2007 10:34:06 -0800 (PST), Marshall wrote:

Quote:
On Dec 28, 2:59 pm, Hugo Kornelis
h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:

3VL is not a result of using NULL to represent missing information, but
a result of allowing missing information. IMO, there are only two
options: either you deal with missing information, and with the 3VL that
results from it -- or you somehow alter reality so that information is
never missing again, for any reason.

Full normalization is a third option.
Hi Marshall,

As in "one table per elementary fact type"? Full agreement here, see my
message to Jim.

I think I addressed the rest of your message in my reply to David. Let
me know if you think I failed to address any of your points.

Best, Hugo


Reply With Quote
  #45  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 05:20 PM



On Sat, 29 Dec 2007 01:11:18 +0100, mAsterdam wrote:

Quote:
Hugo Kornelis schreef:

See my reply to JOG's message.

Hi Hugo, so glad to see you again. :-)
Hi mAmsterdam,

I've never been away - though I restrict myself to scanning headers,
occasionally (when time permits) reading a discussion, and seldomly
participating in one. As you see from the time it took me to reply, I am
often short on time :-)

Quote:
BTW, looking for a nice job ^_^ ?
I already have a nice job - but I never turn down an offer I don't know
yet, so feel free to send me the details. If you remove .REMOVETHIS and
..INVALID from the email address in the headers of this message, you get
my valid email address. Ping me here if you don't get a reply within a
few days, as my provider has occasianally decided that they know better
than me what messages I do and do not want to receive.

Best, Hugo


Reply With Quote
  #46  
Old   
TroyK
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 05:23 PM



On Jan 3, 3:55*pm, Hugo Kornelis
<h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
Quote:
On Fri, 28 Dec 2007 17:59:13 -0800 (PST), David BL wrote:

(snip)>In the presence of missing information the query

* *"List all employees aged 40 and above"

cannot be answered at all. *Instead the query should be

* *"List all employees known by the HR department to be aged 40 and
above"

(snip)

In the case of the question

* *"Is JJ older than Mary?"

This should be

* *"Is it known to HR department that JJ is older than Mary?"

in which case the answer is no. * See, no 3VL required.

We can also ask whether it is known to HR department that JJ is not
older than Mary". * The answer is no as well; so we have been able to
tell (using only efficient 2VL) that the DB doesn't know who it older.

Hi David,

Happy new year to you and all else still readin gthis thread.

You make some fair points. Thanks for that.

In a purely pragmatic sense, are there any situations where this way of
looking at things produces different results than the traditional SQL
approach? It doesn't in the case of the examples above:

* "List all employees known by the HR department to be aged 40 and
above" will exclude employees with no age on file, just as "SELECT name
FREM employees WHERE age > 40" will exclude employees with age NULL.

<snip>

Quote:
Best, Hugo
Was just discussing a similar example with colleagues today. Let's
introduce a hypothetical business rule: all employees must be over 40
years old (with the age column still nullable).

What would you expect the result of the SELECT statement to return in
that case?

TroyK


Reply With Quote
  #47  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 05:34 PM



Hugo Kornelis <hugo (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:

[snip]

Quote:
When implementing relations in what is currently usually called a
relational database, it *is* possible to create a table per elementary
fact type. In that case, no NULLs are ever needed. The facts

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

are represented in two tables, like this:

EmployeeAge EmployeeGender
name | age name | gender
-----+----- -----+--------
Jack | 43 Mary | female
JJ | 32 Jack | male

But for various reasons (maintainability and performance being the most
important), it is often preferable to combine these relations in a
single table, like this:

Employee
name | age | gender
-----+-----+-------
Jack | 43 | male
JJ | 32 |
Mary | | female

Some way has to be found to ensure that JJ's gender is not mistaken as
'' (empty string), and to ensure that no datatype mismatch error is
raised on the blank age of Mary (as a blank is not numeric data). The
best (or, if you prefer, least bad) answer to that problem is NULL.
Which make the one-table example less maintainable because of the
special cases. As to performance, I quite often query for very
limited numbers of columns, so the issue of columns being split up
into different tables is not the problem it might appear to be.

Quote:
Coming from this background, you'll probably understand why I have
little issue with NULL - as it is (to me) just a marker that says "hey,
if you hadn't insisted on stuffing the fact types EmployeeAge and
EmployeeGender in the same table, there simply would have been no row
for Mary in the EmployeeAge table".
Or "Hey, if you had not gotten it wrong, it would not be
necessary to kludge." Why kludge? (Measure twice, cut once, and all
that.)

[snip]

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.


Reply With Quote
  #48  
Old   
Bob Badour
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 05:52 PM



Gene Wirchenko wrote:

Quote:
Hugo Kornelis <hugo (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:

[snip]


When implementing relations in what is currently usually called a
relational database, it *is* possible to create a table per elementary
fact type. In that case, no NULLs are ever needed. The facts

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

are represented in two tables, like this:

EmployeeAge EmployeeGender
name | age name | gender
-----+----- -----+--------
Jack | 43 Mary | female
JJ | 32 Jack | male

But for various reasons (maintainability and performance being the most
important), it is often preferable to combine these relations in a
single table, like this:

Employee
name | age | gender
-----+-----+-------
Jack | 43 | male
JJ | 32 |
Mary | | female

Some way has to be found to ensure that JJ's gender is not mistaken as
'' (empty string), and to ensure that no datatype mismatch error is
raised on the blank age of Mary (as a blank is not numeric data). The
best (or, if you prefer, least bad) answer to that problem is NULL.

Which make the one-table example less maintainable because of the
special cases. As to performance, I quite often query for very
limited numbers of columns, so the issue of columns being split up
into different tables is not the problem it might appear to be.
And given that commercial dbmses generally allow one to cluster data
from multiple tables obviates the performance argument.

[snip]


Reply With Quote
  #49  
Old   
David BL
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 07:21 PM



On Jan 4, 7:55 am, Hugo Kornelis
<h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
Quote:
On Fri, 28 Dec 2007 17:59:13 -0800 (PST), David BL wrote:

(snip)>In the presence of missing information the query

"List all employees aged 40 and above"

cannot be answered at all. Instead the query should be

"List all employees known by the HR department to be aged 40 and
above"

(snip)

In the case of the question

"Is JJ older than Mary?"

This should be

"Is it known to HR department that JJ is older than Mary?"

in which case the answer is no. See, no 3VL required.

We can also ask whether it is known to HR department that JJ is not
older than Mary". The answer is no as well; so we have been able to
tell (using only efficient 2VL) that the DB doesn't know who it older.

Hi David,

Happy new year to you and all else still readin gthis thread.

You make some fair points. Thanks for that.

In a purely pragmatic sense, are there any situations where this way of
looking at things produces different results than the traditional SQL
approach?
I don't know anywhere near enough about SQL to answer that.

Quote:
It doesn't in the case of the examples above:

* "List all employees known by the HR department to be aged 40 and
above" will exclude employees with no age on file, just as "SELECT name
FREM employees WHERE age > 40" will exclude employees with age NULL.

* "Is it known to HR department that JJ is older than Mary?" will result
in "no", so that the ELSE part of a conditional statement is executed,
just as "IF (SELECT age FROM employees WHERE name = 'JJ') > (SELECT age
FROM employees WHERE name = 'Mary')" will evaluate to unknown, skip the
conditional action and execute the ELSE action.

Let's look at some examples that are often used to illustrate the
evilness of 3VL: aggregation and tautologies:

* "Return the average of all ages known to HR department" will return
the exact same value as AVG(age) currently does (i.e. ages not known to
HR department are disregarded by the former, and NULL ages are
disregarded by the latter).
I can't see any problem there. You can take the average over any set
of numeric values as long as you are clear what the set actually is.


Quote:
* "Is it known to HR department that JJ is older than Mary or that JJ is
not older than Mary?" - neither is known, so this would evaluate to
False OR False = False. While technicallly possibly correct, I'm quite
sure that this result will enocunter the same amount of opposition as
the Unknown result that SQL will produce on (simplified) "32 > NULL OR
NOT (32 > NULL)" [which of coourse is again treated the same as False].
The following are quite different queries

A = (It is known to HR department that JJ is older than Mary)
OR
(It is known to HR department that JJ is not older than Mary)


B = It is known to HR department that
( (JJ is older than Mary) OR
(JJ is not older than Mary) )


A = false and B = true. Only B is the tautology.

In practice the RM/RA will answer A instead of B. The answer A =
false is ok because there is no tautology in A.


Quote:
So, while you technically succeeded in answering the questions without
using 3VL, this didn't solve the problems usually associated with NULL
and 3VL. Instead, you found a way to introduce these problems into 2VL.
I don't see any problems!



Reply With Quote
  #50  
Old   
JOG
 
Posts: n/a

Default Re: NULLs - 01-03-2008 , 07:25 PM



On Jan 3, 12:49 am, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
On Dec 30 2007, 12:00 am, JOG <j... (AT) cs (DOT) nott.ac.uk> wrote:



On Dec 28, 10:59 pm, Hugo Kornelis
I doubt it. Given this information:

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

how would you answer the below questions:

* "List all employees aged 40 and above".
* "What is the average age of our employees?"
* "For each employee, how many years left until retirement (assume a
country with laws for retirement age of 65)"
* "List all employees that are female, under 35 years old, or both"
* "Is JJ older than Mary?"
* etc

Yup, I catch your drift. However I believe these questions should have
"in propositions that you know" concatenated to them, so we still have
2VL, and leaving us humans in the real world to interpret the results.
The CWA is a nonsense imo. But I do agree that with CWA, in RM, 3VL is
very difficult to avoid, and nulls are incredibly tempting to use. My
personal conclusion therefore is that perhaps we are trying to
address /symptoms/ and not the root cause of the issue, and a step
back is required.

It seems to me that it's the OWA that invites 3VL. I'm thinking that
the CWA simply means that an extension is uniquely defined given the
intension and when intensional definitions are stated properly 2VL
applies.

I think OWA means a recorded extension may only be a subset of the
actual extension corresponding to the stated intension, because OWA
asserts that a system's knowledge is incomplete. It is the OWA that
forces the logical inference system to deal with partial information.

Have I got it wrong?
Those seem pretty neat definitions as far as I can tell. However there
is then an issue as to what fully completing an extension means. Does
the database then assume:
(a) I possess communicated statements that represent the whole current
truth about the world
(b) I possess the whole current set of communicated statements about
the world.

Its a really subtle difference but its always bugged me. (b) is
interesting in that it completely precludes a user from asking the
database any questions about the world proper, restricting us instead
to asking questions about propositions themselves (I said it was
subtle...). i.e. instead of "how many employees are there", one must
ask "how many propositions are there with employees in". This seems
entirely sensible to myself and completely rules out 3VL, but leaves
the user to interpret the results.

I wonder whether (a) describes the assumptions of an inference engine
and (b) that of a database.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.