dbTalk Databases Forums  

NULLs

comp.databases.theory comp.databases.theory


Discuss NULLs in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #21  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: NULLs - 12-28-2007 , 05:00 PM






On Thu, 27 Dec 2007 18:51:59 -0800 (PST), Marshall wrote:

Quote:
On Dec 27, 3:31 pm, Hugo Kornelis
h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
On Wed, 26 Dec 2007 23:17:52 -0400, Bob Badour wrote:

Attempting to store both the age and the reason why an age is unknown in
a single column violates first normal form.

Attempting to store both the age and whether the age is
known in a single column etc. etc.
Hi Marshall,

See my reply to JOG's message.

Best, Hugo


Reply With Quote
  #22  
Old   
mAsterdam
 
Posts: n/a

Default Re: NULLs - 12-28-2007 , 06:11 PM






Hugo Kornelis schreef:

Quote:
See my reply to JOG's message.
Hi Hugo, so glad to see you again. :-)

BTW, looking for a nice job ^_^ ?


--
What you see depends on where you stand.


Reply With Quote
  #23  
Old   
David BL
 
Posts: n/a

Default Re: NULLs - 12-28-2007 , 07:59 PM



On Dec 29, 7:59 am, Hugo Kornelis
<h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
Quote:
On Thu, 27 Dec 2007 16:41:51 -0800 (PST), JOG wrote:
On Dec 27, 11:31 pm, Hugo Kornelis
h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
On Wed, 26 Dec 2007 23:17:52 -0400, Bob Badour wrote:
Codd pointed out that a single NULL marker did not suffice and
suggested 2 markers. Date pointed out that one can apply the same
argument to 2 markers leading to an infinite progression once one heads
down that path, which suggests the path was never a productive one to
head down in the first place.

Hi Bob,

Unfortunately, both Codd and Date forgot that the NULL marker in (for
instance) the age column should represent only that the age is not on
file and not try to represent a reason for this as well.

Codd's suggestions to use two markers (for "not applicable" and
"unknown", IIRC) assumes that we want to store both the age of a person
(if on file), and the reason why an age is not on file (if it isn't).
That can of course be necessary - but in that case, we have two
attributes that should be stored in two seperate columns.

Attempting to store both the age and the reason why an age is unknown in
a single column violates first normal form.

Is the Zaniolo approach you are favouring not doing exactly the same
thing? Surely it is attempting to store both an age and the fact that
an age is not on file in a single column, which also violates first
normal form.

Hi J,

That would be the case if I were intentionally storing the fact that an
age is not on file, and using (or rather: abusing) the age column for
that. But I'm not. I'm just storing ages I have on file.

Yes, if there is no value (as represented by the NULL marker) in this
column in a specific row, then you can infer from this that I have no
age on file for the person described by that row - but this is an
unintended though unavoidable by-effect. Just as the fact that you can
infer "Jack is an adult" from "Jack is 43 years old" - surely you
wouldn't use that to maintain that storing an age violated 1NF?

It still appears to be a hack imo, and one still ends up
with 3VL.

3VL is not a result of using NULL to represent missing information, but
a result of allowing missing information. IMO, there are only two
options: either you deal with missing information, and with the 3VL that
results from it -- or you somehow alter reality so that information is
never missing again, for any reason.

There has to be a more elegant way....Regards, J.

I doubt it. Given this information:

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

how would you answer the below questions:

* "List all employees aged 40 and above".
* "What is the average age of our employees?"
* "For each employee, how many years left until retirement (assume a
country with laws for retirement age of 65)"
* "List all employees that are female, under 35 years old, or both"
* "Is JJ older than Mary?"
* etc

Try to describe how you would answer those questions, in an elegant way
and without using anything even remotely resembling 3VL. If you succeed
at that, a database without 3VL is just around the corner - because the
hardest part is not implementing, but finding out what to implement.

I can give answers to all questions above, but I have to use 3VL in all
cases. For me, that signifies that 3VL is part of reality and hence
can't be left out of a database that attempts to model (aspects of)
reality.
In the presence of missing information the query

"List all employees aged 40 and above"

cannot be answered at all. Instead the query should be

"List all employees known by the HR department to be aged 40 and
above"

When intensional definitions of predicates are stated correctly there
is no need for 3VL, because the extension matches the intension in the
same way as for when there is no missing information. The RM/RA can
apply in the normal way - an algebra on the extensions that makes no
attempt to formalise the intensions. A formula in the RA corresponds
to a derived predicate and it is possible to write down its
intensional definition in terms of the base relvar intensions. For
example, the above query can be obtained by selection then projection
on the set of employees with known ages. There is no 3VL required.

I think 3VL invites us to think of a recorded extension as only being
a subset of the associated intension, and that is a bad idea.
Predicate definitions should always be "if and only if". It is only
by being careful with intensional definitions that we correctly
interpret the results from a given query. Furthermore it is very
efficient compared to computation of "maybes" to yield more "maybes"
in the result set. I would rather keep the RM/RA simple and efficient
by only calculating what's known in the confines of 2VL (instead of
what's possible). Computation of maybes can be computationally
expensive and provide very little information to the user.

IMO we keep "if and only if" in the intension <--> extension
association, and find one way implications pop up elsewhere. For
example, the set of employees with a known age is a subset of the
known employees. This falls out naturally when we correctly
interpret projections.

In the case of the question

"Is JJ older than Mary?"

This should be

"Is it known to HR department that JJ is older than Mary?"

in which case the answer is no. See, no 3VL required.

We can also ask whether it is known to HR department that JJ is not
older than Mary". The answer is no as well; so we have been able to
tell (using only efficient 2VL) that the DB doesn't know who it older.



Reply With Quote
  #24  
Old   
JOG
 
Posts: n/a

Default Re: NULLs - 12-29-2007 , 09:00 AM



On Dec 28, 10:59 pm, Hugo Kornelis
<h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
Quote:
On Thu, 27 Dec 2007 16:41:51 -0800 (PST), JOG wrote:
On Dec 27, 11:31 pm, Hugo Kornelis
h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
On Wed, 26 Dec 2007 23:17:52 -0400, Bob Badour wrote:
Codd pointed out that a single NULL marker did not suffice and
suggested 2 markers. Date pointed out that one can apply the same
argument to 2 markers leading to an infinite progression once one heads
down that path, which suggests the path was never a productive one to
head down in the first place.

Hi Bob,

Unfortunately, both Codd and Date forgot that the NULL marker in (for
instance) the age column should represent only that the age is not on
file and not try to represent a reason for this as well.

Codd's suggestions to use two markers (for "not applicable" and
"unknown", IIRC) assumes that we want to store both the age of a person
(if on file), and the reason why an age is not on file (if it isn't).
That can of course be necessary - but in that case, we have two
attributes that should be stored in two seperate columns.

Attempting to store both the age and the reason why an age is unknown in
a single column violates first normal form.

Is the Zaniolo approach you are favouring not doing exactly the same
thing? Surely it is attempting to store both an age and the fact that
an age is not on file in a single column, which also violates first
normal form.

Hi J,

That would be the case if I were intentionally storing the fact that an
age is not on file, and using (or rather: abusing) the age column for
that. But I'm not. I'm just storing ages I have on file.
Then surely you are trying to fit propositions of different
structures into a mathematical construct (i.e. a relation) that
proscribes such an undertaking. The conclusion must be that relations
will not do the job that you require of them?

Quote:
Yes, if there is no value (as represented by the NULL marker) in this
column in a specific row, then you can infer from this that I have no
age on file for the person described by that row - but this is an
unintended though unavoidable by-effect. Just as the fact that you can
infer "Jack is an adult" from "Jack is 43 years old" - surely you
wouldn't use that to maintain that storing an age violated 1NF?
I think there is a difference Hugo. Stating the Jack is an adult is an
inference from correct mathematical application of a relation to store
values, whereas a NULL flag (whatever its meaning) is not a value. I
understand what you aim to achieve but it seems perhaps that relations
are the wrong vehicle.

Quote:
It still appears to be a hack imo, and one still ends up
with 3VL.

3VL is not a result of using NULL to represent missing information, but
a result of allowing missing information. IMO, there are only two
options: either you deal with missing information, and with the 3VL that
results from it -- or you somehow alter reality so that information is
never missing again, for any reason.
I think I disagree there. In predicate logic I can state (and
question) everything I know about the world without turning to 3VL, so
it must be possible in a data model surely? If I know an attribute
does or doesn't exist, then I can state that formally (with
existential quantifiers), and If I don't know anything about that
attribute...well I just don't say anything about it at all.

Quote:
There has to be a more elegant way....Regards, J.

I doubt it. Given this information:

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

how would you answer the below questions:

* "List all employees aged 40 and above".
* "What is the average age of our employees?"
* "For each employee, how many years left until retirement (assume a
country with laws for retirement age of 65)"
* "List all employees that are female, under 35 years old, or both"
* "Is JJ older than Mary?"
* etc
Yup, I catch your drift. However I believe these questions should have
"in propositions that you know" concatenated to them, so we still have
2VL, and leaving us humans in the real world to interpret the results.
The CWA is a nonsense imo. But I do agree that with CWA, in RM, 3VL is
very difficult to avoid, and nulls are incredibly tempting to use. My
personal conclusion therefore is that perhaps we are trying to
address /symptoms/ and not the root cause of the issue, and a step
back is required.

Quote:
Try to describe how you would answer those questions, in an elegant way
and without using anything even remotely resembling 3VL. If you succeed
at that, a database without 3VL is just around the corner - because the
hardest part is not implementing, but finding out what to implement.
And I shall call it the Propositional Model. And it shall glisten like
golden sunlight, before industry completely ignores it. And I'll
publish it in a journal wellllll in advance of letting you lot rip it
to shreds

Quote:
I can give answers to all questions above, but I have to use 3VL in all
cases. For me, that signifies that 3VL is part of reality and hence
can't be left out of a database that attempts to model (aspects of)
reality.

Best, Hugo
Happy New Year, Jim.


Reply With Quote
  #25  
Old   
Marshall
 
Posts: n/a

Default Re: NULLs - 12-29-2007 , 12:34 PM



On Dec 28, 2:59 pm, Hugo Kornelis
<h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote:
Quote:
3VL is not a result of using NULL to represent missing information, but
a result of allowing missing information. IMO, there are only two
options: either you deal with missing information, and with the 3VL that
results from it -- or you somehow alter reality so that information is
never missing again, for any reason.
Full normalization is a third option.


Quote:
There has to be a more elegant way....Regards, J.

I doubt it. Given this information:

* "Employee Jack is male"
* "Employee Mary is female"
* "Employee Jack is 43 years old"
* "Employee JJ is 32 years old"

how would you answer the below questions:

* "List all employees aged 40 and above".
* "What is the average age of our employees?"
* "For each employee, how many years left until retirement (assume a
country with laws for retirement age of 65)"
* "List all employees that are female, under 35 years old, or both"
* "Is JJ older than Mary?"
* etc
Given the schema of the database you have supplied above,
the questions you propose are not askable, in the same way
as if you had proposed the question "which animals are
marsupial?"

Asking which employees are 40 and over *requires* the
predicate (employee, age) be present, either directly or
as a projection or join or whatever. Since it isn't, you can't
*ask* the question. Note that I didn't say "answer" the
question--you can't even *ask* the question. It's not well
formed.

(You could of course ask the question "for those employees
for whom we know their age, which are 40 and over?")


Quote:
[...]

I can give answers to all questions above, but I have to use 3VL in all
cases.
Not exactly. You can give answers to questions that are similar
to the ones asked above, but modified in the way I mention
parenthetically.


Quote:
For me, that signifies that 3VL is part of reality and hence
can't be left out of a database that attempts to model (aspects of)
reality.
I agree that missing information is often a fact of life,
but I don't agree that NULL or 3VL necessarily follows.
There are other options, and furthermore, I think our
experience with SQL shows that the other options are
better ones.


Marshall


Reply With Quote
  #26  
Old   
JOG
 
Posts: n/a

Default Re: NULLs - 12-29-2007 , 12:40 PM



I believe myself, david and marshall have all just said the same thing
in different ways...

Reply With Quote
  #27  
Old   
Bob Badour
 
Posts: n/a

Default Re: NULLs - 12-29-2007 , 12:54 PM



JOG wrote:

Quote:
I believe myself, david and marshall have all just said the same thing
in different ways...
Do you find that all that remarkable?


Reply With Quote
  #28  
Old   
Brian Selzer
 
Posts: n/a

Default Re: NULLs - 12-30-2007 , 01:23 PM




"stevedtrm" <stevedtrm (AT) hotmail (DOT) com> wrote

Quote:
If everyone is clear NULLS shouldn't be used, why the debate as to what
to do about them ?

Because SQL allows NULL and even promotes the idea that NULL solves some
problem instead of introducing many.

So everyone is agreed that NULLs shouldn't appear anywhere, and its
just a matter of time before NULLS become a legacy problem and a
relational language supercedes SQL?

No, Steve, not everyone is agreed.

Quote:
Are the two solutions I suggested before the widely accepted as
resolutions to the two problems NULLs were introduced to eradicate?

Missing information? Then surely there should just be no tuple?

A row may contain a lot more information that would otherwise be useful.
Would you tell a potential customer that you won't serve them because you
don't know everything about them? I think not! Haven't you been asked for
your e-mail address at a retailer? What would you think if the cashier told
you to get lost for refusing to supply it?

Quote:
To indicate that there can be no value? Why not a seperate table with
a boolean value in the non-key column?
In my opinion, null should only ever be used to indicate "There should be a
value here, but it hasn't been supplied." That there should be a value or
that there must be a value are structural issues, and therefore should not
be subject to interpretation. "No value here" invites interpretation,
because it begs the question, "Should there have even been a value?"

Structural information is lost by separating a nullable column into its own
table: the fact that there should always be a value, even if it hasn't been
supplied is lost. That's not to say that it isn't possible to eliminate
nulls altoghether, it's just that you need several separate tables instead
of one: one table for the values that must always be present, one table for
each nullable column (or set of columns in those cases where whenever one
value is supplied, another value must also be supplied) that indicates
whether or not there should be a value (or set of values), and one table
with a column (or set of columns) for those values that just happen to have
been supplied.

Note also that the spectre of 3VL disappears when null indicates only "there
should be a value here, but it hasn't been supplied." Since there should be
a value, and only one value, a null can simply be replaced by an exclusive
disjunction that encompasses all of the values in the attribute's domain.
So there is no need for 3VL, 2VL suffices. (For infinite domains, a
quantifier could be used that means instead of "there is...," "there is
exactly one....")




Reply With Quote
  #29  
Old   
JOG
 
Posts: n/a

Default Re: NULLs - 12-30-2007 , 02:09 PM



On Dec 30, 7:23 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
Quote:
"stevedtrm" <steved... (AT) hotmail (DOT) com> wrote in message

news:a3716365-7bc4-4a75-ab58-ee7ac957c299 (AT) n20g2000hsh (DOT) googlegroups.com...

If everyone is clear NULLS shouldn't be used, why the debate as to what
to do about them ?

Because SQL allows NULL and even promotes the idea that NULL solves some
problem instead of introducing many.

So everyone is agreed that NULLs shouldn't appear anywhere, and its
just a matter of time before NULLS become a legacy problem and a
relational language supercedes SQL?

No, Steve, not everyone is agreed.

Are the two solutions I suggested before the widely accepted as
resolutions to the two problems NULLs were introduced to eradicate?

Missing information? Then surely there should just be no tuple?

A row may contain a lot more information that would otherwise be useful.
Would you tell a potential customer that you won't serve them because you
don't know everything about them? I think not! Haven't you been asked for
your e-mail address at a retailer? What would you think if the cashier told
you to get lost for refusing to supply it?

To indicate that there can be no value? Why not a seperate table with
a boolean value in the non-key column?

In my opinion, null should only ever be used to indicate "There should be a
value here, but it hasn't been supplied." That there should be a value or
that there must be a value are structural issues, and therefore should not
be subject to interpretation. "No value here" invites interpretation,
because it begs the question, "Should there have even been a value?"

Structural information is lost by separating a nullable column into its own
table: the fact that there should always be a value, even if it hasn't been
supplied is lost.
I think this could be put better. It is not that there should always
be a value, but rather that "a value exists in the real world".
However I don't really see what is structurally lost by complete
decomposition to eliminate nulls, that can't be remedied via
constraints.

Quote:
That's not to say that it isn't possible to eliminate
nulls altoghether, it's just that you need several separate tables instead
of one: one table for the values that must always be present, one table for
each nullable column (or set of columns in those cases where whenever one
value is supplied, another value must also be supplied) that indicates
whether or not there should be a value (or set of values), and one table
with a column (or set of columns) for those values that just happen to have
been supplied.

Note also that the spectre of 3VL disappears when null indicates only "there
should be a value here, but it hasn't been supplied." Since there should be
a value, and only one value, a null can simply be replaced by an exclusive
disjunction that encompasses all of the values in the attribute's domain.
So there is no need for 3VL, 2VL suffices. (For infinite domains, a
quantifier could be used that means instead of "there is...," "there is
exactly one....")


Reply With Quote
  #30  
Old   
David Cressey
 
Posts: n/a

Default Re: NULLs - 12-30-2007 , 04:42 PM




"Brian Selzer" <brian (AT) selzer-software (DOT) com> wrote

Quote:
"stevedtrm" <stevedtrm (AT) hotmail (DOT) com> wrote in message
news:a3716365-7bc4-4a75-ab58-ee7ac957c299 (AT) n20g2000hsh (DOT) googlegroups.com...
If everyone is clear NULLS shouldn't be used, why the debate as to
what
to do about them ?

Because SQL allows NULL and even promotes the idea that NULL solves
some
problem instead of introducing many.

So everyone is agreed that NULLs shouldn't appear anywhere, and its
just a matter of time before NULLS become a legacy problem and a
relational language supercedes SQL?


No, Steve, not everyone is agreed.

Are the two solutions I suggested before the widely accepted as
resolutions to the two problems NULLs were introduced to eradicate?

Missing information? Then surely there should just be no tuple?


A row may contain a lot more information that would otherwise be useful.
Would you tell a potential customer that you won't serve them because you
don't know everything about them? I think not! Haven't you been asked
for
your e-mail address at a retailer? What would you think if the cashier
told
you to get lost for refusing to supply it?

To indicate that there can be no value? Why not a seperate table with
a boolean value in the non-key column?

In my opinion, null should only ever be used to indicate "There should be
a
value here, but it hasn't been supplied."
A quibble.

I would say that it indicates that there is space for a value here, but no
value has been place in the space.

As for the rest of it, we've had this discussion in c.d.t. before, at
least a dozen times.





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.