![]() | |
#41
| |||
| |||
|
|
On Dec 28, 10:59 pm, Hugo Kornelis I doubt it. Given this information: * "Employee Jack is male" * "Employee Mary is female" * "Employee Jack is 43 years old" * "Employee JJ is 32 years old" how would you answer the below questions: * "List all employees aged 40 and above". * "What is the average age of our employees?" * "For each employee, how many years left until retirement (assume a country with laws for retirement age of 65)" * "List all employees that are female, under 35 years old, or both" * "Is JJ older than Mary?" * etc Yup, I catch your drift. However I believe these questions should have "in propositions that you know" concatenated to them, so we still have 2VL, and leaving us humans in the real world to interpret the results. The CWA is a nonsense imo. But I do agree that with CWA, in RM, 3VL is very difficult to avoid, and nulls are incredibly tempting to use. My personal conclusion therefore is that perhaps we are trying to address /symptoms/ and not the root cause of the issue, and a step back is required. |
#42
| |||
| |||
|
|
In the presence of missing information the query "List all employees aged 40 and above" cannot be answered at all. Instead the query should be "List all employees known by the HR department to be aged 40 and above" (snip) |
|
In the case of the question "Is JJ older than Mary?" This should be "Is it known to HR department that JJ is older than Mary?" in which case the answer is no. See, no 3VL required. We can also ask whether it is known to HR department that JJ is not older than Mary". The answer is no as well; so we have been able to tell (using only efficient 2VL) that the DB doesn't know who it older. |
#43
| ||||
| ||||
|
|
That would be the case if I were intentionally storing the fact that an age is not on file, and using (or rather: abusing) the age column for that. But I'm not. I'm just storing ages I have on file. Then surely you are trying to fit propositions of different structures into a mathematical construct (i.e. a relation) that proscribes such an undertaking. The conclusion must be that relations will not do the job that you require of them? |
|
Yes, if there is no value (as represented by the NULL marker) in this column in a specific row, then you can infer from this that I have no age on file for the person described by that row - but this is an unintended though unavoidable by-effect. Just as the fact that you can infer "Jack is an adult" from "Jack is 43 years old" - surely you wouldn't use that to maintain that storing an age violated 1NF? I think there is a difference Hugo. Stating the Jack is an adult is an inference from correct mathematical application of a relation to store values, whereas a NULL flag (whatever its meaning) is not a value. I understand what you aim to achieve but it seems perhaps that relations are the wrong vehicle. |
|
It still appears to be a hack imo, and one still ends up with 3VL. 3VL is not a result of using NULL to represent missing information, but a result of allowing missing information. IMO, there are only two options: either you deal with missing information, and with the 3VL that results from it -- or you somehow alter reality so that information is never missing again, for any reason. I think I disagree there. In predicate logic I can state (and question) everything I know about the world without turning to 3VL, so it must be possible in a data model surely? If I know an attribute does or doesn't exist, then I can state that formally (with existential quantifiers), and If I don't know anything about that attribute...well I just don't say anything about it at all. |
|
Happy New Year, Jim. |
#44
| |||
| |||
|
|
On Dec 28, 2:59 pm, Hugo Kornelis h... (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote: 3VL is not a result of using NULL to represent missing information, but a result of allowing missing information. IMO, there are only two options: either you deal with missing information, and with the 3VL that results from it -- or you somehow alter reality so that information is never missing again, for any reason. Full normalization is a third option. |
#45
| |||
| |||
|
|
Hugo Kornelis schreef: See my reply to JOG's message. Hi Hugo, so glad to see you again. :-) |
|
BTW, looking for a nice job ^_^ ? |
#46
| |||
| |||
|
|
On Fri, 28 Dec 2007 17:59:13 -0800 (PST), David BL wrote: (snip)>In the presence of missing information the query * *"List all employees aged 40 and above" cannot be answered at all. *Instead the query should be * *"List all employees known by the HR department to be aged 40 and above" (snip) In the case of the question * *"Is JJ older than Mary?" This should be * *"Is it known to HR department that JJ is older than Mary?" in which case the answer is no. * See, no 3VL required. We can also ask whether it is known to HR department that JJ is not older than Mary". * The answer is no as well; so we have been able to tell (using only efficient 2VL) that the DB doesn't know who it older. Hi David, Happy new year to you and all else still readin gthis thread. You make some fair points. Thanks for that. In a purely pragmatic sense, are there any situations where this way of looking at things produces different results than the traditional SQL approach? It doesn't in the case of the examples above: * "List all employees known by the HR department to be aged 40 and above" will exclude employees with no age on file, just as "SELECT name FREM employees WHERE age > 40" will exclude employees with age NULL. |
|
Best, Hugo |
#47
| |||
| |||
|
|
When implementing relations in what is currently usually called a relational database, it *is* possible to create a table per elementary fact type. In that case, no NULLs are ever needed. The facts * "Employee Jack is male" * "Employee Mary is female" * "Employee Jack is 43 years old" * "Employee JJ is 32 years old" are represented in two tables, like this: EmployeeAge EmployeeGender name | age name | gender -----+----- -----+-------- Jack | 43 Mary | female JJ | 32 Jack | male But for various reasons (maintainability and performance being the most important), it is often preferable to combine these relations in a single table, like this: Employee name | age | gender -----+-----+------- Jack | 43 | male JJ | 32 | Mary | | female Some way has to be found to ensure that JJ's gender is not mistaken as '' (empty string), and to ensure that no datatype mismatch error is raised on the blank age of Mary (as a blank is not numeric data). The best (or, if you prefer, least bad) answer to that problem is NULL. |
|
Coming from this background, you'll probably understand why I have little issue with NULL - as it is (to me) just a marker that says "hey, if you hadn't insisted on stuffing the fact types EmployeeAge and EmployeeGender in the same table, there simply would have been no row for Mary in the EmployeeAge table". |
#48
| |||
| |||
|
|
Hugo Kornelis <hugo (AT) perFact (DOT) REMOVETHIS.info.INVALID> wrote: [snip] When implementing relations in what is currently usually called a relational database, it *is* possible to create a table per elementary fact type. In that case, no NULLs are ever needed. The facts * "Employee Jack is male" * "Employee Mary is female" * "Employee Jack is 43 years old" * "Employee JJ is 32 years old" are represented in two tables, like this: EmployeeAge EmployeeGender name | age name | gender -----+----- -----+-------- Jack | 43 Mary | female JJ | 32 Jack | male But for various reasons (maintainability and performance being the most important), it is often preferable to combine these relations in a single table, like this: Employee name | age | gender -----+-----+------- Jack | 43 | male JJ | 32 | Mary | | female Some way has to be found to ensure that JJ's gender is not mistaken as '' (empty string), and to ensure that no datatype mismatch error is raised on the blank age of Mary (as a blank is not numeric data). The best (or, if you prefer, least bad) answer to that problem is NULL. Which make the one-table example less maintainable because of the special cases. As to performance, I quite often query for very limited numbers of columns, so the issue of columns being split up into different tables is not the problem it might appear to be. |
#49
| ||||
| ||||
|
|
On Fri, 28 Dec 2007 17:59:13 -0800 (PST), David BL wrote: (snip)>In the presence of missing information the query "List all employees aged 40 and above" cannot be answered at all. Instead the query should be "List all employees known by the HR department to be aged 40 and above" (snip) In the case of the question "Is JJ older than Mary?" This should be "Is it known to HR department that JJ is older than Mary?" in which case the answer is no. See, no 3VL required. We can also ask whether it is known to HR department that JJ is not older than Mary". The answer is no as well; so we have been able to tell (using only efficient 2VL) that the DB doesn't know who it older. Hi David, Happy new year to you and all else still readin gthis thread. You make some fair points. Thanks for that. In a purely pragmatic sense, are there any situations where this way of looking at things produces different results than the traditional SQL approach? |
|
It doesn't in the case of the examples above: * "List all employees known by the HR department to be aged 40 and above" will exclude employees with no age on file, just as "SELECT name FREM employees WHERE age > 40" will exclude employees with age NULL. * "Is it known to HR department that JJ is older than Mary?" will result in "no", so that the ELSE part of a conditional statement is executed, just as "IF (SELECT age FROM employees WHERE name = 'JJ') > (SELECT age FROM employees WHERE name = 'Mary')" will evaluate to unknown, skip the conditional action and execute the ELSE action. Let's look at some examples that are often used to illustrate the evilness of 3VL: aggregation and tautologies: * "Return the average of all ages known to HR department" will return the exact same value as AVG(age) currently does (i.e. ages not known to HR department are disregarded by the former, and NULL ages are disregarded by the latter). |
|
* "Is it known to HR department that JJ is older than Mary or that JJ is not older than Mary?" - neither is known, so this would evaluate to False OR False = False. While technicallly possibly correct, I'm quite sure that this result will enocunter the same amount of opposition as the Unknown result that SQL will produce on (simplified) "32 > NULL OR NOT (32 > NULL)" [which of coourse is again treated the same as False]. |
|
So, while you technically succeeded in answering the questions without using 3VL, this didn't solve the problems usually associated with NULL and 3VL. Instead, you found a way to introduce these problems into 2VL. |
#50
| |||
| |||
|
|
On Dec 30 2007, 12:00 am, JOG <j... (AT) cs (DOT) nott.ac.uk> wrote: On Dec 28, 10:59 pm, Hugo Kornelis I doubt it. Given this information: * "Employee Jack is male" * "Employee Mary is female" * "Employee Jack is 43 years old" * "Employee JJ is 32 years old" how would you answer the below questions: * "List all employees aged 40 and above". * "What is the average age of our employees?" * "For each employee, how many years left until retirement (assume a country with laws for retirement age of 65)" * "List all employees that are female, under 35 years old, or both" * "Is JJ older than Mary?" * etc Yup, I catch your drift. However I believe these questions should have "in propositions that you know" concatenated to them, so we still have 2VL, and leaving us humans in the real world to interpret the results. The CWA is a nonsense imo. But I do agree that with CWA, in RM, 3VL is very difficult to avoid, and nulls are incredibly tempting to use. My personal conclusion therefore is that perhaps we are trying to address /symptoms/ and not the root cause of the issue, and a step back is required. It seems to me that it's the OWA that invites 3VL. I'm thinking that the CWA simply means that an extension is uniquely defined given the intension and when intensional definitions are stated properly 2VL applies. I think OWA means a recorded extension may only be a subset of the actual extension corresponding to the stated intension, because OWA asserts that a system's knowledge is incomplete. It is the OWA that forces the logical inference system to deal with partial information. Have I got it wrong? |
![]() |
| Thread Tools | |
| Display Modes | |
| |