dbTalk Databases Forums  

teaching relational basics to people, questions

comp.databases.theory comp.databases.theory


Discuss teaching relational basics to people, questions in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Bob Badour
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-21-2009 , 11:56 AM






Roy Hann wrote:

Quote:
Sampo Syreeni wrote:

[snip] And since
one would have to have a bona fide range datatype, building in
handling for infinite ranges would also be easy; that'd get rid of one
of the most persistent reasons why people incorporate nulls into
designs.

I think you are being excessively optimistic. The most persistent (and
most common) reason people incorporate nullable columns into designs is
because they have a misplaced desire to minimize the number of tables in
the design, and think that conflating multiple fact types in one table
is clever, efficient, and harmless.
Very well put!

Reply With Quote
  #12  
Old   
Reinier Post
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-21-2009 , 02:12 PM






Sampo Syreeni wrote:

[...]

Quote:
variable precision would be available at least at the conceptual
level. At the physical one, it could always be implemented so that the
complexity if bounded by the less precise of the compared values, or
in fact limited to some fixed value at physical design time.
The most common types of database values can be represented
in constant space and compared in constant time. For other
types such as stringsor blobs of arbitrary size at least comparison
is linear in the size of the values. In your proposal, the values
are *arbitrary* reals. For all we know, they may be represented
by algorithms to compute them - in which case comparing them may be
pretty expensive - or even worse, they may be uncomputable.
You'll need to constrain them at least to the extent that comparing them
becomes a decidable problem.

Quote:
And since
one would have to have a bona fide range datatype, building in
handling for infinite ranges would also be easy; that'd get rid of one
of the most persistent reasons why people incorporate nulls into
designs.
Yes, I agree with your basic idea, but I really think you want to
replace 'arbitrary real number' with 'arbitrary precision integer'
or 'arbitrary precision integer quotient' or something similar.
They are similar to the strings and blobs we already have.

Quote:
--
Sampo
--
Reinier

Reply With Quote
  #13  
Old   
Mr. Scott
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-22-2009 , 03:26 AM



"Roy Hann" <specially (AT) processed (DOT) almost.meat> wrote

Quote:
Sampo Syreeni wrote:

[snip] And since
one would have to have a bona fide range datatype, building in
handling for infinite ranges would also be easy; that'd get rid of one
of the most persistent reasons why people incorporate nulls into
designs.

I think you are being excessively optimistic. The most persistent (and
most common) reason people incorporate nullable columns into designs is
because they have a misplaced desire to minimize the number of tables in
the design, and think that conflating multiple fact types in one table
is clever, efficient, and harmless.
I think you are oversimplifying. The most persistent (and most common)
reason people incorporate nullable columns into designs is the not so
misplaced desire to provide for information that is relevant but not
required, and that doesn't necessarily involve conflating multiple fact
types in one table. Each table design represents a family of interdependent
predicates. For example, the the simple table design,

{EMPLOYEE, DEPENDENTS} KEY(EMPLOYEE)

could represent the interdependent predicates,

'there is an employee <EMPLOYEE>' iff
'employee <EMPLOYEE> claims <DEPENDENTS> dependents.'

The predicates are interdependent. Given a row (EMP1, 2), the assertion
that 'there is an employee EMP1' implies that 'employee EMP1 claims 2
dependents' due to the functional dependency EMPLOYEE -> DEPENDENTS, while
at the same time the assertion that 'employee EMP1 claims 2 dependents'
implies that 'there is an employee EMP1.'

Permitting nulls in the DEPENDENTS column doesn't change the fact that each
employee claims some number of dependents, so long as the predicates remain
interdependent. What it does do is to permit recording the assertion that
there is an employee even when the precise number of dependents to be
claimed is as yet unknown. I should point out that splitting up the table
in order to eliminate the nulls doesn't fix anything. For example,
splitting the table into

T1 {EMPLOYEE} and T2 {EMPLOYEE, DEPENDENTS}

where T1 represents the predicate,

'there is an employee <EMPLOYEE>,'

and T2 represents the predicate,

'employee <EMPLOYEE> claims <DEPENDENT> dependents,'

requires the introduction of referential constraints from T2 to T1 and also
from T1 to T2 since the original predicates are interdependent. Dropping
the referential constraint from T1 to T2 is a problem because the fact that
each employee claims some number of dependents is no longer explicitly
specified nor is it a logical consequence of the constraints that remain.

The problem you're referring to, the one which involves conflating multiple
fact types, occurs only when the relationship between the predicates is
disjunctive. For example,

'there is an employee <EMPLOYEE>' or
('there is an employee <EMPLOYEE>' iff
'employee <EMPLOYEE> claims <DEPENDENT> dependents.')

It should be easy to see that this mirrors the split tables T1 and T2
without the referential constraint from T1 to T2. While one can't determine
whether null indicates that an employee doesn't claim dependents or that the
number of dependents claimed is not yet known, one also can't determine
whether the absence of a row in T2 indicates that an employee doesn't claim
dependents or just that the number of dependents claimed is not yet known.

What troubles me most about eliminating nulls just for the sake of
eliminating nulls is that there isn't always a precise mapping between a
database scheme that permits nulls and one that doesn't. Information is
lost, such as the fact that each employee claims some number of dependents
in the example above.

Quote:
--
Roy


Reply With Quote
  #14  
Old   
vldm10
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-22-2009 , 06:23 AM



On Nov 16, 8:42*pm, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:
Quote:
Right now it is to be expected that I will be spreading the good
relational word among my peers, in the near future. That is an
opportunity one doesn't want to fuck up; many enough have gone down
that road already. So I've been going over, and over, and over the
basics. Don't want them to be able to catch me off guard with the
minutiae, after all...

So now I bump into my first real surprise, and the chills immediately
go down my spine. That's Date et al.'s answer regarding the
implications between 6NF and DK/NF, athttp://www.dbdebunk.com/page/page/621935.htm
. In there they flat out state that DK/NF doesn't imply 6NF.

So, my first question is, can this really be true? I mean, this seems
highly suspect to me: since 6NF is a normal form like any other and is
as such defined by the constraints it upholds by design, and on the
other hand DK/NF is by definition a normal form where any constraint
whatsoever follows from the domain and key ones, shouldn't it be self-
evident that DK/NF logically implies 6NF, and in fact any other form?
No matter the fact that there might well be databases which could be
put into 6NF which cannot attain DK/NF? I think at the very least said
implication should follow at the price of making it a vacuous truth
(i.e. all (non-trivial?) 6NF databases could be such that they cannot
be put into DK/NF)?

In particular I suspect that the seeming lack of implication follows
from not treating the time dimension(s) on an equal footing with the
rest of the attributes in a relation. That, then, would at least to me
seem like a rather grave violation of the information principle.

The second point ain't as much a rebuke as a retort: I wonder whether
Date and Darwen chose their model of time -- which 6NF is defined on
top of -- based on convenience and familiarity, instead of some deeper
theoretical reasoning. To me the idea that time in a relational
database should be treated as a discrete, countably infinite set of
disjoint moments at a preset temporal granularity seems just
unnatural, and unnecessarily limiting.

To me it would seem much more natural to model time as a full
continuum of precise moments in time, and to constrain such real life
models using a finite (but otherwise unlimited cardinality) set of
FOPL constraints, relying on the full linear order on top of the
reals, on top. I.e. to model time using CW-complexes over the real
line (i.e. finite unions of open, closed and semi-closed intervals of
reals), in a fully discrete but also fully variable precision
approximation.

Model-wise, 6NF as D&D define it immediately generalizes to this --
all that needs to be changed is to quantify every defining formula
over the corresponding nondenumerable set -- yet the possibility of
rigorously modelling the interaction between open and closed intervals
as well is a considerable plus when dealing with general intersection
queries. I also consider the the fact that imputing any kind of chosen-
ahead granularity parameter into the basic model suddenly becomes
unnecessary a huge plus. So, do you think this sort of approach is
sound?

Finally, I of course have the firm intention of covering the
essentials, including the basics of dependency and normalization
theory at least upto 6NF and DK/NF. If my audience proves to be game,
I'd also like to mention in passing some of the lesser known, more
esoteric, and less fully researched topics in dependency theory like
(E)(B)MVD's (cf. e.g.http://www2.cs.uregina.ca/~butz/publications/ipmu00.pdf
), just to make sure people don't accidentally think they've mastered
the subject after what is a mere, hurried, introduction. I'd hope to
pique some genuine interest in the relational way of thought, among
people who perhaps haven't been exposed to the mindset, eventhough
otherwise more than capable in modelling data. If you could suggest
other ways to accomplish the feat, I would greatly appreciate a hint.
--
Sampo

Regarding DK/NF and 6NF you can see my solution at www.dbdesign11.com.
There I introduce “Simple Form” an effective solution which decomposes
any relation to Binary Relations. My solution distinguishes two kinds
of DBs. DBs that don’t maintain changes (Naive DBs) and DBs which
maintain changes (General DBs). The Simple Form for Naive DBs is given
in 6.5 while Simple Form for General DBs is given in 4.2.9.

DK/NF and 6NF don’t distinguish Naive DBs from General DBs.
R. Fagin in his paper wrote: “The very important practical question of
exactly when DK/NF can be obtained (and how to obtain it) is open.”
What is a purpose of 6NF if key is compound? I mean very compound.

The Simple Form don’t need any other NF. The binary schema can be
immediately constructed. Obviously Binary Relations are superior to
NFs.

The idea of Binary Relations can be applied to Binary Files (see
example 5). However the most general case is Binary Concept. Here I
introduce Identifier of Entity which is very important for
construction of entity’s abstraction. Binary Concepts are related to
facts see (3.4 – 3.9). I introduce facts as primitives i.e. a fact is
the smallest complete unit by means of which a semantic act.

Regarding time my solution is event oriented. There are only two kinds
of events related to information (see 3.1). So - no events imply no
time. These events are in the real world. DB’s events are also in the
real world. So my DB design is very much related to the real world and
always it enables determination of the corresponding real world
events.
Hope this can help to determine answers on your questions.
Vladimir Odrljin

Reply With Quote
  #15  
Old   
toby
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-22-2009 , 04:34 PM



On Nov 18, 2:37*am, Casey Hawthorne <caseyhHAMMER_T... (AT) istar (DOT) ca>
wrote:
Quote:
Somebody once called me pretentious.

I replied, "Moi?"
A joke used in Fawlty Towers

Reply With Quote
  #16  
Old   
vldm10
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-23-2009 , 05:23 AM



On Nov 22, 12:23*pm, vldm10 <vld... (AT) yahoo (DOT) com> wrote:
Quote:
On Nov 16, 8:42*pm, Sampo Syreeni <de... (AT) iki (DOT) fi> wrote:

Quote:
What is a purpose of 6NF if key is compound? I mean very compound.

Vladimir Odrljin- Hide quoted text -

To illustrate above question I will give you the following example:
1.
R1{A,B,C,D} and KEY{A,B,C,D}.
My question is: What is an equivalent set of 6NF relvars for R1?

2.
Now we want to maintain changes of attributes. Then R1 become:
R1{A, StartDateA, EndDateA,…,EndDateD}
My question is: What is an equivalent set of 6NF relvars for R1?

3.Now I can add for example one more relvar:
R2{A1, StartDateA1, EndDateA1,…, EndDateA10}.

4.
Now I have m-n relationship between R1 and R2. The relationship also
changes its attributes.
What is the key for this relationship
What is an equivalent set of 6NF relvars for this m-n relationship?

This is really simple example. There are dynamic db applications with
hundreds of relvars.
I can have an additional columns per one attribute. For examle in my
solution (see www.dbdesign11.com example 9) I have 6 additional
columns per one attribute.
For example I can insist that all date be in form YY, MM, DD, HH, MIN,
SEC – but I must take date as timestamp (technical reasons - in the
theory)
What will be 6NF equivalent for last m-n relationship?

Vladimir Odrljin

Reply With Quote
  #17  
Old   
Roy Hann
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-23-2009 , 06:05 AM



Mr. Scott wrote:

Quote:
"Roy Hann" <specially (AT) processed (DOT) almost.meat> wrote in message
news:3eudnYx_I7CpiZXWnZ2dnUVZ7qSdnZ2d (AT) pipex (DOT) net...
Sampo Syreeni wrote:

[snip] And since
one would have to have a bona fide range datatype, building in
handling for infinite ranges would also be easy; that'd get rid of one
of the most persistent reasons why people incorporate nulls into
designs.

I think you are being excessively optimistic. The most persistent (and
most common) reason people incorporate nullable columns into designs is
because they have a misplaced desire to minimize the number of tables in
the design, and think that conflating multiple fact types in one table
is clever, efficient, and harmless.

I think you are oversimplifying. The most persistent (and most common)
reason people incorporate nullable columns into designs is the not so
misplaced desire to provide for information that is relevant but not
required, and that doesn't necessarily involve conflating multiple fact
types in one table. Each table design represents a family of interdependent
predicates.
That last sentence is the crux of your objection to my comment. I guess
since you and I are implicitly talking about SQL and since SQL makes few
if any claims to fidelity with the relational model, you are entitled
to think whatever you like about tables that represent "a family of
interdependent predicates". Personally I have no idea what those are
supposed to behave like, but I am pretty sure that in practice they
must suck. Certainly every example I've ever had to deal with did. At
this point I invoke the Principle of Incoherence and drop out of the
discussion.

I will however allow that you may be right that yours is an even more
common reason people introduce nullable columns. It would be worth
investigating which really is most common, so that appropriate remedial
training can be devised.

[much erudite-looking stuff snipped]

--
Roy

Reply With Quote
  #18  
Old   
Mr. Scott
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-26-2009 , 05:19 AM



"Roy Hann" <specially (AT) processed (DOT) almost.meat> wrote

Quote:
Mr. Scott wrote:


"Roy Hann" <specially (AT) processed (DOT) almost.meat> wrote in message
news:3eudnYx_I7CpiZXWnZ2dnUVZ7qSdnZ2d (AT) pipex (DOT) net...
Sampo Syreeni wrote:

[snip] And since
one would have to have a bona fide range datatype, building in
handling for infinite ranges would also be easy; that'd get rid of one
of the most persistent reasons why people incorporate nulls into
designs.

I think you are being excessively optimistic. The most persistent (and
most common) reason people incorporate nullable columns into designs is
because they have a misplaced desire to minimize the number of tables in
the design, and think that conflating multiple fact types in one table
is clever, efficient, and harmless.

I think you are oversimplifying. The most persistent (and most common)
reason people incorporate nullable columns into designs is the not so
misplaced desire to provide for information that is relevant but not
required, and that doesn't necessarily involve conflating multiple fact
types in one table. Each table design represents a family of
interdependent
predicates.

That last sentence is the crux of your objection to my comment. I guess
since you and I are implicitly talking about SQL and since SQL makes few
if any claims to fidelity with the relational model, you are entitled
to think whatever you like about tables that represent "a family of
interdependent predicates". Personally I have no idea what those are
supposed to behave like,
That sentence applies not only to SQL tables but also to Codd's time-varying
relations and Date and Darwen's relvars. Date defines a relvar's predicate
as the conjunction of all of the constraints that apply to it, but I think
he's wrong. The logical connective should be IFF rather than AND. The
difference is subtle, and may at first glance appear problematic since IFF
is true whenever none of its operands are true, but under the closed world
assumption, the only atomic formulas that are ever represented in the
database are those that are supposed to be true, so there is no harm in
choosing IFF over AND. The main reason I think the connective should be IFF
rather than AND involves deletes. While inserting a row effectively asserts
that all of the atomic formulas represented by the row are true, regardless
of the whether the logical connective is IFF or AND, the same can't be said
for deletes. When the logical connective is IFF, deleting a row effectively
denies that any (not all) of the atomic formulas represented by the row are
true, but when the logical connective is AND, deleting a row effectively
denies that all (not any) of the atomic formulas represented by the row are
true, which can be even if only one isn't. If some but not all of the
atomic formulas represented by a row were true, but a row can't be in the
table unless they're all true, then where is that positive information to be
stored? Without anywhere to store it, the database is inconsistent.

Quote:
but I am pretty sure that in practice they
must suck. Certainly every example I've ever had to deal with did. At
this point I invoke the Principle of Incoherence and drop out of the
discussion.

I will however allow that you may be right that yours is an even more
common reason people introduce nullable columns. It would be worth
investigating which really is most common, so that appropriate remedial
training can be devised.

[much erudite-looking stuff snipped]

--
Roy

Reply With Quote
  #19  
Old   
Clifford Heath
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-26-2009 , 11:13 PM



vldm10 wrote:
Quote:
Regarding DK/NF and 6NF you can see my solution at www.dbdesign11.com.
There I introduce “Simple Form” an effective solution which decomposes
any relation to Binary Relations.
Your approach seems very similar in general to NIAM
and other fact-oriented modeling approaches like ORM2.
They are built on "Elementary Form", which in ORM2 can
include fact types that are ternary and higher (though
these are always trivial to binarise, the higher-order
fact types are useful in modeling because they reflect
natural verbalisations).

The main difference is that in fact orientation, the
model is expected always to be *constructed* in elementary
form. For efficiency of storage and access, uniqueness
constraints allow automatic and invisible aggregation
into non-elementary structures. This means you can model
in elementary form, and get a correct and efficient schema
which holds the same facts to use with your SQL DBMS. This
is the implementation principle behind my "Constellation
Query Language" as well.

You use the term "attribute" a lot. In fact orientation,
there are no attributes, they are just fact types that
encompass a functional dependency between objects. To drop
the notion of "attribute" is a major win for a number of
reasons, including that it isn't a clearly-defined concept
in natural usage; vis the conflict between relational and
object-oriented aggregation; and also the fact that
"attribute migration" is a major driver of schema evolution,
hence project scope inflation and failure.

In any case, before you go claiming your work as "novel",
it'd be good if you did some reading on fact orientation
and see whether others haven't been there before you, like,
for example, more than twenty years ago ;-). This isn't a
criticism actually; I haven't looked in sufficient depth
at your work to decide whether you've done something new.
But a lot of it does seem very familiar...

Quote:
Regarding time my solution is event oriented.
I like this view - it accords with my thoughts on the
quantization of time where the general principle that
"time is just G*d's way of keeping everything from happening
at once" ;-).

Terry Halpin (creator of ORM2) has a recent article series
on temporal modeling published at brcommunity.com, if
you're interested.

Clifford Heath, Data Constellation, http://dataconstellation.com
Agile Information Management and Design.

Reply With Quote
  #20  
Old   
compdb@hotmail.com
 
Posts: n/a

Default Re: teaching relational basics to people, questions - 11-27-2009 , 09:53 PM



On Nov 26, 2:19 am, "Mr. Scott" <do_not_re... (AT) noone (DOT) com> wrote:

You really don't understand the relational model.

Quote:
That sentence applies not only to SQL tables but also to Codd's time-varying
relations and Date and Darwen's relvars. Date defines a relvar's predicate
as the conjunction of all of the constraints that apply to it, but I think
he's wrong.
The "relvar predicate" that is the conjunction of all its constraints
is the characteristic predicate of the set of tuples that can ever be
in the relation.
Date now calls this the "total relvar constraint" but it used to be
called the
"internal predicate" (which term is now deprecated by Date).
But there is a different "relvar predicate", which the designer
specifies
to tell the user what the relvar is saying about the world.
Date now calls this the "relvar predicate"; it used to be called the
"external predicate" (which term is now deprecated by Date).

The relationship between the two "predicates" is as follows.
The designer specifies the relvar predicate so a user updating the
database
can observe the world and figure out whether a given tuple makes it
true
and so a user looking at the database can find out what is true of the
world.
Next the designer figures out all the possible tuples that could ever
make a
relvar predicate true of the world and then writes the total relvar
constraint
so the dbms can tell the user they fmade an error if they ever try to
put some other tuple into the relation.

Quote:
The logical connective should be IFF rather than AND. The
difference is subtle, and may at first glance appear problematic since IFF
is true whenever none of its operands are true, but under the closed world
assumption, the only atomic formulas that are ever represented in the
database are those that are supposed to be true, so there is no harm in
choosing IFF over AND. The main reason I think the connective should be IFF
rather than AND involves deletes. While inserting a row effectively asserts
that all of the atomic formulas represented by the row are true
A row in a relation asserts one proposition for each syntactically
valid row.
If a such a row is in the relation then the proposition is
its relvar predicate with its free variables replaced by attribute
values.
If such a row is not in the relation then the proposition is
its relvar predicate with its free variables replaced by attribute
values, negated.
Since these are all asserted, that's the same as asserting their
conjunction.

If one were to use the "open world assumption" then the latter
propositions
are not asserted. But then a dbms cannot calculate the answer for a
query that
can only be written using a "NOT"
(or in relational algebra, MINUS, assuming the other primitives are
JOIN, UNION, PROJECT, EQUALS-RESTRICT and RENAME).

, regardless
Quote:
of the whether the logical connective is IFF or AND,
I don't know what "atomic formulas" you think are being ANDed/IFFed.
But if the database were asserting the IFF of some propositions then
it would be saying they are either all true or all false.
So a user would know that either they are all true or all false, but
not which.
(That's a rather moot hypothetical since it *isn't* what a database
asserts.)
Whereas in the relational model the user knows they are all true.
(That is, all the propositions above, some of which are ground terms
and some of which are negated ground terms, are all true.)

Quote:
the same can't be said
for deletes. When the logical connective is IFF, deleting a row effectively
denies that any (not all) of the atomic formulas represented by the row are
true, but when the logical connective is AND, deleting a row effectively
denies that all (not any) of the atomic formulas represented by the row are
true, which can be even if only one isn't.
I don't know what you mean by "the atomic formulas represented by a
row".
If one thinks of a relation's predicate as being an arbitrary wff
(it's usually thought of asbeing in natural language,
or wffs with natural language for ground terms)
then the truth value of a particular constituent atomic formula when
the
overall predicate is true depends (like usual in predicate logic)
on the connectives/quantifiers.
A row does not have a bunch of things ANDed together;
the propositions above are ANDed together (or more simply, just
asserted).

After a delete a row it is no longer in the relation.
So the database is asserting the negation of the proposition that you
get by
substituting its attribute values for attribute names in the
proposition.

Quote:
If some but not all of the
atomic formulas represented by a row were true, but a row can't be in the
table unless they're all true, then where is that positive information to be
stored? Without anywhere to store it, the database is inconsistent.
Once again, a row does not in general assert the conjunction of wffs.

If you think otherwise please explain very clearly,
hopefully with a fully worked out simple example,
because you won't be able to assume I know what you
are talking about, because what you have written is not
how the relational model works.
From this message you should also be able to work that example
out "correctly".

philip

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.