dbTalk Databases Forums  

Another view on analysis and ER

comp.databases.theory comp.databases.theory


Discuss Another view on analysis and ER in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #41  
Old   
JOG
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-06-2007 , 02:36 PM






On Dec 6, 7:49 pm, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
[snip general agreement]
Nevertheless we
believe that the identifier can be used successfully in the real
world. This could depend on the population statistics, such as the
way we identify people around us by their facial characteristics, and
it is helpful that people don't tend to have extensive plastic
surgery, exchange body parts on a regular basis or cross dress. This
allows them to be identified by names like "John".

I'm not sure I'm following this entirely David. "John" is just another
attribute, and it can be used to refer to someone, but we know it
wouldn't identify them uniquely. The reason we can get away with using
it in day to day conversation is because humans are incredibly good at
resolving context (whereas a computer is not). We'd only use someone's
first name alone when we know its use will not be ambiguous.

My only point is that in particular contexts we are comfortable with
thinking of humans as entities that we are able to identify.
Er, ok. I can't disagree with you there. (Although I think I'm missing
where you were going with that point though.)

Quote:
By contrast I think a relationship is characterised as only being
identified by the entities that it relates.

Well, coupla things. Relationships can have attributes too, and they
can be part of the identifying set.

Agreed, but that can be accommodated without upsetting the distinction
I'm alluding to.
Ok. gotcha.

Quote:
[schnnippp]
Now I see the same distinction being made in a propositional
encoding.

I'm not sure why - there are no entities in a propositional encoding,
just roles and values.

The RM formalism has attributes, domains, tuples and relations. Are
roles part of the formalism or outside?
Apologies my fault there. I use the term roles as per the ORM, but I
should have said attributes (even thought they are pretty analagous).
I prefer to use the term role because it is more specific, whereas
attribute is a subtly ambiguous word - sometimes its used to denote
attribute name, sometimes the attribute-value, and sometimes both. But
yes anyhow, I meant "there are no entities in a propositional
encoding, just attributes and values".

Quote:
I see entities come into the propositional encoding in the
instantiations of the intentional definitions of the predicates. This
is outside the mathematical formalism. However, of what practical use
is a relation without a well defined intensional definition?
Yup, I see what you're saying (if you meant intenSional definitions
that is!), but I'm not sure I agree. Intensional definitions only
refer to rules concerning valid values for predicate variables, not
valid entities. If I've missed a trick there, perhaps its worth an
example?

Quote:
[ker-shhhnip]
Summarizing the above I gathered that you are saying that:
RELATIONSHIP: married(Husband, Wife, Location)
ENTITY: married(MarriageId, Husband, Wife, Location)
So the only difference is that the entity has a marriageID? I am not
clear why you think the addition of this surrogate would change a
relationship into an entity!

In the first case the marriage is a relationship because it is only
identified indirectly by the entities that are involved. In the
second case the marriage has been directly identified.
Ok, sorry if I'm being dense, but are you saying that the second case
is an entity because it can be identified without reference to another
entity? And would a logical consequence be therefore that no
identifying attributes of an entity may be entities themselves?

Quote:
That seems
like a significant difference in the logical layer. It shows up in
the intensional definitions where a marriage takes on a "role" as an
entity.

In the second case there must be some underlying reason to introduce a
marriage identifier. That reason points at a significant difference
in requirements. Don't assume the marriage identifier is a surrogate
id! Instead assume this is a well conceived design, and it's a
natural identifier.
Well hey, I don't think a surrogate means a poorly conceived design.
But I take your point, the Marriage ID in this case is coming from
some external source, and hasn't been instigated by the designer of
this modeller. Regards, J.



Reply With Quote
  #42  
Old   
Jonathan Leffler
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-06-2007 , 05:32 PM






Jon Heggland wrote:
Quote:
Quoth Bob Badour:
Keys and references are logical issues and not physical issues. Physical
issues affect only performance.

Keys are, but key /primacy/ is not. If one ignores any performance
differences between a primary key and any other key (as of course one
should, although it may not be possible in current SQL systems), the
remaining difference is merely syntactical convenience---hardly a
logical issue.
Consider a database containing a relation containing information about
'the elements' - as in hydrogen, helium, etc.

The elements table has 3 keys (candidate keys):
Element name
Atomic number
Element symbol

Now, consider what else is stored in the database. For the analysis of
isotopes, the atomic number is the important key - the different
isotopes of hydrogen all share the same atomic number, but have
different names (deuterium and tritium) even though chemically they are
all hydrogen.

For the analysis of chemical compounds, it is much more familiar to use
the element symbol - more people have come across H2O and CO2 than are
familiar with 1/2, 8/1 and 6/1, 8/2 (where I'm using atomic number /
multiplicity in the second notation). I'm glossing over some notational
inconveniences (consider the relational representation of your old
friend C2H5OH, for example), but the point remains - for some purposes,
the better key to use is atomic number and for other purposes, the
better key to use is element symbol.

Which key to use is a logical issue here, isn't it?

--
Jonathan Leffler #include <disclaimer.h>
Email: jleffler (AT) earthlink (DOT) net, jleffler (AT) us (DOT) ibm.com
Guardian of DBD::Informix v2007.0914 -- http://dbi.perl.org/

publictimestamp.org/ptb/PTB-1963 sha224 2007-12-06 21:00:03
0AC762E1452FAE2896292EA605A8D66B9FEE09F8E55C0B8707 3F31DA


Reply With Quote
  #43  
Old   
David BL
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-06-2007 , 06:41 PM



On Dec 7, 5:36 am, JOG <j... (AT) cs (DOT) nott.ac.uk> wrote:
Quote:
On Dec 6, 7:49 pm, David BL <davi... (AT) iinet (DOT) net.au> wrote:





[snip general agreement]
Nevertheless we
believe that the identifier can be used successfully in the real
world. This could depend on the population statistics, such as the
way we identify people around us by their facial characteristics, and
it is helpful that people don't tend to have extensive plastic
surgery, exchange body parts on a regular basis or cross dress. This
allows them to be identified by names like "John".

I'm not sure I'm following this entirely David. "John" is just another
attribute, and it can be used to refer to someone, but we know it
wouldn't identify them uniquely. The reason we can get away with using
it in day to day conversation is because humans are incredibly good at
resolving context (whereas a computer is not). We'd only use someone's
first name alone when we know its use will not be ambiguous.

My only point is that in particular contexts we are comfortable with
thinking of humans as entities that we are able to identify.

Er, ok. I can't disagree with you there. (Although I think I'm missing
where you were going with that point though.)
So am I

Quote:
By contrast I think a relationship is characterised as only being
identified by the entities that it relates.

Well, coupla things. Relationships can have attributes too, and they
can be part of the identifying set.

Agreed, but that can be accommodated without upsetting the distinction
I'm alluding to.

Ok. gotcha.

[schnnippp]
Now I see the same distinction being made in a propositional
encoding.

I'm not sure why - there are no entities in a propositional encoding,
just roles and values.

The RM formalism has attributes, domains, tuples and relations. Are
roles part of the formalism or outside?

Apologies my fault there. I use the term roles as per the ORM, but I
should have said attributes (even thought they are pretty analagous).
I prefer to use the term role because it is more specific, whereas
attribute is a subtly ambiguous word - sometimes its used to denote
attribute name, sometimes the attribute-value, and sometimes both. But
yes anyhow, I meant "there are no entities in a propositional
encoding, just attributes and values".
In the formalism I think of "attribute" as a pair of name and domain,
and certainly not a value.

Quote:
I see entities come into the propositional encoding in the
instantiations of the intentional definitions of the predicates. This
is outside the mathematical formalism. However, of what practical use
is a relation without a well defined intensional definition?

Yup, I see what you're saying (if you meant intenSional definitions
that is!)
Oops.

Quote:
, but I'm not sure I agree. Intensional definitions only
refer to rules concerning valid values for predicate variables, not
valid entities. If I've missed a trick there, perhaps its worth an
example?
An intensional definition should uniquely define a corresponding
extension. For example

predicate:
album(N)

intension:
String N is the name of a studio album
released before 2003, of the band Garbage

extension:
{
{N=Garbage},
{N=Version 2.0},
{N=beautifulgarbage}
}

An instantiation of the intensional definition is

String "Version 2.0" is the name of a studio
album released before 2003, of the band Garbage

This natural language sentence refers to a studio album entity in the
real world. The name value is distinct from the entity. The
existence of the entity is implied by the intensional definition.


Quote:
[ker-shhhnip]
Summarizing the above I gathered that you are saying that:
RELATIONSHIP: married(Husband, Wife, Location)
ENTITY: married(MarriageId, Husband, Wife, Location)
So the only difference is that the entity has a marriageID? I am not
clear why you think the addition of this surrogate would change a
relationship into an entity!

In the first case the marriage is a relationship because it is only
identified indirectly by the entities that are involved. In the
second case the marriage has been directly identified.

Ok, sorry if I'm being dense, but are you saying that the second case
is an entity because it can be identified without reference to another
entity? And would a logical consequence be therefore that no
identifying attributes of an entity may be entities themselves?
Yes.

X is an entity in the context of the model if there exists set A of
attributes + values that identify X and there doesn't exist a subset
of A that identifies a different entity Y.

Actually this attempted definition isn't quite right. For example you
could determine that a team is an entity except for the fact that you
end up identifying the team captain as well. Perhaps that problem
can be fixed by talking about maximal entity types corresponding to
cartesian products of domains and noting that team identifiers aren't
suitable for identifying players more generally. Maybe Reinier can
help.

Quote:
That seems
like a significant difference in the logical layer. It shows up in
the intensional definitions where a marriage takes on a "role" as an
entity.

In the second case there must be some underlying reason to introduce a
marriage identifier. That reason points at a significant difference
in requirements. Don't assume the marriage identifier is a surrogate
id! Instead assume this is a well conceived design, and it's a
natural identifier.

Well hey, I don't think a surrogate means a poorly conceived design.
But I take your point, the Marriage ID in this case is coming from
some external source, and hasn't been instigated by the designer of
this modeller.


Reply With Quote
  #44  
Old   
Jon Heggland
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-07-2007 , 02:21 AM



Quoth rpost:
Quote:
Jon Heggland wrote:
Well, that depends on what analysis is. It seems this guy thinks it's
the same as data modeling, which in turn is the same as developing a
graphical representation of the client's needs and processes. Is it?

There we go again ... if a language is graphical it doesn't have to be
imprecise or informal. Textual languages tend to be more expressive
(in the sense of needing fewer square inches to express things) but
they are not automatically better in any other respect.
Straw man. I don't believe I've said anything in general about the
qualities of graphical languages versus textual. I merely observed that
this guy's assumption that analysis = data modeling = drawing something
is dubious.
--
Jon


Reply With Quote
  #45  
Old   
Jon Heggland
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-07-2007 , 02:36 AM



Quoth Brian Selzer:
Quote:
"Jon Heggland" <jon.heggland (AT) ntnu (DOT) no> wrote in message
news:fj8fff$3ed$1 (AT) orkan (DOT) itea.ntnu.no...
Or perhaps it's simpler: Analysis is what you're doing when you're
talking with the subject matter experts; design is what you're doing
when you're not.

It's even simpler yet: if you're trying to understand a problem, then you're
doing analysis; if you're trying to solve a problem you already understand,
then you're doing design.
Too simple. It assumes that understanding is binary: either you
understand the problem, or you don't. Furthermore, that you know whether
or not you understand it.

Quote:
If the model represents a requirement, then I would consider the activity
that produced it to be analysis; if the model represents a possible
implementation, then I would consider the activity that produced it to be
design.
And if it represents both? How can you sharply delineate one from the other?

Quote:
In most E/R notations, you cannot represent the alternate
key---reservation number---if a reservation is a relationship. Vice
versa, if it is an entity, you cannot represent the { CustomerID, CarID,
Date } key. This means that you have to have an underlying model, of
which any graphical E/R diagrams are merely simplified views. I agree to
this, but it raises two points:

1. The underlying model cannot have a strict distinction between
entities and relationships, since the same concept---reservation---can
be thought of and presented as both. This relegates entity/relationship
thinking to a question of presentation, not analysis.

What does presentation have to do with the classification of collections of
individuals into entities and relationships? Discovering and understanding
the individuals that are interesting and how those individuals relate and
interact is analysis. Presentation is about communicating that information.
I can only repeat what I've said: As far as I can tell, the decision of
whether or not an "individual" is an entity or a relationship is quite
arbitrary---it may have aspects of both. To communicate all these
aspects, it may be necessary/useful to present it sometimes as an
entity, and sometimes as a relationship. If instead you insist on
classifying your individual as /either/ and entity /or/ a relationship,
you lose information.
--
Jon


Reply With Quote
  #46  
Old   
Jon Heggland
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-07-2007 , 02:42 AM



Quoth rpost:
Quote:
David Cressey wrote:
Bingo! That's the big problem with the literature on ER. Many ER
proponents use ER as if it were a design artifact. I think that 's a
misapplication of the artifact, and I'm pretty sure Peter Chen would agree.
If one is designing a relational system (including but not limited to a
relational database) then using the relational model to capture the design
is a much better idea.

Well, that reflects what "we" teach: make a model in ER then convert it
into a logical relational design. I thought it was how ER is *always* used.
But this conversion is fairly mechanical. Is "design" in this case just
the little bit of human input that enters this process?
--
Jon


Reply With Quote
  #47  
Old   
Jon Heggland
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-07-2007 , 03:08 AM



Quoth Jonathan Leffler:
Quote:
Now, consider what else is stored in the database. For the analysis of
isotopes, the atomic number is the important key - the different
isotopes of hydrogen all share the same atomic number, but have
different names (deuterium and tritium) even though chemically they are
all hydrogen.

For the analysis of chemical compounds, it is much more familiar to use
the element symbol - more people have come across H2O and CO2 than are
familiar with 1/2, 8/1 and 6/1, 8/2 (where I'm using atomic number /
multiplicity in the second notation). I'm glossing over some notational
inconveniences (consider the relational representation of your old
friend C2H5OH, for example), but the point remains - for some purposes,
the better key to use is atomic number and for other purposes, the
better key to use is element symbol.

Which key to use is a logical issue here, isn't it?
Which key to use for what is definitely a logical issue, but designating
one as primary does not mandate how it is used. I'm not sure if you are
agreeing or disagreeing with me..?
--
Jon


Reply With Quote
  #48  
Old   
JOG
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-07-2007 , 06:11 AM



On Dec 7, 12:41 am, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
On Dec 7, 5:36 am, JOG <j... (AT) cs (DOT) nott.ac.uk> wrote:
On Dec 6, 7:49 pm, David BL <davi... (AT) iinet (DOT) net.au> wrote:
[snip]
, but I'm not sure I agree. Intensional definitions only
refer to rules concerning valid values for predicate variables, not
valid entities. If I've missed a trick there, perhaps its worth an
example?

An intensional definition should uniquely define a corresponding
extension. For example

predicate:
album(N)

intension:
String N is the name of a studio album
released before 2003, of the band Garbage

extension:
{
{N=Garbage},
{N=Version 2.0},
{N=beautifulgarbage}
}

An instantiation of the intensional definition is

String "Version 2.0" is the name of a studio
album released before 2003, of the band Garbage

This natural language sentence refers to a studio album entity in the
real world. The name value is distinct from the entity. The
existence of the entity is implied by the intensional definition.
Yup, we did have crossed wires - I thought you were referring to
integrity constraints (which of course are very much part of a
relation's intension too).

Quote:


[ker-shhhnip]
Summarizing the above I gathered that you are saying that:
RELATIONSHIP: married(Husband, Wife, Location)
ENTITY: married(MarriageId, Husband, Wife, Location)
So the only difference is that the entity has a marriageID? I am not
clear why you think the addition of this surrogate would change a
relationship into an entity!

In the first case the marriage is a relationship because it is only
identified indirectly by the entities that are involved. In the
second case the marriage has been directly identified.

Ok, sorry if I'm being dense, but are you saying that the second case
is an entity because it can be identified without reference to another
entity? And would a logical consequence be therefore that no
identifying attributes of an entity may be entities themselves?

Yes.
O.k., so the next natural question is back to our team entity...

Team(goalkeeper:Jim, defenderavid, Midfielder:Jon, Attacker:Bob)

But by the definition you are positing, there is no team entity here
at all right? Because the identifying attributes are entities
themselves (assuming people are entities of course)? Hmmm....

Quote:
X is an entity in the context of the model if there exists set A of
attributes + values that identify X and there doesn't exist a subset
of A that identifies a different entity Y.

Actually this attempted definition isn't quite right. For example you
could determine that a team is an entity except for the fact that you
end up identifying the team captain as well. Perhaps that problem
can be fixed by talking about maximal entity types corresponding to
cartesian products of domains and noting that team identifiers aren't
suitable for identifying players more generally.
Maximal entity types... er.... sounds like things are getting more
convoluted. Which brings me to the real question. Whats the point of
all this? Why not give up on trying to make what I still feel is an
artificial split between entities and relationships? What is gained by
that split that makes it worth the effort?

As ERM has evolved relationships are looking, more and more like
entities anyhow. Now they are shapes not lines, now they can have
attributes too... One more step, make them boxes instead of diamonds,
and the job is done. What would we have lost then? Might I invoke
occam's razor and say that a system with one type is preferable to
two, unless making a split has any real benefits?

Quote:
That seems
like a significant difference in the logical layer. It shows up in
the intensional definitions where a marriage takes on a "role" as an
entity.

In the second case there must be some underlying reason to introduce a
marriage identifier. That reason points at a significant difference
in requirements. Don't assume the marriage identifier is a surrogate
id! Instead assume this is a well conceived design, and it's a
natural identifier.

Well hey, I don't think a surrogate means a poorly conceived design.
But I take your point, the Marriage ID in this case is coming from
some external source, and hasn't been instigated by the designer of
this modeller.
Jon seems to have covered a lot of comments I might make, so I won't
replicate his post. Instead I shall drink tea.


Reply With Quote
  #49  
Old   
mAsterdam
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-07-2007 , 06:55 AM



Sorry for butting in this late, and not even completely on topic.
'Facts' triggered my interest.

Jon Heggland schreef:

Quote:
...(The idea of viewing a database as a
collection of facts was a revelation for me in that regard.) In fact, I
have the opposite problem; I am unable to look at an E/R diagram without
thinking about relations.

Consider this proposition: "Jon was born in 1974", encoded in a relvar
of the form Born(Person, Year). I think we'll agree that represents a
fact about Jon. You would probably assume that Jon is an entity (though
I'm unsure about what you'd call the relvar/predicate in itself---is it
an entity (type)?). But I would also say that the proposition is as much
a fact about the year 1974! Is 1974 an entity? I really don't care.
Facts are all.
Consider the statement "Jon is 33 years old". It conveys the same real
world fact in a clumsier way than "Jon was born in 1974". Next year it
won't even convey the same fact anymore. "Jon was born in 1974"
catches the invariant better than "Jon is 33 years old".

Consider "John is in Canada". When? The fact isn't
complete without that piece of information.

"<Person> is at <Location>" needs a time to become a possibly
interesting facttype:
"At <Time>, <Person> is/was/will be at <Location>"

The idea of viewing a database as a collection of
possibly interesting facts was a revelation for me :-)

[snip]

Quote:
A propositional encoding does specify how this marriage is identified,
yes. What I dispute is the distinction between implicit and explicit
identification; between entities and relationship. A tuple/fact is
identified by some combination of its attributes, that is all.
minus "that is all": agreed. The choice of the combination of
attributes takes care, some of which is specific to the facttype
(was_born vs. has_age), and some of which is more general (At time:
....).



Reply With Quote
  #50  
Old   
David Cressey
 
Posts: n/a

Default Re: Another view on analysis and ER - 12-07-2007 , 07:27 AM




"mAsterdam" <mAsterdam (AT) vrijdag (DOT) org> wrote

Quote:
Sorry for butting in this late, and not even completely on topic.
'Facts' triggered my interest.

Jon Heggland schreef:

...(The idea of viewing a database as a
collection of facts was a revelation for me in that regard.) In fact, I
have the opposite problem; I am unable to look at an E/R diagram without
thinking about relations.

Consider this proposition: "Jon was born in 1974", encoded in a relvar
of the form Born(Person, Year). I think we'll agree that represents a
fact about Jon. You would probably assume that Jon is an entity (though
I'm unsure about what you'd call the relvar/predicate in itself---is it
an entity (type)?). But I would also say that the proposition is as much
a fact about the year 1974! Is 1974 an entity? I really don't care.
Facts are all.

Consider the statement "Jon is 33 years old". It conveys the same real
world fact in a clumsier way than "Jon was born in 1974". Next year it
won't even convey the same fact anymore. "Jon was born in 1974"
catches the invariant better than "Jon is 33 years old".

Consider "John is in Canada". When? The fact isn't
complete without that piece of information.

Time for a Clinton moment. The above discussion depends on what the meaning
of the word "is" is.

In Spanish, "John is 33 years old" will be expressed roughly like this:
"John has 33 years."
The verb "to be" is not used.

"John is a man" will be expressed using one of the Spanish verbs "to be".

"John is in Canada" will be expressed using the other of the Spanish verbs
"to be".

This distinction is wasted on a person who thinks about the facts in
English. But it isn't wasted, at all, on a person who thinks about the
facts in Spanish. There are even statements in Spanish that differ only by
which verb is used.

To a Spanish speaker, the following are two different facts:
"Juan es loco."
"Juan está loco."

Does this mean that the content of the database is different, depending on
the first language of the observer?

I apologize for using Spanish rather than a more common language. Spanish
is the only language, other than English, that I know well enough to use to
illustrate the point.

I recall that Bob Badour attributed to Dijkstra the motto that one should
always do computer science in a second language.






Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.