dbTalk Databases Forums  

Entity and Identity

comp.databases.theory comp.databases.theory


Discuss Entity and Identity in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Walter Mitty
 
Posts: n/a

Default Entity and Identity - 07-20-2009 , 10:51 AM






I've known for quite some time that better minds than mine have gone astray
in the attempt to overcome the object relational mismatch. Just last week,
I ran across an article that outlines the O-R mapping problem better than I
ever could.

The article is called "The Vietnam of Computer Science", and here's a
pointer:

http://blogs.tedneward.com/2006/06/2...r+Science.aspx

The article devotes rather too much time to exploring the analogy between
the Vietnam war and ORM attempts. And as the article admits, all analogies
eventually fail. Leaving that aside, I think the survey of problems
encountered in crossing the divide is excellent.

I want to draw particular attention to a heading entitled "Entity Identity
Issues". Reading this section gave me a better understanding of the
disconnect between me and Brian Seltzer over matters concerning entity and
identity. My own view of identity is colored by my own experience. And
that experience includes some practical work with relational databases,
preceded by a little formal learning in that area and 20 years of work as a
programmer. Unfortunately, none of that work included object oriented
programming.

Anyway, my view of identity (or of identification, if you prefer) is that an
object's state is all we have to go on as the basis for identification. In
particular, an object's location (as specified by a pointer) or its
trajectory (a history of pointers over time) are unavailable for purposes of
identification. This view of identity fits pretty comfortably into the
relional model, but it runs afoul of object oriented thinking at least two
important ways. Frst, if an object can conceal part of its state
(encapsulation), then it necessarily can conceal some of what needs to be
known to identify it. Second, if two objects are identical in state, then
they are the same object, even if they differ in location (at the same point
in time). I'll call this the "Doppelganger effect".

When I see an SQL table with two different rows in one table that cannot be
distinguished by their contents, my reaction is that the database designer
made a mistake. Failing in that, the database updaters should have been
more careful. Cases where duplication is intentional and carries
significant information strike me as a misuse of SQL, and a misunderstanding
of the relational model.

The above doesn't pretend to explain Brian's view. But I think it sheds a
little light on why I see things the way I do.

Again, I recommend the article cited above.

Reply With Quote
  #2  
Old   
Brian
 
Posts: n/a

Default Re: Entity and Identity - 07-20-2009 , 12:47 PM






On Jul 20, 11:51*am, "Walter Mitty" <wami... (AT) verizon (DOT) net> wrote:
Quote:
I've known for quite some time that better minds than mine have gone astray
in the attempt to overcome the object relational mismatch. *Just last week,
I ran across an article that outlines the O-R mapping problem better thanI
ever could.

The article is called "The Vietnam of Computer Science", and here's a
pointer:

http://blogs.tedneward.com/2006/06/2...mputer+Science...

The article devotes rather too much time to exploring the analogy between
the Vietnam war and ORM attempts. *And as the article admits, all analogies
eventually fail. *Leaving that aside, I think the survey of problems
encountered in crossing the divide is excellent.

I want to draw particular *attention to a heading entitled "Entity Identity
Issues". *Reading this section gave me a better understanding of the
disconnect between me and Brian Seltzer over matters concerning entity and
identity. *My own view of identity is colored by my own experience. *And
that experience includes some practical work with relational databases,
preceded by a little formal learning in that area and 20 years of work asa
programmer. *Unfortunately, none of that work included object oriented
programming.

Anyway, my view of identity (or of identification, if you prefer) is thatan
object's state is all we have to go on as the basis for identification. *In
particular, an object's location (as specified by a pointer) or its
trajectory (a history of pointers over time) are unavailable for purposesof
identification. *This view of identity fits pretty comfortably into the
relional model, but it runs afoul of object oriented thinking at least two
important ways. *Frst, if an object can conceal part of its state
(encapsulation), *then it necessarily can conceal some of what needs tobe
known to identify it. *Second, if two objects *are identical in state, then
they are the same object, even if they differ in location (at the same point
in time). *I'll call this the "Doppelganger effect".
First, it doesn't matter if objects can conceal part of their states
provided that the references to those objects can be used to
distinguish between them, and second, if two objects are identical in
state, then they cannot differ in location, for that would constitute
a difference in state. In the OO world, those references are OIDs,
whose lifetimes coincide with the lifetimes of their respective
objects. Outside the OO world, an identification describes an object,
but just at one time: there can be many descriptions of the same
object at different times, and what describes one object at one time
may describe another object at another time. For objects to be
identical, every description must coincide--that is, at each time in
their lifetimes, what describes one object must also describe the
other.

Quote:
When I see an SQL table with two different rows in one table that cannot be
distinguished by their contents, my reaction is that the database designer
made a mistake. *Failing in that, the database updaters should have been
more careful. *Cases where duplication is intentional and carries
significant information strike me as a misuse of SQL, and a misunderstanding
of the relational model.

The above doesn't pretend to explain Brian's view. * But I think it sheds a
little light on why I see things the way I do.

Again, I recommend the article cited above.

Reply With Quote
  #3  
Old   
Cimode
 
Posts: n/a

Default Re: Entity and Identity - 07-20-2009 , 05:46 PM



Snipped...

Quote:
Again, I recommend the article cited above.
It's always interesting to watch people rediscover some established
truthes using a different pathes...In this case the nebulous path.

The article concludes...
<<
Abandonment. Developers simply give up on objects entirely, and return
to a programming model that doesn't create the object/relational
impedance mismatch. While distasteful, in certain scenarios an object-
oriented approach creates more overhead than it saves, and the ROI
simply isn't there to justify the cost of creating a rich domain
model. ([Fowler] talks about this to some depth.) This eliminates the
problem quite neatly, because if there are no objects, there is no
impedance mismatch.
Quote:
....which means that one does not always have to try to solve a problem
that would not exist in te first place if objects were no used...Same
reasonning applies to NULL values.

The article also lists as a potential conclusion a
*relationalization* of object oriented...(Reinventing the square wheel
if you will)

For his credit, the writer *did* have the intellectual honnesty to
recognize the fundamental limitation of object mindset. It is just a
shame that he is repeating in an vague way what relational theorists
have been trying to warn the community for decades...

I am seriously beginning to truly believe the myth that OO truly
corrupts cognitive abilities.

Reply With Quote
  #4  
Old   
paul c
 
Posts: n/a

Default Re: Entity and Identity - 07-20-2009 , 10:04 PM



Brian wrote:
Quote:
On Jul 20, 11:51 am, "Walter Mitty" <wami... (AT) verizon (DOT) net> wrote:
I've known for quite some time that better minds than mine have gone astray
in the attempt to overcome the object relational mismatch. ...
Just because they stray doesn't make them better. Using the word
'mismatch' to compare defined relations and undefined objects is a
misuse of language and basic nonsense. .

Reply With Quote
  #5  
Old   
David BL
 
Posts: n/a

Default Re: Entity and Identity - 07-21-2009 , 04:04 AM



On Jul 21, 1:47 am, Brian <br... (AT) selzer-software (DOT) com> wrote:

Quote:
First, it doesn't matter if objects can conceal part of their states
provided that the references to those objects can be used to
distinguish between them, and second, if two objects are identical in
state, then they cannot differ in location, for that would constitute
a difference in state.
When you say 'object' do you mean in the OO sense? Usually the OO
community use 'object' to mean an identifiable state machine located
at some address and don't regard the location to be part of its state.
Furthermore usually the identity of an object is determined *only* by
its location and has nothing at all to do with its current state.

Reply With Quote
  #6  
Old   
David BL
 
Posts: n/a

Default Re: Entity and Identity - 07-21-2009 , 05:07 AM



On Jul 20, 11:51 pm, "Walter Mitty" <wami... (AT) verizon (DOT) net> wrote:

Quote:
Anyway, my view of identity (or of identification, if you prefer) is that an
object's state is all we have to go on as the basis for identification. In
particular, an object's location (as specified by a pointer) or its
trajectory (a history of pointers over time) are unavailable for purposes of
identification. This view of identity fits pretty comfortably into the
relional model, but it runs afoul of object oriented thinking at least two
important ways. Frst, if an object can conceal part of its state
(encapsulation), then it necessarily can conceal some of what needs to be
known to identify it. Second, if two objects are identical in state, then
they are the same object, even if they differ in location (at the same point
in time). I'll call this the "Doppelganger effect".
I find that confusing because it's not clear when you're talking about
your view of identity versus the OO one.

In OO, object identity is usually regarded as determined by object
location and is independent of object state. That of course is very
different to your first sentence above where you say you prefer to use
the object's state as the basis for identification.

However in the context of composing complex state machines out of
simpler ones it is entirely appropriate to identify state machines
independently of their current state. More to the point it wouldn't
make sense to do otherwise. For example two stack objects (i.e.
simple state machines that support push and pop operations) used for
entirely different purposes within a containing state machine may
occasionally have the same state. It wouldn't make sense to say there
is only one stack object whenever that happens. In fact the
containing state machine will normally specify exactly which stack
object a given operation is to be performed on. That wouldn't be
feasible if object identity was determined by state not location.

I find it hard to see how one could define "object" such that object
identity is determined by state not location. You appear to be
thinking about eternal, abstract mathematical values, but it doesn't
make much sense to say that values have state (because that suggests a
value can change) or location (as though a value exists in time and
space).

Reply With Quote
  #7  
Old   
Walter Mitty
 
Posts: n/a

Default Re: Entity and Identity - 07-21-2009 , 07:09 AM



"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Jul 20, 11:51 pm, "Walter Mitty" <wami... (AT) verizon (DOT) net> wrote:

Anyway, my view of identity (or of identification, if you prefer) is that
an
object's state is all we have to go on as the basis for identification.
In
particular, an object's location (as specified by a pointer) or its
trajectory (a history of pointers over time) are unavailable for purposes
of
identification. This view of identity fits pretty comfortably into the
relional model, but it runs afoul of object oriented thinking at least
two
important ways. Frst, if an object can conceal part of its state
(encapsulation), then it necessarily can conceal some of what needs to
be
known to identify it. Second, if two objects are identical in state,
then
they are the same object, even if they differ in location (at the same
point
in time). I'll call this the "Doppelganger effect".

I find that confusing because it's not clear when you're talking about
your view of identity versus the OO one.

I was talking about my view of identity, and why it doesn't fit well with
the OO view of identity. My understanding of OO is a little hazy.

Quote:
In OO, object identity is usually regarded as determined by object
location and is independent of object state. That of course is very
different to your first sentence above where you say you prefer to use
the object's state as the basis for identification.
Thanks for clearing that up. I should clear up that I'm thinking about
storing information about "entities" in an SQL database, and not about
storing that information inside objects in an object world. Some database
designers attempt to copy the OO paradigm and try to assign each object
(what I refer to as an "entity") an OID. The OID is usually the first
column in its table, and is generally the primary key. Foreign key
references to an OID are generally used just as if they were pointers in a
world that's based on pointers.

What I really liked about the article I cited in my OP is that the author
doesn't dismiss either OO thinking or relational thinking as nonsense
perpetrated by inderior minds. The article's understanding of how OO works
is better than mine.

I had understood that an object's identity was independent of its state, but
had failed to appreciate that an object's identity was determined by its
location and nothing else. Your response helps in that regard.

That could make defragmenting an object space into a royal pain. You have
to locate and update all the pointers, or else you invalidate the pointers
you don't update. You can build a garbage collector that doesn't
defragment, but I don't like where that road leads very much.







Quote:
However in the context of composing complex state machines out of
simpler ones it is entirely appropriate to identify state machines
independently of their current state. More to the point it wouldn't
make sense to do otherwise. For example two stack objects (i.e.
simple state machines that support push and pop operations) used for
entirely different purposes within a containing state machine may
occasionally have the same state. It wouldn't make sense to say there
is only one stack object whenever that happens. In fact the
containing state machine will normally specify exactly which stack
object a given operation is to be performed on. That wouldn't be
feasible if object identity was determined by state not location.

I find it hard to see how one could define "object" such that object
identity is determined by state not location. You appear to be
thinking about eternal, abstract mathematical values, but it doesn't
make much sense to say that values have state (because that suggests a
value can change) or location (as though a value exists in time and
space).

Other people in other discussions have elaborated at great length about the
distinction between a value and a variable.

A location, in the sense that you and I are using that word, is a location
in memory. The contents of memory are variable. At the moment of
retrieval, the contents provide a value. I don';t think it matters whether
the memory space is in RAM or on disk. And the address provided by apointer
can go through one or more mapping operations before finally resolving down
to a physical address.

Reply With Quote
  #8  
Old   
Brian
 
Posts: n/a

Default Re: Entity and Identity - 07-21-2009 , 08:39 AM



On Jul 21, 5:04*am, David BL <davi... (AT) iinet (DOT) net.au> wrote:
Quote:
On Jul 21, 1:47 am, Brian <br... (AT) selzer-software (DOT) com> wrote:

First, it doesn't matter if objects can conceal part of their states
provided that the references to those objects can be used to
distinguish between them, and second, if two objects are identical in
state, then they cannot differ in location, for that would constitute
a difference in state.

When you say 'object' do you mean in the OO sense? Usually the OO
community use 'object' to mean an identifiable state machine located
at some address and don't regard the location to be part of its state.
Furthermore usually the identity of an object is determined *only* by
its location and has nothing at all to do with its current state.
I disagree with your use of the terms 'location' and 'identity.' In
the OO world, objects are instances of reference types. The location
of an object can change over its lifetime, but what is used to
reference each object, the object identifier, doesn't. It may be
splitting hairs, but there is a distinct difference between 'identity'
and 'the identity' in that 'identity' is a binary relation between
objects in the universe that denotes /is identical to/, but 'the
identity' of an object is that essential property (unary relation)
which distinguishes it from all other objects (its haecceity) and
which is embodied by an object identifier or by a proper name (in the
logical sense). The identity of an object is determined
(functionally) by its object identifier but can also be determined by
its current state in the same way that a relation schema can have more
than one key. An object representing a particular serialized part can
be identified by its object identifier as well as by the part's serial
number, or by its position on the assembly line relative to all other
similar parts on the line, which could change over time (for example,
the part in front of it may have been scrapped).

Reply With Quote
  #9  
Old   
David BL
 
Posts: n/a

Default Re: Entity and Identity - 07-21-2009 , 09:37 PM



On Jul 21, 8:09 pm, "Walter Mitty" <wami... (AT) verizon (DOT) net> wrote:
Quote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

In OO, object identity is usually regarded as determined by object
location and is independent of object state. That of course is very
different to your first sentence above where you say you prefer to use
the object's state as the basis for identification.

Thanks for clearing that up. I should clear up that I'm thinking about
storing information about "entities" in an SQL database, and not about
storing that information inside objects in an object world. Some database
designers attempt to copy the OO paradigm and try to assign each object
(what I refer to as an "entity") an OID. The OID is usually the first
column in its table, and is generally the primary key. Foreign key
references to an OID are generally used just as if they were pointers in a
world that's based on pointers.
In OO, objects are state machines, not "entities" about which we may
want to record information.

I have no idea why an OO programmer would want to pretend that
employees, companies, departments, teachers or courses are state
machines running on their computer! It's a ridiculous suggestion.

It's also silly for an OO programmer to think that an element of a set
of tuples recorded in a relvar represents a state machine.


Quote:
I had understood that an object's identity was independent of its state, but
had failed to appreciate that an object's identity was determined by its
location and nothing else. Your response helps in that regard.

That could make defragmenting an object space into a royal pain. You have
to locate and update all the pointers, or else you invalidate the pointers
you don't update. You can build a garbage collector that doesn't
defragment, but I don't like where that road leads very much.
One can for example distinguish between physical memory address and
virtual memory addresses. Also many OO systems allow for objects to
be physically moved by the garbage collector and indeed end up with
different virtual addresses. This is achieved by defining yet another
layer of "virtual" addresses for object references in order to
establish identity. A common approach is for object references to
involve an indirection which could be achieved using a pointer to a
pointer or an index in an array of pointers etc. One could regard the
object reference as a logical identifier for a physical location.
Distinguishing logical addresses from physical addresses is all
relative.

This distinction also exists for OIDs for persistent object stores.
In the literature OIDs are called "logical" if there is an indirection
to the physical location on disk.


Quote:
However in the context of composing complex state machines out of
simpler ones it is entirely appropriate to identify state machines
independently of their current state. More to the point it wouldn't
make sense to do otherwise. For example two stack objects (i.e.
simple state machines that support push and pop operations) used for
entirely different purposes within a containing state machine may
occasionally have the same state. It wouldn't make sense to say there
is only one stack object whenever that happens. In fact the
containing state machine will normally specify exactly which stack
object a given operation is to be performed on. That wouldn't be
feasible if object identity was determined by state not location.

I find it hard to see how one could define "object" such that object
identity is determined by state not location. You appear to be
thinking about eternal, abstract mathematical values, but it doesn't
make much sense to say that values have state (because that suggests a
value can change) or location (as though a value exists in time and
space).

Other people in other discussions have elaborated at great length about the
distinction between a value and a variable.

A location, in the sense that you and I are using that word, is a location
in memory.
Not really. I'm using "location" in a very abstract sense. I allow
any mechanism that allows one to "access" a state machine given some
"address".


Quote:
The contents of memory are variable. At the moment of
retrieval, the contents provide a value.
No! All state machines have state, but not all state machines are
variables that hold an abstract value.


Quote:
I don';t think it matters whether
the memory space is in RAM or on disk. And the address provided by apointer
can go through one or more mapping operations before finally resolving down
to a physical address.
Well said.

Reply With Quote
  #10  
Old   
Walter Mitty
 
Posts: n/a

Default Re: Entity and Identity - 07-21-2009 , 10:15 PM



"David BL" <davidbl (AT) iinet (DOT) net.au> wrote


Quote:
I have no idea why an OO programmer would want to pretend that
employees, companies, departments, teachers or courses are state
machines running on their computer! It's a ridiculous suggestion.
And yet it happens over and over again. We've seen dozens of cases of OO
programmers in here trying to do exactly that.
If you look around the web, the cases multiply into the hundreds. And not
all of those people are morons.

Again, the article on "the Vitenam of Computer Science" does a better job
than I can at motivating why an OO programmer
would want to model his subject matter in terms of objects.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.