dbTalk Databases Forums  

ID field as logical address

comp.databases.theory comp.databases.theory


Discuss ID field as logical address in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Walter Mitty
 
Posts: n/a

Default ID field as logical address - 05-29-2009 , 07:54 AM






In another thread, I suggested that prefixing every tuple with an
auto-generated field (column) called ID amounted to reference by location
rather than reference by content. Some people correctly point out to me
that there is a level of indirection gained by using ID fields instead of
pointers. That means the records (rows) "pointed to" by references to the
ID are not pinned, and can still be shuffled. That's good, but it's
trivial, IMO, when compared to the advantages claimed by Codd and refined by
Date for the Relational model over alternative models. As far as I can see,
a database where every foreign key refers to an auto-generated ID column is
just a graph database disguised to look like a relational database.

In this thread, I want to further explore the use of ID fields, and approach
the question of whether using ID fields is relevant to the discussion of
whether an UPDATE is somehow different from a DELETE followed by an INSERT.
I declined to engage Brian in this question when he posted a response in the
earlier thread. I'm willing to dialogue about that here.

Let's say that the design has followed the practice of including an
auto-generated key called ID in every table, and declaring the ID as the
PRIMARY KEY. Let's further assume a practice that says that you never
update the ID field of any tuple, once it's been assigned. Now let's say
you have a table that records values that look like (ID, value). In an
actual case, value would probably ityself be a tuple, so our tagged tuples
might look like, for example, (ID, LastName, FirstName, PhoneNumber).

Now, if we update the LastName, the FirstName, and the Phone number of an
entry, it's still the same entry, because the ID field retains the same
value.

Now what happens if instead of updating the three dependent fields, we
delete the row, and insert a new one. Here, I'm making an assumption about
how auto-generate works. I'm assuming that, once used, and ID will never be
auto-generated again, even if it's no longer in use. So, when we delete a
row and insert a new row, we get a brand new ID. All of the references to
the deleted row remain as orphans, unless they got deleted by a cascaded
delete.

So, Brian's claim that deleting a value and inserting a new one is somehow
really different from updating an existing value works in this model, if
you regard ID as somehow "special", and not "part of the value recorded in
the table".

Disclaimer: I use auto-generated keys only when there are no natural keys
that I trust. And I tend to do the autogeneration in the application, and
not in the table itself. I might use a sequence generator built into the
database to obtain an unused value for a synthetic key, but I don't cause
the DBMS to autogenerate new keys in the table.








Reply With Quote
  #2  
Old   
Roy Hann
 
Posts: n/a

Default Re: ID field as logical address - 05-29-2009 , 08:26 AM






Walter Mitty wrote:


Quote:
In this thread, I want to further explore the use of ID fields, and approach
the question of whether using ID fields is relevant to the discussion of
whether an UPDATE is somehow different from a DELETE followed by an INSERT.
So we're explicitly talking about SQL here, is that right?

--
Roy




Reply With Quote
  #3  
Old   
Walter Mitty
 
Posts: n/a

Default Re: ID field as logical address - 05-29-2009 , 08:46 AM




"Roy Hann" <specially (AT) processed (DOT) almost.meat> wrote

Quote:
Walter Mitty wrote:


In this thread, I want to further explore the use of ID fields, and
approach
the question of whether using ID fields is relevant to the discussion of
whether an UPDATE is somehow different from a DELETE followed by an
INSERT.

So we're explicitly talking about SQL here, is that right?
My bad. I'm using SQL terminology because it comes naturally to me. But I
don't intend to limit the discussion to SQL.

There are interface languages other than SQL where the same distinction
might be made. I'd be interested to know whether this distinction is
meaningful at all in the D language. If so, what terminology would be used
in that context?





Reply With Quote
  #4  
Old   
Walter Mitty
 
Posts: n/a

Default Re: ID field as logical address - 05-29-2009 , 11:13 AM




"Bernard Peek" <bap (AT) shrdlu (DOT) com> wrote

Quote:
In message <xWQTl.1481$9L2.793 (AT) nwrddc02 (DOT) gnilink.net>, Walter Mitty
wamitty (AT) verizon (DOT) net> writes

Let's say that the design has followed the practice of including an
auto-generated key called ID in every table, and declaring the ID as the
PRIMARY KEY. Let's further assume a practice that says that you never
update the ID field of any tuple, once it's been assigned. Now let's say
you have a table that records values that look like (ID, value). In an
actual case, value would probably ityself be a tuple, so our tagged
tuples
might look like, for example, (ID, LastName, FirstName, PhoneNumber).

Now, if we update the LastName, the FirstName, and the Phone number of an
entry, it's still the same entry, because the ID field retains the same
value.

This appears to be an update anomaly but it's not as simple as that. If we
think of the logical data structure the ID isn't in the same entity as the
data we want to manage. It's in a separate entity with only one attribute.
There is supposed to be a 1:1 relationship between the ID entity and the
data entity, but this is not usually maintained by the database. The
cardinality of that relationship is managed either programmatically or by
external business processes.

Bongo! I think you got the nail on the head.

The problem is that certain tools that I'll call "application generators"
blur the distinction between an ID field and the entity that it serves as an
indirect identifier for.

In particular, MA Access, a fairly popular application generator, has a a
datatype called "autonumber". If you make your primary key an autonumber
field, several things happen.
first, if you let a forms wizard build your data entry form for you, it
will omit the autonumbered field. Good idea, since you can't specify in
anyway. If you get in there with SQL and try to update the autonumber
field, Access will flag you with an error. I could go on.

So, even if the relational model doesn't forbid updates on any attribute,
the neophyte learning on MS Access is apt to learn that autonumber fields
can't be updated.

I don't know what the situation is for SQL Server. My Oracle knowledge is
out of date.




Reply With Quote
  #5  
Old   
Brian Selzer
 
Posts: n/a

Default Re: ID field as logical address - 05-29-2009 , 08:42 PM




"paul c" <toledobythesea (AT) oohay (DOT) ac> wrote

Quote:
Walter Mitty wrote:
...
So, even if the relational model doesn't forbid updates on any attribute,
the neophyte learning on MS Access is apt to learn that autonumber fields
can't be updated.
...

The neophyte learning MS Access is learning MS Access, not relational
theory. Probably the same can said of whoever designed MS Access. It's
true of other fields too that the average practitioner knows very little
theory but what is strange to me about this one is that compared to some
fields, there is very little relational theory to learn! The question of
how to implement 'updates', if asked with relational theory in mind,
should be put as how to replace one relation with another. There are lots
of ways to program that, how too pick one is a different question. In
fact, such comments aren't about relational theory, they would be about
'table theory' if there were such a thing.
Not according to Codd: I suggest you read his papers and his book.




Reply With Quote
  #6  
Old   
Walter Mitty
 
Posts: n/a

Default Re: ID field as logical address - 05-30-2009 , 06:37 AM




"Bernard Peek" <bap (AT) shrdlu (DOT) com> wrote

Quote:
In message <xWQTl.1481$9L2.793 (AT) nwrddc02 (DOT) gnilink.net>, Walter Mitty
wamitty (AT) verizon (DOT) net> writes

Let's say that the design has followed the practice of including an
auto-generated key called ID in every table, and declaring the ID as the
PRIMARY KEY. Let's further assume a practice that says that you never
update the ID field of any tuple, once it's been assigned. Now let's say
you have a table that records values that look like (ID, value). In an
actual case, value would probably ityself be a tuple, so our tagged
tuples
might look like, for example, (ID, LastName, FirstName, PhoneNumber).

Now, if we update the LastName, the FirstName, and the Phone number of an
entry, it's still the same entry, because the ID field retains the same
value.

This appears to be an update anomaly but it's not as simple as that. If we
think of the logical data structure the ID isn't in the same entity as the
data we want to manage. It's in a separate entity with only one attribute.
There is supposed to be a 1:1 relationship between the ID entity and the
data entity, but this is not usually maintained by the database. The
cardinality of that relationship is managed either programmatically or by
external business processes.

Updating the data without making a simultaneous update to the ID field is
perfectly permissible within the relational model. They are after all in
separate entities with only a tenuous connection. It's an update anomaly
and it's caused by failure to fully normalise the data structure.
What would the data structure look like if it were fully normalized?





Reply With Quote
  #7  
Old   
Walter Mitty
 
Posts: n/a

Default Re: ID field as logical address - 05-30-2009 , 07:19 AM




"paul c" <toledobythesea (AT) oohay (DOT) ac> wrote

Quote:
Just a couple of comments about a post that is unusually vague and fuzzy
coming from you, it's tedious to dismantle every sentence, so I mention
only a couple:
I apologize for how vague and fuzzy this topic is. If my wording is vague
and fuzzy, my thinking is even more vague and fuzzy.

The first thing is whether or not I'm talking about a tuple within a tuple.
This was a digression. As far as I'm concerned, the difference between the
following two is not fundamental:

(714, Joe, Friday, 234-5678)
(714, (Joe, Friday,234-5678))

At least not fundamental for the purposes of figuring out how "714" comes to
be associated with Joe Friday and not with some other policeman.

Next, the question arises whether the association is managed by the
database or whether its managed programatically or by some external
function, manual or automatic.
Some responder addressed precisely that question. I think it was Bernard.
In my experience discussing this topic with fans of the ID field concept,
I've found that, contrary to Bernard's opinion, they turn over management
of the association to the database. I smell a rat at this point, because I
think they are asking a database to perform a function that is not really a
database function. I think that assigning 714 to Joe Friday is inventing
data, and it's not the database system's job to invent data. It's the
database system's job to manage the data it's given. That's not to say that
some popular tool like MS Access might not do it anyway.

And it's not clear to me whether MS Access is really a DBMS. I think it's
an application management system.

I think the process by which 714 comes to be associated with Joe Friday is
actually somewhat mysterious. Googling "Badge 714" confirmed this opinion.
BTW, I an NOT using "mysterious" as a code word for "mystical". The one
thing I can say is that it's not a consequence of the data. It's either
part of the data as given, or it's part of some fairly arbitrary process.

Next, my statement of the issue is fogged by my own ignorance of
relational theory. My introduction to relational theory was only as a
backdrop to my introduction to database design. At the time, I was learning
how to design SQL databases, which all of us called "relational databases".
I don't recall if I heard Ed Codd's name, but I know I never read anything
he wrote until much later. This informal and spotty education
notwithstanding, I continue to view my own interpretation of relational
theory to be largely orthodox, when compared to some of the crazy things
that have been accepted as "best practices" in database design.

Anyway, how would relational theory decribe the operation that I called
"UPDATE" (using the SQL keyword intentionally.)? would it be something like
the following?

relvar R gets assigned relvar R minus relation(tuple T) plus relation(tuple
U).

How does this differ from a DELETE followed by an INSERT?

I don't think this post clears up any of the fuzziness. But maybe it lays
the groundwork for clearingf it up.












Reply With Quote
  #8  
Old   
Walter Mitty
 
Posts: n/a

Default Re: ID field as logical address - 05-30-2009 , 02:12 PM




"paul c" <toledobythesea (AT) oohay (DOT) ac> wrote

Quote:
Walter Mitty wrote:
...
I think the process by which 714 comes to be associated with Joe Friday
is actually somewhat mysterious. Googling "Badge 714" confirmed this
opinion. BTW, I an NOT using "mysterious" as a code word for "mystical".
The one thing I can say is that it's not a consequence of the data. It's
either part of the data as given, or it's part of some fairly arbitrary
process.
...

There will always be 'mysteries' that machines don't deal with. I'd say
requirements should always trump any notion of some data being innately
different from other data. Requirements are arbitrary, to say they aren't
is to say any system will suit any purpose. For me if it's a requirement
that a system generate badge numbers, that's pretty much the same thing as
a requirement that it generate invoice numbers. If it's a requirement
that a clerk generate the numbers instead, so be it. Whether the numbers
are calculated or sequential or reversible might be another requirement.


I agree, in general. But I still think it might be worthwhile to
distinguish between application requirements and database requirements.
I think that associating a new instance of any entity with an abitrary
"next" element of some sequence that's suitable for identification, is an
application requirement.
There's one aspect of autogeneration that ought to be persistent.
Remembering where the sequence generator was in its sequence so that it can
resume at some later time where it left off last time, is a legitimate
database function.

All of this raises yet another question. Should application development
swallow database development? (Whee! Now we can fire all the DBAs!) Or,
alternatively, should database development swallow up applications
development? (Whee! Now we can fire all the programmers!). Or should they
remain collaborators neither of which fully controls the other.







Reply With Quote
  #9  
Old   
Brian Selzer
 
Posts: n/a

Default Re: ID field as logical address - 05-30-2009 , 09:38 PM




"Walter Mitty" <wamitty (AT) verizon (DOT) net> wrote

Quote:
"paul c" <toledobythesea (AT) oohay (DOT) ac> wrote in message
news:vdUTl.29814$PH1.1299 (AT) edtnps82 (DOT) ..
Just a couple of comments about a post that is unusually vague and fuzzy
coming from you, it's tedious to dismantle every sentence, so I mention
only a couple:

I apologize for how vague and fuzzy this topic is. If my wording is vague
and fuzzy, my thinking is even more vague and fuzzy.

The first thing is whether or not I'm talking about a tuple within a
tuple. This was a digression. As far as I'm concerned, the difference
between the following two is not fundamental:

(714, Joe, Friday, 234-5678)
(714, (Joe, Friday,234-5678))

At least not fundamental for the purposes of figuring out how "714" comes
to be associated with Joe Friday and not with some other policeman.

Next, the question arises whether the association is managed by the
database or whether its managed programatically or by some external
function, manual or automatic.
Some responder addressed precisely that question. I think it was Bernard.
In my experience discussing this topic with fans of the ID field concept,
I've found that, contrary to Bernard's opinion, they turn over management
of the association to the database. I smell a rat at this point, because
I think they are asking a database to perform a function that is not
really a database function. I think that assigning 714 to Joe Friday is
inventing data, and it's not the database system's job to invent data.
It's the database system's job to manage the data it's given. That's not
to say that some popular tool like MS Access might not do it anyway.

And it's not clear to me whether MS Access is really a DBMS. I think it's
an application management system.

I think the process by which 714 comes to be associated with Joe Friday is
actually somewhat mysterious. Googling "Badge 714" confirmed this
opinion. BTW, I an NOT using "mysterious" as a code word for "mystical".
The one thing I can say is that it's not a consequence of the data. It's
either part of the data as given, or it's part of some fairly arbitrary
process.

Next, my statement of the issue is fogged by my own ignorance of
relational theory. My introduction to relational theory was only as a
backdrop to my introduction to database design. At the time, I was
learning how to design SQL databases, which all of us called "relational
databases". I don't recall if I heard Ed Codd's name, but I know I never
read anything he wrote until much later. This informal and spotty
education notwithstanding, I continue to view my own interpretation of
relational theory to be largely orthodox, when compared to some of the
crazy things that have been accepted as "best practices" in database
design.

Anyway, how would relational theory decribe the operation that I called
"UPDATE" (using the SQL keyword intentionally.)? would it be something
like the following?

relvar R gets assigned relvar R minus relation(tuple T) plus
relation(tuple U).

How does this differ from a DELETE followed by an INSERT?

I don't think this post clears up any of the fuzziness. But maybe it lays
the groundwork for clearingf it up.

Many on this newsgroup adhere with blind zeal to Date and Darwin's
interpretation of relational theory that has at its roots the notions that a
database is a collection of relation variables (relvars) and that insert,
update and delete are somehow "shortcuts" for relational assignments. They
refuse to recognize the limitations these notions impose, such as the
requirement that every instance of every key be a permanent identifier for
something in the Universe of Discourse. If instances of a key do not
permanently identify things in the Universe, then transition constraints
that join the relation values before and after an assignment to a relvar
would not be sound. For example, suppose that you have a relation, {L, F,
Stat} Key {L, F}that starts out with the following value:

{{L:Smith, F:Mary, Stat:Single},
{L:Jones, F:Mary, Stat:Married}}

and is then assigned the value,

{{L:Smith, F:Mary, Stativorced},
{L:Jones, F:Mary, Stat:Married}}

You might be wondering how someone who is Single can become Divorced?
Should the assignment be rejected on that basis? Or is it possible that
Mary Jones' maiden name is Smith, and that she became Divorced while at the
same time the Mary Smith who was Single Married Robert Jones? It might also
be that the Mary Smith who was Single moved out of the area while a
different Mary Smith, who is Divorced, moved in.

The D&D adherents would try to claim that this is a bad design, and as such
should be dismissed, but Codd, even as early as 1970, recognized that key
components of tuples would be the target of updates, which should be obvious
given that there can be relation schemes in which the entire heading is the
key. The ambiguity inherent in relational assignment that is illustrated by
the above example is exhibited by every relation in which instances of the
key do not permanently identify things in the Universe, so their claim
becomes just noise.

Note that the introduction of an autogenerated ID eliminates all confusion:

If you start out with,

{{ID:1, L:Smith, F:Mary, Stat:Single},
{ID:2, L:Jones, F:Mary, Stat:Married}}

and then assign

{{ID:1, L:Smith, F:Mary, Stativorced},
{ID:2, L:Jones, F:Mary, Stat:Married}}

then clearly the assignment should be rejected because Single people can't
get Divorced, but if you assign

{{ID:1, L:Jones, F:Mary, Stat:Married},
{ID:2, L:Smith, F:Mary, Stativorced}}

then the assignment should be allowed, because Single people can get Married
and Married people can get Divorced. Similarly, if you assign,

{{ID:2, L:Jones, F:Mary, Stat:Married},
{ID:3, L:Smith, F:Mary, Stativorced}}

Then it is clear that there is a different Mary Smith, and the assignment
should be allowed.

It is a consequence of adopting D&D's notions that demands the introduction
of surrogates, such as autogenerated IDs. If, on the other hand, insert,
update and delete are not shortcuts, but rather primitive operations, then
the introduction of the autogenerated IDs is not necessary. All that is
needed is to correlate the tuples in a relation that are the target of an
update with those in its successor. For example, the update

{{L:Smith, F:Mary, Stat:Single, L':Smith, F':Mary, Stat'ivorced}}

should clearly be rejected, but the update,

{{L:Smith, F:Mary, Stat:Single, L':Jones, F':Mary, Stat':Married},
{L:Jones, F:Mary, Stat:Married, L':Smith, F':Mary, Stat'ivorced}}

should be allowed. Similarly, the delete and insert,

{{L:Smith, F:Mary, Stat:Single},
{L':Smith, F':Mary, Stat'ivorced}}

should be allowed.





Reply With Quote
  #10  
Old   
Brian Selzer
 
Posts: n/a

Default Re: ID field as logical address - 05-31-2009 , 01:52 AM




"Walter Mitty" <wamitty (AT) verizon (DOT) net> wrote

Quote:
In another thread, I suggested that prefixing every tuple with an
auto-generated field (column) called ID amounted to reference by location
rather than reference by content. Some people correctly point out to me
that there is a level of indirection gained by using ID fields instead of
pointers. That means the records (rows) "pointed to" by references to
the ID are not pinned, and can still be shuffled. That's good, but it's
trivial, IMO, when compared to the advantages claimed by Codd and refined
by Date for the Relational model over alternative models. As far as I can
see, a database where every foreign key refers to an auto-generated ID
column is just a graph database disguised to look like a relational
database.

In this thread, I want to further explore the use of ID fields, and
approach the question of whether using ID fields is relevant to the
discussion of whether an UPDATE is somehow different from a DELETE
followed by an INSERT. I declined to engage Brian in this question when he
posted a response in the earlier thread. I'm willing to dialogue about
that here.

Let's say that the design has followed the practice of including an
auto-generated key called ID in every table, and declaring the ID as the
PRIMARY KEY. Let's further assume a practice that says that you never
update the ID field of any tuple, once it's been assigned. Now let's say
you have a table that records values that look like (ID, value). In an
actual case, value would probably ityself be a tuple, so our tagged
tuples might look like, for example, (ID, LastName, FirstName,
PhoneNumber).

Now, if we update the LastName, the FirstName, and the Phone number of an
entry, it's still the same entry, because the ID field retains the same
value.

Now what happens if instead of updating the three dependent fields, we
delete the row, and insert a new one. Here, I'm making an assumption
about how auto-generate works. I'm assuming that, once used, and ID will
never be auto-generated again, even if it's no longer in use. So, when
we delete a row and insert a new row, we get a brand new ID. All of the
references to the deleted row remain as orphans, unless they got deleted
by a cascaded delete.

So, Brian's claim that deleting a value and inserting a new one is somehow
really different from updating an existing value works in this model, if
you regard ID as somehow "special", and not "part of the value recorded
in the table".

Just to be clear, one can assert what in the world is different and how, but
one cannot "update" a value. Delete, update and insert are primitive
assertions that state, in the context of what has up to now been the case,
what has just ceased to exist, what has just changed in appearance and what
has just come into existence, respectively. A value is the output of the
valuation function which under an interpretation maps the terms of a formal
language to things in the Universe of Discourse; therefore, each value is
fixed at the instant of interpretation and therefore not subject to
modification.

It is not that ID is somehow "special;" it is that each ID symbol
permanently identifies (rigidly designates) something in the Universe of
Discourse. Note that I am not arguing for or against the introduction of
autogenerated ID fields, but when they /are/ introduced, they are not
pointers: they are names. They are names in the same way that a sales order
number names a particular sales order; they are names in the same way that a
particular VIN names a particular vehicle. Contrast that with instances of
compound keys (and even some unary keys), which are descriptions rather than
names.




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.