dbTalk Databases Forums  

Autoincrement

comp.databases comp.databases


Discuss Autoincrement in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Richard
 
Posts: n/a

Default Autoincrement - 06-12-2004 , 04:14 AM






How can I create an Interbase table with an autoincrement field in SQL? What
should I change in this query for example:



CREATE TABLE names (

ID INT,

name CHAR(30),

PRIMARY KEY(ID)

);



Reply With Quote
  #2  
Old   
--CELKO--
 
Posts: n/a

Default Re: Autoincrement - 06-12-2004 , 05:13 PM






Quote:
How can I create an Interbase table with an autoincrement field in
SQL?

=Why do you want to be a bad programmer? Auto-increment Oh, you mean
faking a sequential file's positional record number, so I can
reference the **physical** storage location? Sure, if I want to lose
all the advantages of an abstract data model, SQL set oriented
programming, carry extra data and destroy the portability of code!

More and more programmers who have absolutely no database training are
being told to design a database. They are using GUIDs, IDENTITY,
ROWID and other proprietary auto-numbering "features" in SQL products
to imitate either a record number (sequential file system mindset) or
OID (OO mindset) since they don't know anything else.

Experienced database designers tend toward intelligent keys they find
in industry standard codes, such as UPC, VIN, GTIN, ISBN, etc. They
know that they need to verify the data against the reality they are
modeling. A trusted external source is a good thing to have.

The early SQLs were based on existing file systems. The data was kept
in physically contiguous disk pages, in physically contiguous rows,
made up of physically contiguous columns. In short, just like a deck
of punch cards or a magnetic tape. Most programmer still carry that
mental model, which is why I keep doing that rant about file vs.
table, row vs. record and column vs. field.

But physically contiguous storage is only one way of building a
relational database and it is not the best one. The basic idea of a
relational database is that user is not supposed to know *how* or
*where* things are stored at all, much less write code that depends on
the particular physical representation in a particular release of a
particular product on particular hardware at a particular time.

One of the biggest errors is such a physical locator column (actually
property, not a column at all). People actually program with this
"feature" and even use it as the primary key for the table! Now,
let's go into painful details as to why this thing is bad.

The first practical consideration is that they are proprietary and
non-portable, so you know that you will have maintenance problems when
you change releases or port your system to other products. Newbies
actually think they will never port code! Perhaps they only work for
companies that are failing and will be gone. Perhaps their code is
such crap nobody else want their application.

Create a simple table with any kind of auto-increment and a few other
columns. Use a few statements like

INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');

To put a few rows into the table and notice that they are sequentially
numbered in the order they were presented. This is how we did record
numbers in pre-allocated sequential files in the 1950's, by the way. A
utility program would then "pack" or "compress" the records that were
flagged as deleted or unused to move the empty space to the physical
end of the physical file.

But now use a statement with a query expression in it, like this:

INSERT INTO Foobar (a, b, c)
SELECT x, y, z
FROM Floob;

Since a query result is a table, and a table is a set which has no
ordering, what should the sequential numbers be? The entire, whole,
completed set is presented to Foobar all at once, not a row at a time.
There are (n!) ways to number (n) rows, so which one do you pick?
The answer has been to use whatever the *physical* order of the result
set happened to be. That non-relational phrase "physical order"
again!

But it is actually worse than that. If the same query is executed
again, but with new statistics or after an index has been dropped or
added, the new execution plan could bring the result set back in a
different physical order.

Can you explain from a logical model why the same rows in the second
query get different numbers? In the relational model, they should be
treated the same if all the values of all the attributes are
identical.

Using these things as a primary key is a sign that there is no data
model, only an imitation of a sequential file system. Since this
"magic, all-purpose, one-size-fits-all" pseudo-identifier exists only
as a result of the physical state of a particular piece of hardware at
a particular time as read by the current release of a particular
database product, how do you verify that an entity has such a number
in the reality you are modeling?

You will see newbies who design tables like this:

CREATE Drivers
(driver_id IDENTITY (1,1) NOT NULL PRIMARY KEY,
ssn CHAR(9) NOT NULL REFERENCES Personnel(ssn),
vin CHAR(17) NOT NULL REFERENCES Motorpool(vin));

Now input data and submit the same row a thousand times, a million
times. Your data integrity is trashed. The natural key was this:

CREATE Drivers
(ssn CHAR(9) NOT NULL REFERENCES Personnel(ssn),
vin CHAR(17) NOT NULL REFERENCES Motorpool(vin),
PRIMARY KEY (ssn, vin));

To demonstrate, here is a typical idiot newbie schema -- you will them
all over the news groups. I call them "idiots" because they always
name the IDENTITY property column "id" in EVERY table. They don't
understand basic data modeling -- one and only name for an attribute.
About half the time they don't use any DRI, but let's show it.

CREATE TABLE MotorPool
(id IDENTITY (1,1) NOT NULL PRIMARY KEY,
personnel_id INTEGER NOT NULL REFERENCES Personnel(id),
vehicle_id INTEGER NOT NULL REFERENCES Vehicle(id));

CREATE TABLE Personnel
(id IDENTITY (1,1) NOT NULL PRIMARY KEY,
ssn CHAR(9) NOT NULL UNIQUE,
..);

CREATE TABLE Vehicles
(id IDENTITY (1,1) NOT NULL PRIMARY KEY,
vin CHAR(17) NOT NULL UNIQUE,
...);

Now change the natural key in Personnel:

UPDATE Personnel
SET ssn = '666666666'
WHERE ssn = '000000000';

Nothing happened in Motorpool, did it? You can do the same thing with
a VIN.

Now you are REALLY thinking about relations and keys instead of 1950's
sequential record numbering. Adding an auto-increment column to
either of these tables as a candidate key would be dangerously
redundant; one query uses the auto-increment and another uses the real
key, and like a man with two watches, you are never sure what time it
is.

Finally, an appeal to authority, with a quote from Dr. Codd:
"..Database users may cause the system to generate or delete a
surrogate, but they have no control over its value, nor is its value
ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and
Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp.
397-434.

This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means
never used in queries, DRI or anything else that a user does.

Codd also wrote the following:

"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two companies
merge, the two employee databases might be combined with the result
that some or all of the serial numbers might be changed.).

(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it
has ceased to have one (e.g. and applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] -
is to introduce entity domains which contain system-assigned
surrogates. Database users may cause the system to generate or delete
a surrogate, but they have no control over its value, nor is its value
ever displayed to them....." (Codd in ACM TODS, pp 409-410).

References

Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp.
397-434


Reply With Quote
  #3  
Old   
Paul
 
Posts: n/a

Default Re: Autoincrement - 06-13-2004 , 04:54 AM




nospam (AT) nospam (DOT) net says...

Quote:
How can I create an Interbase table with an autoincrement field in SQL? What
should I change in this query for example:

Quote:
CREATE TABLE names (
ID INT,
name CHAR(30),
PRIMARY KEY(ID)
);

Interbase uses a "Generator" to do this. It is of type Int.


Paul...


--
plinehan y_a_h_o_o and d_o_t com
C++ Builder 5 SP1, Interbase 6.0.1.6 IBX 5.04 W2K Pro
Please do not top-post.

"XML avoids the fundamental question of what we should do,
by focusing entirely on how we should do it."

quote from http://www.metatorial.com


Reply With Quote
  #4  
Old   
Paul
 
Posts: n/a

Default Re: Autoincrement - 06-13-2004 , 04:55 AM




nospam (AT) nospam (DOT) net says...

Quote:
How can I create an Interbase table with an autoincrement field in SQL? What
should I change in this query for example:

Check out the Borland newsgroups if you have any questions about
Interbase - the newsgroups are full of high quality responses with a
very high signal to noise ratio, and they are moderated.


Paul...


--
plinehan y_a_h_o_o and d_o_t com
C++ Builder 5 SP1, Interbase 6.0.1.6 IBX 5.04 W2K Pro
Please do not top-post.

"XML avoids the fundamental question of what we should do,
by focusing entirely on how we should do it."

quote from http://www.metatorial.com


Reply With Quote
  #5  
Old   
John
 
Posts: n/a

Default Re: Autoincrement - 06-14-2004 , 06:00 AM



--CELKO-- wrote:

Quote:
How can I create an Interbase table with an autoincrement field in

SQL?

=Why do you want to be a bad programmer? Auto-increment Oh, you mean
faking a sequential file's positional record number, so I can
reference the **physical** storage location? Sure, if I want to lose
all the advantages of an abstract data model, SQL set oriented
programming, carry extra data and destroy the portability of code!

More and more programmers who have absolutely no database training are
being told to design a database. They are using GUIDs, IDENTITY,
ROWID and other proprietary auto-numbering "features" in SQL products
to imitate either a record number (sequential file system mindset) or
OID (OO mindset) since they don't know anything else.

snip
I think that you are either generalising wildly or misunderstand the OP.
Let's think of a common problem: defining an order number for a sales
order. You might have a table such as this:

orders( order_number int,
customer_order_number varchar,
order_status int,
customer_account_number int
)

I can't think of any good reason not to use order_number as an auto
increment primary key. Nothing else can be a key, and there is no
advantage to getting the middle tier or (!) client to define it.

Despite the fact that I agree with your sentiment regarding programmers
hacking database design, I have to say that I regard auto increment as a
useful way of simplifying middle tier database calls and avoiding the
need for recalculating (e.g.) order_number before every insert.

John


Reply With Quote
  #6  
Old   
Todd B
 
Posts: n/a

Default Re: Autoincrement - 06-14-2004 , 04:26 PM



John <no@email> wrote

Quote:
--CELKO-- wrote:

How can I create an Interbase table with an autoincrement field in

SQL?

=Why do you want to be a bad programmer? Auto-increment Oh, you mean
faking a sequential file's positional record number, so I can
reference the **physical** storage location? Sure, if I want to lose
all the advantages of an abstract data model, SQL set oriented
programming, carry extra data and destroy the portability of code!

More and more programmers who have absolutely no database training are
being told to design a database. They are using GUIDs, IDENTITY,
ROWID and other proprietary auto-numbering "features" in SQL products
to imitate either a record number (sequential file system mindset) or
OID (OO mindset) since they don't know anything else.

snip

I think that you are either generalising wildly or misunderstand the OP.
Let's think of a common problem: defining an order number for a sales
order. You might have a table such as this:

orders( order_number int,
customer_order_number varchar,
order_status int,
customer_account_number int
)

I can't think of any good reason not to use order_number as an auto
increment primary key. Nothing else can be a key, and there is no
advantage to getting the middle tier or (!) client to define it.

Despite the fact that I agree with your sentiment regarding programmers
hacking database design, I have to say that I regard auto increment as a
useful way of simplifying middle tier database calls and avoiding the
need for recalculating (e.g.) order_number before every insert.

John
What about the idea of having a multiple primary key on
customer_account_number and order_date? Or, if you want to track the
status history of the order, have a primary key for
(customer_account_number,order_date,order_status)?

I personally would never use a surrogate key for the primary key
(except in rare occasions). Although, I might make it a unique column
for customer interaction purposes (i.e. "When you contact us, just
tell the support service represantative your trouble ticket number is
123456").

The problem with the scenario you present is that the same order could
be entered more than once (each with a different order_number). I've
seen this happen, and I've seen it cost a company a good percent of
their gross profit in one quarter.

Todd


Reply With Quote
  #7  
Old   
Nick Landsberg
 
Posts: n/a

Default Re: Autoincrement - 06-14-2004 , 05:37 PM



Todd B wrote:
Quote:
John <no@email> wrote


--CELKO-- wrote:


How can I create an Interbase table with an autoincrement field in

SQL?

=Why do you want to be a bad programmer? Auto-increment Oh, you mean
faking a sequential file's positional record number, so I can
reference the **physical** storage location? Sure, if I want to lose
all the advantages of an abstract data model, SQL set oriented
programming, carry extra data and destroy the portability of code!

More and more programmers who have absolutely no database training are
being told to design a database. They are using GUIDs, IDENTITY,
ROWID and other proprietary auto-numbering "features" in SQL products
to imitate either a record number (sequential file system mindset) or
OID (OO mindset) since they don't know anything else.


snip

I think that you are either generalising wildly or misunderstand the OP.
Let's think of a common problem: defining an order number for a sales
order. You might have a table such as this:

orders( order_number int,
customer_order_number varchar,
order_status int,
customer_account_number int
)

I can't think of any good reason not to use order_number as an auto
increment primary key. Nothing else can be a key, and there is no
advantage to getting the middle tier or (!) client to define it.

Despite the fact that I agree with your sentiment regarding programmers
hacking database design, I have to say that I regard auto increment as a
useful way of simplifying middle tier database calls and avoiding the
need for recalculating (e.g.) order_number before every insert.

John


What about the idea of having a multiple primary key on
customer_account_number and order_date? Or, if you want to track the
status history of the order, have a primary key for
(customer_account_number,order_date,order_status)?

I personally would never use a surrogate key for the primary key
(except in rare occasions). Although, I might make it a unique column
for customer interaction purposes (i.e. "When you contact us, just
tell the support service represantative your trouble ticket number is
123456").

The problem with the scenario you present is that the same order could
be entered more than once (each with a different order_number). I've
seen this happen, and I've seen it cost a company a good percent of
their gross profit in one quarter.

Todd
While I generally agree, I ran across a situation some years back
for which auto-increment seemed like a good solution ...

There's no replacement for knowing your application,
but not all applications are created equal.

Some apps at the extreme ends may require adding
something external to ensure uniqueness.

Consider the situation where equipment alarms are to be
stored in a database. (Not your typical database application,
I know.)

Consider also, the following sequence of messages:

*** 2004-06-14:18:16 ALARM; EQUIPID=12345; SEV=1; TYPE=somestring;
*** 2004-06-14:18:16 ALARM-CLEAR; EQUIPID=12345; SEV=1; TYPE=somestring;
*** 2004-06-14:18:16 ALARM; EQUIPID=12345; SEV=1; TYPE=somestring;
*** 2004-06-14:18:16 ALARM-CLEAR; EQUIPID=12345; SEV=1; TYPE=somestring;
*** 2004-06-14:18:16 ALARM; EQUIPID=12345; SEV=1; TYPE=somestring;
*** 2004-06-14:18:16 ALARM-CLEAR; EQUIPID=12345; SEV=1; TYPE=somestring;
....
(Note same time stamps, down to the second, and the
sequence may not stop with just 3 pairs of messages.)

For this particular situation, the customer was interested in
(among other things), the frequency of these messages
because the above behavior indicated a piece of equipment was
about to completely fail soon.
Maybe in a matter of weeks, days or hours.
If the message-pairs occurred just once, the customer wasn't
particularly concerned.

We handled the situation by providing a sequence number for
each message to assure uniqueness. We could have handled it
by adding the milliseconds to the time-stamp, but that
was benchmarked as taking more CPU cycles than the
auto-increment.

How *should* this have been handled?


NPL


Reply With Quote
  #8  
Old   
--CELKO--
 
Posts: n/a

Default Re: Autoincrement - 06-14-2004 , 08:45 PM



Quote:
Let's think of a common problem: defining an order number for a
sales
order ... I can't think of any good reason not to use order_number as
an auto
increment primary key. <<

Since customers have to use this code, where is the check digit?
Surely no responsible designer would neglect that vital part of the
design of this code.

How do you account for gaps in a sequence? Think about a checkbook;
you do not have gaps, without a note that a check numberis VOID.

How do you track orders from multiple sources? If the orders can only
be identified AFTER they arrive at a single PHYSICAL storage, it
screws up expansion of the enterprise.

People think that designing order numbers and such things are easy
because they can just auto-increment "a magic number"; sure, if you
don't care about data quality. Auto-increment is for the "code first,
think later" programmers.


Reply With Quote
  #9  
Old   
--CELKO--
 
Posts: n/a

Default Re: Autoincrement - 06-14-2004 , 08:52 PM



Quote:
We handled the situation by providing a sequence number for each
message to assure uniqueness. We could have handled it by adding the
milliseconds to the time-stamp, but that was benchmarked as taking
more CPU cycles than the
auto-increment. <<

That one is weird; I have never seen a timestamp that was slower than
an auto-increment. They are both derived from hardware and the
timeclock in in main storage, while the auto-increment counts are
often attached to schema information tables so they have to be read
into storage.

Also, doesn't your clock go finer than milliseconds? The FIPS rules
say you have to have at least five decimal places in the seconds.


Reply With Quote
  #10  
Old   
Nick Landsberg
 
Posts: n/a

Default Re: Autoincrement - 06-14-2004 , 09:35 PM



--CELKO-- wrote:

Quote:
We handled the situation by providing a sequence number for each

message to assure uniqueness. We could have handled it by adding the
milliseconds to the time-stamp, but that was benchmarked as taking
more CPU cycles than the
auto-increment.

That one is weird; I have never seen a timestamp that was slower than
an auto-increment. They are both derived from hardware and the
timeclock in in main storage, while the auto-increment counts are
often attached to schema information tables so they have to be read
into storage.

Also, doesn't your clock go finer than milliseconds? The FIPS rules
say you have to have at least five decimal places in the seconds.
This was about 16-18 years ago and the time resultion
was limited then.

I'm only reporting what we found at the time. It is
quite probable that if I were on such a project again,
the benchmarks would give different results, in which case
I would not use an autoincrement because of all the
points you made in a previous post. However, if the they
did show that auto-increment were faster, how can one
handle this problem without using auto-increment.

(It's an outlier. 99.44% of databases don't need this.
But it's useful in some cases. I agree that it can be abused by
newbies, tho.)

NPL


--
"It is impossible to make anything foolproof
because fools are so ingenious"
- A. Bloch


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.