dbTalk Databases Forums  

How should I generate a primary key?

comp.databases comp.databases


Discuss How should I generate a primary key? in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
David Segall
 
Posts: n/a

Default How should I generate a primary key? - 11-01-2006 , 08:55 AM






In a separate thread I was chastised for using the database to
generate primary keys. CELKO insisted "we invent a key with check
digits, validation rules and an audit trail within our enterprise". I
might have ignored the advice from someone else but CELKO's posts have
been consistently helpful and erudite.

I remember generating dozens of pages of "computer paper" containing
customer numbers replete with the latest fashion in check digits so
that the clerical staff could assign a number to a new customer and
code the incoming orders with their assigned number. I have assumed
that, these days, primary keys are either an intrinsic part of the
data or are computer generated and that, in either case, the user is
almost never obliged to know what they are.

If a specific case makes a response easier, mine is registration on a
web site. There is a natural key because the user name for
registration must be unique. However, the user name is likely to
change fairly frequently and I really don't want to change the foreign
key on many records after every change of name.

Please don't answer on the basis that surrogate keys are "bad". I'm
sure you can think of a case when a generated primary key is required
and you have some thoughts on how that should accomplished.



Reply With Quote
  #2  
Old   
Robert Klemme
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-01-2006 , 10:44 AM






David Segall wrote:
Quote:
If a specific case makes a response easier, mine is registration on a
web site. There is a natural key because the user name for
registration must be unique. However, the user name is likely to
change fairly frequently and I really don't want to change the foreign
key on many records after every change of name.

Please don't answer on the basis that surrogate keys are "bad". I'm
sure you can think of a case when a generated primary key is required
and you have some thoughts on how that should accomplished.
Maybe I am being dumb but I do not understand your question. What is
wrong with using IDENTITY or SEQUENCE? I mean, that's the most
"natural" way to do it. If you want to get fancy you might consider
using a UUID - even that can be generated by many modern RDBMS.

Kind regards

robert


Reply With Quote
  #3  
Old   
Ed Prochak
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-01-2006 , 12:19 PM




David Segall wrote:
Quote:
In a separate thread I was chastised for using the database to
generate primary keys. CELKO insisted "we invent a key with check
digits, validation rules and an audit trail within our enterprise". I
might have ignored the advice from someone else but CELKO's posts have
been consistently helpful and erudite.
We have to assume new posters in this group know little about good
Relational Model dtabase design. Too many of them think MS Access is a
relational DB, and Joe C. is so strict that even ORACLE is borderline
in his book.

Quote:
I remember generating dozens of pages of "computer paper" containing
customer numbers replete with the latest fashion in check digits so
that the clerical staff could assign a number to a new customer and
code the incoming orders with their assigned number. I have assumed
that, these days, primary keys are either an intrinsic part of the
data or are computer generated and that, in either case, the user is
almost never obliged to know what they are.
That last sentence shows that you understand some of the pitfalls of
surrogate keys. Good. Your implementation will likely make sense and
actually work.
Quote:
If a specific case makes a response easier, mine is registration on a
web site. There is a natural key because the user name for
registration must be unique. However, the user name is likely to
change fairly frequently and I really don't want to change the foreign
key on many records after every change of name.
As ademant against surrogate keys as I am, there is a pragmatic use for
them.

Quote:
Please don't answer on the basis that surrogate keys are "bad". I'm
sure you can think of a case when a generated primary key is required
and you have some thoughts on how that should accomplished.
Now to get to the nitty gritty of your question. Well without knowing
the DBMS product you are using, there is no easy answer. There are
three ways, depending on the support of your DBMS.

1. A counter table
Use a one row table
Create table regkey (registrationkey number );
with basically this logic (commands and syntax vary per DBMS):
lock table regkey
select registrationkey from regkey;
update regkey set registrationkey=registrationkey+1;
unlock table regkey
commit;
Obviously this scales poorly.

2. Autogenerated
Either an AUTOINCREMENT field or a SEQUENCE
I don't really know how well the AUTOINCREMENT type database do. If
they leave no gaps, is the value not generated until the COMMIT?
Otherewise they would seem to have a scalling problem too. A SEQUENCE
has fewer scaling problems because it does not guarantee an unbroken
sequence of numbers over all sessions.

3. A ticket table
use a table like this:
Create table regkey (registrationkey number , available char(1) );
and populate it at install time with values needed and the available
flag cleared.
with basically this logic (commands and syntax vary per DBMS):
lock table regkey
select MAX(registrationkey) from regkey where available='Y';
update regkey set avail='N';
unlock table regkey
commit;
some scaling issues here too.

My prefered choice DB is Oracle, so I lean toward using a SEQUENCE. It
is the scalable choice.
Ed



Reply With Quote
  #4  
Old   
David Segall
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-01-2006 , 09:09 PM



Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:

Quote:
David Segall wrote:
If a specific case makes a response easier, mine is registration on a
web site. There is a natural key because the user name for
registration must be unique. However, the user name is likely to
change fairly frequently and I really don't want to change the foreign
key on many records after every change of name.

Please don't answer on the basis that surrogate keys are "bad". I'm
sure you can think of a case when a generated primary key is required
and you have some thoughts on how that should accomplished.

Maybe I am being dumb but I do not understand your question. What is
wrong with using IDENTITY or SEQUENCE?
That _is_ my question and I should have said so more clearly.

I asked in a separate thread "Is there a database independent way of
obtaining the generated key after the insertion?" and received the
scathing response from CELKO "... newbies prefer to do it wrong by
using auto-increments or other proprietary stuff that cannot be
verified or validated BECAUSE it is easier than real RDBMS design.
They want to have a magical, universal "one-size-fits-all" answer that
does not require you ACTUALLY understand the problem domain"

I assumed, because of the respondent, that he had a valid point and
that there was a better, generally accepted, way of generating primary
keys.


Reply With Quote
  #5  
Old   
Christopher Browne
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-01-2006 , 09:09 PM



In the last exciting episode, David Segall <david (AT) address (DOT) invalid> wrote:
Quote:
Please don't answer on the basis that surrogate keys are "bad". I'm
sure you can think of a case when a generated primary key is required
and you have some thoughts on how that should accomplished.
The fact that you're thinking about this is a very good sign :-).

If there's a natural way to "number" the thing to create a permanently
good primary key, that's ideal.

If your application is the only thing that will ever see that key,
then using either a SEQUENCE or a DCE style UUID are likely to be best
fallbacks, and if you're aware that there are pitfalls, you'll
probably evade them :-).
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://cbbrowne.com/info/slony.html
Rules of the Evil Overlord #218. "I will not pick up a glowing ancient
artifact and shout "Its power is now mine!!!" Instead I will grab some
tongs, transfer it to a hazardous materials container, and transport
it back to my lab for study." <http://www.eviloverlord.com/>


Reply With Quote
  #6  
Old   
Robert Klemme
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-02-2006 , 02:47 AM



On 02.11.2006 04:09, David Segall wrote:
Quote:
Robert Klemme <shortcutter (AT) googlemail (DOT) com> wrote:
Maybe I am being dumb but I do not understand your question. What is
wrong with using IDENTITY or SEQUENCE?
That _is_ my question and I should have said so more clearly.

I asked in a separate thread "Is there a database independent way of
obtaining the generated key after the insertion?" and received the
scathing response from CELKO "... newbies prefer to do it wrong by
using auto-increments or other proprietary stuff that cannot be
verified or validated BECAUSE it is easier than real RDBMS design.
They want to have a magical, universal "one-size-fits-all" answer that
does not require you ACTUALLY understand the problem domain"

I assumed, because of the respondent, that he had a valid point and
that there was a better, generally accepted, way of generating primary
keys.
I think his point was to *not generate* PKs. And - I am doing some
interpretation here - his intention was to warn people from jumping on
autoincrement surrogate PKs without understanding their problem domain.
I don't think he wanted to say that there is never a use for them. My
0.02EUR...

Kind regards

robert


Reply With Quote
  #7  
Old   
AMO
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-02-2006 , 03:37 AM



"David Segall" <david (AT) address (DOT) invalid> wrote

Quote:
Maybe I am being dumb but I do not understand your question. What is
wrong with using IDENTITY or SEQUENCE?
That _is_ my question and I should have said so more clearly.
I asked in a separate thread "Is there a database independent way of
obtaining the generated key after the insertion?" and received the
scathing response from CELKO "... newbies prefer to do it wrong by
using auto-increments or other proprietary stuff that cannot be
verified or validated BECAUSE it is easier than real RDBMS design.
They want to have a magical, universal "one-size-fits-all" answer that
does not require you ACTUALLY understand the problem domain"
Ignore the other person. To avoid complications in any relational database,
the primary key cannot have any meaning within the system for the majority
of your data. If it does, then sooner or later people want to change the
value and primary key values should not be changeable for any reason.

The concept of sequences were invented because the overhead of getting the
next sequence from a database table resulted in needing the transaction to
be committed before the next person could get the next sequence. This
severely reduces the scalability of the database. Sequences therefore
obtain their value outside of the transaction.

In general, people either have an auto incrementing number as a primary key
if uniqueness is required within a single database, or make use of global
identifiers (GUID) to generate primary keys. These use algorithms to
guarantee that keys generated will be unique by machine by time. However,
they are significantly larger fields and beause the values generated are
algorthm-based, you cannot identify sequence as you would with an
incrementing number.

Now what I describe above should always be used for 'transactional' data
such as creating an order or adding an employee to the system.

However, for basic 'referential' data (and I mean basic as the term
referential is relative, so we're talking about code tables for the most
part here), you may wish to have this generated sequence to be an
alternative key and use the unique 'code' field as the primary key. You
have to be absolutely sure that the 'code' field will not change though, but
for the most part, many like to use the 'code' field because it assists when
doing lookups. In these cases, storing a generated number in the table
performing the lookup to the 'code' table is meaningless, and having
significance in the primary key can be advantageous.

AMO




Reply With Quote
  #8  
Old   
Ed Prochak
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-02-2006 , 07:14 AM




AMO wrote:
Quote:
"David Segall" <david (AT) address (DOT) invalid> wrote in message
news:nhmik2dqruobck1dvv8ld6cjs2mmr6t7o9 (AT) 4ax (DOT) com...
Maybe I am being dumb but I do not understand your question. What is
wrong with using IDENTITY or SEQUENCE?
That _is_ my question and I should have said so more clearly.
I asked in a separate thread "Is there a database independent way of
obtaining the generated key after the insertion?" and received the
scathing response from CELKO "... newbies prefer to do it wrong by
using auto-increments or other proprietary stuff that cannot be
verified or validated BECAUSE it is easier than real RDBMS design.
They want to have a magical, universal "one-size-fits-all" answer that
does not require you ACTUALLY understand the problem domain"

Ignore the other person.
No, take Joe's advice in context.

Quote:
To avoid complications in any relational database,
the primary key cannot have any meaning within the system for the majority
of your data.
WRONG! The first assumption is that the PK indentifies the entity being
modelled. A PK that has no meaning makes no sense in a Relational
Model. You are just proposing using a relational model database as a
network model one when you suggest this. Please go back and read Codd.

Quote:
If it does, then sooner or later people want to change the
value and primary key values should not be changeable for any reason.
While the facts you say in that sentence are true, that is not a reason
to go to surrogates for a PK. In practical real world systems, you must
manage changes to the PK. Within the DB that might be as simple as
deleting the old row and inserting a new one, or as complex as
cascading copies of data from the old PK to the new PK. In the later
case, you might consider surrogates to reduce such work, but only as a
last choice, not a primary choice.

Quote:
The concept of sequences were invented because the overhead of getting the
next sequence from a database table resulted in needing the transaction to
be committed before the next person could get the next sequence. This
severely reduces the scalability of the database. Sequences therefore
obtain their value outside of the transaction.

In general, people either have an auto incrementing number as a primary key
if uniqueness is required within a single database, or make use of global
identifiers (GUID) to generate primary keys.
You mean people People that do not understand good Relational design.

Quote:
These use algorithms to
guarantee that keys generated will be unique by machine by time. However,
they are significantly larger fields and beause the values generated are
algorthm-based, you cannot identify sequence as you would with an
incrementing number.
If the goal is to creat a Unique ID, you should not care about the
order. That just shows again, AMO, that you are not thinking correctly
about this issue.

Quote:
Now what I describe above should always be used for 'transactional' data
such as creating an order or adding an employee to the system.
Okay, let's see what you consider non-'transactional' data ...

Quote:
However, for basic 'referential' data (and I mean basic as the term
referential is relative, so we're talking about code tables for the most
part here), you may wish to have this generated sequence to be an
alternative key and use the unique 'code' field as the primary key. You
have to be absolutely sure that the 'code' field will not change though, but
for the most part, many like to use the 'code' field because it assists when
doing lookups. In these cases, storing a generated number in the table
performing the lookup to the 'code' table is meaningless, and having
significance in the primary key can be advantageous.

AMO
I have normally called this simply reference data. I thought the
difference might be slight, but it just struck me, so i checked the
dictionary and your use is correct. Reference is listed as a noun and
referential is the adjective. But code tables, as Joe Celko has pointed
out, usually come from some standard, so you need to generate nothing.

And to rephrase your last remark, I would say: having significance in
the primary key IS the advantage.

and that's why pseudokeys should be used with caution.

Ed



Reply With Quote
  #9  
Old   
AMO
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-02-2006 , 10:01 AM



"Ed Prochak" <edprochak (AT) gmail (DOT) com> wrote

Quote:
No, take Joe's advice in context.
I think that anyone taking Codd's advice and not deviating from that to
cater for real world requirements will find that they are going down a path
where relational thinking is fixed in the past. Codd's relational rules
have not changed over time meaning that those that follow his rules work on
the assumption that they are correct and definitive even with ever changing
database engines and their extensions over time.

Quote:
WRONG! The first assumption is that the PK indentifies the entity being
modelled. A PK that has no meaning makes no sense in a Relational
Model. You are just proposing using a relational model database as a
network model one when you suggest this. Please go back and read Codd.
The relational model that Codd created worked on the assumption that the
database acted as a whole. It did not cater for distributed databases
whether it be the old type of partitioning of an application to balance the
load via several machines as seen in large banking systems or in today's
world of offline applications on handheld devices that synchronise with the
backend databases which are commonly seen in parcel delivery companies
today.

The primary key cannot have any meaning to it unless its creation is always
at the master source database.

Please go back and read everything since Codd.

Quote:
While the facts you say in that sentence are true, that is not a reason
to go to surrogates for a PK. In practical real world systems, you must
manage changes to the PK. Within the DB that might be as simple as
deleting the old row and inserting a new one, or as complex as
cascading copies of data from the old PK to the new PK. In the later
case, you might consider surrogates to reduce such work, but only as a
last choice, not a primary choice.
Wrong. Whether you use yesterday's thinking or today's thinking, the
cardinal rule of relational databases is that you cannot change the primary
key. In distributed databases, doing this would be disasterous. We are not
in the days of Codd where one only has to consider a single database.

Quote:
The concept of sequences were invented because the overhead of getting
the
next sequence from a database table resulted in needing the transaction
to
be committed before the next person could get the next sequence. This
severely reduces the scalability of the database. Sequences therefore
obtain their value outside of the transaction.
In general, people either have an auto incrementing number as a primary
key
if uniqueness is required within a single database, or make use of global
identifiers (GUID) to generate primary keys.
You mean people People that do not understand good Relational design.
The people that dreamt up sequences are the brains behind Oracle, SQL Server
and all the major databases. This development was necessary especially in
the age of the internet where a website could be hit thousands of times and
systems had to scale up to tens of thousands of concurrent hits per minute.
GUIDs were created so that sequence ranges did not have to be farmed out to
seperate databases deployed in the field - sooner or later you would run out
of ranges if your deployed enough remote devices.

It's not a question of understanding the relational model, its striking a
balance between relational thinking 20 years ago by Codd and addressing the
todays issues of high scalability and distributed databases. There are many
areas of record creation such as audit trails where there is no significance
to the primary key value. Some people add a time component to a
concatenated key to try and convince themselves that this is more correct,
but time has no significance either and creating concatenated keys simply
make the primary key large and bulky resulting in poor application
performance.

Quote:
If the goal is to creat a Unique ID, you should not care about the
order. That just shows again, AMO, that you are not thinking correctly
about this issue.
I did not say it did matter. I was simply pointing out thar GUID algorithms
did not result in a sequential set of incrementing numbers.

Quote:
Okay, let's see what you consider non-'transactional' data ...
'Transactional' in the business term. When you buy a few things in a
supermarket, what you buy is a transaction when you exchange it with your
cash, as opposed to a database transaction which sums up the most granular
unit of work which can be rolled back as a single unit. When you go through
the tills at the supermarket, who cares about the transaction number? This
can be created by the next sequence or a GUID. It does not require the
customer to register their details with the supermarket so that a
concatenated key of CustNo, VisitDateTime can be created.

Quote:
And to rephrase your last remark, I would say: having significance in
the primary key IS the advantage.
and that's why pseudokeys should be used with caution.
Ed
I think that if you continue to rely on primary keys having significance,
you'll get into huge problems when you enter the world of distributed
databases or any system where data is not already registered with the
system. Imagine going to a website. Most websites would create some kind
of server session id to track your movements on the website without knowing
who you are. Otherwise you would either have to have the user register (but
even that is catch-22 in accessing the website until the user is registered)
or to use some kind of key based on detected I.P. address and some other
dodgy components.

Bottom line is that there is nothing wrong with a primary key with no
significance. In fact, it is the best way forward for most non-referential
data so that cascade updated are avoided. With basic referential data, as I
said before, if you can control that its source will always be the same,
then, yes, you can use values of significance. Otherwise, if they have
meaning, you'll have problems once you have to consider distributed
databases where the same code could be created on more than one database to
have different meanings.

You should really read everything with regards to databases since Codd (and
I am not trying to insult you here, but you have to work in today's world
and not the world of Codd). Codd made a good start on the relational model
but even he would make changes to encompass today's real world requirements
and problems.

AMO




Reply With Quote
  #10  
Old   
Lennart
 
Posts: n/a

Default Re: How should I generate a primary key? - 11-02-2006 , 10:26 AM




AMO wrote:
[...]
Quote:
I think that anyone taking Codd's advice and not deviating from that to
cater for real world requirements will find that they are going down a path
where relational thinking is fixed in the past. Codd's relational rules
have not changed over time meaning that those that follow his rules work on
the assumption that they are correct and definitive even with ever changing
database engines and their extensions over time.
Predicate logic hasnt changed much either.


/Lennart



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.