dbTalk Databases Forums  

Why use a composite PK ever?

comp.databases comp.databases


Discuss Why use a composite PK ever? in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
--CELKO--
 
Posts: n/a

Default Re: Why use a composite PK ever? - 06-13-2006 , 08:36 AM






Quote:
So in real life it turned out to be changing in about 0.5% cases.
Not a bad error rate! Mailing lists work with 2-3% at best and in the
United States, they estimate 5% or more of the SSNs (Social Security
Number) are wrong or fake. Of course, you do not have tens of millions
of illegal Mexicans flooding the Baltics

I vaguely remember and article years ago about one country (Sweden?)
that used the birthdate-sequence number scheme getting in trouble
because it immediately marked someone as a foreigner or refugee and was
thus a source of discrimination.

I also had a friend who worked overseas with a "February-30" guest
worker id card number that was treated as a forgery when he tried to
rent a car.

I am waiting for the day that I have very-smart card with my medical
history on it that can tie itself to me via a DNA match, so i do not
have to give any further information.



Reply With Quote
  #12  
Old   
Gints Plivna
 
Posts: n/a

Default Re: Why use a composite PK ever? - 06-13-2006 , 09:27 AM






Yea we also had some discussions about it as world is becoming more
political correct each day
Birthday in personal id is considered as discrimination and abuse of
private data so in future it most probably will be just a sequence
number with control sum. Of course that will avoid also much trouble of
changing many documents and related info in case of birthday change.
Quote:
I also had a friend who worked overseas with a "February-30" guest
worker id card number that was treated as a forgery when he tried to
rent a car.
Yea and after personal id change we as IT workers will have some work
again to disable personal id checks that was implemented in every other
information system
Quote:
I am waiting for the day that I have very-smart card with my medical
history on it that can tie itself to me via a DNA match, so i do not
have to give any further information.
Yea and when you'll lose it happy finder or thief will know so much
about you

Gints



Reply With Quote
  #13  
Old   
Michael Zedeler
 
Posts: n/a

Default Re: Why use a composite PK ever? - 06-13-2006 , 10:25 AM



gints.plivna (AT) gmail (DOT) com wrote:
Quote:
Michael Zedeler wrote:

dananrg (AT) yahoo (DOT) com wrote:

[...]

There are a few exceptions, such as here in Denmark where each citizen
has a personal id number that _never_ changes and is guaranteed to be
unique. It consist of the birth date and a serial number (subject to
certain checksum constraints), which can easilly be implemented as a
two-part key. In certain applications I would use that number as primary
key.

Khe, khe are you really working with this database or just heard of it,
that personal id is never changing? Funnily we in Latvia had our db
created with help of Denmark consultants and as a result I assume
recieved more or less the same data model. So in real life it turned
out to be changing in about 0.5% cases. So I'd be interested how you in
Denmark bypass at least following problems we had:
1) person finds out that his birthday has changed or was incorrectly
written in his identification documents
That is extremely unlikely to happen, since the number is issued when
each child is born. I am unsure about the procedure regarding people
emigrating to Denmark, apart from the fact that they do get their own ssid.

Quote:
2) operator simply enters incorrect birthdate
See above.

Quote:
3) two persons get the same personal id - this probably was our problem
only because initially there wasn't all data enetered online in central
db
That is not happening, since we have another four digits to distinguish
between people with the same birth date. And there is a central,
authoratative database with everybody in it.

Quote:
4) a person get personal id two times - this is probably a bigger
problem and mostly for foreigners.
Yes, but surrogate keys doesn't solve that problem either.

Quote:
So now we have a new system and personal id isn't primary key there, it
is just the same attribute as name, sex and address for example.
So after such cases I'm very sceptical about natural keys as primary
keys and tend to use surrogate keys everywhere and use unique
constraints for natural keys.
I'm ready to accept that natural keys for the first 80 elements of
periodic table probably are immutable )
Analyzing the social security id sceme above, it is impossible for a
person to change his/her actual birthdate (read: the actual day it took
place). The only reason that you would change the birthe date record,
could be a result of errors. For this reason, using birth dates as part
of the ssid is fairly stable.

For ERP systems where you need to track your customers, I'd readily
reccomend using ssid as primary keys, since you'd otherwise have to
spend a lot of resources internally keeping the database free of
redunant records. A job that the ssid scheme will do cheaper and less
error prone than your accountants.

Please note that here in Denmark, it is possible to subscribe to a
public ssid verification service.

Regards,

Michael.
--
Which is more dangerous? TV guided missiles or TV guided families?
I am less likely to answer usenet postings by anonymous authors.
Visit my home page at http://michael.zedeler.dk/


Reply With Quote
  #14  
Old   
David Portas
 
Posts: n/a

Default Re: Why use a composite PK ever? - 06-13-2006 , 01:59 PM



--CELKO-- wrote:
Quote:
I am waiting for the day that I have very-smart card with my medical
history on it that can tie itself to me via a DNA match, so i do not
have to give any further information.
You'd still have to give enough information to ensure you couldn't be
confused with your identical twin. If you don't have an identical twin
then you'd have to supply that information too. Of course you might be
mistaken or lying so the recipient may require additional
identification to prove that you aren't impersonating your twin.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/m...S,SQL.90).aspx
--



Reply With Quote
  #15  
Old   
Michael Zedeler
 
Posts: n/a

Default OT [Re: Why use a composite PK ever?] - 06-13-2006 , 04:10 PM



David Portas wrote:
Quote:
--CELKO-- wrote:

I am waiting for the day that I have very-smart card with my medical
history on it that can tie itself to me via a DNA match, so i do not
have to give any further information.

You'd still have to give enough information to ensure you couldn't be
confused with your identical twin. If you don't have an identical twin
then you'd have to supply that information too. Of course you might be
mistaken or lying so the recipient may require additional
identification to prove that you aren't impersonating your twin.
I heard about a strange phenomenon where a person could carry two
different sets of genes as a result of a fusion between non-identical
twins in the womb, thus creating a lot of problems wrt to DNA-based
forensics.

Sorry for the off topic posting. Relevant to anyone considering DNA to
be a silver bullet solving identification problems.

Quote:
--
David Portas, SQL Server MVP
[...]
--
FYI: that wasn't a standard signature. There should be a space after the
two first dashes.

Regards,

Michael.
--
Which is more dangerous? TV guided missiles or TV guided families?
I am less likely to answer usenet postings by anonymous authors.
Visit my home page at http://michael.zedeler.dk/


Reply With Quote
  #16  
Old   
--CELKO--
 
Posts: n/a

Default Re: OT [Re: Why use a composite PK ever?] - 06-14-2006 , 12:23 PM



Quote:
You'd still have to give enough information to ensure you couldn't be confused with your identical twin. If you don't have an identical twin then you'd have to supply that information too.
That is not a problem with a full medical record on chip -- dental
work, surgeries, etc. will all be different. I just want to use the
DNA signature for the verification to get into the data.

Quote:
I heard about a strange phenomenon where a person could carry two different sets of genes as a result of a fusion between non-identical twins in the womb, thus creating a lot of problems wrt to DNA-based forensics.
Chimera. It was used in a CSI episode. The gimmick is that you will
have two disitinct DNA strands and depending where you sample, you will
get one or the other. It is supposed to make transplants very hard to
do and mess you up in other ways.



Reply With Quote
  #17  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Why use a composite PK ever? - 06-19-2006 , 02:32 PM




Tony Rogerson wrote:
Quote:
This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means never
used in queries, DRI or anything else that a user does.

That's your personal interpretation and not a lot of others and you've
actually missed out what Codd really said, I've summarised this in my blog:
http://sqlblogcasts.com/blogs/tonyro...05/11/427.aspx.

Your ideas around surrogate key usage is completely wrong.

You can still use surrogate keys in an application and they do not need to
be exposed to the application, surrogates are a good way of getting round
the problem when the natural key (primary key) changes - in that situation
Then you really have not created the table with the complete primary
key. That example is spurious. Now you might have a point if you claim
the complete PK is too complex to use. For example, to truely identify
a person, we might actually need to use their DNA sequence. That is
the reason for a surrogate key like SSN in USA.

Quote:
you need to stop all user access to your database do the primary key update
and allow users back in, but first - expiring any cache; why do this? Well,
if you have used the primary key as the access through to the data from the
app then there might be old primary key floating around that when a user
clicks update may in fact update the wrong data!

Let's look at the logical problems. First try to create a table with
two columns and try to make them both IDENTITY. If you cannot declare
more than one column to be of a certain data type, then that thing is
not a data type at all, by definition. It is a property which belongs
to the PHYSICAL table, not the LOGICAL data in the table. It also has
to be NULL-able to be a data type in SQL.

First, IDENTITY is a property of a column in the same way a PRIMARY KEY is
the property one or more columns - you can only have 1 primary key, you can
only have 1 identity property!
Says who? I don't believe Codd defined anything called IDENTITY,
especially in the logical model. You are making up rules.

Quote:
Again, IDENTITY is not a data type, its a property! Its a property I say,
sunk in yet? Its a property I say!!! and, it can exist on a nullable column.
What Relational Data model defines this IDENTITY property?

Quote:

Next, create a table with one column and make it an IDENTITY. Now try
to insert, update and delete different numbers from it. If you cannot
insert, update and delete rows from a table, then it is not a table by
definition.

You cannot update the value of IDENTITY, you can delete rows or insert rows
using SET IDENTITY_INSERT <tbl> ON.

What is the property of a surrogate key again? Not changeable!!!

So, the IDENTITY property suddenly becomes valid in both COdd and Dates
definitions of a surrogate key.
Well at least you realize you are making up your own rules.

Quote:

Finally create a simple table with one IDENTITY and a few other
columns. Use a few statements like

INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');

To put a few rows into the table and notice that the IDENTITY
sequentially numbered them in the order they were presented. If you
delete a row, the gap in the sequence is not filled in and the sequence
continues from the highest number that has ever been used in that
column in that particular table. This is how we did record numbers in
pre-allocated sequential files in the 1950's, by the way. A utility
program would then "pack" or "compress" the records that were flagged
as deleted or unused to move the empty space to the physical end of the
physical file. IDENTITY leaves the gaps.

But now use a statement with a query expression in it, like this:

INSERT INTO Foobar (a, b, c)
SELECT x, y, z
FROM Floob;

Since a query result is a table, and a table is a set which has no
ordering, what should the IDENTITY numbers be? The entire, whole,
completed set is presented to Foobar all at once, not a row at a time.
There are (n!) ways to number (n) rows, so which one do you pick? The
answer has been to use whatever the *physical* order of the result set
happened to be. That non-relational phrase "physical order" again!

But it is actually worse than that. If the same query is executed
again, but with new statistics or after an index has been dropped or
added, the new execution plan could bring the result set back in a
different physical order. Indexes and statistics are not part of the
logical model.

Can you explain from a logical model viewpoint why the same rows in the
second query get different IDENTITY numbers? In the relational model,
they should be treated the same if all the values of all the attributes
are identical.

Just how does that prevent usage as a surrogate key?

A real PK always has the same value. moving the dependent data from one
instance of the table to another does not change the PK, UNLESS the PK
is a "IDENTITY" column. then Joe Smith is ID=5 in one table but ID=2381
in the other. (example: the same table in a site database and the
company datawarehouse)


Quote:
Using IDENTITY as a primary key is a sign that there is no data model,
only an imitation of a sequential file system. Since this "magic,
all-purpose, one-size-fits-all" pseudo-identifier exists only as a
result of the physical state of a particular piece of hardware at a
particular time as read by the current release of a particular database
product, how do you verify that an entity has such a number in the
reality you are modeling?

What do you do when you have no usable natural key, for instance a member
table of a user group? You can't ask people for their national insurance
number - you need to create your own membership id! I'd be interested to
know what primary key you'd create when you are only capturing FullName and
Address?

There are times when you are forced to use a surrogate key. But I
honestly have seen surrogate keys used even when a simple, valid
composite candidate key could have been used instead. There is not
necessarily any justification for using an ID for club members when a
simple key like LastName, Firstname, Birthdate likely suffices.

Now if you are running a National Insurance program I can see that
there is really no relatively simple composite key (again short of a
full DNA sequence).


Quote:
Remember - there are data protection rules for capturing data that is not
necessary for the purpose you are using it, well here in the UK anyway.,
So how do you guarantee that you do not mix up data?
ID NAME other attributes
5 Joe Schmo
913 Joe Schmo

When is having both rows valid? When is it invalid? Do you see that it
is a lot easier dealing with
NAME other attributes
whenever possible?

Quote:

Another cute way to destroy data integrity:

BEGIN ATOMIC
DELETE FROM Foobar
WHERE id = <<some row>>;
INSERT INTO Foobar
VALUES ( <<recreate deleted row>>)
END;

Logically this should do nothing, but since IDENTITY has gaps, it
trashes the data.

SET IDENTITY_INSERT Foobar ON
What the heck command is this???

Quote:
INSERT INTO Foobar ( always specify your columns you are inserting into!!! )
VALUES ( <<recreate deleted row>>)

SET IDENTITY_INSERT Foobar OFF

Its REALLY BAD PRACTICE not specifying the columns you are inserting into on
the INSERT - you should know that and should set a good example.
But those folks that overuse ID columns do it this way all the time.
Joe was giving an example of BAD coding practice that happens in such
systems in production. Since the ID column is trigger assigned or
Automatic (depending on the DBMS), coders get sloppy.

Quote:

Now you are REALLY thinking about relations and keys instead of 1950's
sequential record numbering. Adding an IDENTITY column to either of
these tables as a candidate key would be dangerously redundant; one
query uses the IDENTITY and another uses the real key, and like a man
with two watches, you are never sure what time it is.

Daniel Wetzler found out the hard way that IDENTITY changed behavior in
SQL 2000 and SQL 2005. If you perform the statement below you get only
one dataset which has the described properties.

SELECT DISTINCT IDENTITY (INTEGER) AS fake_id, title1, ..
FROM Foobar
WHERE title1 IS NOT NULL
AND ..

The IDENTITY function makes each row unique so DISTINCT doesn't
eliminate the duplicates in this case. Interestingly, this behavior
seems to have changed in SQL Server 2005. If Iupoi run this as a SELECT
INTO on 2005, the execution plan computes the IDENTITY value after
DISTINCT.

For 2000 the kludge is a bit hard to see. The following should insert
just one row into the target table.

CREATE TABLE Foobar (title1 VARCHAR(10), ..);

INSERT INTO Foobar VALUES ('1', ..);
INSERT INTO Foobar VALUES ('1', ..);

SELECT IDENTITY (INTEGER) AS fake_id, title1, ..
INTO Foobar2
FROM (SELECT DISTINCT title1, ..
FROM Foobar
WHERE ..);

Since we are dealling with a proprietary feature, this is subject to
change without noti ce again.


Thats because people have relied not on documented behaviour as specified in
books online but believing the product behaves that way, which, given
another example, people have come unstuck with the ORDER BY in a view!

You really need to get a handle on your bias against IDENTITY, you still do
not understand (after 5 years now, your post is an old canned answer by the
way) the implementation and make fundemental mistakes which is why your
answer is just out and out wrong.
Please get the point straight. Surrogate keys should always be treated
as a necessary evil. There are times when they cannot be avoided, but
all too often they are being chosen as a quick fix. Better requirements
analysis and data modelling will oftern show they are not needed in
many cases where they are used.

At least in your favor, you did not argue the old "it reduces data
storage" line.
Quote:
--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials


"--CELKO--" <jcelko212 (AT) earthlink (DOT) net> wrote in message
news:1149954078.691147.177750 (AT) y43g2000cwc (DOT) googlegroups.com...
[simple, surrogate keys for PKs in all tables] I personally think it's
a good rule of thumb to create surrogate keys for almost all tables.

No, absolutley no. First of all, they do not create surrogate keys;
they create exposed physical locators, like IDENTITY or other
auto-numbering schemes. Let me maker an appeal to authority, with a
quote from Dr. Codd: "..Database users may cause the system to generate
or delete a surrogate, but they have no control over its value, nor is
its value ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410)
and Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp.
397-434.



Reply With Quote
  #18  
Old   
Abdullah Kauchali
 
Posts: n/a

Default Re: Why use a composite PK ever? - 06-19-2006 , 04:15 PM




"Michael Zedeler" wrote in message
Quote:
2) operator simply enters incorrect birthdate

See above.
If an operator captures Joe Bloggs' ID number for Jake Bloggs, how does your
earlier ("see above") comment help?




Reply With Quote
  #19  
Old   
Michael Zedeler
 
Posts: n/a

Default Re: Why use a composite PK ever? - 06-20-2006 , 02:54 AM



Abdullah Kauchali wrote:
Quote:
"Michael Zedeler" wrote in message

2) operator simply enters incorrect birthdate

See above.

If an operator captures Joe Bloggs' ID number for Jake Bloggs, how does your
earlier ("see above") comment help?
There is a built in checksum, making this less likely to happen. But of
course, this kind of error will eventually happen in a tiny fraction of
all manually entered ssids. The question is if you can provide more
reliable alternatives. In many applications, I think that it is not the
case.

Regards,

Michael.
--
Which is more dangerous? TV guided missiles or TV guided families?
I am less likely to answer usenet postings by anonymous authors.
Visit my home page at http://michael.zedeler.dk/


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.