![]() | |
#41
| |||
| |||
|
|
On Tue, 2 Dec 2003 09:04:35 -0500 in comp.databases, "Bob Badour" bbadour (AT) golden (DOT) net> wrote: "Louis Davidson" <dr_dontspamme_sql (AT) hotmail (DOT) com> wrote in message news:#rIbinJuDHA.3436 (AT) tk2msftngp13 (DOT) phx.gbl... While it is true that the chosen primary key cannot contain any optional values, it is more the praticioner (sp?) that disagrees with this stance. Compound keys are unwieldy and bad for performance, but the theorist in me says "who cares?" It is all about what is right/best, not what is fastest/easiest. Performance is determined by the physical structure and not by the logical interface. Legitimate theorists have written ad nauseum on the severe logical problems caused by using compound keys for references when data may be missing. Are you talking about data being missing from a foreign key side of a relationship? I'm pro identity column myself but I don't see how that would help in this instance. |
#42
| |||
| |||
|
|
On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas" REMOVE_BEFORE_REPLYING_dportas (AT) acm (DOT) org> wrote: living people than VendorID values. I don't think an identity VendorID value is any more artificial than any other unique way of identifying a vendor, so long as it is assigned the first time a vendor enters the system and is never changed. But the point is that a "natural" key is verifiable outside of the system. When I see a NG post from "skass[at]drew.edu" I don't care whether that's based on your "real" name or even whether S.Kass is the same name as on your passport or driver's licence. What's important to me is that it's determined by a consistent method outside of the system which gives me some acceptable degree of confidence that you're the same person who posted here as "skass[at]drew.edu" yesterday. Of course that validity is destroyed if you change your email address or if someone spoofs your address. But it's still intrinsically better than an arbitrary ID allocated by the server. If Steve leaves that educational facility he's at now, his email address will surely change. If he goes into a witness protection scheme his name, address and SS number will change, he or someone else could change quiet a bit about him but if he's on a database with an identity column as his PK then it's more likely that it will *never* change. |
#43
| |||
| |||
|
|
I've read through this thread but I don't understand it. I always use an int or smallint as primary key, with identity. I believe it would be a mess otherwise. |
|
Example: I have a table with people, last name, first name, address, ... So suppose you would make a natural key then you need at least the last name and the first name. I have >25 other tables that reference that table. If I get this right I will need to use the name and firstname field in all the other tables as well to reference. Isn't that just a lot of data waste? If I'm missing something, please tell me what because this seems a bit silly. |
#44
| |||
| |||
|
|
You have definitely hit a couple of the strong points in IDENTITY's favor. But there are cases where a "natural" kind of key can be small as well. Consider when eBay bought PayPal (and other than that, the rest of this paragraph is completely fictional), they likely had to merge some data... perhaps eBay used an IDENTITY to generate customer numbers, but they want to align those primary keys with the new data in the PayPal tables. So, the keys in the PayPal data become INTs, but not IDENTITY. They are kind of "natural" because they came to the PayPal from an external source, so to speak, rather than generated arbitrarily from within. Of course, completely fictional. But surely you can see that not all natural keys are going to be larger than an IDENTITY, or less efficient. There are other examples, too. In a small stats system, a SMALLDATETIME could be the primary key (perhaps several subrelated tables are organized by day). In fact, part of http://www.aspfaq.com/stats.asp (and plenty more that you can't see) is derived on a set of tables where SMALLDATETIME is the only key of relevance. Okay, so that's still 4 bytes, but you save 4 if your other alternative is to store an IDENTITY along with the SMALLDATETIME value. Consider: CREATE TABLE calendar ( dateValue SMALLDATETIME PRIMARY KEY ) vs. CREATE TABLE calendar ( dateID INT IDENTITY PRIMARY KEY, dateValue SMALLDATETIME NOT NULL ) Never mind my goofy naming scheme. :-) Now, Kass could probably show me some cool dateadd tricks that would allow me to store just an INT (or maybe even a SMALLINT, depending on the date range required), and determine what the date value is at runtime. Not that I think that's what his argument would be, but rather just to show that it is still possible to choose either route. I think the usability of the date value representing what it is, rather than having to derive its value from some formula, is a good thing. In cases like e-mail address and SSN (and in fact most cases), I still prefer your route, where there is a surrogate key (IDENTITY) that prevents me from having to cascade changes all over the place, and store larger foreign keys. Firstname + lastname is obviously a bad choice for a key of any kind, because I know more than one Aaron Bertrand. So then you bring middle name into the key, and it can still be repeated. Other things like getting adopted, re-married, legally changing their name, and other reasons why this "key" would change are minor; changes to the key can be dealt with in the database using DRI/CASCADE or, worst case scenario, through rigorous update code; it will be tougher to re-train users to look up all the tables containing "Carmen Bertrand" instead of "Carmen Electra." :-) However, I think the possibility of two people having the same key is a far more compelling argument for bypassing the natural key and placing some meaningless identifier, like IDENTITY, that the user doesn't care about and would never have to change. Now, you might think, "why not bring SSN into the FirstName + MiddleName + LastName key? That would make it unique." Yes, and hideously large. If SSN is unique, then why not just use SSN as the key? Again, it's large even on its own (CHAR(9) and surely to become CHAR(10) in our lifetimes), so I fail to see the benefit of repeating the value in every related table, DRI or not. Sorry about the earful, sometimes I get a little typographical diarrhea. Hopefully that was at least marginally intelligible. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/ "Stijn Verrept" <sverrept (AT) nospan (DOT) vub.ac.be> wrote in message news:#G5HW$TuDHA.1512 (AT) TK2MSFTNGP10 (DOT) phx.gbl... I've read through this thread but I don't understand it. I always use an int or smallint as primary key, with identity. I believe it would be a mess otherwise. Example: I have a table with people, last name, first name, address, ... So suppose you would make a natural key then you need at least the last name and the first name. I have >25 other tables that reference that table. If I get this right I will need to use the name and firstname field in all the other tables as well to reference. Isn't that just a lot of data waste? If I'm missing something, please tell me what because this seems a bit silly. Stijn Verrept. |
#45
| |||
| |||
|
|
Your belief does not alter the correct criteria for choosing a key: simplicity, familiarity and stability. |
|
You have constructed a straw man. One anecdote does not demonstrate or justify a general principle or rule. |
#46
| |||
| |||
|
#47
| |||||||
| |||||||
|
|
I am a firm believer that natural keys should only be used to logically design/normalize the data. When it comes to the real reason for keys, data integrity, more often than not I have seen that natural keys are intrinsically not good physical primary keys. |
|
1. Natural keys are, being natural and therefore user entered [i.e., provided to the database by external means], fungible. If a user enters data, they must also be able to modify it. If data can be modified, then its value as a systemic primary key is gone. Yes, you can cascade updates to these, but why do it when it can be avoided to start with. |
|
2. Natural keys are typically a composite of atomic attributes. If using a composite, these must be propagated to referencing tables as foreign keys. Your normalization drops below par, by having these [potentially] massively duplicated columns. |
|
Attributes that are single, [supposedly] unique attributes (e.g., SSN), usually represent some official, governmentally recognized ID, and therefore have legal issues with being propagated throughout a system. |
|
Also, for amateurs and many professionals, natural keys are very often chosen incorrectly. e.g., I believe some combination of Name and other info has been used by my ISP as their primary key. My last name was entered into their system incorrectly, but they cannot fix it because their system will not allow it. Preposterous and poor design. |
|
Surrogate keys generated by using the identity property are ideal for data integrity, because 1. They are static values [i.e., once entered, it does not change] and the DBA has control over allowing values in identity columns to be modified. |
|
2. They are singleton row ids. |
#48
| |||
| |||
|
|
Do you really allow the same Doctor, Department, etc to appear twice in its table with different keys? If you don't declare unique natural keys then that's the kind of problem you have. An IDENTITY isn't a *surrogate* key at all unless the table also has a natural key - it's just a physical row identifier. |
#49
| |||
| |||
|
|
"David Portas" <REMOVE_BEFORE_REPLYING_dportas (AT) acm (DOT) org> wrote in message news:NvCdnazr-9_dZlCiRVn-vA (AT) giganews (DOT) com... Do you really allow the same Doctor, Department, etc to appear twice in its table with different keys? If you don't declare unique natural keys then that's the kind of problem you have. An IDENTITY isn't a *surrogate* key at all unless the table also has a natural key - it's just a physical row identifier. I never said I allow them to appear twice in the column, you have Unique Constraint for that. I could use that as a natural key, but I prefer using an int or smallint. I don't want to note Name, Firstname, ... in another table as foreign key! Also in the application I don't see me writing: select SN_Active from seniors where (SN_Name = :SNName) and (SN_FirstName = :FirstName) and (SN_BirthDate = :SNBirthDate). Stijn Verrept. How about putting a UNIQUE NOT NULL CLUSTERED constraint on the identity |

#50
| |||
| |||
|
|
How about putting a UNIQUE NOT NULL CLUSTERED constraint on the identity column and putting the PRIMARY KEY NONCLUSTERED constraint on the multi-column primary key? |
![]() |
| Thread Tools | |
| Display Modes | |
| |