![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
| I'm sure the problem I'm about to describe is *very common* but I don't know the right terminology, so please bear with me as I describe it. I'm working with a huge collection of entities and a huge (and highly fluid) collection of binary "attributes" that these entities can have. Multiple entities can have the same attribute, and any one entity can have multiple attributes, but these tend to be very few relative to the huge number of possible attributes. (Both the entities and the attributes run in the hundreds of thousands; think of a *very* sparse matrix, whose rows represent entities and whose columns represent attributes, and that a cell of this matrix is 1 or 0 depending on whether the corresponding entity has the corresponding attribute or not). One way to efficiently represent this would be to have separate tables for entities and attributes, and a third "association" table, like this ENT_ID ATTRIB_ID ------- ---------- entity1 attribute1 entity1 attribute2 entity1 attribute3 entity2 attribute1 entity2 attribute4 entity3 attribute3 . . . . . . where ENT_ID and ATTRIB_ID are foreign keys referring to the PKs of entity and attribute tables. The problem is that I'm dealing with *many* different *classes* of attributes, so that no single table can efficiently accommodate all the information I need to keep track of for each class of attribute. (I.e. the info that is relevant for one attribute class is not applicable for most of the other classes). In other words, the second field above must refer not to a particular attribute table but to a collection of attribute tables. This problem sounds entirely analogous to the one often encounters when programming a type-agnostic algorithm in a strongly typed language. The idea of polymorphism from OOP immediately comes to mind. Unfortunately, I must solve this problem using a standard RDB (PostgreSQL, to be more precise). Is there any way to do this? Thanks! kj P.S. I suppose that in my "association" table I could have multiple attribute fields, one for each class of attribute, like this entity1 A_attrib_1 NULL NULL ... entity1 NULL B_attrib_1 NULL ... entity1 NULL B_attrib_2 NULL ... entity2 A_attrib_1 NULL NULL ... entity2 A_attrib_2 NULL NULL ... entity2 NULL NULL C_attrib_1 ... entity3 NULL B_attrib_3 NULL ... entity3 NULL NULL C_attrib_2 ... . . . . . . . . . . . . where the second, third, fourth, etc. fields are foreign keys referring to the tables corresponding to different attribute classes; each record in the table above has exactly two non-null fields (note that each entity can have multiple attributes of any one class, so I can't use one record per entity with multiple non-null attribute fields). But this seems to me *extremely* inefficient. -- NOTE: In my address everything before the first period is backwards; and the last period, and everything after it, should be discarded. |
#3
| |||
| |||
|
|
I think you will need to revisit the actual requirements, the actual business need. You are looking at a tangled mess, probably foisted on some "requirement" thought up by some "clever" programming. State the actual problem and business requirement you are trying to solve. |
#4
| |||
| |||
|
|
Trust me: what I wrote ***is*** the problem at had. I cannot make it any simpler. And who said "business"?? I work in a genetics research lab! |
#5
| |||
| |||
|
|
Don't use an RDBMS. This design flaw is called EAV (Entity-Atrribute-Value) in RDBMS; |
#6
| |||
| |||
|
|
In <1124630094.434037.22860 (AT) o13g2000cwo (DOT) googlegroups.com> "--CELKO--" jcelko212 (AT) earthlink (DOT) net> writes: Don't use an RDBMS. This design flaw is called EAV (Entity-Atrribute-Value) in RDBMS; Thanks, this is very helpful. I'm having a surprisingly difficult time finding a good description/analysis/critique of EAV online. I get a lot of Google hits, but none of the sites I visited seemed particularly useful. (I'm looking something not only for myself, but also to point other members of my team to. It will be a challenge for me to convince them that we should abandon the RDBMS strategy.) Also, my RDBMS textbook (Elmasri & Navathe, 3d ed) does not make mention of EAV. (I figure that if anyone is going to analyze EAV and tear it to bits if necessary, it would be EJ Data, but I'm away from my copy of his book ATM, so I have not checked it yet.) Do you by any chance know of a book or journal article that discusses EAV at length, and explains why it is a "flaw"? (Incidentally, FWIW, many of the pages on EAV that I found through Google advance it as a good thing...) kj -- NOTE: In my address everything before the first period is backwards; and the last period, and everything after it, should be discarded. |
![]() |
| Thread Tools | |
| Display Modes | |
| |