dbTalk Databases Forums  

Noob needs polymorphism

comp.databases comp.databases


Discuss Noob needs polymorphism in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
kj
 
Posts: n/a

Default Noob needs polymorphism - 08-20-2005 , 04:00 PM









I'm sure the problem I'm about to describe is *very common* but I
don't know the right terminology, so please bear with me as I
describe it.

I'm working with a huge collection of entities and a huge (and
highly fluid) collection of binary "attributes" that these entities
can have. Multiple entities can have the same attribute, and any
one entity can have multiple attributes, but these tend to be very
few relative to the huge number of possible attributes. (Both the
entities and the attributes run in the hundreds of thousands; think
of a *very* sparse matrix, whose rows represent entities and whose
columns represent attributes, and that a cell of this matrix is 1
or 0 depending on whether the corresponding entity has the
corresponding attribute or not).

One way to efficiently represent this would be to have separate
tables for entities and attributes, and a third "association" table,
like this

ENT_ID ATTRIB_ID
------- ----------
entity1 attribute1
entity1 attribute2
entity1 attribute3
entity2 attribute1
entity2 attribute4
entity3 attribute3
. .
. .
. .

where ENT_ID and ATTRIB_ID are foreign keys referring to the PKs
of entity and attribute tables.

The problem is that I'm dealing with *many* different *classes* of
attributes, so that no single table can efficiently accommodate
all the information I need to keep track of for each class of
attribute. (I.e. the info that is relevant for one attribute class
is not applicable for most of the other classes).

In other words, the second field above must refer not to a particular
attribute table but to a collection of attribute tables.

This problem sounds entirely analogous to the one often encounters
when programming a type-agnostic algorithm in a strongly typed
language. The idea of polymorphism from OOP immediately comes to
mind. Unfortunately, I must solve this problem using a standard
RDB (PostgreSQL, to be more precise).

Is there any way to do this?

Thanks!

kj

P.S. I suppose that in my "association" table I could have multiple
attribute fields, one for each class of attribute, like this

entity1 A_attrib_1 NULL NULL ...
entity1 NULL B_attrib_1 NULL ...
entity1 NULL B_attrib_2 NULL ...
entity2 A_attrib_1 NULL NULL ...
entity2 A_attrib_2 NULL NULL ...
entity2 NULL NULL C_attrib_1 ...
entity3 NULL B_attrib_3 NULL ...
entity3 NULL NULL C_attrib_2 ...
. . . .
. . . .
. . . .

where the second, third, fourth, etc. fields are foreign keys
referring to the tables corresponding to different attribute classes;
each record in the table above has exactly two non-null fields
(note that each entity can have multiple attributes of any one
class, so I can't use one record per entity with multiple non-null
attribute fields).

But this seems to me *extremely* inefficient.

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.

Reply With Quote
  #2  
Old   
Jim Kennedy
 
Posts: n/a

Default Re: Noob needs polymorphism - 08-20-2005 , 07:25 PM







"kj" <socyl (AT) 987jk (DOT) com.invalid> wrote

Quote:


I'm sure the problem I'm about to describe is *very common* but I
don't know the right terminology, so please bear with me as I
describe it.

I'm working with a huge collection of entities and a huge (and
highly fluid) collection of binary "attributes" that these entities
can have. Multiple entities can have the same attribute, and any
one entity can have multiple attributes, but these tend to be very
few relative to the huge number of possible attributes. (Both the
entities and the attributes run in the hundreds of thousands; think
of a *very* sparse matrix, whose rows represent entities and whose
columns represent attributes, and that a cell of this matrix is 1
or 0 depending on whether the corresponding entity has the
corresponding attribute or not).

One way to efficiently represent this would be to have separate
tables for entities and attributes, and a third "association" table,
like this

ENT_ID ATTRIB_ID
------- ----------
entity1 attribute1
entity1 attribute2
entity1 attribute3
entity2 attribute1
entity2 attribute4
entity3 attribute3
. .
. .
. .

where ENT_ID and ATTRIB_ID are foreign keys referring to the PKs
of entity and attribute tables.

The problem is that I'm dealing with *many* different *classes* of
attributes, so that no single table can efficiently accommodate
all the information I need to keep track of for each class of
attribute. (I.e. the info that is relevant for one attribute class
is not applicable for most of the other classes).

In other words, the second field above must refer not to a particular
attribute table but to a collection of attribute tables.

This problem sounds entirely analogous to the one often encounters
when programming a type-agnostic algorithm in a strongly typed
language. The idea of polymorphism from OOP immediately comes to
mind. Unfortunately, I must solve this problem using a standard
RDB (PostgreSQL, to be more precise).

Is there any way to do this?

Thanks!

kj

P.S. I suppose that in my "association" table I could have multiple
attribute fields, one for each class of attribute, like this

entity1 A_attrib_1 NULL NULL ...
entity1 NULL B_attrib_1 NULL ...
entity1 NULL B_attrib_2 NULL ...
entity2 A_attrib_1 NULL NULL ...
entity2 A_attrib_2 NULL NULL ...
entity2 NULL NULL C_attrib_1 ...
entity3 NULL B_attrib_3 NULL ...
entity3 NULL NULL C_attrib_2 ...
. . . .
. . . .
. . . .

where the second, third, fourth, etc. fields are foreign keys
referring to the tables corresponding to different attribute classes;
each record in the table above has exactly two non-null fields
(note that each entity can have multiple attributes of any one
class, so I can't use one record per entity with multiple non-null
attribute fields).

But this seems to me *extremely* inefficient.

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.
I think you will need to revisit the actual requirements, the actual
business need. You are looking at a tangled mess, probably foisted on some
"requirement" thought up by some "clever" programming. State the actual
problem and business requirement you are trying to solve.
Jim




Reply With Quote
  #3  
Old   
kj
 
Posts: n/a

Default Re: Noob needs polymorphism - 08-21-2005 , 12:54 AM





Quote:
I think you will need to revisit the actual requirements, the actual
business need. You are looking at a tangled mess, probably foisted on some
"requirement" thought up by some "clever" programming. State the actual
problem and business requirement you are trying to solve.
I did: we have a huge collection of entities and a HUGE (and growing)
collection of attributes that we need to be able to query. What's
so hard to understand about that?? The typical query is: what
attributes (in attribute class X) does entity Y have.

Trust me: what I wrote ***is*** the problem at had. I cannot make
it any simpler.

And who said "business"?? I work in a genetics research
lab!

kj

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.


Reply With Quote
  #4  
Old   
--CELKO--
 
Posts: n/a

Default Re: Noob needs polymorphism - 08-21-2005 , 08:14 AM



Quote:
Trust me: what I wrote ***is*** the problem at had. I cannot make it any simpler. And who said "business"?? I work in a genetics research lab!
Don't use an RDBMS. This design flaw is called EAV
(Entity-Atrribute-Value) in RDBMS; we assume a known reality from which
we can build a data model. We are not researching in an ever changing
world wher elephants can drop out of the sky.

You probably want to look at some kind of XML tool which defines things
on the fly. Either way, this will be a very slow system for large
amounts of data. That is just the nature of the problem.



Reply With Quote
  #5  
Old   
kj
 
Posts: n/a

Default Re: Noob needs polymorphism - 08-21-2005 , 09:19 AM



In <1124630094.434037.22860 (AT) o13g2000cwo (DOT) googlegroups.com> "--CELKO--" <jcelko212 (AT) earthlink (DOT) net> writes:

Quote:
Don't use an RDBMS. This design flaw is called EAV
(Entity-Atrribute-Value) in RDBMS;
Thanks, this is very helpful.

I'm having a surprisingly difficult time finding a good
description/analysis/critique of EAV online. I get a lot of Google
hits, but none of the sites I visited seemed particularly useful.
(I'm looking something not only for myself, but also to point other
members of my team to. It will be a challenge for me to convince
them that we should abandon the RDBMS strategy.)

Also, my RDBMS textbook (Elmasri & Navathe, 3d ed) does not make
mention of EAV. (I figure that if anyone is going to analyze EAV
and tear it to bits if necessary, it would be EJ Data, but I'm away
from my copy of his book ATM, so I have not checked it yet.)

Do you by any chance know of a book or journal article that discusses
EAV at length, and explains why it is a "flaw"? (Incidentally,
FWIW, many of the pages on EAV that I found through Google advance
it as a good thing...)

kj

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.


Reply With Quote
  #6  
Old   
Jim Kennedy
 
Posts: n/a

Default Re: Noob needs polymorphism - 08-21-2005 , 07:41 PM




"kj" <socyl (AT) 987jk (DOT) com.invalid> wrote

Quote:
In <1124630094.434037.22860 (AT) o13g2000cwo (DOT) googlegroups.com> "--CELKO--"
jcelko212 (AT) earthlink (DOT) net> writes:

Don't use an RDBMS. This design flaw is called EAV
(Entity-Atrribute-Value) in RDBMS;

Thanks, this is very helpful.

I'm having a surprisingly difficult time finding a good
description/analysis/critique of EAV online. I get a lot of Google
hits, but none of the sites I visited seemed particularly useful.
(I'm looking something not only for myself, but also to point other
members of my team to. It will be a challenge for me to convince
them that we should abandon the RDBMS strategy.)

Also, my RDBMS textbook (Elmasri & Navathe, 3d ed) does not make
mention of EAV. (I figure that if anyone is going to analyze EAV
and tear it to bits if necessary, it would be EJ Data, but I'm away
from my copy of his book ATM, so I have not checked it yet.)

Do you by any chance know of a book or journal article that discusses
EAV at length, and explains why it is a "flaw"? (Incidentally,
FWIW, many of the pages on EAV that I found through Google advance
it as a good thing...)

kj

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.
It is like anything else; if you have too much flexibility it becomes
unwieldy. RDBMSs are more designed for stricter entity attribute
definitions. They are real work horses for that. For what you want to do I
don't think most RDBMSs are really designed for what you want to do. You
are going to have a lot of difficulties querying data and performance
problems. Look for something better designed for your problem. Too often
in the business world we see EAV used as a means of sloppy design - ultimate
flexibility, but very difficult to query and scalability problems.
Jim




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.