dbTalk Databases Forums  

Question on Structuring Product Attributes

comp.databases.theory comp.databases.theory


Discuss Question on Structuring Product Attributes in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Yadda
 
Posts: n/a

Default Question on Structuring Product Attributes - 10-03-2011 , 07:54 AM






Hello,

I have a question on how to efficiently implement a product database
table structure. I have products grouped into Categories which is the
highest level grouping by attribute. Products across Categories share
some attributes in some cases, e.g. Diameter and Color. I need to have
each attribute have a proper data type, e.g. Diameter is of type DECIMAL
and Color is of type CHARACTER. I will also have to have standard
stored queries to inquire on these products according to attribute(s).
Over time new product with new types of attributes may be added.

One way is to just define the attribute in the Product table and then
have a table to define what attributes are applicable to a Product
Category. How ever this table field could get quite large then.
However, I don't think breaking attributes out into separate attribute
tables is a good idea because of the stored query requirement and it
would require more structure changes over time.

I would be curious to read others' opinions on this design issue as it
must be very common.

Thank you
Alex

Reply With Quote
  #2  
Old   
-CELKO-
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-04-2011 , 12:24 PM






The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUVs
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Reply With Quote
  #3  
Old   
Derek Asirvadem
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-10-2011 , 07:16 PM



On Oct 5, 4:24*am, -CELKO- <jcelko... (AT) earthlink (DOT) net> wrote:
Quote:
[snip]
Joe

If I may ask, is this method of implementing RI for Exclusive Subtypes
one of your original concepts (like the Nested Sets), or is it someone
else's or general practice or whatever ?

Regards
Derek

Reply With Quote
  #4  
Old   
Roy Hann
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-11-2011 , 02:55 AM



Derek Asirvadem wrote:

Quote:
On Oct 5, 4:24*am, -CELKO- <jcelko... (AT) earthlink (DOT) net> wrote:
[snip]

Joe

If I may ask, is this method of implementing RI for Exclusive Subtypes
one of your original concepts (like the Nested Sets), or is it someone
else's or general practice or whatever ?
Surprisingly to me, Joe is advocating what I would have suggested. It
is entirely respectable, though it may not be all that common.

The common approach is to just cram everything into one table with lots
of nullable attributes to hide the conflation of fact types, then leave
the whole mess under the carpet so that the application code has to
figure out the real business model at run-time, every time.

Explicit support for distributed key constraints in SQL would make the
approach properly robust plus it would, in my opinion, have the
psychological benefit of pointing the way and comforting us that we're
doing it right.

--
Roy

Reply With Quote
  #5  
Old   
Yadda
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-11-2011 , 08:35 AM



on 10/4/11 12:24 PM -CELKO- said the following:
Quote:
The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUVs
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.


Much to contemplate here! Is there a reference on the web or book that
fleshes this concept out? Thanks much.

Reply With Quote
  #6  
Old   
Derek Asirvadem
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-11-2011 , 05:44 PM



Roy

To be clear, I am totally for:
- correct Normalisation, which results in Subtypes
- and certainly a hierarchy, if that is what the exercise produces in
any given situation.
- No Nullable columns, especially not FKs.
- simplicity, which is a classic result of the formal process.
- No "hiding", such that the model is plainly exposed in the catalogue

Credit for the concept or notion of Subtypes goes to Codd & Date; for
the theoretical modelling, to Codd and Chen; and for practical
modelling or implementation specifics, to R Brown.

I appreciate that SQL does not have formal support for Subtypes, as it
does for eg. Declarative Referential Integrity, Foreign Keys (and even
that was added years afterward). Which leads people to
implementations such as this.

This method does have a bit of processing, and of hiding that behind
Views, though.

My question was specifically regarding the method, of implementing RI
for Subtypes. Who deserves credit for this technique ? (I am not
suggesting plagiary here.) When I present this method to others, I
would like to give due credit the original author, as we do when we
discuss Joe's Nested Sets.


Yadda

Quote:
Is there a reference on the web or book that
fleshes this concept out? *Thanks much.
(I assume you mean the concept of Subtypes, since Joe has provided the
subtype RI method in fair detail.)

Well, no. Anyone with a keyboard can "publish" these days. The web
is full of shallow information and misinformation; one has to dig deep
through that morass to find nuggets of truth, and to do that one has
to be able to discriminate between the two, which is not possible for
the seeker. Also, there are many related issues within the one
science; one aspect cannot be excised, held up, and inspected in
isolation from the other aspects (which is the common problem
afflicting any technical discussion on the web). All you need is a
good text book and some formal study. Or work with a mentor over a
period of time.

Subtypes have been with us since the 1980's, when the RM was first
available in product form, and I and many others have used them since
then.

Regards
Derek

Reply With Quote
  #7  
Old   
-CELKO-
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-12-2011 , 10:20 AM



While I love the flattery, I did not invent Nesteds Sets or this class
hierarchy model. I just make tehm popular and publish teh code to use
them.

Reply With Quote
  #8  
Old   
-CELKO-
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-14-2011 , 08:52 PM



Quote:
The common approach is to just cram everything into one table with lots of nullable attributes to hide the conflation of fact types, then leave thewhole mess under the carpet so that the application code has to figure outthe real business model at run-time, every time
LOL! that is unfortunately true.

Reply With Quote
  #9  
Old   
-CELKO-
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-14-2011 , 09:29 PM



Now you are obligated to actually post better techniques and explain
why they are better..

I am not bothered by increasing the number of indexes. You have made
the assumption that references are done by more indexes. Sybase (nee
WATCOM) SQL implements the constraints like
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)

by building pointer chains under the covers, so each compound key
actually appears once in the schema. Teradata uses a hashing
algorithm. RDBMS companies have been working on PK-FK joins for a few
decades, so they are pretty good at it now.

What do you have against the Nested Sets? The only laugh I had with
them was teaching them to XML programmers. Traditional procedural
programmers still think in terms of fake pointer chains, recursive
traversals and other stiff they know from assembly language. But the
XML kids look at it and say "Oh, (lft, rgt) is the tag pairs!" then
wrote really great code by mimicking XQuery.

Reply With Quote
  #10  
Old   
Derek Asirvadem
 
Posts: n/a

Default Re: Question on Structuring Product Attributes - 10-15-2011 , 01:03 AM



On Oct 15, 1:29*pm, -CELKO- <jcelko... (AT) earthlink (DOT) net> wrote:
Quote:
Now you are obligated to actually post better techniques and explain
why they are better..
Ok. Happy to take up the challenge. I have already identified the
"feature" is already available in Relational databases; that it is a
CHECK Constraint, no need for duplicating columns and adding indices.

Do you want me to post the code ? Come on, you are the famous SQL guru
in the family, I am just a little old practitioner with a few happy
customers; your code will probably run circles around mine.

Quote:
I am not bothered by increasing the number of indexes.
Yes, that bothers me, especially adding *totally 100% redundant*
indices. That is probably partly why my customers are happy.

Quote:
Sybase (nee WATCOM)
Actually:
- Their baby DBMS is "SQL Anywhere", which is nee Watcom. It is an
excellent product for a limited market.

- Their flagship prodcut for any size box, is Adaptive Server
Enterprise. It is nee nothing, they designed and wrote it from
scratch. The *lineage* of the original (1984-1987) engineers is
easily traced back to Britton-Lee. And yes, they always had a few
Waterloo grads on staff. brings back memories.

Quote:
You have made
the assumption that references are done by more indexes. SQL implements *the constraints like
*FOREIGN KEY (vin, vehicle_type)
* REFERENCES Vehicles(vin, vehicle_type)

by building pointer chains under the covers, so each compound key
actually appears once in the schema.
Excuse me, but:

a. You posted SQL. I answered SQL. In order to maintain focus, avoid
tangents, and close this subject like gentlemen, let's talk SQL
only.

b. SQL is a standard, anything that does not comply with the Standard
cannot make the claim to be an SQL. The SQL standard requires that
all references in the database are by *KEY*, and specifically *NOT* by
pointers.

b. I made no assumptions. The SQL code that *you provided*
specifically creates two indices:
Quote:
CREATE TABLE SUVs
(vin CHAR(17) NOT NULL
PRIMARY KEY, -- Index on (vin) for SUV.PK
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type), -- Dupe Index [1] on (vin, vehicle_type)
FOREIGN KEY (vin, vehicle_type) -- no suggestion that this FK is anadditional index
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);
d. The SQL code that *you provided* specifically identifies data
columns, not pointers. Therefore, above the covers, there are two
indices, visible, with data columns, visible. No pointers. What a
product does under the covers is tangential.

e. The issue I have is above the covers:

e.1 Index[1] is a 110% duplicate of Index SUV.PK.

e.2 (vin) is unique, so (vin) plus anything will be unique. It has no
value (except to support this monstrous structure, which is not reqd
to ensure Exclusive Subtype CHECK).

e.3 There is also another, additional, 110% redundant index in the
parent, to provide the key for the REFERENCED key in the child/ Your
SQL code again:

Quote:
CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL
PRIMARY KEY, -- Index on (vin) for Vehicle.PK
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type), -- Dupe Index [2] on (vin, vehicle_type)
..);
e.4

Quote:
so each compound key actually appears once in the schema.
No. It appears once for each type you declare a CONSTRAINT, not once
in the schema. SQL does not normalise the schema (wish that it
would!).

Quote:
Teradata uses a hashing algorithm.
Teradata is not SQL.

Sybase, MS, DB2, all create Standard SQL visible indices from data
columns, from the SQL command that you supplied.

Oracle is non-compliant, in that it twists the SQL definitions in
order to appear compliant. It famously uses ROWIDs, above and below
the covers, but the indices and data columns are visible, above the
covers.

Quote:
RDBMS companies have been working on PK-FK joins for a few
decades, so they are pretty good at it now.
Agreed. But the Foreign *KEYS* that you declared in your SQL code are
still ordinary visible data columns, not pointers or pointer chains.

Quote:
What do you have against the Nested Sets?
There is a recent thread on that subject. I will take it up there.

Let's close this thread here.

Ambler and Fowler are famous for their "straw man" arguments,
suggesting that the RM or RDBMS can't do X, then proposing some
monstrous method of doing X. AFAIK, you do not do that. SQL, and any
RDBMS supplying SQL, already has everything required to ensure the
Referential Integrity of an Exclusive Subtype, with zero overhead. The
problem does not exist (we can ensure RI for Exclusive Subtypes),
therefore we do not need the "solution" of the additional structure.

Although you did not present the "straw man" intro to this article,
you did imply that Exclusive Subtypes cannot be verified in SQL (which
is not correct), and then promoted this structure. I do not
understand why, when there is a perfectly simple, zero-overhead
Exclusive Subtype CHECK Constraint, you Mr Celko, the famous SQL guru,
would be promoting this non-SQL "class hierarchy" structure, which is
massively inefficient, due to the number of totally redundant columns
and redundant indices it requires.

Response please.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.