dbTalk Databases Forums  

Suggestions for refactoring unusual tables

comp.databases comp.databases


Discuss Suggestions for refactoring unusual tables in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Conrad Lender
 
Posts: n/a

Default Suggestions for refactoring unusual tables - 09-09-2008 , 10:18 PM






Hello.

I'm looking at a medium/large web application with 72 tables on a
company intranet. Two of the tables are unusual; they appear to have
been created to avoid having too many additional (very small) tables in
the database. The rough structure of these tables is like this:

Table "catalog":
----------------
id (int, primary key)
systext (string)
dshort (string)
dlong (string)

Table "catalog_entry":
----------------------
id (int, primary key)
catalog_id (int, references catalog)
systext (string)
dshort (string)
dlong (string)
rank (int)

"systext" is an alternate unique human-readable identifier for the
record, so that it can be referenced by name from SQL queries.
"dshort" and "dlong" are short and long labels, respectively.
"rank" is used to determine in which order the catalog_entry records
should be displayed.

Most of the other tables in the schema reference catalog_entry, for
various purposes (such as classification, current status, etc).
For example:

Table "contact":
----------------
id (int, primary key)
name (string)
....
type_id (int, references catalog_entry)
language_id (int, references catalog_entry)
billing_type_id (int, references catalog_entry

When the users edit a record, they usually see a dropdown box to select
one of the entries in a catalog. That's where the "dshort", "dlong", and
"rank" fields are used.

There are 63 catalogs. Only 6 of them contain 20+ entries (max. is 57),
half of them contain 5 entries or less (min. is 2). Here is some example
data from catalog_entry:

SELECT cat.systext, entry.systext
FROM catalog cat
JOIN catalog_entry entry ON entry.catalog_id = cat.id
WHERE cat.systext IN ('coverage', 'document_type');

cat.systext | entry.systext
---------------+------------------------
coverage | international
coverage | national
coverage | regional
document_type | notice_of_registration
document_type | reminder_1
document_type | reminder_2
( ... 25 more results ... )
document_type | publication_invoice
document_type | registragion_expired

My first thought was this looks unhealthy, and that every catalog should
be a separate table. That way the referencial integrity checks could be
enforced by the database - as it is, it would not be a foreign key
violation if the "language_id" value from the "contact" table pointed to
a catalog_entry that is actually part of the "document_type" catalog.

However, I hesitate to break this up, not just because that would almost
double the number of tables, or because it would mean a lot of
restructuring, but also because most of these new tables would be *very*
small (5 rows or less), and they would all have the same columns.

To make matters worse, catalogs can (optionally) be organized
hierarchically - some catalogs have one or more parent catalogs (there
is a separate table for this n:m relationship).

I'm not sure how to approach this. Is that part of the database design
"good enough" to be left alone, or should I break it up? If so, is it
really usual or desirable to have lots of mini-tables that all look
alike, and only have 2-5 rows each?
On the other hand, I suppose I could solve the current weakness
concerning referential integrity with column constraints, so that
shouldn't be a problem.

Any suggestions or hints would be very appreciated.


TIA,
stefan

Reply With Quote
  #2  
Old   
--CELKO--
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-16-2008 , 08:23 AM






Quote:
Any suggestions or hints would be very appreciated.
This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.





Reply With Quote
  #3  
Old   
--CELKO--
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-16-2008 , 08:23 AM



Quote:
Any suggestions or hints would be very appreciated.
This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.





Reply With Quote
  #4  
Old   
--CELKO--
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-16-2008 , 08:23 AM



Quote:
Any suggestions or hints would be very appreciated.
This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.





Reply With Quote
  #5  
Old   
Conrad Lender
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-19-2008 , 12:23 PM



On 2008-09-16 15:23, --CELKO-- wrote:
Quote:
Any suggestions or hints would be very appreciated.

This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.
Thanks for your reply. I have never implemented an EAV design myself,
but I'm somewhat familiar with what it means, and what the main problems
are. I wasn't aware that the design that I described would also qualify
as EAV. It looked more like a concatenation of a number of small tables
with identical layout into one larger table (like categories). Orphans
would not be possible in this case, but bad references could happen.
After 4 years in use, I've only ever seen that happen during development
- I guess we were lucky.

I agree that it doesn't look good. I did not design that database
myself, I only write the software that uses it. Would it really be
better to split the catalog_entries table into 63 very small tables, like:

print_status:
id systext dshort dlong
-----------------------------------------------------
1 "pending" "Queued" "Queued to be sent to the printer"
2 "error" "Error" "Print job had errors"
3 "done" "Printed" "Document was printed without errors"

renewal_type:
id systext dshort dlong
-----------------------------------------------------
1 "expiration" "Expiration" "Date of expiration"
2 "grant" "Issuance" "Date of issuance"

(etc.)

All of these 63 tables would look exactly alike; some of them really
only have two rows, others have 40-50 rows. Simple enumeration wouldn't
work, because the text fields dshort and dlong are necessary and can be
edited by (some) users.

What are the best practices of dealing with very small tables like
these? I assume that they must exist in some larger data models (I heard
that the Siebel CRM uses 1500+ tables).


- Conrad


Reply With Quote
  #6  
Old   
Conrad Lender
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-19-2008 , 12:23 PM



On 2008-09-16 15:23, --CELKO-- wrote:
Quote:
Any suggestions or hints would be very appreciated.

This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.
Thanks for your reply. I have never implemented an EAV design myself,
but I'm somewhat familiar with what it means, and what the main problems
are. I wasn't aware that the design that I described would also qualify
as EAV. It looked more like a concatenation of a number of small tables
with identical layout into one larger table (like categories). Orphans
would not be possible in this case, but bad references could happen.
After 4 years in use, I've only ever seen that happen during development
- I guess we were lucky.

I agree that it doesn't look good. I did not design that database
myself, I only write the software that uses it. Would it really be
better to split the catalog_entries table into 63 very small tables, like:

print_status:
id systext dshort dlong
-----------------------------------------------------
1 "pending" "Queued" "Queued to be sent to the printer"
2 "error" "Error" "Print job had errors"
3 "done" "Printed" "Document was printed without errors"

renewal_type:
id systext dshort dlong
-----------------------------------------------------
1 "expiration" "Expiration" "Date of expiration"
2 "grant" "Issuance" "Date of issuance"

(etc.)

All of these 63 tables would look exactly alike; some of them really
only have two rows, others have 40-50 rows. Simple enumeration wouldn't
work, because the text fields dshort and dlong are necessary and can be
edited by (some) users.

What are the best practices of dealing with very small tables like
these? I assume that they must exist in some larger data models (I heard
that the Siebel CRM uses 1500+ tables).


- Conrad


Reply With Quote
  #7  
Old   
Conrad Lender
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-19-2008 , 12:23 PM



On 2008-09-16 15:23, --CELKO-- wrote:
Quote:
Any suggestions or hints would be very appreciated.

This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.
Thanks for your reply. I have never implemented an EAV design myself,
but I'm somewhat familiar with what it means, and what the main problems
are. I wasn't aware that the design that I described would also qualify
as EAV. It looked more like a concatenation of a number of small tables
with identical layout into one larger table (like categories). Orphans
would not be possible in this case, but bad references could happen.
After 4 years in use, I've only ever seen that happen during development
- I guess we were lucky.

I agree that it doesn't look good. I did not design that database
myself, I only write the software that uses it. Would it really be
better to split the catalog_entries table into 63 very small tables, like:

print_status:
id systext dshort dlong
-----------------------------------------------------
1 "pending" "Queued" "Queued to be sent to the printer"
2 "error" "Error" "Print job had errors"
3 "done" "Printed" "Document was printed without errors"

renewal_type:
id systext dshort dlong
-----------------------------------------------------
1 "expiration" "Expiration" "Date of expiration"
2 "grant" "Issuance" "Date of issuance"

(etc.)

All of these 63 tables would look exactly alike; some of them really
only have two rows, others have 40-50 rows. Simple enumeration wouldn't
work, because the text fields dshort and dlong are necessary and can be
edited by (some) users.

What are the best practices of dealing with very small tables like
these? I assume that they must exist in some larger data models (I heard
that the Siebel CRM uses 1500+ tables).


- Conrad


Reply With Quote
  #8  
Old   
Leif Neland
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-23-2008 , 08:31 AM



Conrad Lender skrev:
Quote:
On 2008-09-16 15:23, --CELKO-- wrote:
Any suggestions or hints would be very appreciated.
This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.
No need to throw a perfectly good design out because of programmer
incompetence. Better throw the programmer away :-)
Quote:
Thanks for your reply. I have never implemented an EAV design myself,
but I'm somewhat familiar with what it means, and what the main problems
are. I wasn't aware that the design that I described would also qualify
as EAV. It looked more like a concatenation of a number of small tables
with identical layout into one larger table (like categories). Orphans
would not be possible in this case, but bad references could happen.
After 4 years in use, I've only ever seen that happen during development
- I guess we were lucky.
If the programming is done right, and if data-manipulation is done
through a few carefully tested proceures, like

catalog_id = catalog_add(catalog,dshort,dlong)
catalogEntry_id =catalogEntry_add(catalog_id,systext,dshort,dlong, rank)
getCatalogByName(catalogname)

etc, orphans shouldn't happen.

Also, it appears to me that catalog and catalog_entry is only populated
during th development, not later. Later only dshort and dlong is changed.
Quote:
I agree that it doesn't look good. I did not design that database
myself, I only write the software that uses it. Would it really be
better to split the catalog_entries table into 63 very small tables, like:
Why doesn't it look good? Now, you understand what's going on, can you
give a reason for not using EAV?

Quote:
All of these 63 tables would look exactly alike; some of them really
only have two rows, others have 40-50 rows. Simple enumeration wouldn't
work, because the text fields dshort and dlong are necessary and can be
edited by (some) users.
Horrible idea. Would you also create 63 routines for editing each of the
63 tables?

Leif


Reply With Quote
  #9  
Old   
Leif Neland
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-23-2008 , 08:31 AM



Conrad Lender skrev:
Quote:
On 2008-09-16 15:23, --CELKO-- wrote:
Any suggestions or hints would be very appreciated.
This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.
No need to throw a perfectly good design out because of programmer
incompetence. Better throw the programmer away :-)
Quote:
Thanks for your reply. I have never implemented an EAV design myself,
but I'm somewhat familiar with what it means, and what the main problems
are. I wasn't aware that the design that I described would also qualify
as EAV. It looked more like a concatenation of a number of small tables
with identical layout into one larger table (like categories). Orphans
would not be possible in this case, but bad references could happen.
After 4 years in use, I've only ever seen that happen during development
- I guess we were lucky.
If the programming is done right, and if data-manipulation is done
through a few carefully tested proceures, like

catalog_id = catalog_add(catalog,dshort,dlong)
catalogEntry_id =catalogEntry_add(catalog_id,systext,dshort,dlong, rank)
getCatalogByName(catalogname)

etc, orphans shouldn't happen.

Also, it appears to me that catalog and catalog_entry is only populated
during th development, not later. Later only dshort and dlong is changed.
Quote:
I agree that it doesn't look good. I did not design that database
myself, I only write the software that uses it. Would it really be
better to split the catalog_entries table into 63 very small tables, like:
Why doesn't it look good? Now, you understand what's going on, can you
give a reason for not using EAV?

Quote:
All of these 63 tables would look exactly alike; some of them really
only have two rows, others have 40-50 rows. Simple enumeration wouldn't
work, because the text fields dshort and dlong are necessary and can be
edited by (some) users.
Horrible idea. Would you also create 63 routines for editing each of the
63 tables?

Leif


Reply With Quote
  #10  
Old   
Leif Neland
 
Posts: n/a

Default Re: Suggestions for refactoring unusual tables - 09-23-2008 , 08:31 AM



Conrad Lender skrev:
Quote:
On 2008-09-16 15:23, --CELKO-- wrote:
Any suggestions or hints would be very appreciated.
This nightmare is called an EAV design; you can Google for the
painful details. Youn will have to throw it all out and start over
with a relational design. But if this has been used for about a year,
you can expect to have orphans and bad data everywhere.
No need to throw a perfectly good design out because of programmer
incompetence. Better throw the programmer away :-)
Quote:
Thanks for your reply. I have never implemented an EAV design myself,
but I'm somewhat familiar with what it means, and what the main problems
are. I wasn't aware that the design that I described would also qualify
as EAV. It looked more like a concatenation of a number of small tables
with identical layout into one larger table (like categories). Orphans
would not be possible in this case, but bad references could happen.
After 4 years in use, I've only ever seen that happen during development
- I guess we were lucky.
If the programming is done right, and if data-manipulation is done
through a few carefully tested proceures, like

catalog_id = catalog_add(catalog,dshort,dlong)
catalogEntry_id =catalogEntry_add(catalog_id,systext,dshort,dlong, rank)
getCatalogByName(catalogname)

etc, orphans shouldn't happen.

Also, it appears to me that catalog and catalog_entry is only populated
during th development, not later. Later only dshort and dlong is changed.
Quote:
I agree that it doesn't look good. I did not design that database
myself, I only write the software that uses it. Would it really be
better to split the catalog_entries table into 63 very small tables, like:
Why doesn't it look good? Now, you understand what's going on, can you
give a reason for not using EAV?

Quote:
All of these 63 tables would look exactly alike; some of them really
only have two rows, others have 40-50 rows. Simple enumeration wouldn't
work, because the text fields dshort and dlong are necessary and can be
edited by (some) users.
Horrible idea. Would you also create 63 routines for editing each of the
63 tables?

Leif


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.