dbTalk Databases Forums  

Normalizing a common ID type shared across tables

comp.databases comp.databases


Discuss Normalizing a common ID type shared across tables in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ovid
 
Posts: n/a

Default Normalizing a common ID type shared across tables - 10-21-2008 , 05:59 AM






This is a simplified version of the problem.

We are required by external customers to have several "public" ids
they can query our data by. For simplicity, we'll call them "pid" and
"crid". We have an "entity" table which stores this information.
Something like this:

entity
--
entity_id
pid
crid
musicbrainz_id

Then we have separate entities such as "episode", "series" and
"broadcast". Each of these has an entity_id pointing to the entity
table.

How can external customers search, via pid or crid and get the
appropriate episode or series, along with proper identification of
what it is? Given a pid, we can fetch the entity id, but then we need
to search the episode, series and broadcast tables for this value
(actually, we need to search a bunch more). Further, not all ids will
necessarily be related to all of the other tables.

Strategies:

1. Find the entity id for a pid and search every other table for the
pid.
2. Put an "entity_type" column on entity, but what if it's a pid in
the episode table but we accidentally set episode.type as series? We
don't want to duplicate data and I don't want to put database metadata
into column values.

Option number 1 is slow and seems wrong (further, the various tables
have different structures making problematic).

Option 2 means duplicate data and this data can get out of synch. We
can use triggers for force this, but this seems really nasty and, in
any event, bugs in the implementation of mysql triggers have hit us
several times.

What's option 3?

Cheers,
Ovid

Reply With Quote
  #2  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 10-21-2008 , 12:28 PM






On Oct 21, 5:59*am, Ovid <curtis.... (AT) gmail (DOT) com> wrote:
Quote:
This is a simplified version of the problem.

We are required by external customers to have several "public" ids
they can query our data by. *For simplicity, we'll call them "pid" and
"crid". *We have an "entity" table which stores this information.
Something like this:

* entity
* --
* entity_id
* pid
* crid
* musicbrainz_id

Then we have separate entities such as "episode", "series" and
"broadcast". *Each of these has an entity_id pointing to the entity
table.

How can external customers search, via pid or crid and get the
appropriate episode or series, along with proper identification of
what it is? *Given a pid, we can fetch the entity id, but then we need
to search the episode, series and broadcast tables for this value
(actually, we need to search a bunch more). *Further, not all ids will
necessarily be related to all of the other tables.

Strategies:

1. *Find the entity id for a pid and search every other table for the
pid.
2. *Put an "entity_type" column on entity, but what if it's a pid in
the episode table but we accidentally set episode.type as series? *We
don't want to duplicate data and I don't want to put database metadata
into column values.

Option number 1 is slow and seems wrong (further, the various tables
have different structures making problematic).
why is it slow? Are you literally selecting from ALL the other tables
at once?

Quote:
Option 2 means duplicate data and this data can get out of synch. *We
can use triggers for force this, but this seems really nasty and, in
any event, bugs in the implementation of mysql triggers have hit us
several times.
Where the hell does entity type come in??? How does that help?
What are you really trying to do?

Quote:
What's option 3?
materialized views? Oh wait mysql doesn't have those does it?

Quote:
Cheers,
Ovid

If you created a view of the other tables (and this is a WAG since you
give so little info on the data model and what you really are trying
to do), for example:

create view cv_episode --- customer view of episode table
as
select entity.pid, episode.(columns other than entity_id)
from episode , entity
where episode.entity_id = entity.entity_id; --- assumes entities that
are episodes have pid values that match.
--- in fact this makes the BIG assumption that episode has a pid
column, since you did not say
--- if it had pid or crid (whatever the hell those IDs really are
other than user exposed values.)

So the customer can do this:

select * from cv_episode where pid='xyz5678';

But I will say it sounds like you really need to hire a databasebrainz
entity. 8^)
(mumbles something about Codd and not exposing surrogate keys)

Ed


Reply With Quote
  #3  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 10-21-2008 , 12:28 PM



On Oct 21, 5:59*am, Ovid <curtis.... (AT) gmail (DOT) com> wrote:
Quote:
This is a simplified version of the problem.

We are required by external customers to have several "public" ids
they can query our data by. *For simplicity, we'll call them "pid" and
"crid". *We have an "entity" table which stores this information.
Something like this:

* entity
* --
* entity_id
* pid
* crid
* musicbrainz_id

Then we have separate entities such as "episode", "series" and
"broadcast". *Each of these has an entity_id pointing to the entity
table.

How can external customers search, via pid or crid and get the
appropriate episode or series, along with proper identification of
what it is? *Given a pid, we can fetch the entity id, but then we need
to search the episode, series and broadcast tables for this value
(actually, we need to search a bunch more). *Further, not all ids will
necessarily be related to all of the other tables.

Strategies:

1. *Find the entity id for a pid and search every other table for the
pid.
2. *Put an "entity_type" column on entity, but what if it's a pid in
the episode table but we accidentally set episode.type as series? *We
don't want to duplicate data and I don't want to put database metadata
into column values.

Option number 1 is slow and seems wrong (further, the various tables
have different structures making problematic).
why is it slow? Are you literally selecting from ALL the other tables
at once?

Quote:
Option 2 means duplicate data and this data can get out of synch. *We
can use triggers for force this, but this seems really nasty and, in
any event, bugs in the implementation of mysql triggers have hit us
several times.
Where the hell does entity type come in??? How does that help?
What are you really trying to do?

Quote:
What's option 3?
materialized views? Oh wait mysql doesn't have those does it?

Quote:
Cheers,
Ovid

If you created a view of the other tables (and this is a WAG since you
give so little info on the data model and what you really are trying
to do), for example:

create view cv_episode --- customer view of episode table
as
select entity.pid, episode.(columns other than entity_id)
from episode , entity
where episode.entity_id = entity.entity_id; --- assumes entities that
are episodes have pid values that match.
--- in fact this makes the BIG assumption that episode has a pid
column, since you did not say
--- if it had pid or crid (whatever the hell those IDs really are
other than user exposed values.)

So the customer can do this:

select * from cv_episode where pid='xyz5678';

But I will say it sounds like you really need to hire a databasebrainz
entity. 8^)
(mumbles something about Codd and not exposing surrogate keys)

Ed


Reply With Quote
  #4  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 10-21-2008 , 12:28 PM



On Oct 21, 5:59*am, Ovid <curtis.... (AT) gmail (DOT) com> wrote:
Quote:
This is a simplified version of the problem.

We are required by external customers to have several "public" ids
they can query our data by. *For simplicity, we'll call them "pid" and
"crid". *We have an "entity" table which stores this information.
Something like this:

* entity
* --
* entity_id
* pid
* crid
* musicbrainz_id

Then we have separate entities such as "episode", "series" and
"broadcast". *Each of these has an entity_id pointing to the entity
table.

How can external customers search, via pid or crid and get the
appropriate episode or series, along with proper identification of
what it is? *Given a pid, we can fetch the entity id, but then we need
to search the episode, series and broadcast tables for this value
(actually, we need to search a bunch more). *Further, not all ids will
necessarily be related to all of the other tables.

Strategies:

1. *Find the entity id for a pid and search every other table for the
pid.
2. *Put an "entity_type" column on entity, but what if it's a pid in
the episode table but we accidentally set episode.type as series? *We
don't want to duplicate data and I don't want to put database metadata
into column values.

Option number 1 is slow and seems wrong (further, the various tables
have different structures making problematic).
why is it slow? Are you literally selecting from ALL the other tables
at once?

Quote:
Option 2 means duplicate data and this data can get out of synch. *We
can use triggers for force this, but this seems really nasty and, in
any event, bugs in the implementation of mysql triggers have hit us
several times.
Where the hell does entity type come in??? How does that help?
What are you really trying to do?

Quote:
What's option 3?
materialized views? Oh wait mysql doesn't have those does it?

Quote:
Cheers,
Ovid

If you created a view of the other tables (and this is a WAG since you
give so little info on the data model and what you really are trying
to do), for example:

create view cv_episode --- customer view of episode table
as
select entity.pid, episode.(columns other than entity_id)
from episode , entity
where episode.entity_id = entity.entity_id; --- assumes entities that
are episodes have pid values that match.
--- in fact this makes the BIG assumption that episode has a pid
column, since you did not say
--- if it had pid or crid (whatever the hell those IDs really are
other than user exposed values.)

So the customer can do this:

select * from cv_episode where pid='xyz5678';

But I will say it sounds like you really need to hire a databasebrainz
entity. 8^)
(mumbles something about Codd and not exposing surrogate keys)

Ed


Reply With Quote
  #5  
Old   
--CELKO--
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 10-22-2008 , 08:46 AM



This is pretty vague. Could you possibly be more generic than
"entity_id", which is impossible or absurd in RDBMS. Is this the
Khabala number that God assigns to all of his creations? The
"entity_type" is really weird, too. Is this a Dewey Decimal
Classification on steroids?

The specifics you gave --"episode", "series" and "broadcast" seem to
be in the television industry. You did not give us even that much
information. They got some industry standard identifiers for
television shows several years, but I cannot remember what they are
called.


Reply With Quote
  #6  
Old   
--CELKO--
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 10-22-2008 , 08:46 AM



This is pretty vague. Could you possibly be more generic than
"entity_id", which is impossible or absurd in RDBMS. Is this the
Khabala number that God assigns to all of his creations? The
"entity_type" is really weird, too. Is this a Dewey Decimal
Classification on steroids?

The specifics you gave --"episode", "series" and "broadcast" seem to
be in the television industry. You did not give us even that much
information. They got some industry standard identifiers for
television shows several years, but I cannot remember what they are
called.


Reply With Quote
  #7  
Old   
--CELKO--
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 10-22-2008 , 08:46 AM



This is pretty vague. Could you possibly be more generic than
"entity_id", which is impossible or absurd in RDBMS. Is this the
Khabala number that God assigns to all of his creations? The
"entity_type" is really weird, too. Is this a Dewey Decimal
Classification on steroids?

The specifics you gave --"episode", "series" and "broadcast" seem to
be in the television industry. You did not give us even that much
information. They got some industry standard identifiers for
television shows several years, but I cannot remember what they are
called.


Reply With Quote
  #8  
Old   
Todd
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 11-01-2008 , 12:29 PM



On Oct 21, 5:59 am, Ovid <curtis.... (AT) gmail (DOT) com> wrote:
Quote:
This is a simplified version of the problem.

We are required by external customers to have several "public" ids
they can query our data by. For simplicity, we'll call them "pid" and
"crid". We have an "entity" table which stores this information.
Something like this:

entity
--
entity_id
pid
crid
musicbrainz_id

Then we have separate entities such as "episode", "series" and
"broadcast". Each of these has an entity_id pointing to the entity
table.

How can external customers search, via pid or crid and get the
appropriate episode or series, along with proper identification of
what it is? Given a pid, we can fetch the entity id, but then we need
to search the episode, series and broadcast tables for this value
(actually, we need to search a bunch more). Further, not all ids will
necessarily be related to all of the other tables.

Strategies:

1. Find the entity id for a pid and search every other table for the
pid.
2. Put an "entity_type" column on entity, but what if it's a pid in
the episode table but we accidentally set episode.type as series? We
don't want to duplicate data and I don't want to put database metadata
into column values.

Option number 1 is slow and seems wrong (further, the various tables
have different structures making problematic).

Option 2 means duplicate data and this data can get out of synch. We
can use triggers for force this, but this seems really nasty and, in
any event, bugs in the implementation of mysql triggers have hit us
several times.

What's option 3?

Cheers,
Ovid
I'm no DB expert, per se, but I think Ed is right on this. Create a
view if you can.


Reply With Quote
  #9  
Old   
Todd
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 11-01-2008 , 12:29 PM



On Oct 21, 5:59 am, Ovid <curtis.... (AT) gmail (DOT) com> wrote:
Quote:
This is a simplified version of the problem.

We are required by external customers to have several "public" ids
they can query our data by. For simplicity, we'll call them "pid" and
"crid". We have an "entity" table which stores this information.
Something like this:

entity
--
entity_id
pid
crid
musicbrainz_id

Then we have separate entities such as "episode", "series" and
"broadcast". Each of these has an entity_id pointing to the entity
table.

How can external customers search, via pid or crid and get the
appropriate episode or series, along with proper identification of
what it is? Given a pid, we can fetch the entity id, but then we need
to search the episode, series and broadcast tables for this value
(actually, we need to search a bunch more). Further, not all ids will
necessarily be related to all of the other tables.

Strategies:

1. Find the entity id for a pid and search every other table for the
pid.
2. Put an "entity_type" column on entity, but what if it's a pid in
the episode table but we accidentally set episode.type as series? We
don't want to duplicate data and I don't want to put database metadata
into column values.

Option number 1 is slow and seems wrong (further, the various tables
have different structures making problematic).

Option 2 means duplicate data and this data can get out of synch. We
can use triggers for force this, but this seems really nasty and, in
any event, bugs in the implementation of mysql triggers have hit us
several times.

What's option 3?

Cheers,
Ovid
I'm no DB expert, per se, but I think Ed is right on this. Create a
view if you can.


Reply With Quote
  #10  
Old   
Todd
 
Posts: n/a

Default Re: Normalizing a common ID type shared across tables - 11-01-2008 , 12:29 PM



On Oct 21, 5:59 am, Ovid <curtis.... (AT) gmail (DOT) com> wrote:
Quote:
This is a simplified version of the problem.

We are required by external customers to have several "public" ids
they can query our data by. For simplicity, we'll call them "pid" and
"crid". We have an "entity" table which stores this information.
Something like this:

entity
--
entity_id
pid
crid
musicbrainz_id

Then we have separate entities such as "episode", "series" and
"broadcast". Each of these has an entity_id pointing to the entity
table.

How can external customers search, via pid or crid and get the
appropriate episode or series, along with proper identification of
what it is? Given a pid, we can fetch the entity id, but then we need
to search the episode, series and broadcast tables for this value
(actually, we need to search a bunch more). Further, not all ids will
necessarily be related to all of the other tables.

Strategies:

1. Find the entity id for a pid and search every other table for the
pid.
2. Put an "entity_type" column on entity, but what if it's a pid in
the episode table but we accidentally set episode.type as series? We
don't want to duplicate data and I don't want to put database metadata
into column values.

Option number 1 is slow and seems wrong (further, the various tables
have different structures making problematic).

Option 2 means duplicate data and this data can get out of synch. We
can use triggers for force this, but this seems really nasty and, in
any event, bugs in the implementation of mysql triggers have hit us
several times.

What's option 3?

Cheers,
Ovid
I'm no DB expert, per se, but I think Ed is right on this. Create a
view if you can.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.