dbTalk Databases Forums  

Design question

comp.databases.theory comp.databases.theory


Discuss Design question in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mark S. (UK)
 
Posts: n/a

Default Design question - 01-24-2009 , 10:23 AM






Hi,
I'm currently working on a system that consists of several sites. Each
site runs a system with its own local database, and then periodically
each site updates the "master" site with its data, allow the data to be
viewed centrally.

It must be done like this because each site may not always be on the
network, but the data must always be available from the central server.

To enable the master database to contain all the site database
information, I use a composite key, with the first part being the site
identification, and the second part being the row identification. This
allows user with row id 5 for instance to exist more than once because
paired with the site id it is still unique.

On top of this, there is another layer. Each site must be contained
within an "Enterprise", which is just another identifier. So, I have a
table called enterprise which contains the list of site ids.

So far then, I have this (obviously cut down to only show the relevant
columns) : -

table: enterprise
column: id

table: site
column: id

table: user
column: site_id
column: id
column: label
etc..

With the site_id and id making the composite key.

Now, I've just had to add the concept of "enterprise" level users.
Initially the only clean way I could think of doing this is having two
separate user tables; enterprise_user and site_user : -

table: enterprise_user
column[key]: enterprise_id
column[key]: id
column: label
etc..

table: site_user
column[key]: site_id
column[key]: id
column: label
etc..

However this does not seem very "normalized"..

Other ways I considered are : -

table: user
column[key]: id
column[key]: enterprise_id
column: site_id
column: label
etc..

But this does not make sense as an enterprise user is not associated
with any site. I could have a dummy site, but I really do not like
that. I could just set site_id to NULL, but I'm not a big fan of NULL
in database and it just seems a bit hacky.

So then I thought of adding a link table:

table: user
column[key]: id
column[key]: enterprise_id
etc..

table: user_site <== link table
column: enterprise_id
column: user_id
column: site_id
etc..

with a unique constraint on enterprise_id, user_id (the composite key
that identifies a user in an enterprise).

I think that also works.. basically with no entry in the "user_site"
table, they are enterprise users. An entry in user_site effectively
limits them to being a site user.

However.. this might create complications in the "master" update process.

Part of the batch update process involves updating user information.
Enterprise users (only editable at the master) must be propagated down
to each site, so that the site is "aware" of enterprise users.

Site users (only editable at the site) must be propagated up to the
master, purely so all the (read only) data at the master is complete.

So I'm wondering if having the two tables separate will be just much
simpler (although not being terribly efficient in terms of pretty much
repeating a table structure).

I hope all that makes sense!

Thanks,

Mark.

Reply With Quote
  #2  
Old   
Mark S. (UK)
 
Posts: n/a

Default Re: Design question - 01-24-2009 , 11:03 AM






Mark S. (UK) wrote:
Quote:
Hi,
I'm currently working on a system that consists of several sites. Each
site runs a system with its own local database, and then periodically
each site updates the "master" site with its data, allow the data to be
viewed centrally.

It must be done like this because each site may not always be on the
network, but the data must always be available from the central server.

To enable the master database to contain all the site database
information, I use a composite key, with the first part being the site
identification, and the second part being the row identification. This
allows user with row id 5 for instance to exist more than once because
paired with the site id it is still unique.

On top of this, there is another layer. Each site must be contained
within an "Enterprise", which is just another identifier. So, I have a
table called enterprise which contains the list of site ids.

So far then, I have this (obviously cut down to only show the relevant
columns) : -

table: enterprise
column: id

table: site
column: id

table: user
column: site_id
column: id
column: label
etc..

With the site_id and id making the composite key.

Now, I've just had to add the concept of "enterprise" level users.
Initially the only clean way I could think of doing this is having two
separate user tables; enterprise_user and site_user : -

table: enterprise_user
column[key]: enterprise_id
column[key]: id
column: label
etc..

table: site_user
column[key]: site_id
column[key]: id
column: label
etc..

However this does not seem very "normalized"..

Other ways I considered are : -

table: user
column[key]: id
column[key]: enterprise_id
column: site_id
column: label
etc..

But this does not make sense as an enterprise user is not associated
with any site. I could have a dummy site, but I really do not like
that. I could just set site_id to NULL, but I'm not a big fan of NULL
in database and it just seems a bit hacky.

So then I thought of adding a link table:

table: user
column[key]: id
column[key]: enterprise_id
etc..

table: user_site <== link table
column: enterprise_id
column: user_id
column: site_id
etc..

with a unique constraint on enterprise_id, user_id (the composite key
that identifies a user in an enterprise).

I think that also works.. basically with no entry in the "user_site"
table, they are enterprise users. An entry in user_site effectively
limits them to being a site user.

However.. this might create complications in the "master" update process.

Part of the batch update process involves updating user information.
Enterprise users (only editable at the master) must be propagated down
to each site, so that the site is "aware" of enterprise users.

Site users (only editable at the site) must be propagated up to the
master, purely so all the (read only) data at the master is complete.

So I'm wondering if having the two tables separate will be just much
simpler (although not being terribly efficient in terms of pretty much
repeating a table structure).

I hope all that makes sense!

Thanks,

Mark.
After some further thought, I've realised there is another problem: user
name (or label as I called it above) uniqueness.

A user name at a site may be re-used, as the site is pretty much a self
contained system. So a user "fred" can exist at site 1, and at site 2.
Enterprise users however must remain unique.

So.. site users are unique to sites, and enterprise users are unique to
the enterprise (which is the "container" for sites).

This brings up a problem with the normalized design:

table: user
column[key]: enterprise_id
column[key]: id
column: label

add constraint unique on user(enterprise_id, label)

table: user_site <== link table
column: enterprise_id
column: user_id
column: site_id

This is fine for keeping users unique at the enterprise level, but
doesn't work out for site users when they've been propagated back to the
"master" database.

So, currently I'm thinking two separate tables. I'm not repeating data,
I'm just repeating some elements of a table, so perhaps it isn't that bad?

Having read these posts back, I'm not sure I've explained the scenario
very clearly. I hope it makes some sense at least.

Mark.


Reply With Quote
  #3  
Old   
Bob Badour
 
Posts: n/a

Default Re: Design question - 01-24-2009 , 12:14 PM



Mark S. (UK) wrote:
Quote:
Mark S. (UK) wrote:

Hi,
I'm currently working on a system that consists of several sites.
Each site runs a system with its own local database, and then
periodically each site updates the "master" site with its data, allow
the data to be viewed centrally.

It must be done like this because each site may not always be on the
network, but the data must always be available from the central server.

To enable the master database to contain all the site database
information, I use a composite key, with the first part being the site
identification, and the second part being the row identification.
This allows user with row id 5 for instance to exist more than once
because paired with the site id it is still unique.

On top of this, there is another layer. Each site must be contained
within an "Enterprise", which is just another identifier. So, I have
a table called enterprise which contains the list of site ids.

So far then, I have this (obviously cut down to only show the relevant
columns) : -

table: enterprise
column: id

table: site
column: id

table: user
column: site_id
column: id
column: label
etc..

With the site_id and id making the composite key.

Now, I've just had to add the concept of "enterprise" level users.
Initially the only clean way I could think of doing this is having two
separate user tables; enterprise_user and site_user : -

table: enterprise_user
column[key]: enterprise_id
column[key]: id
column: label
etc..

table: site_user
column[key]: site_id
column[key]: id
column: label
etc..

However this does not seem very "normalized"..

Other ways I considered are : -

table: user
column[key]: id
column[key]: enterprise_id
column: site_id
column: label
etc..

But this does not make sense as an enterprise user is not associated
with any site. I could have a dummy site, but I really do not like
that. I could just set site_id to NULL, but I'm not a big fan of NULL
in database and it just seems a bit hacky.

So then I thought of adding a link table:

table: user
column[key]: id
column[key]: enterprise_id
etc..

table: user_site <== link table
column: enterprise_id
column: user_id
column: site_id
etc..

with a unique constraint on enterprise_id, user_id (the composite key
that identifies a user in an enterprise).

I think that also works.. basically with no entry in the "user_site"
table, they are enterprise users. An entry in user_site effectively
limits them to being a site user.

However.. this might create complications in the "master" update process.

Part of the batch update process involves updating user information.
Enterprise users (only editable at the master) must be propagated down
to each site, so that the site is "aware" of enterprise users.

Site users (only editable at the site) must be propagated up to the
master, purely so all the (read only) data at the master is complete.

So I'm wondering if having the two tables separate will be just much
simpler (although not being terribly efficient in terms of pretty much
repeating a table structure).

I hope all that makes sense!

Thanks,

Mark.


After some further thought, I've realised there is another problem: user
name (or label as I called it above) uniqueness.

A user name at a site may be re-used, as the site is pretty much a self
contained system. So a user "fred" can exist at site 1, and at site 2.
Enterprise users however must remain unique.

So.. site users are unique to sites, and enterprise users are unique to
the enterprise (which is the "container" for sites).

This brings up a problem with the normalized design:

table: user
column[key]: enterprise_id
column[key]: id
column: label

add constraint unique on user(enterprise_id, label)

table: user_site <== link table
column: enterprise_id
column: user_id
column: site_id

This is fine for keeping users unique at the enterprise level, but
doesn't work out for site users when they've been propagated back to the
"master" database.

So, currently I'm thinking two separate tables. I'm not repeating data,
I'm just repeating some elements of a table, so perhaps it isn't that bad?

Having read these posts back, I'm not sure I've explained the scenario
very clearly. I hope it makes some sense at least.

Mark.
Mark,

I don't think theory-based answers to your questions exist, and I also
find design via usenet counterproductive at best.

You are struggling with a number of issues including distribution and
partial fault tolerance. When needed, these are very important
considerations.

As a general principle, it is best to handle your distribution and fault
tolerance needs at the physical level of discourse without exposing any
of that in your logical data model. You are trying to handle it entirely
at the logical level, which I think is probably inappropriate.

Pragmatically, it is best to consider these issues at the earliest
stages and to use your distribution and fault tolerance requirements as
a basis for comparing dbms products and then choosing the dbms product
that best addresses your distribution needs. Once you have chosen a dbms
product, which it sounds like you have already done, then it is best to
consider the specific features that dbms product has for handling
distribution, replication, fault tolerance et cetera physically with the
least impact to your logical design.

Many products now have features for distribution or replication, for
assigning globally unique numeric identifiers, for reconciling off-line
changes and more. Perhaps even the product you use has such features.

The issue you have with user id's is a basic requirements and design
issue. Assuming you are using user id in the conventional way, the
purpose of a user id is to uniquely identify users of a system.
Pragmatically, the scope of such identity has to be an entire
independent system. Generally, that means globally throughout a system.
In your case, that may not be true. Individual sites, though, are not
independent systems because they are tied together into enterprises. It
may be your enterprises are entirely independent from one another. Or it
may be they have interdependencies too.

Sometimes, it makes sense to scope user ids to have partial
independence, which is what email addresses do. The entire globally
unique id is the full email address with a user name, the @ symbol, and
a domain, while the locally unique id is simply the user name. Doing so
simplifies some things while complicating others.

In any case, I think you need to decide the scope of your user ids based
on what users will expect and what meets the needs of the system you are
designing. You may have to trade off conflicts between them.

In the end, at this point in time, design is more art than science.


Reply With Quote
  #4  
Old   
Bob Badour
 
Posts: n/a

Default Re: Design question - 01-24-2009 , 12:17 PM



P.S. Also ask yourself whether you may have conflated your distribution
needs with your security needs, when perhaps these could remain
completely separate concerns.

Reply With Quote
  #5  
Old   
Mark S. (UK)
 
Posts: n/a

Default Re: Design question - 01-24-2009 , 01:01 PM



Bob Badour wrote:
Quote:
Mark S. (UK) wrote:
Mark S. (UK) wrote:

Hi,
I'm currently working on a system that consists of several sites.
Each site runs a system with its own local database, and then
periodically each site updates the "master" site with its data, allow
the data to be viewed centrally.

It must be done like this because each site may not always be on the
network, but the data must always be available from the central server.

To enable the master database to contain all the site database
information, I use a composite key, with the first part being the
site identification, and the second part being the row
identification. This allows user with row id 5 for instance to exist
more than once because paired with the site id it is still unique.

On top of this, there is another layer. Each site must be contained
within an "Enterprise", which is just another identifier. So, I have
a table called enterprise which contains the list of site ids.

So far then, I have this (obviously cut down to only show the
relevant columns) : -

table: enterprise
column: id

table: site
column: id

table: user
column: site_id
column: id
column: label
etc..

With the site_id and id making the composite key.

Now, I've just had to add the concept of "enterprise" level users.
Initially the only clean way I could think of doing this is having
two separate user tables; enterprise_user and site_user : -

table: enterprise_user
column[key]: enterprise_id
column[key]: id
column: label
etc..

table: site_user
column[key]: site_id
column[key]: id
column: label
etc..

However this does not seem very "normalized"..

Other ways I considered are : -

table: user
column[key]: id
column[key]: enterprise_id
column: site_id
column: label
etc..

But this does not make sense as an enterprise user is not associated
with any site. I could have a dummy site, but I really do not like
that. I could just set site_id to NULL, but I'm not a big fan of
NULL in database and it just seems a bit hacky.

So then I thought of adding a link table:

table: user
column[key]: id
column[key]: enterprise_id
etc..

table: user_site <== link table
column: enterprise_id
column: user_id
column: site_id
etc..

with a unique constraint on enterprise_id, user_id (the composite key
that identifies a user in an enterprise).

I think that also works.. basically with no entry in the "user_site"
table, they are enterprise users. An entry in user_site effectively
limits them to being a site user.

However.. this might create complications in the "master" update
process.

Part of the batch update process involves updating user information.
Enterprise users (only editable at the master) must be propagated
down to each site, so that the site is "aware" of enterprise users.

Site users (only editable at the site) must be propagated up to the
master, purely so all the (read only) data at the master is complete.

So I'm wondering if having the two tables separate will be just much
simpler (although not being terribly efficient in terms of pretty
much repeating a table structure).

I hope all that makes sense!

Thanks,

Mark.


After some further thought, I've realised there is another problem:
user name (or label as I called it above) uniqueness.

A user name at a site may be re-used, as the site is pretty much a
self contained system. So a user "fred" can exist at site 1, and at
site 2. Enterprise users however must remain unique.

So.. site users are unique to sites, and enterprise users are unique
to the enterprise (which is the "container" for sites).

This brings up a problem with the normalized design:

table: user
column[key]: enterprise_id
column[key]: id
column: label

add constraint unique on user(enterprise_id, label)

table: user_site <== link table
column: enterprise_id
column: user_id
column: site_id

This is fine for keeping users unique at the enterprise level, but
doesn't work out for site users when they've been propagated back to
the "master" database.

So, currently I'm thinking two separate tables. I'm not repeating
data, I'm just repeating some elements of a table, so perhaps it isn't
that bad?

Having read these posts back, I'm not sure I've explained the scenario
very clearly. I hope it makes some sense at least.

Mark.

Mark,

I don't think theory-based answers to your questions exist, and I also
find design via usenet counterproductive at best.

You are struggling with a number of issues including distribution and
partial fault tolerance. When needed, these are very important
considerations.

As a general principle, it is best to handle your distribution and fault
tolerance needs at the physical level of discourse without exposing any
of that in your logical data model. You are trying to handle it entirely
at the logical level, which I think is probably inappropriate.

Pragmatically, it is best to consider these issues at the earliest
stages and to use your distribution and fault tolerance requirements as
a basis for comparing dbms products and then choosing the dbms product
that best addresses your distribution needs. Once you have chosen a dbms
product, which it sounds like you have already done, then it is best to
consider the specific features that dbms product has for handling
distribution, replication, fault tolerance et cetera physically with the
least impact to your logical design.

Many products now have features for distribution or replication, for
assigning globally unique numeric identifiers, for reconciling off-line
changes and more. Perhaps even the product you use has such features.

The issue you have with user id's is a basic requirements and design
issue. Assuming you are using user id in the conventional way, the
purpose of a user id is to uniquely identify users of a system.
Pragmatically, the scope of such identity has to be an entire
independent system. Generally, that means globally throughout a system.
In your case, that may not be true. Individual sites, though, are not
independent systems because they are tied together into enterprises. It
may be your enterprises are entirely independent from one another. Or it
may be they have interdependencies too.

Sometimes, it makes sense to scope user ids to have partial
independence, which is what email addresses do. The entire globally
unique id is the full email address with a user name, the @ symbol, and
a domain, while the locally unique id is simply the user name. Doing so
simplifies some things while complicating others.

In any case, I think you need to decide the scope of your user ids based
on what users will expect and what meets the needs of the system you are
designing. You may have to trade off conflicts between them.

In the end, at this point in time, design is more art than science.
Thank you for your well thought out and detailed answer Bob; it makes a
lot of sense and you've given me a lot to think about.

Take care,

Mark.
--


Reply With Quote
  #6  
Old   
Bob Badour
 
Posts: n/a

Default Re: Design question - 01-24-2009 , 01:55 PM



Mark S. (UK) wrote:

Quote:
Bob Badour wrote:
<snip>

Quote:
Thank you for your well thought out and detailed answer Bob; it makes a
lot of sense and you've given me a lot to think about.

Take care,

Mark.
Good Luck with your project!


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.