dbTalk Databases Forums  

How to 'normalise' this scenario

comp.databases.theory comp.databases.theory


Discuss How to 'normalise' this scenario in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Fred.
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-17-2011 , 02:50 PM






On May 17, 11:16*am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:
Quote:
On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:





On May 17, 12:08*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

Hi Erwin

Thanks for your comments - you have got me thinking.

I have stripped most of the comments, because what I have left is the
crux of the matter.

You have not stated what "the requirement" is.

Let's take an example we should all be familiar with - a file system.

Assume that we have a lot of files, and we have a table where each row
represents one file, with columns such as file name, disk address,
date created, date last modified, etc.

You can sort the table by name, creation date, etc. However, it is
getting difficult to manage, so a bright spark comes up with the idea
of directories/folders.

The user can create directories on demand, place directories within
directories, assign files to directories, move files between
directories, etc.

The requirement is to create a database structure to represent the
directories.

My guess is that you could use an adjacency list or a nested set for
this purpose. However, where I am getting stuck is, how does one link
an entry in the 'files' table to an entry in the 'directories' table,
to indicate that a particular file resides in a particular directory.

I have been trying to create an entry in the 'directories' table that
represents, or points to, a 'file'. From your comments, it would
appear that this is the wrong approach.

I could carry on and speculate further, but I think this is a good
time to pause and ask if this is a good example, and ask if there is a
preferred solution.

TIA

Frank

That you're now thinking is a good thing :-)

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. *This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". *This is not the case for file systems. **IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. *Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

So if file systems are what's in your mind, then what you would
typically need (for representing that in a database) is a single table
where the _fully qualified_ filename is a key/identifier. *And as soon
as you have that, there typically is no longer a need for explicitly
recording the "identity" of the parent directory/folder, as that is
already implied by the full name of the file itself.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". *Relational database designs
do not include "pointers". *And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.- Hide quoted text -

- Show quoted text -
I think Frank's analogy is on all fours.

The nested sets model is consrained to those graphs which represent
hierarchies. While each node will have a identity, users in Frank's
application will be working with descriptors, which is one level
removed from whatever identifier Frank sets up, and two levels removed
from the physucal structure of the database. Leaf nodes in the
structure will correspond 1-1 to entries in a separte product table,
from which users will be trying to retrieve information.

The situation with a file system is very similar. The directory
structure is a hierachical set of nodes. While each node has an
identity in the logical disk structures. user of the file system work
with the descriptive names associated with each node, which are on the
order of 2 levels removed from the physical structure of the disk.
Leaf nodes in the structure correspond 1-1 to files which have a
different structure on the disk.

While the hierarchy represents a conceptual structure, that conceptual
structure does not exist in isolation. Elements of it are already
implemetned in the product table.

I agree that there are differences as well as similarities. That's
why I used the word "similar" and Frank used the word "like".

Fred.

Reply With Quote
  #12  
Old   
Frank Millman
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-18-2011 , 03:33 AM






On May 17, 5:16*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:
Quote:
On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. *This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". *This is not the case for file systems. **IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. *Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

Agreed - a file system is not a perfect analogy because, as you point
out, files in different directories can have the same name, so you
need a combination of directory name and file name to uniquely
identify a file. In the scenario that I am trying to describe, that
does not apply.

Quote:
I don't fully see what you mean by "creating an entry in the
directories table that points to a file". *Relational database designs
do not include "pointers". *And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.
I don't know the correct terminology, but take your example of a bill-
of-materials structure. Each element in the structure 'is',
'represents', 'points to', 'references', an item in a 'products'
table, which will contain all the attributes of the product in
question.

This is also not a perfect analogy for the scenario I am trying to
describe, because in my scenario only the 'leaf' nodes represent items
in a separate table.

How about a 'menu' system as an analogy? A typical cell phone has
hundreds of functions available to the user. Instead of presenting the
options in a long list, they are grouped in the form of menus, with
sub-menus, of arbitrary depth, ultimately arriving at a function.

Only the 'leaf' nodes represent something 'tangible' - an executable
function. All the other nodes are purely descriptive, and their only
purpose is to assist the user in arriving at the function they are
looking for.

Assume we have a table of 'functions', with columns such as function
name, description, and a pointer to the executable code.

If we create a separate table representing the 'menu' system, how do
we ensure that all 'leaf' nodes represent functions?

Maybe you gave a clue earlier, when you said -

Quote:
you don't say on the directory level which files it contains,
instead you say on the file level to which directory each file belongs.
Substituting, are you saying that "you don't say on the menu level
which function it contains, instead you say on the function level to
which menu each function belongs."?

Are you suggesting the following?

CREATE TABLE menus
(node_id INT PRIMARY KEY,
..., ..., ...)

CREATE TABLE functions
(function_id INT PRIMARY KEY,
...,
menu_id INT REFERENCES menus)

I can see the benefit of this approach, but I can see some issues.

1. How do you ensure that functions.menu_id only references 'leaf'
nodes?

2. How do you ensure that every 'leaf' node is referenced by a
functions.menu_id?

3. How do you prevent a user from expanding a 'leaf' node and giving
it children?

All of these issues can be handled at the application level. Are there
any SQL constraints that could be used?

All comments welcome.

Frank

Reply With Quote
  #13  
Old   
Erwin
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-18-2011 , 06:14 AM



On 17 mei, 21:50, "Fred." <ghrno-goo... (AT) yahoo (DOT) com> wrote:
Quote:
On May 17, 11:16*am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:





On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

On May 17, 12:08*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

Hi Erwin

Thanks for your comments - you have got me thinking.

I have stripped most of the comments, because what I have left is the
crux of the matter.

You have not stated what "the requirement" is.

Let's take an example we should all be familiar with - a file system.

Assume that we have a lot of files, and we have a table where each row
represents one file, with columns such as file name, disk address,
date created, date last modified, etc.

You can sort the table by name, creation date, etc. However, it is
getting difficult to manage, so a bright spark comes up with the idea
of directories/folders.

The user can create directories on demand, place directories within
directories, assign files to directories, move files between
directories, etc.

The requirement is to create a database structure to represent the
directories.

My guess is that you could use an adjacency list or a nested set for
this purpose. However, where I am getting stuck is, how does one link
an entry in the 'files' table to an entry in the 'directories' table,
to indicate that a particular file resides in a particular directory.

I have been trying to create an entry in the 'directories' table that
represents, or points to, a 'file'. From your comments, it would
appear that this is the wrong approach.

I could carry on and speculate further, but I think this is a good
time to pause and ask if this is a good example, and ask if there is a
preferred solution.

TIA

Frank

That you're now thinking is a good thing :-)

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. *This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". *This is not the case for file systems. **IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. *Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

So if file systems are what's in your mind, then what you would
typically need (for representing that in a database) is a single table
where the _fully qualified_ filename is a key/identifier. *And as soon
as you have that, there typically is no longer a need for explicitly
recording the "identity" of the parent directory/folder, as that is
already implied by the full name of the file itself.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". *Relational database designs
do not include "pointers". *And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.- Hide quoted text -

- Show quoted text -

I think Frank's analogy is on all fours.

The nested sets model is consrained to those graphs which represent
hierarchies. *While each node will have a identity, users in Frank's
application will be working with descriptors, which is one level
removed from whatever identifier Frank sets up, and two levels removed
from the physucal structure of the database. *Leaf nodes in the
structure will correspond 1-1 to entries in a separte product table,
from which users will be trying to retrieve information.

The situation with a file system is very similar. *The directory
structure is a hierachical set of nodes. *While each node has an
identity in the logical disk structures. user of the file system work
with the descriptive names associated with each node, which are on the
order of 2 levels removed from the physical structure of the disk.
Leaf nodes in the structure correspond 1-1 to files which have a
different structure on the disk.

While the hierarchy represents a conceptual structure, that conceptual
structure does not exist in isolation. *Elements of it are already
implemetned in the product table.

I agree that there are differences as well as similarities. *That's
why I used the word "similar" and Frank used the word "like".

Fred.- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -
I have no clue what point you're trying to make with your first
paragraphs, but I just want to note that those very same "differences
and similarities" are probably also the reason why I wrote "NOT THE
BEST example [for dealing with graph data]".

Reply With Quote
  #14  
Old   
Fred.
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-18-2011 , 08:27 AM



On May 18, 7:14*am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:
Quote:
On 17 mei, 21:50, "Fred." <ghrno-goo... (AT) yahoo (DOT) com> wrote:





On May 17, 11:16*am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

On May 17, 12:08*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

Hi Erwin

Thanks for your comments - you have got me thinking.

I have stripped most of the comments, because what I have left is the
crux of the matter.

You have not stated what "the requirement" is.

Let's take an example we should all be familiar with - a file system.

Assume that we have a lot of files, and we have a table where each row
represents one file, with columns such as file name, disk address,
date created, date last modified, etc.

You can sort the table by name, creation date, etc. However, it is
getting difficult to manage, so a bright spark comes up with the idea
of directories/folders.

The user can create directories on demand, place directories within
directories, assign files to directories, move files between
directories, etc.

The requirement is to create a database structure to represent the
directories.

My guess is that you could use an adjacency list or a nested set for
this purpose. However, where I am getting stuck is, how does one link
an entry in the 'files' table to an entry in the 'directories' table,
to indicate that a particular file resides in a particular directory.

I have been trying to create an entry in the 'directories' table that
represents, or points to, a 'file'. From your comments, it would
appear that this is the wrong approach.

I could carry on and speculate further, but I think this is a good
time to pause and ask if this is a good example, and ask if there is a
preferred solution.

TIA

Frank

That you're now thinking is a good thing :-)

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. *This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". *This is not the case for file systems. **IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. *Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

So if file systems are what's in your mind, then what you would
typically need (for representing that in a database) is a single table
where the _fully qualified_ filename is a key/identifier. *And as soon
as you have that, there typically is no longer a need for explicitly
recording the "identity" of the parent directory/folder, as that is
already implied by the full name of the file itself.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". *Relational database designs
do not include "pointers". *And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.- Hide quoted text -

- Show quoted text -

I think Frank's analogy is on all fours.

The nested sets model is consrained to those graphs which represent
hierarchies. *While each node will have a identity, users in Frank's
application will be working with descriptors, which is one level
removed from whatever identifier Frank sets up, and two levels removed
from the physucal structure of the database. *Leaf nodes in the
structure will correspond 1-1 to entries in a separte product table,
from which users will be trying to retrieve information.

The situation with a file system is very similar. *The directory
structure is a hierachical set of nodes. *While each node has an
identity in the logical disk structures. user of the file system work
with the descriptive names associated with each node, which are on the
order of 2 levels removed from the physical structure of the disk.
Leaf nodes in the structure correspond 1-1 to files which have a
different structure on the disk.

While the hierarchy represents a conceptual structure, that conceptual
structure does not exist in isolation. *Elements of it are already
implemetned in the product table.

I agree that there are differences as well as similarities. *That's
why I used the word "similar" and Frank used the word "like".

Fred.- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -

I have no clue what point you're trying to make with your first
paragraphs, but I just want to note that those very same "differences
and similarities" are probably also the reason why I wrote "NOT THE
BEST example [for dealing with graph data]".- Hide quoted text -

- Show quoted text -
I'm starting to think some of the people on this topic can't read very
well. Frank wasn't trying to create an example for dealing with graph
data. He was trying to solve a business problem using nested sets to
support hierachical lookups in his product data and was asking a
specific question about how to organize the hierarch to do this. If
you wish to complain that he was off topic for a theory group, I
can't argue to the contrary. Still, I can't see that it hurts theory
too much to apply it once in a while.

Fred.

Reply With Quote
  #15  
Old   
Erwin
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-18-2011 , 08:31 AM



On 18 mei, 10:33, Frank Millman <fr... (AT) chagford (DOT) com> wrote:
Quote:
On May 17, 5:16*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:
The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. *This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". *This is not the case for file systems. **IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. *Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

Agreed - a file system is not a perfect analogy because, as you point
out, files in different directories can have the same name, so you
need a combination of directory name and file name to uniquely
identify a file. In the scenario that I am trying to describe, that
does not apply.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". *Relational database designs
do not include "pointers". *And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.

I don't know the correct terminology, but take your example of a bill-
of-materials structure. Each element in the structure 'is',
'represents', 'points to', 'references', an item in a 'products'
table, which will contain all the attributes of the product in
question.

This is also not a perfect analogy for the scenario I am trying to
describe, because in my scenario only the 'leaf' nodes represent items
in a separate table.

How about a 'menu' system as an analogy? A typical cell phone has
hundreds of functions available to the user. Instead of presenting the
options in a long list, they are grouped in the form of menus, with
sub-menus, of arbitrary depth, ultimately arriving at a function.

Only the 'leaf' nodes represent something 'tangible' - an executable
function. All the other nodes are purely descriptive, and their only
purpose is to assist the user in arriving at the function they are
looking for.

Assume we have a table of 'functions', with columns such as function
name, description, and a pointer to the executable code.

If we create a separate table representing the 'menu' system, how do
we ensure that all 'leaf' nodes represent functions?

Maybe you gave a clue earlier, when you said -

*you don't say on the directory level which files it contains,
instead you say on the file level to which directory each file belongs.

Substituting, are you saying that "you don't say on the menu level
which function it contains, instead you say on the function level to
which menu each function belongs."?

Are you suggesting the following?

CREATE TABLE menus
* (node_id INT PRIMARY KEY,
* ..., ..., ...)

CREATE TABLE functions
* (function_id INT PRIMARY KEY,
* ...,
* menu_id INT REFERENCES menus)

I can see the benefit of this approach, but I can see some issues.

1. How do you ensure that functions.menu_id only references 'leaf'
nodes?

2. How do you ensure that every 'leaf' node is referenced by a
functions.menu_id?

3. How do you prevent a user from expanding a 'leaf' node and giving
it children?

All of these issues can be handled at the application level. Are there
any SQL constraints that could be used?

All comments welcome.

Frank
I'll leave original message intact and copy-paste the parts to which I
respond.

Quote:
This is also not a perfect analogy for the scenario I am trying to
describe, because in my scenario only the 'leaf' nodes represent items
in a separate table.

How about a 'menu' system as an analogy? A typical cell phone has
hundreds of functions available to the user. Instead of presenting the
options in a long list, they are grouped in the form of menus, with
sub-menus, of arbitrary depth, ultimately arriving at a function.
Graph theory applies only to scenarios where, and only to the extent
where, there is a conceptual 'sameness' between the nodes. That
'sameness' in a file system is the fact that a directory, just like
any other kind of file, has an identifying name within the file
system, has some kind of physical content (bits&bytes) that is
physically stored on disk, and that it can be, just like any other
file, contained within another directory. The conceptual 'difference'
is that typically, interpretation of the physical contents (the
bits&bytes) of a directory is reserved for OS- or FS- provided
services (system-provided at any rate), and not user programs.

Menus are completely analogous to file systems. Menus are like menu
entries in that they can both be contained within another menu, menus
are different from menu entries in that only 'actual' menu entries
have an "executable function" linked to them. Well, on second
thought, is that really so ? If you regard a function, perhaps system-
provided, such as "display submenu" as an "executable function", then
the conceptual difference has been abstracted away.

Quote:
Only the 'leaf' nodes represent something 'tangible' - an executable
function. All the other nodes are purely descriptive, and their only
purpose is to assist the user in arriving at the function they are
looking for.
Is that really so ? If a menu is existant, but empty, is that menu
then not a leaf node ? Can you not have empty menus ? Can you not
have empty directories in a file system ? It never ceases to amaze me
how people consistently consider the empty set as an indication of
something being wrong ...

Quote:
Are you suggesting the following?

CREATE TABLE menus
(node_id INT PRIMARY KEY,
..., ..., ...)
I assume that the ellipsis here stand for either the attributes needed
to make this a graph-represented-as-nested-sets, or else graph-
represented-as-an-adjacency-list.

As an adjacency list, I'll assume the attributes will be {nodeID,
parentNodeID}.

If there are other details to be kept about the menus (menu title and/
or description, e.g. ?), they could be in a separate table, or in the
same table, depending on circumstances. This aspect is a bit
irrelevant.

But note explicitly that _there must not be entries in this table for
the containment of a *function* in a menu_. That is, there are no
rows in this table where the nodeID is a value that is actually a
_function_ID. You can do this (and I suspect it's indeed what you
have in mind), but then you'll run into other problems. I'll discuss
those later, separately.

What are the typical constraints and design problems related to this
table ?

(1) What is the root node ?

A root node, unlike all the others, does not have a parent node. You
can address this problem either by applying relational theory very
rigorously, and provide a separate table that identifies the root node
(CREATE TABLE ROOTNODE (nodeID)), or you can address this problem with
a bit of a hack, by inserting a self-referencing row (or yet worse, a
row containing a null).

At any rate you want to constrain your root node to be "unique in the
entire universe", i.e. your "set of root nodes" must be a singleton
(unless of course your business case requires you to be managing
multiple independent trees/hierarchies simultaneously and
independently).

In relational theory, you could do this by defining a key with no
attributes on table ROOTNODE. Alas, SQL does not support this, so if
you do define table ROOTNODE in an SQL system, you are still left with
work to do to enforce this (an ASSERTION to the effect that SELECT
COUNT(*) FROM ROOTNODE <= 1). If you do not define a separate table
to identify the root node, then "identifying the root node" will look
something like SELECT nodeID FROM MENUS WHERE nodeID = parentNodeID;
In fact you could define ROOTNODE as being exactly that view of table
MENUS ... But in this case too, you are left with work to do for
enforcing the rule that there is at most one row that satisfies 'WHERE
nodeID = parentNodeID'.

(2) A statement of something being a parent node, must be a valid
reference to an existing node.

If you have the single-table design with the "self-referencing row
hack", then this is achievable by declaring an FK from parentNodeID to
nodeID.

If you have the two-table design with the "separately identified root
node", then you'd need to create an ASSERTION to the effect that all
values that appear as a parentNodeID, appear either as a nodeID in
MENUS, or else as the nodeID in ROOTNODE. That is, you have a
"foreign key" from parentNodeID to the nodeID as it appears _in a
VIEW_ that is the UNION of ROOTNODE with 'SELECT nodeID FROM MENUS'.
And while you could create such a view (CREATE VIEW ALLNODES AS SELECT
nodeID FROM ROOTNODE UNION SELECT nodeID FROM MENUS), SQL will not
allow you to create an FK reference from some table into this view.

So in the two-table design, you cannot enforce this rule
declaratively. This is SQL systems (not SQL itself, mind you, the
standard does support this) actively punishing you for choosing the
design that is, from a logical perspective, the most (and in fact the
only) correct one.

(3) Menu items are unique.

If you have the one-table design, you achieve this by the KEY
declaration that you already mentioned.

If you have the two-table design, then in addition to the KEY on
MENUS, you must enforce the rule that SELECT nodeID FROM ROOTNODE is
disjoint with SELECT nodeID FROM MENUS. This is an ASSERTION to the
effect that SELECT COUNT(*) FROM (SELECT nodeID FROM ROOTNODE NATURAL
JOIN SELECT nodeID FROM MENUS) == 0. Once again, this is SQL systems
actively punishing you for choosing the right logical design.



Quote:
CREATE TABLE functions
(function_id INT PRIMARY KEY,
...,
menu_id INT REFERENCES menus)
Yes.



Quote:
I can see the benefit of this approach, but I can see some issues.

1. How do you ensure that functions.menu_id only references 'leaf'
nodes?
First, do you _really_ want to impose that rule ? Do you _really_
want to enforce the rule that "functions" can _only_ be mentioned on
"leaf'" menus ?

Suppose you do. You obviously understand that a simple FK from
FUNCTIONS is not strong enough. Because the set of nodeIDs that are
"valid" for functions is a proper subset of the set of *all* nodeIDs.
The FK would have to be from FUNCTIONS into a VIEW that lists all, and
only, your leaf nodes : CREATE VIEW LEAFMENUS AS SELECT nodeID AS
leafNodeID FROM MENUS A WHERE NOT EXISTS (SELECT * FROM MENUS B WHERE
A.nodeID = B.parentNodeID). Or CREATE VIEW LEAFMENUS AS SELECT nodeID
AS leafNodeID FROM MENUS EXCEPT SELECT parentNodeID AS leafNodeID FROM
MENUS.

If you really want this constraint, then it is up to you to enforce it
no matter what. The problem is completely analogous as the one of
enforcing an FK into that UNION view.



Quote:
2. How do you ensure that every 'leaf' node is referenced by a
functions.menu_id?
3. How do you prevent a user from expanding a 'leaf' node and giving
it children?
I suspect that you are indeed thinking of having rows in table MENUS
where the nodeID represents a function. I think this is a mistake,
and here's why : because if you "reserve" the rows in the MENUS table
for "genuine" menus only, then I think there is no need to enforce
these constraints at all ! If you "reserve" the rows in MENUS for the
"genuine" menus, then the mention of {functionID and menuID} in table
FUNCTIONS makes that an "unexpandable" leaf node by definition.

But if you do want to enforce these, then :

As for (2), consider this again. You are saying that you want to
prohibit "orphaned" functions in your database, right ? In that case,
in addition to the FK from FUNCTIONS to MENUS (/LEAFNODES), you want a
similar thing from LEAFNODES to FUNCTIONS. For one, this creates the
kind of chicken-and-egg problem that SQL has addressed by defining
DEFERRED CONSTRAINT CHECKING. But more importantly, because LEAFNODES
is a view, SQL supports this stuff only in the form of CREATE
ASSERTION, which no currently existing SQL system allows you to use,
thus which isn't of any help to you.

As for (3), you probably can guess by now : this can be done by
defining a certain ASSERTION, to the effect that no functionID that
appears in the FUNCTIONS table, will also appear as a parentNodeID in
MENUS. Thus, this is once again a disjointness constraint : 0 =
SELECT COUNT(*) FROM (SELECT parentNodeID AS n FROM MENUS NATURAL JOIN
SELECT functionID AS n FROM FUNCTIONS). Alas, ASSERTIONS are always
for the designer to implement.

And now of course, you're going to ask, "if I can't have the full menu
in the MENU table, then how can I know what the menu must look like ?"

The answer is, obviously enough, views :

CREATE VIEW ALLFUNCTIONS AS
SELECT "F" AS itemType, functionID AS itemID, menuID FROM FUNCTIONS;
CREATE VIEW ALLSUBMENUS AS
SELECT "M" AS itemType, nodeID AS itemID, parentNodeID AS menuID FROM
MENUS;
CREATE VIEW ENTIREMENUSTRUCTURE AS
SELECT * FROM ALLFUNCTIONS UNION SELECT * FROM ALLSUBMENUS;

Adapt according to personal taste and user requirements.

(Note that you might consider it desirable to also enforce a
disjointness constraint between functionID values and nodeID values
from MENUS, but this is not strictly necessary (because you can always
inspect the ITEMTYPE and use it where necessary). If you do want
this, once again no SQL system will help you and you're simply on your
own.)



Quote:
All of these issues can be handled at the application level. Are there
any SQL constraints that could be used ?
Well, I think I've said about all I had to say.

Except, you might try and figure out where exactly the "nested sets"
approach makes addressing any of the problems mentioned here any
easier or less labour-intensive.

Nested sets are a hack to bypass (and note that I didn't say
"overcome" here) the difficulties when having to compute closures over
adjacency lists. (And thus to bypass the shortcomings of certain SQL
implementations whose support for recursive querying is anywhere from
nonexistant to unacceptably poor.)

Oh, wait, no, I'll give you two references of good books that touch on
these issues :

"Practical Issues in Database Management", by Fabian Pascal (I've
heard that it's out of print, but the dump circuit might still provide
copies). There's a dedicated chapter on "dealing with graphs and
hierarchies" in this book.

"Applied Mathematics for Database Professionals", by De Haan and
Koppelaars. It's a brilliant book on formally specifying database
designs, plus there's a very large dedicated chapter on "enforcing
database constraints in SQL systems".

Reply With Quote
  #16  
Old   
Erwin
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-18-2011 , 08:53 AM



On 18 mei, 15:27, "Fred." <ghrno-goo... (AT) yahoo (DOT) com> wrote:
Quote:
On May 18, 7:14*am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:





On 17 mei, 21:50, "Fred." <ghrno-goo... (AT) yahoo (DOT) com> wrote:

On May 17, 11:16*am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

On May 17, 12:08*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

Hi Erwin

Thanks for your comments - you have got me thinking.

I have stripped most of the comments, because what I have left isthe
crux of the matter.

You have not stated what "the requirement" is.

Let's take an example we should all be familiar with - a file system.

Assume that we have a lot of files, and we have a table where each row
represents one file, with columns such as file name, disk address,
date created, date last modified, etc.

You can sort the table by name, creation date, etc. However, it is
getting difficult to manage, so a bright spark comes up with the idea
of directories/folders.

The user can create directories on demand, place directories within
directories, assign files to directories, move files between
directories, etc.

The requirement is to create a database structure to represent the
directories.

My guess is that you could use an adjacency list or a nested set for
this purpose. However, where I am getting stuck is, how does one link
an entry in the 'files' table to an entry in the 'directories' table,
to indicate that a particular file resides in a particular directory.

I have been trying to create an entry in the 'directories' table that
represents, or points to, a 'file'. From your comments, it would
appear that this is the wrong approach.

I could carry on and speculate further, but I think this is a good
time to pause and ask if this is a good example, and ask if thereis a
preferred solution.

TIA

Frank

That you're now thinking is a good thing :-)

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. *This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". *This is not the case for file systems. **IX systemsmay
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. *Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

So if file systems are what's in your mind, then what you would
typically need (for representing that in a database) is a single table
where the _fully qualified_ filename is a key/identifier. *And assoon
as you have that, there typically is no longer a need for explicitly
recording the "identity" of the parent directory/folder, as that is
already implied by the full name of the file itself.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". *Relational database designs
do not include "pointers". *And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.- Hide quoted text -

- Show quoted text -

I think Frank's analogy is on all fours.

The nested sets model is consrained to those graphs which represent
hierarchies. *While each node will have a identity, users in Frank's
application will be working with descriptors, which is one level
removed from whatever identifier Frank sets up, and two levels removed
from the physucal structure of the database. *Leaf nodes in the
structure will correspond 1-1 to entries in a separte product table,
from which users will be trying to retrieve information.

The situation with a file system is very similar. *The directory
structure is a hierachical set of nodes. *While each node has an
identity in the logical disk structures. user of the file system work
with the descriptive names associated with each node, which are on the
order of 2 levels removed from the physical structure of the disk.
Leaf nodes in the structure correspond 1-1 to files which have a
different structure on the disk.

While the hierarchy represents a conceptual structure, that conceptual
structure does not exist in isolation. *Elements of it are already
implemetned in the product table.

I agree that there are differences as well as similarities. *That's
why I used the word "similar" and Frank used the word "like".

Fred.- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -

I have no clue what point you're trying to make with your first
paragraphs, but I just want to note that those very same "differences
and similarities" are probably also the reason why I wrote "NOT THE
BEST example [for dealing with graph data]".- Hide quoted text -

- Show quoted text -

I'm starting to think some of the people on this topic can't read very
well. *Frank wasn't trying to create an example for dealing with graph
data. *He was trying to solve a business problem using nested sets to
support hierachical lookups in his product data and was asking a
specific question about how to organize the hierarch to do this. *If
you wish to complain that he was off topic for a theory group, *I
can't argue to the contrary. *Still, I can't see that it hurts theory
too much to apply it once in a while.

Fred.- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -
Well, first, I do not "complain" that he poses a question that can
indeed legitimately be considered as "off-topic for the theory
group". Had my intent been to do that kind of complaining, I simply
would have said so (and left it at that afterwards).

Second, it does remain a fact that we are indeed in the theory group,
and what I have tried is to provide the answers that theory has to
give regarding the problem area that the OP is dealing with. Or
rather, what I think theory's answers to this problem are. Others may
still be around that can speak with more authority than I, but I know
they will be much less inclined to respond, precisely because that off-
topicness that I am not so bothered with.

Third, applying theory "once in a while" obviously requires that one
actually _knows_ that theory. I invite you to google around the whole
internet for a definition of a mathematical theory of graphs that does
_NOT_ define, somehow, a graph as being (or at least consisting of) a
"set of edges". And if relational theory has it that a relation is
(or at least consists of) a "set of tuples", I invite you to explain
how one can reasonably justify that the "most natural" way of "dealing
with graphs and hierarchies relationally" would somehow NOT be to just
simply have a "set of tuples" where each tuple represents an edge of
the graph. That is, an adjacency list.

To conclude : if the OP was indeed "asking a specific question about
how to organize the hierarchy", then isn't the information I gave him
about the adjacency list approach a most relevant and pertinent answer
to precisely the question that he asked ? The only way imo to answer
this with a "No it's not" is to observe that the OP had already chosen
the solution (nested sets), and was now looking for a problem. I for
one do not consider that a very smart way of "applying theory once in
a while".

Reply With Quote
  #17  
Old   
Bob Badour
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-18-2011 , 09:46 AM



Fred. wrote:

Quote:
On May 18, 7:14 am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 17 mei, 21:50, "Fred." <ghrno-goo... (AT) yahoo (DOT) com> wrote:






On May 17, 11:16 am, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

On May 17, 12:08 pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

Hi Erwin

Thanks for your comments - you have got me thinking.

I have stripped most of the comments, because what I have left is the
crux of the matter.

You have not stated what "the requirement" is.

Let's take an example we should all be familiar with - a file system.

Assume that we have a lot of files, and we have a table where each row
represents one file, with columns such as file name, disk address,
date created, date last modified, etc.

You can sort the table by name, creation date, etc. However, it is
getting difficult to manage, so a bright spark comes up with the idea
of directories/folders.

The user can create directories on demand, place directories within
directories, assign files to directories, move files between
directories, etc.

The requirement is to create a database structure to represent the
directories.

My guess is that you could use an adjacency list or a nested set for
this purpose. However, where I am getting stuck is, how does one link
an entry in the 'files' table to an entry in the 'directories' table,
to indicate that a particular file resides in a particular directory.

I have been trying to create an entry in the 'directories' table that
represents, or points to, a 'file'. From your comments, it would
appear that this is the wrong approach.

I could carry on and speculate further, but I think this is a good
time to pause and ask if this is a good example, and ask if there is a
preferred solution.

TIA

Frank

That you're now thinking is a good thing :-)

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". This is not the case for file systems. *IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

So if file systems are what's in your mind, then what you would
typically need (for representing that in a database) is a single table
where the _fully qualified_ filename is a key/identifier. And as soon
as you have that, there typically is no longer a need for explicitly
recording the "identity" of the parent directory/folder, as that is
already implied by the full name of the file itself.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". Relational database designs
do not include "pointers". And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.- Hide quoted text -

- Show quoted text -

I think Frank's analogy is on all fours.

The nested sets model is consrained to those graphs which represent
hierarchies. While each node will have a identity, users in Frank's
application will be working with descriptors, which is one level
removed from whatever identifier Frank sets up, and two levels removed
from the physucal structure of the database. Leaf nodes in the
structure will correspond 1-1 to entries in a separte product table,
from which users will be trying to retrieve information.

The situation with a file system is very similar. The directory
structure is a hierachical set of nodes. While each node has an
identity in the logical disk structures. user of the file system work
with the descriptive names associated with each node, which are on the
order of 2 levels removed from the physical structure of the disk.
Leaf nodes in the structure correspond 1-1 to files which have a
different structure on the disk.

While the hierarchy represents a conceptual structure, that conceptual
structure does not exist in isolation. Elements of it are already
implemetned in the product table.

I agree that there are differences as well as similarities. That's
why I used the word "similar" and Frank used the word "like".

Fred.- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -

I have no clue what point you're trying to make with your first
paragraphs, but I just want to note that those very same "differences
and similarities" are probably also the reason why I wrote "NOT THE
BEST example [for dealing with graph data]".- Hide quoted text -

- Show quoted text -


I'm starting to think some of the people on this topic can't read very
well. Frank wasn't trying to create an example for dealing with graph
data. He was trying to solve a business problem using nested sets to
support hierachical lookups in his product data and was asking a
specific question about how to organize the hierarch to do this. If
you wish to complain that he was off topic for a theory group, I
can't argue to the contrary. Still, I can't see that it hurts theory
too much to apply it once in a while.

Fred.
Remind me: What theory are we applying?

Reply With Quote
  #18  
Old   
Frank Millman
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-19-2011 , 04:31 AM



On May 18, 3:31*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:
Quote:
On 18 mei, 10:33, Frank Millman <fr... (AT) chagford (DOT) com> wrote:
Thank you very much, Erwin, for your time and patience. You have given
me enough information for me to figure out the rest for myself now.

The 'lightbulb' moment for me was to realise that -

1. I should not store 'function' nodes in the menu table at all.
2. a rule stating that functions can appear only on 'leaf' menus is
unnecessary.

With these two fallacies removed, the design becomes simpler, cleaner,
and more robust.

I really am very grateful.

Frank

P.S. Thanks also to Fred for your support - much appreciated.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.