dbTalk Databases Forums  

How to 'normalise' this scenario

comp.databases.theory comp.databases.theory


Discuss How to 'normalise' this scenario in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Frank Millman
 
Posts: n/a

Default How to 'normalise' this scenario - 05-13-2011 , 02:22 AM






Hi all

I have read up on 'nested sets', and I like the concept.

It seems to me that there are (at least) two possible scenarios -

1. The table *is* the data - e.g. staff members in a hierarchy.

2. The table is used as navigation for a separate data table. For
example, you might have thousands of product codes, and you want to
set up a separate table to provide a hierarchical 'view' of the
product code table, allowing a user to drill down to find the code
they are looking for.

The problem with the second scenario is that 'branch' nodes and 'leaf'
nodes are conceptually different.

A 'branch' node may require columns for 'code' and 'description'.

A 'leaf' node requires a column that contains a foreign key to the
products table.

It is tricky to ensure that there is one, and only one, 'leaf' node
for each row in the product table.

It is tricky to ensure that 'branch' codes are unique, as 'leaf' nodes
will have a null 'code', and MS SQL Server does not allow more than
one null in a unique column.

Is there an approach that satisfies the requirement, but avoids these
problems?

Thanks for any insights.

Frank Millman

Reply With Quote
  #2  
Old   
Fred.
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-13-2011 , 12:22 PM






On May 13, 3:22*am, Frank Millman <fr... (AT) chagford (DOT) com> wrote:
Quote:
Hi all

I have read up on 'nested sets', and I like the concept.

It seems to me that there are (at least) two possible scenarios -

1. The table *is* the data - e.g. staff members in a hierarchy.

2. The table is used as navigation for a separate data table. For
example, you might have thousands of product codes, and you want to
set up a separate table to provide a hierarchical 'view' of the
product code table, allowing a user to drill down to find the code
they are looking for.

The problem with the second scenario is that 'branch' nodes and 'leaf'
nodes are conceptually different.

A 'branch' node may require columns for 'code' and 'description'.

A 'leaf' node requires a column that contains a foreign key to the
products table.

It is tricky to ensure that there is one, and only one, 'leaf' node
for each row in the product table.

It is tricky to ensure that 'branch' codes are unique, as 'leaf' nodes
will have a null 'code', and MS SQL Server does not allow more than
one null in a unique column.

Is there an approach that satisfies the requirement, but avoids these
problems?

Thanks for any insights.

Frank Millman
The models I've seen give the leaf nodes in the hierarchy left and
right traversal numbers as well. Except while the hierarchy is being
modified each of the traversal numbers will be unique to the node even
using integer traversal numbers, so each of the traversal numbers
fields can have a unique constraint. If you are using rational rather
than integer traversal numbers then I'd think it would be possible to
modify the hierarchy without violating the uniqueness contraint. Of
course, the left and right tranversal number sets should be non-
intersecting, and have a bunch of other constraints, but there is a
limit to what you can check.

As far as linking the hierarchy table and the product table, it might
be a good idea to look at what your business rules are going to
become. While currently you have the product table and are looking at
the hierarchy table as an index, it might be reasonable in the future
to create the leaf node first, when you first conceive of making or
carrying a product, and later create a product entry which points to
the leaf node. This use would make a descriptor applicable to the
leaf node, and at the same time remove the pointer to the product
table. Once things were set up, the pointer in the product table to
the leaf node could be uniquely constrained.

The issue here, which you would have to deal with in any case, is that
some ways of adding to hierarchy can promote a leaf node which has an
associated product. If you want to control this, you could make an
indicater or not a node is a leaf node part of the primary key, and
constrain that portion of the foreign key in the product table to have
the leaf value only.

Fred.

Reply With Quote
  #3  
Old   
-CELKO-
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-15-2011 , 08:02 AM



Since SQL is a set oriented language, this is a better model than the
usual adjacency list approach you see in most text books. Let us
define a simple OrgChart table like this.

CREATE TABLE OrgChart
(emp_name CHAR(10) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft < rgt));


OrgChart
emp_name lft rgt
======================
'Albert' 1 12
'Bert' 2 3
'Chuck' 4 11
'Donna' 5 6
'Eddie' 7 8
'Fred' 9 10

The (lft, rgt) pairs are like tags in a mark-up language, or parens
in algebra, BEGIN-END blocks in Algol-family programming languages,
etc. -- they bracket a sub-set. This is a set-oriented approach to
trees in a set-oriented language. Technically, there ought to be a
separate personnel and organization table, but bear with me,

The organizational chart would look like this as a directed graph:


Albert (1, 12)
/ \
/ \
Bert (2, 3) Chuck (4, 11)
/ \
/ | \
/ | \
/ | \

Donna (5, 6) Eddie (7, 8) Fred (9, 10)

The adjacency list table is denormalized in several ways. We are
modeling both the Personnel and the Organizational chart in one table.
But for the sake of saving space, pretend that the names are job
titles and that we have another table which describes the Personnel
that hold those positions.

Another problem with the adjacency list model is that the
boss_emp_name and employee columns are the same kind of thing (i.e.
identifiers of personnel), and therefore should be shown in only one
column in a normalized table. To prove that this is not normalized,
assume that "Chuck" changes his name to "Charles"; you have to change
his name in both columns and several places. The defining
characteristic of a normalized table is that you have one fact, one
place, one time.

The final problem is that the adjacency list model does not model
subordination. Authority flows downhill in a hierarchy, but If I fire
Chuck, I disconnect all of his subordinates from Albert. There are
situations (i.e. water pipes) where this is true, but that is not the
expected situation in this case.


To show a tree as nested sets, replace the nodes with ovals, and
then nest subordinate ovals inside each other. The root will be the
largest oval and will contain every other node. The leaf nodes will be
the innermost ovals with nothing else inside them and the nesting will
show the hierarchical relationship. The (lft, rgt) columns (I cannot
use the reserved words LEFT and RIGHT in SQL) are what show the
nesting. This is like XML, HTML or parentheses.

At this point, the boss_emp_name column is both redundant and
denormalized, so it can be dropped. Also, note that the tree structure
can be kept in one table and all the information about a node can be
put in a second table and they can be joined on employee number for
queries.

To convert the graph into a nested sets model think of a little worm
crawling along the tree. The worm starts at the top, the root, makes a
complete trip around the tree. When he comes to a node, he puts a
number in the cell on the side that he is visiting and increments his
counter. Each node will get two numbers, one of the right side and one
for the left. Computer Science majors will recognize this as a
modified preorder tree traversal algorithm. Finally, drop the unneeded
OrgChart.boss_emp_name column which used to represent the edges of a
graph.

This has some predictable results that we can use for building
queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
are defined by the BETWEEN predicate; etc. Here are two common queries
which can be used to build others:

1. An employee and all their Supervisors, no matter how deep the
tree.

SELECT O2.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp_name = :myemployee;

2. The employee and all their subordinates. There is a nice
symmetry here.

SELECT O1.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O2.emp_name = :myemployee;

3. Add a GROUP BY and aggregate functions to these basic queries and
you have hierarchical reports. For example, the total salaries which
each employee controls:

SELECT O2.emp_name, SUM(S1.salary_amt)
FROM OrgChart AS O1, OrgChart AS O2,
Salaries AS S1
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp_name = S1.emp_name
GROUP BY O2.emp_name;

4. To find the level of each emp_name, so you can print the tree as
an indented listing.

SELECT T1.node,
SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
+ CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
FROM Tree AS T1, Tree AS T2
WHERE T2.lft <= T1.lft
GROUP BY T1.node;

An untested version of this using OLAP functions might be better
able to use the ordering.

SELECT T1.node,
SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
+ CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
OVER (ORDER BY T1.lft
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS lvl
FROM Tree AS T1, Tree AS T2
WHERE T2.lft <= T1.lft;

5. The nested set model has an implied ordering of siblings which
the adjacency list model does not. To insert a new node, G1, under
part G. We can insert one node at a time like this:

BEGIN ATOMIC
DECLARE rightmost_spread INTEGER;
SET rightmost_spread
= (SELECT rgt
FROM Frammis
WHERE part = 'G');
UPDATE Frammis
SET lft = CASE WHEN lft > rightmost_spread
THEN lft + 2
ELSE lft END,
rgt = CASE WHEN rgt >= rightmost_spread
THEN rgt + 2
ELSE rgt END
WHERE rgt >= rightmost_spread;

INSERT INTO Frammis (part, lft, rgt)
VALUES ('G1', rightmost_spread, (rightmost_spread + 1));

COMMIT WORK;
END;

The idea is to spread the (lft, rgt) numbers after the youngest
child of the parent, G in this case, over by two to make room for the
new addition, G1. This procedure will add the new node to the
rightmost child position, which helps to preserve the idea of an age
order among the siblings.

6. To convert a nested sets model into an adjacency list model:

SELECT B.emp_name AS boss_emp_name, E.emp_name
FROM OrgChart AS E
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE E.lft > S.lft
AND E.lft < S.rgt);

7. To find the immediate parent of a node:

SELECT MAX(P2.lft), MIN(P2.rgt)
FROM Personnel AS P1, Personnel AS P2
WHERE P1.lft BETWEEN P2.lft AND P2.rgt
AND P1.emp_name = :my_emp_name;

I have a book on TREES & HIERARCHIES IN SQL which you can get at
Amazon.com right now.

Reply With Quote
  #4  
Old   
Frank Millman
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-16-2011 , 01:33 AM



On May 13, 7:22*pm, "Fred." <ghrno-goo... (AT) yahoo (DOT) com> wrote:
Quote:
On May 13, 3:22*am, Frank Millman <fr... (AT) chagford (DOT) com> wrote:


Hi all

I have read up on 'nested sets', and I like the concept.

It seems to me that there are (at least) two possible scenarios -

1. The table *is* the data - e.g. staff members in a hierarchy.

2. The table is used as navigation for a separate data table. For
example, you might have thousands of product codes, and you want to
set up a separate table to provide a hierarchical 'view' of the
product code table, allowing a user to drill down to find the code
they are looking for.

The problem with the second scenario is that 'branch' nodes and 'leaf'
nodes are conceptually different.

A 'branch' node may require columns for 'code' and 'description'.

A 'leaf' node requires a column that contains a foreign key to the
products table.

It is tricky to ensure that there is one, and only one, 'leaf' node
for each row in the product table.

It is tricky to ensure that 'branch' codes are unique, as 'leaf' nodes
will have a null 'code', and MS SQL Server does not allow more than
one null in a unique column.

Is there an approach that satisfies the requirement, but avoids these
problems?

Thanks for any insights.

Frank Millman
Thanks for the reply, Fred

Quote:
The models I've seen give the leaf nodes in the hierarchy left and
right traversal numbers as well. *Except while the hierarchy is being
modified each of the traversal numbers will be unique to the node even
using integer traversal numbers, so each of the traversal numbers
fields can have a unique constraint. *If you are using rational rather
than integer traversal numbers then I'd think it would be possible to
modify the hierarchy *without violating the uniqueness contraint. *Of
course, the left and right tranversal number sets should be non-
intersecting, and have a bunch of other constraints, but there is a
limit to what you can check.

Agreed - the 'nested set' model requires that *every* node has a left
and right traversal number. I was referring to the additional columns
that may be required - in my second scenario, they differ depending on
whether the node is a 'branch' or a 'leaf'.

Quote:
As far as linking the hierarchy table and the product table, it might
be a good idea to look at what your business rules are going to
become. *While currently you have the product table and are looking at
the hierarchy table as an index, it might be reasonable in the future
to create the leaf node first, when you first conceive of making or
carrying a product, and later create a product entry which points to
the leaf node. *This use would make a descriptor applicable to the
leaf node, and at the same time remove the pointer to the product
table. *Once things were set up, the pointer in the product table to
the leaf node could be uniquely constrained.

That is an interesting idea. I will give it more thought.

Quote:
The issue here, which you would have to deal with in any case, is that
some ways of adding to hierarchy can promote a leaf node which has an
associated product. *If you want to control this, you could make an
indicater or not a node is a leaf node part of the primary key, and
constrain that portion of the foreign key in the product table to have
the leaf value only.

I agree that this is an issue. I was planning to control this at
application level rather than through SQL constraints.

Quote:
Fred.
Thanks for the ideas - much appreciated.

Frank

Reply With Quote
  #5  
Old   
Frank Millman
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-16-2011 , 01:50 AM



On May 15, 3:02*pm, -CELKO- <jcelko... (AT) earthlink (DOT) net> wrote:
Quote:
Since SQL is a set oriented language, this is a better model than the
usual adjacency list approach you see in most text books. Let us
define a simple OrgChart table like this.

* *CREATE TABLE OrgChart
* (emp_name CHAR(10) NOT NULL PRIMARY KEY,
* lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
* rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
* CONSTRAINT order_okay CHECK (lft < rgt));

* OrgChart
* emp_name lft rgt
* ======================
* 'Albert' 1 12
* 'Bert' 2 3
* 'Chuck' 4 11
* 'Donna' 5 6
* 'Eddie' 7 8
* 'Fred' 9 10

* The (lft, rgt) pairs are like tags in a mark-up language, or parens
in algebra, BEGIN-END blocks in Algol-family programming languages,
etc. -- they bracket a sub-set. This is a set-oriented approach to
trees in a set-oriented language. Technically, there ought to be a
separate personnel and organization table, but bear with me,

*The organizational chart would look like this as a directed graph:

* Albert (1, 12)
* */ * * * * * * \
* / * * * * * * * \
* Bert (2, 3) *Chuck (4, 11)
* * * * * */ * * * * * * * * \
* * * * * / * * * * * *| * * * \
* * * * */ * * * * * * | * * * *\
* * * * / * * * * * * *| * * * * *\

* Donna (5, 6) Eddie (7, 8) Fred (9, 10)

* The adjacency list table is denormalized in several ways. We are
modeling both the Personnel and the Organizational chart in one table.
But for the sake of saving space, pretend that the names are job
titles and that we have another table which describes the Personnel
that hold those positions.

* Another problem with the adjacency list model is that the
boss_emp_name and employee columns are the same kind of thing (i.e.
identifiers of personnel), and therefore should be shown in only one
column in a normalized table. To prove that this is not normalized,
assume that "Chuck" changes his name to "Charles"; you have to change
his name in both columns and several places. The defining
characteristic of a normalized table is that you have one fact, one
place, one time.

* The final problem is that the adjacency list model does not model
subordination. Authority flows downhill in a hierarchy, but If I fire
Chuck, I disconnect all of his subordinates from Albert. There are
situations (i.e. water pipes) where this is true, but that is not the
expected situation in this case.

* To show a tree as nested sets, replace the nodes with ovals, and
then nest subordinate ovals inside each other. The root will be the
largest oval and will contain every other node. The leaf nodes will be
the innermost ovals with nothing else inside them and the nesting will
show the hierarchical relationship. The (lft, rgt) columns (I cannot
use the reserved words LEFT and RIGHT in SQL) are what show the
nesting. This is like XML, HTML or parentheses.

* At this point, the boss_emp_name column is both redundant and
denormalized, so it can be dropped. Also, note that the tree structure
can be kept in one table and all the information about a node can be
put in a second table and they can be joined on employee number for
queries.

* To convert the graph into a nested sets model think of a little worm
crawling along the tree. The worm starts at the top, the root, makes a
complete trip around the tree. When he comes to a node, he puts a
number in the cell on the side that he is visiting and increments his
counter. Each node will get two numbers, one of the right side and one
for the left. Computer Science majors will recognize this as a
modified preorder tree traversal algorithm. Finally, drop the unneeded
OrgChart.boss_emp_name column which used to represent the edges of a
graph.

* This has some predictable results that we can use for building
queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
are defined by the BETWEEN predicate; etc. Here are two common queries
which can be used to build others:

*1. An employee and all their Supervisors, no matter how deep the
tree.

* SELECT O2.*
* FROM OrgChart AS O1, OrgChart AS O2
* WHERE O1.lft BETWEEN O2.lft AND O2.rgt
* AND O1.emp_name = :myemployee;

* *2. The employee and all their subordinates. There is a nice
symmetry here.

* SELECT O1.*
* FROM OrgChart AS O1, OrgChart AS O2
* WHERE O1.lft BETWEEN O2.lft AND O2.rgt
* AND O2.emp_name = :myemployee;

* 3. Add a GROUP BY and aggregate functions to these basic queries and
you have hierarchical reports. For example, the total salaries which
each employee controls:

* *SELECT O2.emp_name, SUM(S1.salary_amt)
* FROM OrgChart AS O1, OrgChart AS O2,
* Salaries AS S1
* WHERE O1.lft BETWEEN O2.lft AND O2.rgt
* AND O1.emp_name = S1.emp_name
* GROUP BY O2.emp_name;

* *4. To find the level of each emp_name, so you can print the tree as
an indented listing.

* SELECT T1.node,
* SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
* + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
* FROM Tree AS T1, Tree AS T2
* WHERE T2.lft <= T1.lft
* GROUP BY T1.node;

* An untested version of this using OLAP functions might be better
able to use the ordering.

* SELECT T1.node,
* SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
* + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
* OVER (ORDER BY T1.lft
* RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS lvl
* FROM Tree AS T1, Tree AS T2
* WHERE T2.lft <= T1.lft;

* 5. The nested set model has an implied ordering of siblings which
the adjacency list model does not. To insert a new node, G1, under
part G. We can insert one node at a time like this:

* *BEGIN ATOMIC
* DECLARE rightmost_spread INTEGER;
* SET rightmost_spread
* = (SELECT rgt
* FROM Frammis
* WHERE part = 'G');
* UPDATE Frammis
* SET lft = CASE WHEN lft > rightmost_spread
* THEN lft + 2
* ELSE lft END,
* rgt = CASE WHEN rgt >= rightmost_spread
* THEN rgt + 2
* ELSE rgt END
* WHERE rgt >= rightmost_spread;

* INSERT INTO Frammis (part, lft, rgt)
* VALUES ('G1', rightmost_spread, (rightmost_spread + 1));

* COMMIT WORK;
* END;

* The idea is to spread the (lft, rgt) numbers after the youngest
child of the parent, G in this case, over by two to make room for the
new addition, G1. This procedure will add the new node to the
rightmost child position, which helps to preserve the idea of an age
order among the siblings.

* 6. To convert a nested sets model into an adjacency list model:

* SELECT B.emp_name AS boss_emp_name, E.emp_name
* FROM OrgChart AS E
* LEFT OUTER JOIN
* OrgChart AS B
* ON B.lft
* = (SELECT MAX(lft)
* FROM OrgChart AS S
* WHERE E.lft > S.lft
* AND E.lft < S.rgt);

* 7. To find the immediate parent of a node:

* SELECT MAX(P2.lft), MIN(P2.rgt)
* FROM Personnel AS P1, Personnel AS P2
* WHERE P1.lft BETWEEN P2.lft AND P2.rgt
* AND P1.emp_name = :my_emp_name;

* I have a book on TREES & HIERARCHIES IN SQL which you can get at
Amazon.com right now.
Thanks for the reply, Joe.

Unless I am missing something, you have simply reproduced the 'nested
set' theory that I have read in a number of other places on the web.
You do not seem to have addressed the particular issue that I raised
in my original post.

While on the subject, here is a question that has always puzzled me.
In your example of inserting a new node, you have the following -

rgt = CASE WHEN rgt >= rightmost_spread
THEN rgt + 2
ELSE rgt END
WHERE rgt >= rightmost_spread;

As the WHERE clause restricts the selection, why do you need the CASE
statement at all. In my experiments I have changed this to

rgt = rgt + 2

and it seems to work ok. Is there a problem with this?

Thanks

Frank

Reply With Quote
  #6  
Old   
Erwin
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-17-2011 , 04:47 AM



Quote:
Since SQL is a set oriented language, this is a better model than the
usual adjacency list approach you see in most text books.
Yolks. "If SQL is a set-oriented language, then the nested sets model
is a better one than the adjacency list model." That one should go
into RDB's hall of fame.



Quote:
The adjacency list table is denormalized in several ways.
Is that supposed to mean "denormalized" as in "not in 2NF", "not in
3NF", "not in BCNF" ? Exactly which of the normal forms is violated
by an adjacency table {BOSSID, EMPID} KEY {BOSSID, EMPID} ?



Quote:
We are
modeling both the Personnel and the Organizational chart in one table.
No we are not. An adjacency list table records/lists/documents/
models/... the organizational chart, NOT the personnel details.

If one has a need for recording personnel details, then that should go
in another table, with another external predicate. Of course there
might be constraints to enforce, but that is just a matter of defining
all the appropriate CREATE ASSERTIONs (or implementing their
equivalents using triggers).



Quote:
Another problem with the adjacency list model is that the
boss_emp_name and employee columns are the same kind of thing (i.e.
identifiers of personnel), and therefore should be shown in only one
column in a normalized table.
No database table should ever have two columns of the same type ?



Quote:
To prove that this is not normalized,
assume that "Chuck" changes his name to "Charles"; you have to change
his name in both columns and several places.
This has nothing to do with normalization.



Quote:
The defining characteristic of a normalized table is that you have one fact, one
place, one time.

The final problem is that the adjacency list model does not model
subordination.
The employee identified by employee number EMPID has the employee
identified by employee number BOSSID as his direct chef.

Please explain how this "does not model subordination".



Quote:
Authority flows downhill in a hierarchy, but If I fire
Chuck, I disconnect all of his subordinates from Albert.
If Chuck gets fired, then you remove or end-date the row for Chuck in
the Personnel table.

Depending on whether there are constraints between the OrgChart table
and the Personnel table, this might require that the organizational
chart is adapted first, reflecting the fact that all of Chuck's former
subordinates now have a new chef, prior to doing this update in the
Personnel table.



Quote:
5. The nested set model has an implied ordering of siblings which
the adjacency list model does not.
Now THAT is a problem.

With the nested sets model.

Forcing orderings upon the user where none are needed.

Reply With Quote
  #7  
Old   
Erwin
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-17-2011 , 05:08 AM



On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

Quote:
Thanks for the reply, Joe.
I have documented some of the fallacies in his reply.

You might be interested.



Following are my remarks to your original post.

Quote:
1. The table *is* the data - e.g. staff members in a hierarchy.
Can you give me an example of a table that "is NOT" the data ?



Quote:
2. The table is used as navigation for a separate data table. For
example, you might have thousands of product codes, and you want to
set up a separate table to provide a hierarchical 'view' of the
product code table, allowing a user to drill down to find the code
they are looking for.
"How data is used" does not make a difference as to whether it is data
or not.



Quote:
The problem with the second scenario is that 'branch' nodes and 'leaf'
nodes are conceptually different.
Then you are outside the realm of graphs, and whatever your problem
might be, the nested sets stuff is not a solution for it.

The essence of graphs is that their nodes are NOT "conceptually
different".



Quote:
It is tricky to ensure that there is one, and only one, 'leaf' node
for each row in the product table.
It is tricky to ensure that 'branch' codes are unique, as 'leaf' nodes
will have a null 'code', and MS SQL Server does not allow more than
one null in a unique column.
Graphs are tricky no matter what.



Quote:
Is there an approach that satisfies the requirement, but avoids these
problems?
You have not stated what "the requirement" is.

Reply With Quote
  #8  
Old   
Frank Millman
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-17-2011 , 06:16 AM



On May 17, 12:08*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:
Quote:
On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:
Hi Erwin

Thanks for your comments - you have got me thinking.

I have stripped most of the comments, because what I have left is the
crux of the matter.

Quote:
You have not stated what "the requirement" is.
Let's take an example we should all be familiar with - a file system.

Assume that we have a lot of files, and we have a table where each row
represents one file, with columns such as file name, disk address,
date created, date last modified, etc.

You can sort the table by name, creation date, etc. However, it is
getting difficult to manage, so a bright spark comes up with the idea
of directories/folders.

The user can create directories on demand, place directories within
directories, assign files to directories, move files between
directories, etc.

The requirement is to create a database structure to represent the
directories.

My guess is that you could use an adjacency list or a nested set for
this purpose. However, where I am getting stuck is, how does one link
an entry in the 'files' table to an entry in the 'directories' table,
to indicate that a particular file resides in a particular directory.

I have been trying to create an entry in the 'directories' table that
represents, or points to, a 'file'. From your comments, it would
appear that this is the wrong approach.

I could carry on and speculate further, but I think this is a good
time to pause and ask if this is a good example, and ask if there is a
preferred solution.

TIA

Frank

Reply With Quote
  #9  
Old   
Erwin
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-17-2011 , 10:16 AM



On 17 mei, 13:16, Frank Millman <fr... (AT) chagford (DOT) com> wrote:
Quote:
On May 17, 12:08*pm, Erwin <e.sm... (AT) myonline (DOT) be> wrote:

On 16 mei, 08:50, Frank Millman <fr... (AT) chagford (DOT) com> wrote:

Hi Erwin

Thanks for your comments - you have got me thinking.

I have stripped most of the comments, because what I have left is the
crux of the matter.

You have not stated what "the requirement" is.

Let's take an example we should all be familiar with - a file system.

Assume that we have a lot of files, and we have a table where each row
represents one file, with columns such as file name, disk address,
date created, date last modified, etc.

You can sort the table by name, creation date, etc. However, it is
getting difficult to manage, so a bright spark comes up with the idea
of directories/folders.

The user can create directories on demand, place directories within
directories, assign files to directories, move files between
directories, etc.

The requirement is to create a database structure to represent the
directories.

My guess is that you could use an adjacency list or a nested set for
this purpose. However, where I am getting stuck is, how does one link
an entry in the 'files' table to an entry in the 'directories' table,
to indicate that a particular file resides in a particular directory.

I have been trying to create an entry in the 'directories' table that
represents, or points to, a 'file'. From your comments, it would
appear that this is the wrong approach.

I could carry on and speculate further, but I think this is a good
time to pause and ask if this is a good example, and ask if there is a
preferred solution.

TIA

Frank
That you're now thinking is a good thing :-)

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". This is not the case for file systems. *IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

So if file systems are what's in your mind, then what you would
typically need (for representing that in a database) is a single table
where the _fully qualified_ filename is a key/identifier. And as soon
as you have that, there typically is no longer a need for explicitly
recording the "identity" of the parent directory/folder, as that is
already implied by the full name of the file itself.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". Relational database designs
do not include "pointers". And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.

Reply With Quote
  #10  
Old   
Erwin
 
Posts: n/a

Default Re: How to 'normalise' this scenario - 05-17-2011 , 10:40 AM



On 17 mei, 17:16, Erwin <e.sm... (AT) myonline (DOT) be> wrote:
Quote:
That you're now thinking is a good thing :-)

The contents of file systems are not the best example for discussing
management of graph data, because the "identity" (the means for
identification) of a file typically is a combination (concatenation)
of the "identity" of its containing directory and the file's own
distinguishing name _within that directory_. *This is different
compared to, say, bill-of-material structures or genealogical
relationships, where the "nodes" (parts, people, ...) have a property
"of their own" that uniquely identifies them "within the entire
universe". *This is not the case for file systems. **IX systems may
have multiple directories each containing a /bin "file", and all
those /bin "files" are distinct things. *Windows systems have multiple
directories each containing an "Application data" folder, and those
are all distinct.

So if file systems are what's in your mind, then what you would
typically need (for representing that in a database) is a single table
where the _fully qualified_ filename is a key/identifier. *And as soon
as you have that, there typically is no longer a need for explicitly
recording the "identity" of the parent directory/folder, as that is
already implied by the full name of the file itself.

I don't fully see what you mean by "creating an entry in the
directories table that points to a file". *Relational database designs
do not include "pointers". *And you don't say on the directory level
which files it contains, instead you say on the file level to which
directory each file belongs.- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -
A small example of a single-table database representing the contents
of a file system :

(not valid SQL, but I hope you'll see the idea)

TABLE FILES {FILENAME:VARCHAR(...) DIRNAME:VARCHAR(...) other
attributes (creationdate, ...) here} KEY {FILENAME}
FOREIGN KEY FILES{DIRNAME} REFERENCES FILES{FILENAME}
CHECK CONSTRAINT LENGTH(DIRNAME) < LENGTH(FILENAME) && SUBSTR(FILENAME,
0,LENGTH(DIRNAME) == DIRNAME) &&
SUBSTR(FILENAME,LENGTH(DIRNAME),LENGTH(DIRNAME)+1) == '/'

sample data :

"","" /*the root point */
"\PF",""
"\UD",""
"\PF\IBM","\PF"
"\PF\IBM\DB2","\PF\IBM"

Note that this way of dealing with the "root" entry is just a hack and
at any rate it doesn't satisfy the check constraint as stated. But
that's not the main point, so I gloss over it.

The thing is, while this is indeed a file containment graph
represented in the form of an adjacency list, you don't really need
the adjacency list to answer queries such as "list me all files
contained directly or indirectly in the \PF directory". Owing to the
fact that the DIRNAME value is always derivable from the FILENAME.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.