dbTalk Databases Forums  

how to suppress carefully a recursive tree

comp.databases.theory comp.databases.theory


Discuss how to suppress carefully a recursive tree in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #41  
Old   
David Cressey
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 05:02 AM







"fj" <francois.jacq (AT) irsn (DOT) fr> wrote

Quote:
On 23 jan, 01:12, David BL <davi... (AT) iinet (DOT) net.au> wrote:
On Jan 23, 1:10 am, fj <francois.j... (AT) irsn (DOT) fr> wrote:

On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind
of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ... a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.

Would the mark and sweep algorithm suit you purposes?

No

the mark and sweep algorithm needs to know all the tree roots in order
to mark all the used objects.

In my example I indicates that the node b3 belongs to the root r2
just for information to explain the storage count. But the deletion
routine I want to write just receives "r1" as argument, nothing else.
It does not know that r2 exists too ... It is just able to detect the
existence of other objects via the storage count of b3 which is a
little bit too high for being just referenced by r1 !

How is the tree represented? If it's in the usual child-parent
relationship, then you will have to traverse the sub tree in order to locate
all its nodes.

If it's in the nested set representation, then there is a simple query to
return every node in the subtree.





Reply With Quote
  #42  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM






On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #43  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM



On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #44  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM



On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #45  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM



On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #46  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM



On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #47  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM



On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #48  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM



On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #49  
Old   
Jan Hidders
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 06:40 AM



On 22 jan, 17:10, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ...
Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

Quote:
a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.
Ok, understood.

Quote:
r1
* * -> b1
* * * * -> b2 -> ...
* * * * -> b3 -> ...
* * -> b2
* * * * -> b1 -> ...
* * * * -> b1 -> ...
* * -> b3
* * * * -> b4

r2
* * -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2 *could perhaps belong to a bigger
tree.



A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0) *b1(3) *b2(2) *b3(3) *b4(1) *r2(0)

Which would correspond to the number of incoming edges, yes?

Yes





In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
* *r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often. *My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

I don't use SQL but it does not matter :
It might. Outside the context of an SQL database my advice might
actually be counterproductive.

Quote:
I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).
You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders


Reply With Quote
  #50  
Old   
David BL
 
Posts: n/a

Default Re: how to suppress carefully a recursive tree - 01-23-2008 , 07:32 AM



On Jan 23, 4:10 pm, fj <francois.j... (AT) irsn (DOT) fr> wrote:
Quote:
On 23 jan, 01:12, David BL <davi... (AT) iinet (DOT) net.au> wrote:





On Jan 23, 1:10 am, fj <francois.j... (AT) irsn (DOT) fr> wrote:

On 22 jan, 15:52, Jan Hidders <hidd... (AT) gmail (DOT) com> wrote:

On 22 jan, 12:04, fj <francois.j... (AT) irsn (DOT) fr> wrote:

I know how to suppress a normal tree but I meet the following kind of
situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

No : I want to destroy, remove, kill ... a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.

Would the mark and sweep algorithm suit you purposes?

No

the mark and sweep algorithm needs to know all the tree roots in order
to mark all the used objects.

In my example I indicates that the node b3 belongs to the root r2
just for information to explain the storage count. But the deletion
routine I want to write just receives "r1" as argument, nothing else.
It does not know that r2 exists too ... It is just able to detect the
existence of other objects via the storage count of b3 which is a
little bit too high for being just referenced by r1 !
For a directed edge from node n1 towards node n2, let n2 be referred
to as the "destination" node.

n1 ---->---- n2
source destination


Maybe the following approach would work:

It is assumed that node r1 is the root of the "tree" to be deleted.

Step 1: Perform a trace (in the manner of a mark phase) treating r1
as the one and only root and decrementing the storage counter on the
destination node for each edge that is traversed. ie find the set X
of nodes that are reachable from r1. Nodes that have already been
visited are flagged to ensure each reachable node is visited exactly
once. For each node during the recurse, retrieve the set of outgoing
edges yielding a set of destination nodes. Decrement each destination
node's storage counter (this must be done for all the destination
nodes - ie including the nodes that have already been visited). Then
recurse into the destination nodes that haven't been visited before.

Step 2: Find the subset R of X corresponding to nodes whose storage
counter hasn't fallen to zero.

Step 3: Run a mark and sweep using roots R and sweeping over X. For
each edge traversed in the mark phase, increment the storage counter
on the destination node.


In your example I think step 1 will find X = {r1,b1,b2,b3,b4} and only
b3 will end up with a nonzero storage counter, so step 2 yields R =
{b3}. Then in step 3, a trace from roots {b3} will bump the storage
counter of b4 back up to 1 and only {r1,b1,b2} will be deleted.

Am I making sense?



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.