Re: trees, trees, trees - 06-24-2012 , 11:56 AM
you've lost me. I do not know what "edges of a graph" are. What are columns p and c in your
table? What do they contain?
Re: trees, trees, trees - 06-24-2012 , 05:13 PM
Am 22.06.2012 02:03, schrieb geos:
a primary key (and it should better be one with inherent meaning, not just a
sequence). Otherwise it is not a relational table, but just a data dump
carrying no meaning whatsoever.
Please consider: if you don't have a PK, then not even the uniqueness of a
complete record is guarenteed, records can just be duplicated as often as
coincedence, bad luck or program bugs do like to make it happen. And now think
further what happens if you join such a "table" to other table in complex
queries, e.g. in aggregation queries computing sums and such. Get the point?
You should better rethink you design, or you will run into big data
consistency problems later. Trust me, I've seen such shit happen with bad design.
And if you have a tree model, you should be able to answer these question:
- can an entity instance which occurs as a child record in the tree be
uniquely identified only because of its position in the tree hierarchy or is
in some way uniquey by itself?
- is it allowed to have multiple instances of the same child element to appear
at the same level (as children of the same parent) or is this disallowed?
- is it allowed to have multiple roots in the tree, or is only one root allowed?
All these findings need to be enforced by constraints, otherwise your tree
data is going to violate the rules at some point in time or the other.
If you haven't thought yet about such kind of questions, my guess is that you
probably haven't thought deep enough about the entities in real life that your
DB is trying to model.
Climb the mountain not to plant your flag, but to embrace the challenge,
enjoy the air and behold the view. Climb it so you can see the world,
not so the world can see you. -- David McCullough Jr.
Re: trees, trees, trees - 06-24-2012 , 06:26 PM
Gerard H. Pille wrote:
write down a few dots. connect them with arrows any way you like. that's
a graph. an arrow corresponds to edge. an arrow introduces orientation
aka parent-child relation. assign a letter to every dot. now you can
describe the edge following the arrow, like FROM a TO b. the table
contain edges, ie. entries FROM a TO b, FROM 7 TO 11 etc. the task is to
identify all graphs and name them. the naming convention is that you
pick any "dot name" composing the graph and assign it to graph. you can
identify a graph only by identifying all edges belonging to it. a
grouping column was introduced for debugging purposes only. it helps you
to easily spot edges which form a graph so that you can compare the
output of your algorithm at a glance.
Re: trees, trees, trees - 06-24-2012 , 06:46 PM
Peter Schneider wrote:
solve the problem. and it's not the point of my post either.
imagine there is a table, that you're not an owner, that you're not
designer, that you don't even intend to model anything etc. you're just
a guy who likes solving puzzles. you were given this table. it has some
structure. it's a good structure, it has some business meaning for
someone (remember: you're not designer). and imagine that there is a
quite a prize for just solving the puzzle. not for asking questions and
bringing up other interesting, but unrelated subjects (they give minus
points for this and call IRS).
so, can you solve the problem with pure SQL? if there is anything that
needs to be explained about the problem please ask, I will explain in
more datails. but I think that the data set I presented along with the
expected results are self-explanatory.
Re: trees, trees, trees - 06-25-2012 , 03:54 PM
On Jun 24, 5:46*pm, geos <g... (AT) nowhere (DOT) invalid> wrote:
Re: trees, trees, trees - 06-25-2012 , 04:25 PM
client data but I'm not 100% sure it's doesn't have any weak points
though. I am still testing it. I will post the code after I clean it up
along with some request for suggestions about improving its performance
(though I'm not convinced if this makes any sense here).
for short: I used sys_connect_by_path on pre-prepared set of original
data, regexp_replace to remove duplicate nodes (it improves
performance), and xmlcast for splitting the paths (instead of level/dual
trick, it took too long).
can you solve this problem with pure sql? do you know some equivalent
non-hierarchical approach that works?
Re: trees, trees, trees - 06-25-2012 , 05:00 PM
works on the data provided here as an example, but I'm no 100% sure if
this is correct approach and doesn't have any flaws. I would appreciate
if you spot such flaws or share some data for which it doesn't work so
that I could analyze it and improve the code.
you can see some xml usage. this is because I found it has better
performance in 11g than "level dual splitting comma separated string
into rows in oracle" approach.
what I really don't like in this code is its overbloated
replace/regexp_replace section (it's ugly!). I am not good at this, and
basically I wanted to have a pattern which finds the longest concecutive
repetetive substring in a string. for example:
1,2,2,4: 2 is the longest, it repeats twice
1,2,3,2,3,2,3,4: 2,3 is the longest, it repeats three times
or -- better! -- to replace any duplicate substrings with its one instance.
please treat this code as beta-code, an entry point for any discussion
(about it ). I would appreciate any performance improvments hints,
suggestions or ideas from people who have more in depth
theoretical/practical insight in the subject.
please share some ideas concepts thoughts about it.