dbTalk Databases Forums  

trees, trees, trees

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss trees, trees, trees in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Gerard H. Pille
 
Posts: n/a

Default Re: trees, trees, trees - 06-24-2012 , 11:56 AM






geos wrote:
Quote:
Gerard H. Pille wrote:
I've been keeping both my eyes on performance in programming for over 35 years now. Of course,
I can only do as much as your design allows. Given the correct indexing, tree walks can be
performant enough. You wouldn't want me to write the select statement for you, would you?

sure I would. why wouln't ask for more explicit help when struggling with the problem for some
time? writing a statement that can be analyzed is great help too, especially a query from
someone with such big experience compared to mine. the information about hierarchical queries
was obvious from the beginning.

anyway, if you would like to help here is a complete script that fills table with sample data
and expected result:

http://geos2005.republika.pl/table.sql
http://geos2005.republika.pl/result.jpg

GRP column in the table is just helpful for spotting a groups of records but it can not be used
in a query. generally the entries in the table are edges of a graph, can be many nodes in a
graph, can be connected in any way between each other allowing loops.

thank you,
geos

I'm not sure I'll be able to help you. I did understand your original requirements, but now
you've lost me. I do not know what "edges of a graph" are. What are columns p and c in your
table? What do they contain?

Reply With Quote
  #12  
Old   
Peter Schneider
 
Posts: n/a

Default Re: trees, trees, trees - 06-24-2012 , 05:13 PM






Am 22.06.2012 02:03, schrieb geos:
Quote:
Peter Schneider wrote:
Are you then, in turn, going to share part of your pay check with Gerard? ;-)

do you think this is the only criteria people help other people? but yes, I
would consider that if I knew more details.

That is not a database table. It has no primary key, no constraints, no
indexes... It is just a dumped Excel table. Any query against it will always
do full table scans.

it doesn't have to have, because there is no constraints nor primary key. but
I strongly disagree. Each table in a relational database has to have at least
a primary key (and it should better be one with inherent meaning, not just a
sequence). Otherwise it is not a relational table, but just a data dump
carrying no meaning whatsoever.

Please consider: if you don't have a PK, then not even the uniqueness of a
complete record is guarenteed, records can just be duplicated as often as
coincedence, bad luck or program bugs do like to make it happen. And now think
further what happens if you join such a "table" to other table in complex
queries, e.g. in aggregation queries computing sums and such. Get the point?

You should better rethink you design, or you will run into big data
consistency problems later. Trust me, I've seen such shit happen with bad design.

And if you have a tree model, you should be able to answer these question:

- can an entity instance which occurs as a child record in the tree be
uniquely identified only because of its position in the tree hierarchy or is
in some way uniquey by itself?

- is it allowed to have multiple instances of the same child element to appear
at the same level (as children of the same parent) or is this disallowed?

- is it allowed to have multiple roots in the tree, or is only one root allowed?

All these findings need to be enforced by constraints, otherwise your tree
data is going to violate the rules at some point in time or the other.

If you haven't thought yet about such kind of questions, my guess is that you
probably haven't thought deep enough about the entities in real life that your
DB is trying to model.

Regards
Peter

--
Climb the mountain not to plant your flag, but to embrace the challenge,
enjoy the air and behold the view. Climb it so you can see the world,
not so the world can see you. -- David McCullough Jr.

Reply With Quote
  #13  
Old   
geos
 
Posts: n/a

Default Re: trees, trees, trees - 06-24-2012 , 06:26 PM



Gerard H. Pille wrote:
Quote:
I'm not sure I'll be able to help you. I did understand your original
requirements, but now you've lost me. I do not know what "edges of a
graph" are. What are columns p and c in your table? What do they contain?
what is not clear to you? I thought everything was said and shown.

write down a few dots. connect them with arrows any way you like. that's
a graph. an arrow corresponds to edge. an arrow introduces orientation
aka parent-child relation. assign a letter to every dot. now you can
describe the edge following the arrow, like FROM a TO b. the table
contain edges, ie. entries FROM a TO b, FROM 7 TO 11 etc. the task is to
identify all graphs and name them. the naming convention is that you
pick any "dot name" composing the graph and assign it to graph. you can
identify a graph only by identifying all edges belonging to it. a
grouping column was introduced for debugging purposes only. it helps you
to easily spot edges which form a graph so that you can compare the
output of your algorithm at a glance.

thank you,
geos

Reply With Quote
  #14  
Old   
geos
 
Posts: n/a

Default Re: trees, trees, trees - 06-24-2012 , 06:46 PM



Peter Schneider wrote:
Quote:
I strongly disagree. Each table in a relational database has to have at
least a primary key [...]
[...]If you haven't thought yet about such kind of questions, my guess is
that you probably haven't thought deep enough about the entities in real
life that your DB is trying to model.
you said many interesting things of which no one makes it closer to
solve the problem. and it's not the point of my post either.

imagine there is a table, that you're not an owner, that you're not
designer, that you don't even intend to model anything etc. you're just
a guy who likes solving puzzles. you were given this table. it has some
structure. it's a good structure, it has some business meaning for
someone (remember: you're not designer). and imagine that there is a
quite a prize for just solving the puzzle. not for asking questions and
bringing up other interesting, but unrelated subjects (they give minus
points for this and call IRS).

so, can you solve the problem with pure SQL? if there is anything that
needs to be explained about the problem please ask, I will explain in
more datails. but I think that the data set I presented along with the
expected results are self-explanatory.

thank you,
geos

Reply With Quote
  #15  
Old   
ddf
 
Posts: n/a

Default Re: trees, trees, trees - 06-25-2012 , 03:54 PM



On Jun 24, 5:46*pm, geos <g... (AT) nowhere (DOT) invalid> wrote:
Quote:
so, can you solve the problem with pure SQL?

thank you,
geos
Can you, and have you tried since posting this originally?


David Fitzjarrell

Reply With Quote
  #16  
Old   
geos
 
Posts: n/a

Default Re: trees, trees, trees - 06-25-2012 , 04:25 PM



ddf wrote:
Quote:
On Jun 24, 5:46 pm, geos <g... (AT) nowhere (DOT) invalid> wrote:
so, can you solve the problem with pure SQL?

thank you,
geos

Can you, and have you tried since posting this originally?

David Fitzjarrell
yes, I have the solution already. is is quite simple. it works with
client data but I'm not 100% sure it's doesn't have any weak points
though. I am still testing it. I will post the code after I clean it up
along with some request for suggestions about improving its performance
(though I'm not convinced if this makes any sense here).

for short: I used sys_connect_by_path on pre-prepared set of original
data, regexp_replace to remove duplicate nodes (it improves
performance), and xmlcast for splitting the paths (instead of level/dual
trick, it took too long).

can you solve this problem with pure sql? do you know some equivalent
non-hierarchical approach that works?

thank you,
geos

Reply With Quote
  #17  
Old   
geos
 
Posts: n/a

Default Re: trees, trees, trees - 06-25-2012 , 05:00 PM



ddf wrote:
Quote:
Can you, and have you tried since posting this originally?

David Fitzjarrell
ok, here is my solution. as I said, it works on the client data, it
works on the data provided here as an example, but I'm no 100% sure if
this is correct approach and doesn't have any flaws. I would appreciate
if you spot such flaws or share some data for which it doesn't work so
that I could analyze it and improve the code.

you can see some xml usage. this is because I found it has better
performance in 11g than "level dual splitting comma separated string
into rows in oracle" approach.

what I really don't like in this code is its overbloated
replace/regexp_replace section (it's ugly!). I am not good at this, and
basically I wanted to have a pattern which finds the longest concecutive
repetetive substring in a string. for example:

1,2,2,4: 2 is the longest, it repeats twice
1,2,3,2,3,2,3,4: 2,3 is the longest, it repeats three times

or -- better! -- to replace any duplicate substrings with its one instance.

please treat this code as beta-code, an entry point for any discussion
(about it ). I would appreciate any performance improvments hints,
suggestions or ideas from people who have more in depth
theoretical/practical insight in the subject.

http://geos2005.republika.pl/betacode.sql

please share some ideas concepts thoughts about it.

thank you,
geos

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.