dbTalk Databases Forums  

native xml processing vs what Postgres and Oracle offer

comp.databases.theory comp.databases.theory


Discuss native xml processing vs what Postgres and Oracle offer in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #501  
Old   
Keith H Duggar
 
Posts: n/a

Default Re: native xml processing vs what Postgres and Oracle offer - 01-05-2009 , 09:54 PM






On Jan 3, 6:02*pm, Keith H Duggar <dug... (AT) alum (DOT) mit.edu> wrote:
Quote:
On Nov 10 2008, 9:17*am, salmobytes <salmoby... (AT) closenuf (DOT) org> wrote:

I'm thinking about starting a hobby project.
I wrote a files-based Bulletin Board years ago.
I'd like to convert it to a more database-like system, so
password-identified users could edit old posts.

Forums are inherently hierarchical

Discussions that evolve in forums are in fact not hierarchal.
Claims that they are arise, I believe, chiefly from a lack of
imagination and brainwashing by current interfaces.

For example, one often finds the need to respond, with one
post, to many prior posts across multiple levels in a typical
hierarchal view such as the "tree" view Google groups creates.
That is what I am doing write now. This paragraph responds to
several posts at different levels in the google tree that all
claim forums are hierarchies. However, since google provides
the capability to "reply" to but a single message I had to
choose one thus perpetuating this false structuring.

What's more, a forum post may respond to content from
other forum topics, other forums or even entirely different
sources such as articles, emails, books, television, etc.

Even more amusing is that posts can actually preemptively
respond to posts from the future! This most often happens
when ignorant or lazy or time constrained or just plain
stupid participants blurt out their two cents without having
comprehended or read or cared (respectively) about said prior
post that already address their belched vociferous reply.

Furthermore, different parts of single post may reply to
different subsets of prior posts, topics, forums, external,
or future sources. Likewise those parts may respond only
to parts of said sources.

Thus, often in a general and very useful sense a post does
not have a "parent" post in the narrow sense of a hierarchal
tree as some have claimed here.

To improve the design flaws or your (and most or all other
forums) I would humbly (because am and certainly not expert
enough to claim this as a very "good" set of requirements)
suggest that you aim to achieve at least the following:

Phase 1 : Basic
* *For every post the ability to:
* *1) refer to multiple posts (including THIS post and
* * * posts in other threads and forums)
* *2) refer to external sources
* *3) denote that a referent REPLIES to a referent

Phase 2 : Content Parts
* *For arbitrary parts of posts the ability to:
* *4) refer to multiple arbitrary parts of multiple posts

Phase 3 : Temporal Correction
* *For arbitrary content parts the ability to
* *5) edit the content part to add or remove referents

Phase 4 : Semantic Enrichment
* *6) In addition to the basic REPLIES, the ability to
* * * denote that a referent SUPPORTS, DISPUTES, REBUTS,
* * * AGREES, CLARIFIES, CALLS-UTTER-BULLSHIT, etc a
* * * referent (possibility including THIS).

I think you would find that the above far more advanced forum
fits nicely into a relational model and would support more
efficient and productive discussion. For example, imagine how
much easier it would be to refute a vociferous ignoramus when
they continue to repeat the same bullshit. You can simply edit
one of your prior responses adding a CALLS-UTTER-BULLSHIT
reference to their latest post and immediately it could appear
in various forum views.

KHD
Since none of salmonbytes, whileone, BS, etc have any response
I can only surmise that they now realize a forum discussion is
in fact not a hierarchy. Glad I could help.

KHD


Reply With Quote
  #502  
Old   
rpost
 
Posts: n/a

Default Re: native xml processing vs what Postgres and Oracle offer - 01-07-2009 , 11:03 AM






Keith H Duggar wrote:

Quote:
On Nov 10 2008, 9:17*am, salmobytes <salmoby... (AT) closenuf (DOT) org> wrote:
I'm thinking about starting a hobby project.
I wrote a files-based Bulletin Board years ago.
I'd like to convert it to a more database-like system, so
password-identified users could edit old posts.

Forums are inherently hierarchical

Discussions that evolve in forums are in fact not hierarchal.
Claims that they are arise, I believe, chiefly from a lack of
imagination and brainwashing by current interfaces.
I strongly doubt it.

Quote:
For example, one often finds the need to respond, with one
post, to many prior posts across multiple levels in a typical
hierarchal view such as the "tree" view Google groups creates.
Indeed, sometimes I do; but not often. Is this due to an arbitrary
restricion in the interfaces, or is it due to a more fundamental
restriction in how discussions proceed? I think the latter.
Reply to multiple postings would be more complex in character,
e.g. quoted material would now have to be marked with the originating
posting in some way and it's not clear whether they would be
sufficiently understandable to those who arrive at them having
read just one or only a few of them. Will readers be prepared to
back up all the time into threads they haven't read in order to
make sense of the exchange? Won't the result produce the 'lost
in hyperspace' problem that has caused pretty much every hypertext
and website to structure its material into a hierarchy
full of crosslinks even when there is little or no technological
support to do so? I think it will.

But you have a good point: has it even been tried?

Quote:
That is what I am doing write now. This paragraph responds to
several posts at different levels in the google tree that all
claim forums are hierarchies. However, since google provides
the capability to "reply" to but a single message I had to
choose one thus perpetuating this false structuring.
In my posting software I can arbitrarily edit the References:
header, but you're right, all the viewers I know only present
threads as trees, never as arbitrary directed acyclic graphs.

Quote:
What's more, a forum post may respond to content from
other forum topics, other forums or even entirely different
sources such as articles, emails, books, television, etc.
Cross-linking in discussion happens a lot in web-based writing
of course. E.g. blogs responding to each other, with talkback/pings
to create the forward links. This approaches what you have in mind,
I think. Yet, while blogs are full of hyperlinks, their internal
organization is nearly always linear or hierarchical. This is
not because of necessary tehcnological limitations, but
because of limitations in their users: if they weren't,
postings would be much harder to find, to read and to write.
E.g. I find editing and organizing Wikis pretty difficult.

Quote:
Even more amusing is that posts can actually preemptively
respond to posts from the future! This most often happens
when ignorant or lazy or time constrained or just plain
stupid participants blurt out their two cents without having
comprehended or read or cared (respectively) about said prior
post that already address their belched vociferous reply.
Yes, but we can't preemptively guess NNTP Message-IDs.
This is of course an implementation restriction, not a
fundamental one.

Quote:
Furthermore, different parts of single post may reply to
different subsets of prior posts, topics, forums, external,
or future sources. Likewise those parts may respond only
to parts of said sources.
Yes, this happens all the time, and in USENET well-established
conventions exist for keeping this manageable (that I'm using here).
A strong point is that they are really simple and expressed
in plain text. Can something equally simple suffice for a
discussion environment in which multi-replying is the norm?

Quote:
Thus, often in a general and very useful sense a post does
not have a "parent" post in the narrow sense of a hierarchal
tree as some have claimed here.
No, but the question is how useful it would be for the discussion
environment to allow postings with *multiple* parents (meaning,
I suppose, that we can navigate the postings as a DAG rather than
just a tree).

Quote:
To improve the design flaws or your (and most or all other
forums) I would humbly (because am and certainly not expert
enough to claim this as a very "good" set of requirements)
suggest that you aim to achieve at least the following:

Phase 1 : Basic
For every post the ability to:
1) refer to multiple posts (including THIS post and
posts in other threads and forums)
2) refer to external sources
What does this mean, exactly? That we can follow the reference?
Just hyperlink to it, quote it or attach a copy.
That we have multiple documents open while browsing?
In my web browser I have this all the time.
That we can quickly determine a specific set of documents
that become parents when initiating a reply? This is harder.
Hypertext systems of the past supported stuff like this
but I don't know how user-friendly it is.

Quote:
3) denote that a referent REPLIES to a referent
What does this mean? That when at the referenced source we can follow
the reference backwards to arrive at the reply? This is also hard,
because the software controlling the creatin of the reply doesn't
usually control how the referred sources are presented (usually
to others, and written by others). But e.g. trackbacks/pings address it.

Anything more?

Quote:
Phase 2 : Content Parts
For arbitrary parts of posts the ability to:
4) refer to multiple arbitrary parts of multiple posts
How to do this in a sufficiently useable way?

Quote:
Phase 3 : Temporal Correction
For arbitrary content parts the ability to
5) edit the content part to add or remove referents
Some forum software allows this. Replies may become invalid.
What you end up with is not a discussion forum, but a Wiki:
writing for Wikis is very different.

Quote:
Phase 4 : Semantic Enrichment
6) In addition to the basic REPLIES, the ability to
denote that a referent SUPPORTS, DISPUTES, REBUTS,
AGREES, CLARIFIES, CALLS-UTTER-BULLSHIT, etc a
referent (possibility including THIS).
The problem with this idea, as with any semantic enrichment,
is that the labels, even when users can be trained to apply them,
will rarely be accurate, unambiguous or complete.
E.g. I may agree with your premise, but disagree that it
supports your conclusion. Do I get to modify your
SUPPORTS to CALLS-INTO-QUESTION?

Quote:
I think you would find that the above far more advanced forum
fits nicely into a relational model
It doesn't make any difference.

The basic issue is the need to traverse along the discussion threads,
which relational systems aren't usually optimized for,
if they can express it at all.
Whether the relation forms trees or arbitrary DAGS
doesn't make any difference.

The resolution, I think, is to optimize this type of use,
either within the query engine or in some other way.

Quote:
and would support more
efficient and productive discussion. For example, imagine how
much easier it would be to refute a vociferous ignoramus when
they continue to repeat the same bullshit. You can simply edit
one of your prior responses adding a CALLS-UTTER-BULLSHIT
reference to their latest post and immediately it could appear
in various forum views.
You can; but will you? And where do you stop?
E.g. why not label with specific logical fallacies? (STRAW-MAN,
AD-NAUSEAM, BEGS-QUESTION). I'll tell you: the labelers won't agree
on when to use which labels.

Quote:
KHD
--
Reinier


Reply With Quote
  #503  
Old   
rpost
 
Posts: n/a

Default Re: native xml processing vs what Postgres and Oracle offer - 01-07-2009 , 12:40 PM



salmobytes wrote:

[...]

Quote:
Hierarchies are part of the real world. They just don't fit well into
the relational scheme of things.
This is a broad statement. What is needed as far as I can see is
efficient traversal of relations (i.e. arbitrarily wide, very selective
joins). This can be supported, even if many existing RDMBSes don't.

Quote:
With XML querying hierarchies is a snap.
So if you have a hierarchical problem, XML is a better technology.
Not so fast.

XML itself is just a standard for serializing labelled trees. In my
experience, most of my "trees" are really arbitrary graphs (relations),
and while XML supports crosslinks as well, XML definition and manipulation
languages tend not to support their traversal well.
For a forum this need not be an issue.

But XML also assumes that all data is stored as documents and processed
by operating on documents one at a time. USENET does the same thing,
but it really isn't very practical.

Some issues from a database perspective: What if we want to query
or manipulate across the whole collection? Why do we always have
to parse documents whenever we want to use the data they contain?
Why do I, when writing queries or transformations on my data
(e.g. with XPath, XQuery or XSLT) or schema definition (XML Schema - please)
I always have to concern myself with stupid serialization and document
management issues such as the consistent use of file names and URLs,
file system limitations, character encodings, etcetera?

Not to mention that XPath, XSLT and XQuery are still pretty hideous
languages, both syntactically and semantically, although they have much
improved. Try representing an arbitrary relation (graph) in XML, then
writing, say, an XPath expression to compute its connected components.

Not such a good match for a discussion forum, if you ask me.
XPath queries may be expressive enough, but what about speed?
Do you want to represent the whole forum contents as a single XML
document that is updated whenever some posting or edit is performed?
Or are you thinking of some solution that keeps the whole thing in memory
in parsed form? How to make it scale?

Quote:
Someone referred to XML as messy technology that couldn't be optimised.
But SleepyCat and XPath is faster than any relational system running any
one of the ugly, complex and slow-as-mollases "relational solutions" to
the hierarchical problem.
I'm not familiar with this, but does it work well for a big discusion forum?
How sophisticated is the querying you allow?

Quote:
For some problems you don't need a database at all: grep or perhaps
lucene or HyperEstaier are all that's needed.

For some problems XML is the best choice, particularly if the data
is naturally hierarchical.
.... and consists of small enough bits (documents) that don't need to be
queried or manipulated collectively.

Quote:
For other problems--particularly for *large* data problems--relational
systems are the best choice....but almost never when hierarchies are
involved.
I think this is far too strong a statement.

--
Reinier


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.