dbTalk Databases Forums  

XML storing and management

comp.databases.theory comp.databases.theory


Discuss XML storing and management in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #31  
Old   
Bob Badour
 
Posts: n/a

Default Re: XML storing and management - 09-27-2007 , 07:58 PM






JOG wrote:

Quote:
On Sep 28, 1:31 am, "V.J. Kumar" <vjkm... (AT) gmail (DOT) com> wrote:

Jan Hidders <hidd... (AT) gmail (DOT) com> wrote innews:1190926317.715110.61110 (AT) g4g2000hsf (DOT) googlegroups.com:

...

There is an essay written [by Codd] called "A Relational Model of
Data for Large Shared Data Banks". "Sapienti sat !" as they say in
Sanskrit.

They do? Tha'ts funny. We say that in Latin. :-)

Yeah, Latin is just a Hindi-European dialect Sanskrit simplified for
Europeans' use as Sir William Jones discovered to his utter amazement
more than 200 hundred years ago

According to wikipedia, Sir William Jones believed that Sanskrit and
Latin had a common ancestor, not that one was derived from the other.

I guess it depends on how valid you deem the source. But then I also
like to believe wikipedia was written by one jolly person, without
much of a life, but an unending supply of donuts. Its like a safety
blanket for me.
Trisyllabic laxing. It's a feature of languages from that same root
language. Latin, greek, sanskrit, english, german, french etc.


Reply With Quote
  #32  
Old   
djmcmahon@gmail.com
 
Posts: n/a

Default Re: XML storing and management - 10-04-2007 , 04:58 PM






On Sep 26, 5:16 am, karsoods53 <karsood... (AT) gmail (DOT) com> wrote:
Quote:
I get XML feeds as input and have to store this data on our server. I
have worked with databases but new to XML. Can someone tell me how I
can store and manage this data?
Depends on what you need to do with it. If all you want to do is
store messages from the feed, you could just shove the XML into a LOB
column along with other columns for the date/time received, etc. No
different than if you were receiving, say, jpeg images.

A step up from that would be to extract relevant single-valued scalars
as part of intake processing in your host programming language (e.g.
Java) and stick them in querieable, indexable columns. For example if
every XML message had exactly one <title> element in it, you could
extract that as you were receiving a message from the feeds, then put
it into a TITLE column as well as putting the message into a LOB
column. Again no different than, say, extracting the horizontal and
vertical resolution from a jpeg image and storing them in columns
along with the jpeg itself.

If you need to be able to run queries over the contents of the XML,
and especially if you want the query result sets to yield fragments of
the XML messages instead of whole ones, you have a lot more work to
do.

Databases such as Oracle's allow you to create XML typed columns which
still store the XML in a LOB but let you dig out the content in
queries, index it, etc. If you have XML schemas for your feeds, you
could even go so far as to specify decompositions of the XML into
relational tables. These databases also let you query by XML
expressions, extract subsets of the XML, etc.

(Trouble arises when you need to do this with multi-valued elements,
or elements embedded within the hierarchy where multi-valued ancestors
might exist.)

FWIW I prefer to do as little as possible with the XML while it's in
the database, it's easier to think of it as an opaque object similar
to e.g. a jpeg image. (Even a jpeg file has an internal structure,
containing information that might be useful in queries, e.g. the
resolution, bit depth, whatever.) It's usually not too hard,
especially with XML, to extract whatever you think you'll need to
support searches and such, and put that into real columns as part of
the intake processing in a host language. You can then do any XML/
XQuery style filtering of the internal content of the XML objects as
part of host-language post-processing, after using SQL in the usual
way to find objects of possible interest. This might or might not fit
your application.

If you absolutely must be able to drive queries from internal multi-
valued elements, path expressions, etc., then you'll need help from
the database in the form of XML extensions such as Oracle's, or if
none are available, you'll have to do the relevant decomposition of
the XML yourself in the host language and then populate your own
"index tables". For example, if your input XML contains <address>
elements, and possibly more than one such per message, and you think
it's important to index the <zipcode> element underneath <address> to
support searches for messages based on zip code, you'll need to have
one table with {MSG_ID, MSG_XML_CONTENT} and another table with
{ADDRESS_ZIP_CODE, MSG_ID} that you will have to populate as part of
intake processing.



Reply With Quote
  #33  
Old   
djmcmahon@gmail.com
 
Posts: n/a

Default Re: XML storing and management - 10-04-2007 , 05:02 PM



On Sep 26, 5:16 am, karsoods53 <karsood... (AT) gmail (DOT) com> wrote:
Quote:
I get XML feeds as input and have to store this data on our server. I
have worked with databases but new to XML. Can someone tell me how I
can store and manage this data?
Depends on what you need to do with it. If all you want to do is
store messages from the feed, you could just shove the XML into a LOB
column along with other columns for the date/time received, etc. No
different than if you were receiving, say, jpeg images.

A step up from that would be to extract relevant single-valued scalars
as part of intake processing in your host programming language (e.g.
Java) and stick them in querieable, indexable columns. For example if
every XML message had exactly one <title> element in it, you could
extract that as you were receiving a message from the feeds, then put
it into a TITLE column as well as putting the message into a LOB
column. Again no different than, say, extracting the horizontal and
vertical resolution from a jpeg image and storing them in columns
along with the jpeg itself.

If you need to be able to run queries over the contents of the XML,
and especially if you want the query result sets to yield fragments of
the XML messages instead of whole ones, you have a lot more work to
do.

Databases such as Oracle's allow you to create XML typed columns which
still store the XML in a LOB but let you dig out the content in
queries, index it, etc. If you have XML schemas for your feeds, you
could even go so far as to specify decompositions of the XML into
relational tables. These databases also let you query by XML
expressions, extract subsets of the XML, etc.



Reply With Quote
  #34  
Old   
Bob Badour
 
Posts: n/a

Default Re: XML storing and management - 11-11-2007 , 10:44 AM



Jan Hidders wrote:
Quote:
On 27 sep, 22:52, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:

Jan Hidders wrote:

On 27 sep, 19:07, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:

Jan Hidders wrote:

On 27 sep, 16:27, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote:

Jan Hidders wrote:

On 27 sep, 02:19, JOG <j... (AT) cs (DOT) nott.ac.uk> wrote:

Ok, so why is it exactly cdt, despite the inherent flaws of a
hierarchical model such as XML, it has seen such widespread uptake?

It's all hype, of course.

Btw., what fundamental flaws?

Well, let's see... How about we start with: "The inability to re-order
the data without changing meaning and without destroying information." ?

"Hierarchical models such as XML" are not necessarily ordered-only
data models. In fact most proposals for semistructured data models
before XML weren't.

But even in XML this is not a big problem. Whether reordering destroys
information or not depends on your interpretation of the data. If you
send me an XML document and in addition tell me that certain parts
represent sets then I can reorder them without destroying any
information. The fact that XML is an ordered data model only implies
that it *might* destroy informaton, not that it *must*.

That's a nit. If one cannot always safely reorder, then one cannot
safely reorder.

I thought we were having a serious discussion, not playing trivial
word games. My mistake.

I am having a serious discussion. I am not the one picking at nits.

I disagree. I think you are.
Suppose I said my pickup truck is no good for digging trenches. Your
position amounts to saying: "If you hooked up a hydraulic system and
welded a back-hoe on the back, it would dig trenches just fine."

One simply cannot re-order an arbitrary XML document without destroying
information. One can re-order any relation without ever destroying
information.


Quote:
What your position boils down to is: XML is needlessly complex.

Not really. What I said is that concerning the aspect we were
discussing it is actually missing a construct. So my position is more
accurately described as that it is "too simple", not "too complex".
Actually, it is both, but I will accept the above correction.


Quote:
As a
result of the needless complexity, one cannot re-order the data without
changing meaning and without destroying information.

That is too imprecise to be correct. You can in some sense always
reorder if you want to. What I said is that whether this loses
information or not is a matter of interpretation.
So, if I reorder all of the children of several nodes immediately after
just one of those nodes, it's only a matter of interpretation whether
that changed the meaning?!? You are joking, right?


Note by the way that
Quote:
this is also true for the Relational Model: you cannot always
arbitrarily permute the atomic values in a relation without risking
changing its meaning. Also there it is a matter of interpretation
whether this is actually a problem or not.
Could you provide an example?


Quote:
BUT if we add even
more complexity, we can sometimes re-order data. Sometimes.

Yes, when it is appropriate, which is not always.
So, it's too complex but adding complexity will sometimes but not always
correct the problem. Sounds wonderful.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.