dbTalk Databases Forums  

Xml shredding performance

microsoft.public.sqlserver.xml microsoft.public.sqlserver.xml


Discuss Xml shredding performance in the microsoft.public.sqlserver.xml forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Johnny Persson
 
Posts: n/a

Default Xml shredding performance - 03-06-2010 , 09:34 AM






Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny

Reply With Quote
  #2  
Old   
Bob
 
Posts: n/a

Default RE: Xml shredding performance - 03-08-2010 , 04:03 AM






Have a look through this article:

Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx


You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!

SQLXML Bulkload is fast for loading data if you can get it working, but not
transforming. This could be called from C#.

Let us know how you get on!

"Johnny Persson" wrote:

Quote:
Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny
.

Reply With Quote
  #3  
Old   
Johnny Persson
 
Posts: n/a

Default Re: Xml shredding performance - 03-08-2010 , 08:30 AM



Hi Bob,

thank you for your answer!

I am a bit doubtful about the OPENXML idea. My main issue with OPENXML
is that you store your xml into a variable. That means that you are
unable to put an index to your xml. My tiny experience says that OPENXML
rarely beats nodes/CROSS APPLY-method.

I will however try the bulk importing first, if I am not pleased with
the result I sure will try the OPENXML-method

What would you say about the I/O-impact when joining the bulk loaded
tables before transformations/lookups?

Anyway, I will try the bulk load thing right away.. thanks once again!

On 2010-03-08 11:03, Bob wrote:
Quote:
Have a look through this article:

Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx


You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!

SQLXML Bulkload is fast for loading data if you can get it working, but not
transforming. This could be called from C#.

Let us know how you get on!

"Johnny Persson" wrote:

Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny
.

Reply With Quote
  #4  
Old   
Johnny Persson
 
Posts: n/a

Default Re: Xml shredding performance - 03-09-2010 , 03:22 AM



Hi Bob,

I ended up with a CLR C#-parser. The performance is much better now!

Thank you!

On 2010-03-08 11:03, Bob wrote:
Quote:
Have a look through this article:

Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx


You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!

SQLXML Bulkload is fast for loading data if you can get it working, but not
transforming. This could be called from C#.

Let us know how you get on!

"Johnny Persson" wrote:

Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny
.

Reply With Quote
  #5  
Old   
Farmer
 
Posts: n/a

Default Re: Xml shredding performance - 03-22-2010 , 01:40 PM



Johnny

OPENXML builds DOM in memory so there is no need for any indexes on
variable.
In fact, I can confirm, that on large documents, OPEMXML will leave nodes()
in dust far behind.
yet, on small documents, it's foolish to use it as it grabs 1/8 of SQL
memory and bad handling or forgetting to close documents will leak memory.


"Johnny Persson" <a@a.a> wrote

Quote:
Hi Bob,

thank you for your answer!

I am a bit doubtful about the OPENXML idea. My main issue with OPENXML is
that you store your xml into a variable. That means that you are unable to
put an index to your xml. My tiny experience says that OPENXML rarely
beats nodes/CROSS APPLY-method.

I will however try the bulk importing first, if I am not pleased with the
result I sure will try the OPENXML-method

What would you say about the I/O-impact when joining the bulk loaded
tables before transformations/lookups?

Anyway, I will try the bulk load thing right away.. thanks once again!

On 2010-03-08 11:03, Bob wrote:
Have a look through this article:

Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx


You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!

SQLXML Bulkload is fast for loading data if you can get it working, but
not
transforming. This could be called from C#.

Let us know how you get on!

"Johnny Persson" wrote:

Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be
pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny
.

Reply With Quote
  #6  
Old   
Johnny Persson
 
Posts: n/a

Default Re: Xml shredding performance - 04-05-2010 , 08:50 AM



Sorry for late response.

Good to know. How large xml documents are we talking about?

Regards,
Johnny

On 2010-03-22 20:40, Farmer wrote:
Quote:
Johnny

OPENXML builds DOM in memory so there is no need for any indexes on
variable.
In fact, I can confirm, that on large documents, OPEMXML will leave
nodes() in dust far behind.
yet, on small documents, it's foolish to use it as it grabs 1/8 of SQL
memory and bad handling or forgetting to close documents will leak memory.


"Johnny Persson" <a@a.a> wrote in message
news:eiGOXvsvKHA.4636 (AT) TK2MSFTNGP06 (DOT) phx.gbl...
Hi Bob,

thank you for your answer!

I am a bit doubtful about the OPENXML idea. My main issue with OPENXML
is that you store your xml into a variable. That means that you are
unable to put an index to your xml. My tiny experience says that
OPENXML rarely beats nodes/CROSS APPLY-method.

I will however try the bulk importing first, if I am not pleased with
the result I sure will try the OPENXML-method

What would you say about the I/O-impact when joining the bulk loaded
tables before transformations/lookups?

Anyway, I will try the bulk load thing right away.. thanks once again!

On 2010-03-08 11:03, Bob wrote:
Have a look through this article:

Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx


You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!

SQLXML Bulkload is fast for loading data if you can get it working,
but not
transforming. This could be called from C#.

Let us know how you get on!

"Johnny Persson" wrote:

Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about
five
minutes. We would however like to do this in about one minute to be
pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny
.


Reply With Quote
  #7  
Old   
scmike
 
Posts: n/a

Default Re: Xml shredding performance - 04-29-2010 , 01:26 PM



I have encountered a similar problem with needing to load over 5 million XML
files per week, with over 100 different file versions. I highly recommend
using CLR for shredding the XML. Here is my article on my design.

http://blog.scmike.com/2010/tsql/fas...ml-sql-server/

"Johnny Persson" wrote:

Quote:
Hi Bob,

I ended up with a CLR C#-parser. The performance is much better now!

Thank you!

On 2010-03-08 11:03, Bob wrote:
Have a look through this article:

Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx


You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!

SQLXML Bulkload is fast for loading data if you can get it working, but not
transforming. This could be called from C#.

Let us know how you get on!

"Johnny Persson" wrote:

Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny
.

.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.