![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi, we are having some performance issues regarding xml shredding. At this point we are extracting data from xmls from nearly 60 different companies - and therefore 60 different xml structures. The total amount of xml is about 350MB and we are trying to extract the data as fast as possible. Our current system extracts, transforms and loads the data in about five minutes. We would however like to do this in about one minute to be pleased. We use the "nodes/cross apply"-technique to shred the xmls into our internal format. This is how we shred the data. ------------------------------ 1) Load xml into a temporary table (#XmlTable) 2) Set an xml index 3) Query (like below) INSERT INTO #TransformedData SELECT T0.T.value('asasd', 'asdadd') T1.T.value('asasd', 'asdadd') FROM #XmlTable CROSS APPLY data.nodes('asd') AS T0(T) T0.T.nodes('level1') AS T1(T) DROP #XmlTable 4) Pass the temporary table #TransformedData into the common/shared transformation procedure EXEC LookupData ------------------------------- This is very I/O intensive and it makes the system slow. Are there any other good ways to parse the xmls in the sql server? Should we perhaps move the shredding outside the SQL environment into, for instance, a C# method which bulk loads the data? Regards, Johnny . |
#3
| |||
| |||
|
|
Have a look through this article: Examples of Bulk Importing and Exporting XML Documents http://msdn.microsoft.com/en-us/library/ms191184.aspx You should probably contrast OPENXML performance with your nodes / CROSS APPLY method. I much prefer nodes with CROSS APPLY however some people report OPENXML as faster particularly with larger documents. Try it with your data! SQLXML Bulkload is fast for loading data if you can get it working, but not transforming. This could be called from C#. Let us know how you get on! "Johnny Persson" wrote: Hi, we are having some performance issues regarding xml shredding. At this point we are extracting data from xmls from nearly 60 different companies - and therefore 60 different xml structures. The total amount of xml is about 350MB and we are trying to extract the data as fast as possible. Our current system extracts, transforms and loads the data in about five minutes. We would however like to do this in about one minute to be pleased. We use the "nodes/cross apply"-technique to shred the xmls into our internal format. This is how we shred the data. ------------------------------ 1) Load xml into a temporary table (#XmlTable) 2) Set an xml index 3) Query (like below) INSERT INTO #TransformedData SELECT T0.T.value('asasd', 'asdadd') T1.T.value('asasd', 'asdadd') FROM #XmlTable CROSS APPLY data.nodes('asd') AS T0(T) T0.T.nodes('level1') AS T1(T) DROP #XmlTable 4) Pass the temporary table #TransformedData into the common/shared transformation procedure EXEC LookupData ------------------------------- This is very I/O intensive and it makes the system slow. Are there any other good ways to parse the xmls in the sql server? Should we perhaps move the shredding outside the SQL environment into, for instance, a C# method which bulk loads the data? Regards, Johnny . |
#4
| |||
| |||
|
|
Have a look through this article: Examples of Bulk Importing and Exporting XML Documents http://msdn.microsoft.com/en-us/library/ms191184.aspx You should probably contrast OPENXML performance with your nodes / CROSS APPLY method. I much prefer nodes with CROSS APPLY however some people report OPENXML as faster particularly with larger documents. Try it with your data! SQLXML Bulkload is fast for loading data if you can get it working, but not transforming. This could be called from C#. Let us know how you get on! "Johnny Persson" wrote: Hi, we are having some performance issues regarding xml shredding. At this point we are extracting data from xmls from nearly 60 different companies - and therefore 60 different xml structures. The total amount of xml is about 350MB and we are trying to extract the data as fast as possible. Our current system extracts, transforms and loads the data in about five minutes. We would however like to do this in about one minute to be pleased. We use the "nodes/cross apply"-technique to shred the xmls into our internal format. This is how we shred the data. ------------------------------ 1) Load xml into a temporary table (#XmlTable) 2) Set an xml index 3) Query (like below) INSERT INTO #TransformedData SELECT T0.T.value('asasd', 'asdadd') T1.T.value('asasd', 'asdadd') FROM #XmlTable CROSS APPLY data.nodes('asd') AS T0(T) T0.T.nodes('level1') AS T1(T) DROP #XmlTable 4) Pass the temporary table #TransformedData into the common/shared transformation procedure EXEC LookupData ------------------------------- This is very I/O intensive and it makes the system slow. Are there any other good ways to parse the xmls in the sql server? Should we perhaps move the shredding outside the SQL environment into, for instance, a C# method which bulk loads the data? Regards, Johnny . |
#5
| |||
| |||
|
|
Hi Bob, thank you for your answer! I am a bit doubtful about the OPENXML idea. My main issue with OPENXML is that you store your xml into a variable. That means that you are unable to put an index to your xml. My tiny experience says that OPENXML rarely beats nodes/CROSS APPLY-method. I will however try the bulk importing first, if I am not pleased with the result I sure will try the OPENXML-method ![]() What would you say about the I/O-impact when joining the bulk loaded tables before transformations/lookups? Anyway, I will try the bulk load thing right away.. thanks once again! On 2010-03-08 11:03, Bob wrote: Have a look through this article: Examples of Bulk Importing and Exporting XML Documents http://msdn.microsoft.com/en-us/library/ms191184.aspx You should probably contrast OPENXML performance with your nodes / CROSS APPLY method. I much prefer nodes with CROSS APPLY however some people report OPENXML as faster particularly with larger documents. Try it with your data! SQLXML Bulkload is fast for loading data if you can get it working, but not transforming. This could be called from C#. Let us know how you get on! "Johnny Persson" wrote: Hi, we are having some performance issues regarding xml shredding. At this point we are extracting data from xmls from nearly 60 different companies - and therefore 60 different xml structures. The total amount of xml is about 350MB and we are trying to extract the data as fast as possible. Our current system extracts, transforms and loads the data in about five minutes. We would however like to do this in about one minute to be pleased. We use the "nodes/cross apply"-technique to shred the xmls into our internal format. This is how we shred the data. ------------------------------ 1) Load xml into a temporary table (#XmlTable) 2) Set an xml index 3) Query (like below) INSERT INTO #TransformedData SELECT T0.T.value('asasd', 'asdadd') T1.T.value('asasd', 'asdadd') FROM #XmlTable CROSS APPLY data.nodes('asd') AS T0(T) T0.T.nodes('level1') AS T1(T) DROP #XmlTable 4) Pass the temporary table #TransformedData into the common/shared transformation procedure EXEC LookupData ------------------------------- This is very I/O intensive and it makes the system slow. Are there any other good ways to parse the xmls in the sql server? Should we perhaps move the shredding outside the SQL environment into, for instance, a C# method which bulk loads the data? Regards, Johnny . |
#6
| |||
| |||
|
|
Johnny OPENXML builds DOM in memory so there is no need for any indexes on variable. In fact, I can confirm, that on large documents, OPEMXML will leave nodes() in dust far behind. yet, on small documents, it's foolish to use it as it grabs 1/8 of SQL memory and bad handling or forgetting to close documents will leak memory. "Johnny Persson" <a@a.a> wrote in message news:eiGOXvsvKHA.4636 (AT) TK2MSFTNGP06 (DOT) phx.gbl... Hi Bob, thank you for your answer! I am a bit doubtful about the OPENXML idea. My main issue with OPENXML is that you store your xml into a variable. That means that you are unable to put an index to your xml. My tiny experience says that OPENXML rarely beats nodes/CROSS APPLY-method. I will however try the bulk importing first, if I am not pleased with the result I sure will try the OPENXML-method ![]() What would you say about the I/O-impact when joining the bulk loaded tables before transformations/lookups? Anyway, I will try the bulk load thing right away.. thanks once again! On 2010-03-08 11:03, Bob wrote: Have a look through this article: Examples of Bulk Importing and Exporting XML Documents http://msdn.microsoft.com/en-us/library/ms191184.aspx You should probably contrast OPENXML performance with your nodes / CROSS APPLY method. I much prefer nodes with CROSS APPLY however some people report OPENXML as faster particularly with larger documents. Try it with your data! SQLXML Bulkload is fast for loading data if you can get it working, but not transforming. This could be called from C#. Let us know how you get on! "Johnny Persson" wrote: Hi, we are having some performance issues regarding xml shredding. At this point we are extracting data from xmls from nearly 60 different companies - and therefore 60 different xml structures. The total amount of xml is about 350MB and we are trying to extract the data as fast as possible. Our current system extracts, transforms and loads the data in about five minutes. We would however like to do this in about one minute to be pleased. We use the "nodes/cross apply"-technique to shred the xmls into our internal format. This is how we shred the data. ------------------------------ 1) Load xml into a temporary table (#XmlTable) 2) Set an xml index 3) Query (like below) INSERT INTO #TransformedData SELECT T0.T.value('asasd', 'asdadd') T1.T.value('asasd', 'asdadd') FROM #XmlTable CROSS APPLY data.nodes('asd') AS T0(T) T0.T.nodes('level1') AS T1(T) DROP #XmlTable 4) Pass the temporary table #TransformedData into the common/shared transformation procedure EXEC LookupData ------------------------------- This is very I/O intensive and it makes the system slow. Are there any other good ways to parse the xmls in the sql server? Should we perhaps move the shredding outside the SQL environment into, for instance, a C# method which bulk loads the data? Regards, Johnny . |
#7
| |||
| |||
|
|
Hi Bob, I ended up with a CLR C#-parser. The performance is much better now! Thank you! On 2010-03-08 11:03, Bob wrote: Have a look through this article: Examples of Bulk Importing and Exporting XML Documents http://msdn.microsoft.com/en-us/library/ms191184.aspx You should probably contrast OPENXML performance with your nodes / CROSS APPLY method. I much prefer nodes with CROSS APPLY however some people report OPENXML as faster particularly with larger documents. Try it with your data! SQLXML Bulkload is fast for loading data if you can get it working, but not transforming. This could be called from C#. Let us know how you get on! "Johnny Persson" wrote: Hi, we are having some performance issues regarding xml shredding. At this point we are extracting data from xmls from nearly 60 different companies - and therefore 60 different xml structures. The total amount of xml is about 350MB and we are trying to extract the data as fast as possible. Our current system extracts, transforms and loads the data in about five minutes. We would however like to do this in about one minute to be pleased. We use the "nodes/cross apply"-technique to shred the xmls into our internal format. This is how we shred the data. ------------------------------ 1) Load xml into a temporary table (#XmlTable) 2) Set an xml index 3) Query (like below) INSERT INTO #TransformedData SELECT T0.T.value('asasd', 'asdadd') T1.T.value('asasd', 'asdadd') FROM #XmlTable CROSS APPLY data.nodes('asd') AS T0(T) T0.T.nodes('level1') AS T1(T) DROP #XmlTable 4) Pass the temporary table #TransformedData into the common/shared transformation procedure EXEC LookupData ------------------------------- This is very I/O intensive and it makes the system slow. Are there any other good ways to parse the xmls in the sql server? Should we perhaps move the shredding outside the SQL environment into, for instance, a C# method which bulk loads the data? Regards, Johnny . . |
![]() |
| Thread Tools | |
| Display Modes | |
| |