dbTalk Databases Forums  

Double-quotes, escape characters and DTS

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Double-quotes, escape characters and DTS in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Dave
 
Posts: n/a

Default Double-quotes, escape characters and DTS - 08-05-2003 , 11:54 PM






Hi all,

I've recently been landed with the task of importing a csv file from a
third party company into a database table. This csv file is
auto-generated by the third party, and the format etc can't be
changed. Text/string data within each 'row' is enclosed within double
quote marks (eg. "string data") with fields separated by commas.

Using DTS, I am attempting to import this data at my end. The first
~7000 rows import fine, but the import fails when it encounters an
escaped quotation mark (sample: "123456","ABC3\"","XYZ" ).

My question is: is there a way to get DTS to recognize that the \"
denotes an escapted quotation mark, rather than being a field
separator?

If possible, I'd prefer to avoid a solution that requires an ActiveX
transformation script on each field (simply because there are around
100 columns and up to 50,000 rows in each csv file), but if there's no
other option, we'll just have to wear the performance hit.

TIA,
Dave

Reply With Quote
  #2  
Old   
Darren Green
 
Posts: n/a

Default Re: Double-quotes, escape characters and DTS - 08-06-2003 , 03:26 PM






In article <8a4e95fe.0308052054.4ffe203d (AT) posting (DOT) google.com>, Dave
<david_m_dean (AT) yahoo (DOT) com> writes
Quote:
Hi all,

I've recently been landed with the task of importing a csv file from a
third party company into a database table. This csv file is
auto-generated by the third party, and the format etc can't be
changed. Text/string data within each 'row' is enclosed within double
quote marks (eg. "string data") with fields separated by commas.

Using DTS, I am attempting to import this data at my end. The first
~7000 rows import fine, but the import fails when it encounters an
escaped quotation mark (sample: "123456","ABC3\"","XYZ" ).

My question is: is there a way to get DTS to recognize that the \"
denotes an escapted quotation mark, rather than being a field
separator?

If possible, I'd prefer to avoid a solution that requires an ActiveX
transformation script on each field (simply because there are around
100 columns and up to 50,000 rows in each csv file), but if there's no
other option, we'll just have to wear the performance hit.

TIA,
Dave
I'm afraid there are three options I can think of-

1 Use an ActiveX Script Transform to Replace out the escape character as
you suggest. To try and mitigate the performance hit make sure you use a
many to many transform. Delete the auto-generated ones and click Select
All before create a new single transform. Edit the code to use column
ordinals instead of column names. Generate the 100 lines of code in
something like Excel for simplicity then paste it in. A many to many
transform and column ordinals will improve performance over the designer
generated defaults, but it still won't be as anywhere near as fast as a
copy column.

2 If you know yourself or have a developer available that knows C++
write your own "Remove" transform. This must be done in C++, but of
course this does give much better performance.

3 Write a pre-processor to parse the file and clean it first. Again C++
would be good for performance, but VB would probably do. GAWK is also
good at this type of operation and is fast.

--
Darren Green (SQL Server MVP)
DTS - http://www.sqldts.com




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.