dbTalk Databases Forums  

LOAD TABLE (or ISQL INPUT) handling of quotes

sybase.public.sqlanywhere.general sybase.public.sqlanywhere.general


Discuss LOAD TABLE (or ISQL INPUT) handling of quotes in the sybase.public.sqlanywhere.general forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
David Winkler
 
Posts: n/a

Default LOAD TABLE (or ISQL INPUT) handling of quotes - 11-17-2009 , 03:06 PM






Running SQL Anywhere 11.0.1.2308, I am unable to
successfully load pipe-delimited, unquoted data via the LOAD
TABLE statement or INPUT ISQL command when the data value
contains an initial apostrophe immediately following a
delimiter. Here's the statement:

load table foo using client file 'C:\\bar.txt' delimited by
'|' quotes off;

If the apostrophe or apostrophes are in any other position
than the first character in a data element they all load
fine; when it's the first position, I'll typically get a
right-truncation error as the remainder of the line of text
is apparently trying to get stuffed into a single field.
Curiously, when an initial apostrophe is used, a second
apostrophe is treated as the closing quote (even with the
QUOTES OFF option) and remaining characters after the
closing quote but before the next delimiter are "dropped on
the floor" by LOAD TABLE. To try the same test (iAnywhere
folks), the table data I'm trying to load is here:

http://wireless.fcc.gov/uls/data/complete/l_ship.zip
(problem data lines are in the larger files)

as defined in DDL here:

http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt

Maybe a bug, maybe I'm nuts. Confirmation either way is
appreciated!

Reply With Quote
  #2  
Old   
John Smirnios [Sybase]
 
Posts: n/a

Default Re: LOAD TABLE (or ISQL INPUT) handling of quotes - 11-17-2009 , 04:04 PM






The problem is not the quotes. Lines 275680 & 275681 of en.dat (for
example) are actually a single record:

EN|2373743|||WDA6635|L|L00115054|Academy of Natural Sciences
Estuarine Research Center|||||| (...)

LOAD TABLE and INPUT will interpret them as separate records so it is
trying to cram 'Estuarine Research Center' into a "record_type" column
which is char(2).

You need to do one of the following:
1. Verify whether any of these fields actually should/can contain
carriage returns. Maybe there shouldn't be a newline character after
"Academy of Natural Sciences". If carriage returns are not allowed,
cleanse your original data.
2. Use escapes in your original data file so that carriage returns don't
occur inside column data (eg. Academy of Natural Sciences \x0aEstuarine
Research Center). Remember to double-up backslashes in the original data
too.
3. Use a different end-of-record marker via "ROW DELIMITED BY" and make
sure your source files are corrected to use that row delimiter too.

-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer

David Winkler wrote:
Quote:
Running SQL Anywhere 11.0.1.2308, I am unable to
successfully load pipe-delimited, unquoted data via the LOAD
TABLE statement or INPUT ISQL command when the data value
contains an initial apostrophe immediately following a
delimiter. Here's the statement:

load table foo using client file 'C:\\bar.txt' delimited by
'|' quotes off;

If the apostrophe or apostrophes are in any other position
than the first character in a data element they all load
fine; when it's the first position, I'll typically get a
right-truncation error as the remainder of the line of text
is apparently trying to get stuffed into a single field.
Curiously, when an initial apostrophe is used, a second
apostrophe is treated as the closing quote (even with the
QUOTES OFF option) and remaining characters after the
closing quote but before the next delimiter are "dropped on
the floor" by LOAD TABLE. To try the same test (iAnywhere
folks), the table data I'm trying to load is here:

http://wireless.fcc.gov/uls/data/complete/l_ship.zip
(problem data lines are in the larger files)

as defined in DDL here:

http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt

Maybe a bug, maybe I'm nuts. Confirmation either way is
appreciated!

Reply With Quote
  #3  
Old   
David Winkler
 
Posts: n/a

Default Re: LOAD TABLE (or ISQL INPUT) handling of quotes - 11-18-2009 , 09:03 AM



Thanks, John for the quick response. I apologize that I
didn't point out a specific file & line number in my earlier
post, but try importing the file PUBACC_SH, which contains
the following on line # 24173:

SH|1330925|||WAL9515|R||PL|PH|'LECTRIC
LADY|WN2160U|Y|N||||8||| ||366051610|||||

This line exhibits the behavior I described in my original
post which I believe could be a bug.

The line you had identified below is certainly bad data
formatting on the FCC's part by not escaping CRLF's within
varchar fields for an ASCII export, but any problems with
trying to import it would be expected as normal behavior of
the DBMS.

Thanks again for looking into this!

Dave Winkler
Data Architect, Stanley Associates

Quote:
The problem is not the quotes. Lines 275680 & 275681 of
en.dat (for example) are actually a single record:

EN|2373743|||WDA6635|L|L00115054|Academy of Natural
Sciences Estuarine Research Center|||||| (...)

LOAD TABLE and INPUT will interpret them as separate
records so it is trying to cram 'Estuarine Research
Center' into a "record_type" column which is char(2).

You need to do one of the following:
1. Verify whether any of these fields actually should/can
contain carriage returns. Maybe there shouldn't be a
newline character after "Academy of Natural Sciences". If
carriage returns are not allowed, cleanse your original
data. 2. Use escapes in your original data file so that
carriage returns don't occur inside column data (eg.
Academy of Natural Sciences \x0aEstuarine Research
Center). Remember to double-up backslashes in the original
data too.
3. Use a different end-of-record marker via "ROW DELIMITED
BY" and make sure your source files are corrected to use
that row delimiter too.

-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through
the iAnywhere Developer Community at
http://www.ianywhere.com/developer

David Winkler wrote:
Running SQL Anywhere 11.0.1.2308, I am unable to
successfully load pipe-delimited, unquoted data via the
LOAD TABLE statement or INPUT ISQL command when the data
value contains an initial apostrophe immediately
following a delimiter. Here's the statement:

load table foo using client file 'C:\\bar.txt' delimited
by '|' quotes off;

If the apostrophe or apostrophes are in any other
position than the first character in a data element they
all load fine; when it's the first position, I'll
typically get a right-truncation error as the remainder
of the line of text is apparently trying to get stuffed
into a single field. Curiously, when an initial
apostrophe is used, a second apostrophe is treated as
the closing quote (even with the QUOTES OFF option) and
remaining characters after the closing quote but before
the next delimiter are "dropped on the floor" by LOAD
TABLE. To try the same test (iAnywhere folks), the
table data I'm trying to load is here:
http://wireless.fcc.gov/uls/data/complete/l_ship.zip
(problem data lines are in the larger files)

as defined in DDL here:



http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt

Maybe a bug, maybe I'm nuts. Confirmation either way is
appreciated!

Reply With Quote
  #4  
Old   
John Smirnios [Sybase]
 
Posts: n/a

Default Re: LOAD TABLE (or ISQL INPUT) handling of quotes - 11-18-2009 , 01:29 PM



The problem with that file isn't the quotes either. Line 303464 contains
the following (all on one line -- the newsgroup editor is word-wrapping):

SH|2956069|||WDE2090|R||FV|PH|Bay Fisher ||
(2)|1196757|N|N|||32|15|||||367320280|N|N|Y|Y|N

So that's
'Bay Fisher ' as ship_name varchar(35)
'' as ship_number char(12)
' (2)' as international_voyages char(2)
which, of course, doesn't fit.

Looking at the rest of the lines, I'd guess that the ship's name is
supposed to be 'Bay Fisher || (2)' which is a bit strange. So the source
data is either invalid or, at the very least, contains strings with
unescaped column delimiters.

I will take from this an implicit enhancement request to report the line
number on which a LOAD TABLE fails as well as the value & column that
caused the problem.

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer

David Winkler wrote:
Quote:
Thanks, John for the quick response. I apologize that I
didn't point out a specific file & line number in my earlier
post, but try importing the file PUBACC_SH, which contains
the following on line # 24173:

SH|1330925|||WAL9515|R||PL|PH|'LECTRIC
LADY|WN2160U|Y|N||||8||| ||366051610|||||

This line exhibits the behavior I described in my original
post which I believe could be a bug.

The line you had identified below is certainly bad data
formatting on the FCC's part by not escaping CRLF's within
varchar fields for an ASCII export, but any problems with
trying to import it would be expected as normal behavior of
the DBMS.

Thanks again for looking into this!

Dave Winkler
Data Architect, Stanley Associates

The problem is not the quotes. Lines 275680 & 275681 of
en.dat (for example) are actually a single record:

EN|2373743|||WDA6635|L|L00115054|Academy of Natural
Sciences Estuarine Research Center|||||| (...)

LOAD TABLE and INPUT will interpret them as separate
records so it is trying to cram 'Estuarine Research
Center' into a "record_type" column which is char(2).

You need to do one of the following:
1. Verify whether any of these fields actually should/can
contain carriage returns. Maybe there shouldn't be a
newline character after "Academy of Natural Sciences". If
carriage returns are not allowed, cleanse your original
data. 2. Use escapes in your original data file so that
carriage returns don't occur inside column data (eg.
Academy of Natural Sciences \x0aEstuarine Research
Center). Remember to double-up backslashes in the original
data too.
3. Use a different end-of-record marker via "ROW DELIMITED
BY" and make sure your source files are corrected to use
that row delimiter too.

-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through
the iAnywhere Developer Community at
http://www.ianywhere.com/developer

David Winkler wrote:
Running SQL Anywhere 11.0.1.2308, I am unable to
successfully load pipe-delimited, unquoted data via the
LOAD TABLE statement or INPUT ISQL command when the data
value contains an initial apostrophe immediately
following a delimiter. Here's the statement:

load table foo using client file 'C:\\bar.txt' delimited
by '|' quotes off;

If the apostrophe or apostrophes are in any other
position than the first character in a data element they
all load fine; when it's the first position, I'll
typically get a right-truncation error as the remainder
of the line of text is apparently trying to get stuffed
into a single field. Curiously, when an initial
apostrophe is used, a second apostrophe is treated as
the closing quote (even with the QUOTES OFF option) and
remaining characters after the closing quote but before
the next delimiter are "dropped on the floor" by LOAD
TABLE. To try the same test (iAnywhere folks), the
table data I'm trying to load is here:
http://wireless.fcc.gov/uls/data/complete/l_ship.zip
(problem data lines are in the larger files)

as defined in DDL here:


http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt
Maybe a bug, maybe I'm nuts. Confirmation either way is
appreciated!

Reply With Quote
  #5  
Old   
Breck Carter [TeamSybase]
 
Posts: n/a

Default Re: LOAD TABLE (or ISQL INPUT) handling of quotes - 11-18-2009 , 02:21 PM



On 18 Nov 2009 11:29:42 -0800, "John Smirnios [Sybase]"
<smirnios_at_sybase.com> wrote:

Quote:
I will take from this an implicit enhancement request to report the line
number on which a LOAD TABLE fails as well as the value & column that
caused the problem.
Ohhhhh yeah, me too! I want to vote yes!

Breck

--
Breck Carter - Blog: http://sqlanywhere.blogspot.com/

SQLA questions and answers: http://sqla.stackexchange.com

RisingRoad helps SQL Anywhere developers make better databases
http://www.risingroad.com/
Breck.Carter at gmail

Reply With Quote
  #6  
Old   
David Winkler
 
Posts: n/a

Default Re: LOAD TABLE (or ISQL INPUT) handling of quotes - 11-18-2009 , 05:27 PM



I wonder if it's possible that the behavior which I've very
specifically described regarding the apostrophe immediately
following the delimiter is actually only occurring when
using the INPUT command in ISQL:

input into FCC.PUBACC_SH from C:\SH.dat delimited by '|';

since it reports the line which it fails on, which is
whatever the first line with a delimiter then an apostrophe
(in this case the 'LECTRIC LADY). Because INPUT INTO will
not roll back the preceding lines it has imported, I can
observe the aberrant behavior I described, and maybe LOAD
TABLE is successfully processing that line?

OK, I just tried another test with the line in question &
the lines just before and after it (so my source file to
test is just 3 lines). You're correct; LOAD TABLE is
failing for other reasons, it is only INPUT INTO which
behaves badly, just as I'd described. Excellent suggestion
though about the LOAD TABLE enhancement, because if it
reported which line it was failing on I wouldn't have leaped
to the conclusion that LOAD and INPUT were failing on the
same line. Since you've spent this much time with this file
(sorry!) could you test out the INPUT command? Throw an old
programmer a bone and tell me I'm right about this? Thanks,
John!


Quote:
The problem with that file isn't the quotes either. Line
303464 contains the following (all on one line -- the
newsgroup editor is word-wrapping):

SH|2956069|||WDE2090|R||FV|PH|Bay Fisher ||
(2)|1196757|N|N|||32|15|||||367320280|N|N|Y|Y|N

So that's
'Bay Fisher ' as ship_name varchar(35)
'' as ship_number char(12)
' (2)' as international_voyages char(2)
which, of course, doesn't fit.

Looking at the rest of the lines, I'd guess that the
ship's name is supposed to be 'Bay Fisher || (2)' which
is a bit strange. So the source data is either invalid or
, at the very least, contains strings with unescaped
column delimiters.

I will take from this an implicit enhancement request to
report the line number on which a LOAD TABLE fails as
well as the value & column that caused the problem.

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through
the iAnywhere Developer Community at
http://www.ianywhere.com/developer

David Winkler wrote:
Thanks, John for the quick response. I apologize that I
didn't point out a specific file & line number in my
earlier post, but try importing the file PUBACC_SH,
which contains the following on line # 24173:

SH|1330925|||WAL9515|R||PL|PH|'LECTRIC
LADY|WN2160U|Y|N||||8||| ||366051610|||||

This line exhibits the behavior I described in my
original post which I believe could be a bug.

The line you had identified below is certainly bad data
formatting on the FCC's part by not escaping CRLF's
within varchar fields for an ASCII export, but any
problems with trying to import it would be expected as
normal behavior of the DBMS.

Thanks again for looking into this!

Dave Winkler
Data Architect, Stanley Associates

The problem is not the quotes. Lines 275680 & 275681 of
en.dat (for example) are actually a single record:

EN|2373743|||WDA6635|L|L00115054|Academy of Natural
Sciences Estuarine Research Center|||||| (...)

LOAD TABLE and INPUT will interpret them as separate
records so it is trying to cram 'Estuarine Research
Center' into a "record_type" column which is char(2).

You need to do one of the following:
1. Verify whether any of these fields actually
should/can >> contain carriage returns. Maybe there
shouldn't be a >> newline character after "Academy of
Natural Sciences". If >> carriage returns are not allowed,
cleanse your original >> data. 2. Use escapes in your
original data file so that >> carriage returns don't
occur inside column data (eg. >> Academy of Natural
Sciences \x0aEstuarine Research >> Center). Remember to
double-up backslashes in the original >> data too.
3. Use a different end-of-record marker via "ROW
DELIMITED >> BY" and make sure your source files are
corrected to use >> that row delimiter too.

-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available
through >> the iAnywhere Developer Community at
http://www.ianywhere.com/developer

David Winkler wrote:
Running SQL Anywhere 11.0.1.2308, I am unable to
successfully load pipe-delimited, unquoted data via
the >>> LOAD TABLE statement or INPUT ISQL command when
the data >>> value contains an initial apostrophe
immediately >>> following a delimiter. Here's the
statement:
load table foo using client file 'C:\\bar.txt'
delimited >>> by '|' quotes off;

If the apostrophe or apostrophes are in any other
position than the first character in a data element
they >>> all load fine; when it's the first position, I'll
typically get a right-truncation error as the
remainder >>> of the line of text is apparently trying to
get stuffed >>> into a single field. Curiously, when an
initial >>> apostrophe is used, a second apostrophe is
treated as >>> the closing quote (even with the QUOTES OFF
option) and >>> remaining characters after the closing
quote but before >>> the next delimiter are "dropped on
the floor" by LOAD >>> TABLE. To try the same test
(iAnywhere folks), the >>> table data I'm trying to load
is here:
http://wireless.fcc.gov/uls/data/complete/l_ship.zip
(problem data lines are in the larger files)
as defined in DDL here:




http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt
Maybe a bug, maybe I'm nuts. Confirmation either way
is >>> appreciated!

Reply With Quote
  #7  
Old   
anil k goel [Sybase]
 
Posts: n/a

Default Re: LOAD TABLE (or ISQL INPUT) handling of quotes - 11-19-2009 , 11:58 AM



Also, if it is a problem getting the load file to the server machine, take a
look at a new feature in SA11 called client file loading.

LOAD TABLE USING CLIENT FILE 'abc' ...

makes the server transparently fetch the contents of file 'abc' from the
client machine instead of looking for the file on the server machine.

In that sense, LOAD TABLE USING CLIENT FILE can be a server side alternative
to DBISQL's INPUT statement.

-anil

"John Smirnios [Sybase]" <smirnios_at_sybase.com> wrote

Quote:
Apparently, dbisql's INPUT statement doesn't have a QUOTES OFF option so
that would explain the behaviour. It is always behaving the same as LOAD
TABLE with QUOTES ON and so the error you are seeing is expected.

I have no idea why such an important option would have been left off of
the INPUT statement implementation. You should make an enhancement request
for dbisql to support the QUOTES option. Dbisql is handled by another team
so it's better if you make the request.

In the mean time, you can use LOAD TABLE or get the data properly
sanitized (escapes where necessary, use quotes, etc) so that it can be
used by LOAD or INPUT. You should be able to throw together a quick Python
or Perl script to sanitize the data into a format appropriate for either
input method.

If the FCC provides the raw data in any other format, that format may be
more convenient. For example if the data is in a format that can be
accessed via an ODBC driver, you could also use remote tables to import
the data.

-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer

David Winkler wrote:
I wonder if it's possible that the behavior which I've very
specifically described regarding the apostrophe immediately
following the delimiter is actually only occurring when
using the INPUT command in ISQL:

input into FCC.PUBACC_SH from C:\SH.dat delimited by '|';

since it reports the line which it fails on, which is
whatever the first line with a delimiter then an apostrophe
(in this case the 'LECTRIC LADY). Because INPUT INTO will
not roll back the preceding lines it has imported, I can
observe the aberrant behavior I described, and maybe LOAD
TABLE is successfully processing that line?

OK, I just tried another test with the line in question &
the lines just before and after it (so my source file to
test is just 3 lines). You're correct; LOAD TABLE is
failing for other reasons, it is only INPUT INTO which
behaves badly, just as I'd described. Excellent suggestion
though about the LOAD TABLE enhancement, because if it
reported which line it was failing on I wouldn't have leaped
to the conclusion that LOAD and INPUT were failing on the
same line. Since you've spent this much time with this file
(sorry!) could you test out the INPUT command? Throw an old
programmer a bone and tell me I'm right about this? Thanks,
John!


The problem with that file isn't the quotes either. Line
303464 contains the following (all on one line -- the
newsgroup editor is word-wrapping):

SH|2956069|||WDE2090|R||FV|PH|Bay Fisher ||
(2)|1196757|N|N|||32|15|||||367320280|N|N|Y|Y|N

So that's
'Bay Fisher ' as ship_name varchar(35)
'' as ship_number char(12)
' (2)' as international_voyages char(2)
which, of course, doesn't fit.

Looking at the rest of the lines, I'd guess that the
ship's name is supposed to be 'Bay Fisher || (2)' which
is a bit strange. So the source data is either invalid or
, at the very least, contains strings with unescaped
column delimiters.

I will take from this an implicit enhancement request to
report the line number on which a LOAD TABLE fails as
well as the value & column that caused the problem.

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through
the iAnywhere Developer Community at
http://www.ianywhere.com/developer

David Winkler wrote:
Thanks, John for the quick response. I apologize that I
didn't point out a specific file & line number in my
earlier post, but try importing the file PUBACC_SH,
which contains the following on line # 24173:

SH|1330925|||WAL9515|R||PL|PH|'LECTRIC
LADY|WN2160U|Y|N||||8||| ||366051610|||||

This line exhibits the behavior I described in my
original post which I believe could be a bug.

The line you had identified below is certainly bad data
formatting on the FCC's part by not escaping CRLF's
within varchar fields for an ASCII export, but any
problems with trying to import it would be expected as
normal behavior of the DBMS.

Thanks again for looking into this!

Dave Winkler
Data Architect, Stanley Associates

The problem is not the quotes. Lines 275680 & 275681 of
en.dat (for example) are actually a single record:

EN|2373743|||WDA6635|L|L00115054|Academy of Natural
Sciences Estuarine Research Center|||||| (...)

LOAD TABLE and INPUT will interpret them as separate
records so it is trying to cram 'Estuarine Research
Center' into a "record_type" column which is char(2).

You need to do one of the following:
1. Verify whether any of these fields actually
should/can >> contain carriage returns. Maybe there
shouldn't be a >> newline character after "Academy of
Natural Sciences". If >> carriage returns are not allowed,
cleanse your original >> data. 2. Use escapes in your
original data file so that >> carriage returns don't
occur inside column data (eg. >> Academy of Natural
Sciences \x0aEstuarine Research >> Center). Remember to
double-up backslashes in the original >> data too.
3. Use a different end-of-record marker via "ROW
DELIMITED >> BY" and make sure your source files are
corrected to use >> that row delimiter too.
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available
through >> the iAnywhere Developer Community at
http://www.ianywhere.com/developer

David Winkler wrote:
Running SQL Anywhere 11.0.1.2308, I am unable to
successfully load pipe-delimited, unquoted data via
the >>> LOAD TABLE statement or INPUT ISQL command when
the data >>> value contains an initial apostrophe
immediately >>> following a delimiter. Here's the
statement:
load table foo using client file 'C:\\bar.txt'
delimited >>> by '|' quotes off;
If the apostrophe or apostrophes are in any other
position than the first character in a data element
they >>> all load fine; when it's the first position, I'll
typically get a right-truncation error as the
remainder >>> of the line of text is apparently trying to
get stuffed >>> into a single field. Curiously, when an
initial >>> apostrophe is used, a second apostrophe is
treated as >>> the closing quote (even with the QUOTES OFF
option) and >>> remaining characters after the closing
quote but before >>> the next delimiter are "dropped on
the floor" by LOAD >>> TABLE. To try the same test
(iAnywhere folks), the >>> table data I'm trying to load
is here:
http://wireless.fcc.gov/uls/data/complete/l_ship.zip
(problem data lines are in the larger files)
as defined in DDL here:


http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt
Maybe a bug, maybe I'm nuts. Confirmation either way
is >>> appreciated!

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.