![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Running SQL Anywhere 11.0.1.2308, I am unable to successfully load pipe-delimited, unquoted data via the LOAD TABLE statement or INPUT ISQL command when the data value contains an initial apostrophe immediately following a delimiter. Here's the statement: load table foo using client file 'C:\\bar.txt' delimited by '|' quotes off; If the apostrophe or apostrophes are in any other position than the first character in a data element they all load fine; when it's the first position, I'll typically get a right-truncation error as the remainder of the line of text is apparently trying to get stuffed into a single field. Curiously, when an initial apostrophe is used, a second apostrophe is treated as the closing quote (even with the QUOTES OFF option) and remaining characters after the closing quote but before the next delimiter are "dropped on the floor" by LOAD TABLE. To try the same test (iAnywhere folks), the table data I'm trying to load is here: http://wireless.fcc.gov/uls/data/complete/l_ship.zip (problem data lines are in the larger files) as defined in DDL here: http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt Maybe a bug, maybe I'm nuts. Confirmation either way is appreciated! |
#3
| |||
| |||
|
|
The problem is not the quotes. Lines 275680 & 275681 of en.dat (for example) are actually a single record: EN|2373743|||WDA6635|L|L00115054|Academy of Natural Sciences Estuarine Research Center|||||| (...) LOAD TABLE and INPUT will interpret them as separate records so it is trying to cram 'Estuarine Research Center' into a "record_type" column which is char(2). You need to do one of the following: 1. Verify whether any of these fields actually should/can contain carriage returns. Maybe there shouldn't be a newline character after "Academy of Natural Sciences". If carriage returns are not allowed, cleanse your original data. 2. Use escapes in your original data file so that carriage returns don't occur inside column data (eg. Academy of Natural Sciences \x0aEstuarine Research Center). Remember to double-up backslashes in the original data too. 3. Use a different end-of-record marker via "ROW DELIMITED BY" and make sure your source files are corrected to use that row delimiter too. -john. -- John Smirnios Senior Software Developer iAnywhere Solutions Engineering Whitepapers, TechDocs, bug fixes are all available through the iAnywhere Developer Community at http://www.ianywhere.com/developer David Winkler wrote: Running SQL Anywhere 11.0.1.2308, I am unable to successfully load pipe-delimited, unquoted data via the LOAD TABLE statement or INPUT ISQL command when the data value contains an initial apostrophe immediately following a delimiter. Here's the statement: load table foo using client file 'C:\\bar.txt' delimited by '|' quotes off; If the apostrophe or apostrophes are in any other position than the first character in a data element they all load fine; when it's the first position, I'll typically get a right-truncation error as the remainder of the line of text is apparently trying to get stuffed into a single field. Curiously, when an initial apostrophe is used, a second apostrophe is treated as the closing quote (even with the QUOTES OFF option) and remaining characters after the closing quote but before the next delimiter are "dropped on the floor" by LOAD TABLE. To try the same test (iAnywhere folks), the table data I'm trying to load is here: http://wireless.fcc.gov/uls/data/complete/l_ship.zip (problem data lines are in the larger files) as defined in DDL here: http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt Maybe a bug, maybe I'm nuts. Confirmation either way is appreciated! |
#4
| |||
| |||
|
|
Thanks, John for the quick response. I apologize that I didn't point out a specific file & line number in my earlier post, but try importing the file PUBACC_SH, which contains the following on line # 24173: SH|1330925|||WAL9515|R||PL|PH|'LECTRIC LADY|WN2160U|Y|N||||8||| ||366051610||||| This line exhibits the behavior I described in my original post which I believe could be a bug. The line you had identified below is certainly bad data formatting on the FCC's part by not escaping CRLF's within varchar fields for an ASCII export, but any problems with trying to import it would be expected as normal behavior of the DBMS. Thanks again for looking into this! Dave Winkler Data Architect, Stanley Associates The problem is not the quotes. Lines 275680 & 275681 of en.dat (for example) are actually a single record: EN|2373743|||WDA6635|L|L00115054|Academy of Natural Sciences Estuarine Research Center|||||| (...) LOAD TABLE and INPUT will interpret them as separate records so it is trying to cram 'Estuarine Research Center' into a "record_type" column which is char(2). You need to do one of the following: 1. Verify whether any of these fields actually should/can contain carriage returns. Maybe there shouldn't be a newline character after "Academy of Natural Sciences". If carriage returns are not allowed, cleanse your original data. 2. Use escapes in your original data file so that carriage returns don't occur inside column data (eg. Academy of Natural Sciences \x0aEstuarine Research Center). Remember to double-up backslashes in the original data too. 3. Use a different end-of-record marker via "ROW DELIMITED BY" and make sure your source files are corrected to use that row delimiter too. -john. -- John Smirnios Senior Software Developer iAnywhere Solutions Engineering Whitepapers, TechDocs, bug fixes are all available through the iAnywhere Developer Community at http://www.ianywhere.com/developer David Winkler wrote: Running SQL Anywhere 11.0.1.2308, I am unable to successfully load pipe-delimited, unquoted data via the LOAD TABLE statement or INPUT ISQL command when the data value contains an initial apostrophe immediately following a delimiter. Here's the statement: load table foo using client file 'C:\\bar.txt' delimited by '|' quotes off; If the apostrophe or apostrophes are in any other position than the first character in a data element they all load fine; when it's the first position, I'll typically get a right-truncation error as the remainder of the line of text is apparently trying to get stuffed into a single field. Curiously, when an initial apostrophe is used, a second apostrophe is treated as the closing quote (even with the QUOTES OFF option) and remaining characters after the closing quote but before the next delimiter are "dropped on the floor" by LOAD TABLE. To try the same test (iAnywhere folks), the table data I'm trying to load is here: http://wireless.fcc.gov/uls/data/complete/l_ship.zip (problem data lines are in the larger files) as defined in DDL here: http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt Maybe a bug, maybe I'm nuts. Confirmation either way is appreciated! |
#5
| |||
| |||
|
|
I will take from this an implicit enhancement request to report the line number on which a LOAD TABLE fails as well as the value & column that caused the problem. ![]() |
#6
| |||
| |||
|
|
The problem with that file isn't the quotes either. Line 303464 contains the following (all on one line -- the newsgroup editor is word-wrapping): SH|2956069|||WDE2090|R||FV|PH|Bay Fisher || (2)|1196757|N|N|||32|15|||||367320280|N|N|Y|Y|N So that's 'Bay Fisher ' as ship_name varchar(35) '' as ship_number char(12) ' (2)' as international_voyages char(2) which, of course, doesn't fit. Looking at the rest of the lines, I'd guess that the ship's name is supposed to be 'Bay Fisher || (2)' which is a bit strange. So the source data is either invalid or , at the very least, contains strings with unescaped column delimiters. I will take from this an implicit enhancement request to report the line number on which a LOAD TABLE fails as well as the value & column that caused the problem. ![]() -john. -- John Smirnios Senior Software Developer iAnywhere Solutions Engineering Whitepapers, TechDocs, bug fixes are all available through the iAnywhere Developer Community at http://www.ianywhere.com/developer David Winkler wrote: Thanks, John for the quick response. I apologize that I didn't point out a specific file & line number in my earlier post, but try importing the file PUBACC_SH, which contains the following on line # 24173: SH|1330925|||WAL9515|R||PL|PH|'LECTRIC LADY|WN2160U|Y|N||||8||| ||366051610||||| This line exhibits the behavior I described in my original post which I believe could be a bug. The line you had identified below is certainly bad data formatting on the FCC's part by not escaping CRLF's within varchar fields for an ASCII export, but any problems with trying to import it would be expected as normal behavior of the DBMS. Thanks again for looking into this! Dave Winkler Data Architect, Stanley Associates The problem is not the quotes. Lines 275680 & 275681 of en.dat (for example) are actually a single record: EN|2373743|||WDA6635|L|L00115054|Academy of Natural Sciences Estuarine Research Center|||||| (...) LOAD TABLE and INPUT will interpret them as separate records so it is trying to cram 'Estuarine Research Center' into a "record_type" column which is char(2). You need to do one of the following: 1. Verify whether any of these fields actually should/can >> contain carriage returns. Maybe there shouldn't be a >> newline character after "Academy of Natural Sciences". If >> carriage returns are not allowed, cleanse your original >> data. 2. Use escapes in your original data file so that >> carriage returns don't occur inside column data (eg. >> Academy of Natural Sciences \x0aEstuarine Research >> Center). Remember to double-up backslashes in the original >> data too. 3. Use a different end-of-record marker via "ROW DELIMITED >> BY" and make sure your source files are corrected to use >> that row delimiter too. -john. -- John Smirnios Senior Software Developer iAnywhere Solutions Engineering Whitepapers, TechDocs, bug fixes are all available through >> the iAnywhere Developer Community at http://www.ianywhere.com/developer David Winkler wrote: Running SQL Anywhere 11.0.1.2308, I am unable to successfully load pipe-delimited, unquoted data via the >>> LOAD TABLE statement or INPUT ISQL command when the data >>> value contains an initial apostrophe immediately >>> following a delimiter. Here's the statement: load table foo using client file 'C:\\bar.txt' delimited >>> by '|' quotes off; If the apostrophe or apostrophes are in any other position than the first character in a data element they >>> all load fine; when it's the first position, I'll typically get a right-truncation error as the remainder >>> of the line of text is apparently trying to get stuffed >>> into a single field. Curiously, when an initial >>> apostrophe is used, a second apostrophe is treated as >>> the closing quote (even with the QUOTES OFF option) and >>> remaining characters after the closing quote but before >>> the next delimiter are "dropped on the floor" by LOAD >>> TABLE. To try the same test (iAnywhere folks), the >>> table data I'm trying to load is here: http://wireless.fcc.gov/uls/data/complete/l_ship.zip (problem data lines are in the larger files) as defined in DDL here: http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt Maybe a bug, maybe I'm nuts. Confirmation either way is >>> appreciated! |
#7
| |||
| |||
|
|
Apparently, dbisql's INPUT statement doesn't have a QUOTES OFF option so that would explain the behaviour. It is always behaving the same as LOAD TABLE with QUOTES ON and so the error you are seeing is expected. I have no idea why such an important option would have been left off of the INPUT statement implementation. You should make an enhancement request for dbisql to support the QUOTES option. Dbisql is handled by another team so it's better if you make the request. In the mean time, you can use LOAD TABLE or get the data properly sanitized (escapes where necessary, use quotes, etc) so that it can be used by LOAD or INPUT. You should be able to throw together a quick Python or Perl script to sanitize the data into a format appropriate for either input method. If the FCC provides the raw data in any other format, that format may be more convenient. For example if the data is in a format that can be accessed via an ODBC driver, you could also use remote tables to import the data. -john. -- John Smirnios Senior Software Developer iAnywhere Solutions Engineering Whitepapers, TechDocs, bug fixes are all available through the iAnywhere Developer Community at http://www.ianywhere.com/developer David Winkler wrote: I wonder if it's possible that the behavior which I've very specifically described regarding the apostrophe immediately following the delimiter is actually only occurring when using the INPUT command in ISQL: input into FCC.PUBACC_SH from C:\SH.dat delimited by '|'; since it reports the line which it fails on, which is whatever the first line with a delimiter then an apostrophe (in this case the 'LECTRIC LADY). Because INPUT INTO will not roll back the preceding lines it has imported, I can observe the aberrant behavior I described, and maybe LOAD TABLE is successfully processing that line? OK, I just tried another test with the line in question & the lines just before and after it (so my source file to test is just 3 lines). You're correct; LOAD TABLE is failing for other reasons, it is only INPUT INTO which behaves badly, just as I'd described. Excellent suggestion though about the LOAD TABLE enhancement, because if it reported which line it was failing on I wouldn't have leaped to the conclusion that LOAD and INPUT were failing on the same line. Since you've spent this much time with this file (sorry!) could you test out the INPUT command? Throw an old programmer a bone and tell me I'm right about this? Thanks, John! The problem with that file isn't the quotes either. Line 303464 contains the following (all on one line -- the newsgroup editor is word-wrapping): SH|2956069|||WDE2090|R||FV|PH|Bay Fisher || (2)|1196757|N|N|||32|15|||||367320280|N|N|Y|Y|N So that's 'Bay Fisher ' as ship_name varchar(35) '' as ship_number char(12) ' (2)' as international_voyages char(2) which, of course, doesn't fit. Looking at the rest of the lines, I'd guess that the ship's name is supposed to be 'Bay Fisher || (2)' which is a bit strange. So the source data is either invalid or , at the very least, contains strings with unescaped column delimiters. I will take from this an implicit enhancement request to report the line number on which a LOAD TABLE fails as well as the value & column that caused the problem. ![]() -john. -- John Smirnios Senior Software Developer iAnywhere Solutions Engineering Whitepapers, TechDocs, bug fixes are all available through the iAnywhere Developer Community at http://www.ianywhere.com/developer David Winkler wrote: Thanks, John for the quick response. I apologize that I didn't point out a specific file & line number in my earlier post, but try importing the file PUBACC_SH, which contains the following on line # 24173: SH|1330925|||WAL9515|R||PL|PH|'LECTRIC LADY|WN2160U|Y|N||||8||| ||366051610||||| This line exhibits the behavior I described in my original post which I believe could be a bug. The line you had identified below is certainly bad data formatting on the FCC's part by not escaping CRLF's within varchar fields for an ASCII export, but any problems with trying to import it would be expected as normal behavior of the DBMS. Thanks again for looking into this! Dave Winkler Data Architect, Stanley Associates The problem is not the quotes. Lines 275680 & 275681 of en.dat (for example) are actually a single record: EN|2373743|||WDA6635|L|L00115054|Academy of Natural Sciences Estuarine Research Center|||||| (...) LOAD TABLE and INPUT will interpret them as separate records so it is trying to cram 'Estuarine Research Center' into a "record_type" column which is char(2). You need to do one of the following: 1. Verify whether any of these fields actually should/can >> contain carriage returns. Maybe there shouldn't be a >> newline character after "Academy of Natural Sciences". If >> carriage returns are not allowed, cleanse your original >> data. 2. Use escapes in your original data file so that >> carriage returns don't occur inside column data (eg. >> Academy of Natural Sciences \x0aEstuarine Research >> Center). Remember to double-up backslashes in the original >> data too. 3. Use a different end-of-record marker via "ROW DELIMITED >> BY" and make sure your source files are corrected to use >> that row delimiter too. -john. -- John Smirnios Senior Software Developer iAnywhere Solutions Engineering Whitepapers, TechDocs, bug fixes are all available through >> the iAnywhere Developer Community at http://www.ianywhere.com/developer David Winkler wrote: Running SQL Anywhere 11.0.1.2308, I am unable to successfully load pipe-delimited, unquoted data via the >>> LOAD TABLE statement or INPUT ISQL command when the data >>> value contains an initial apostrophe immediately >>> following a delimiter. Here's the statement: load table foo using client file 'C:\\bar.txt' delimited >>> by '|' quotes off; If the apostrophe or apostrophes are in any other position than the first character in a data element they >>> all load fine; when it's the first position, I'll typically get a right-truncation error as the remainder >>> of the line of text is apparently trying to get stuffed >>> into a single field. Curiously, when an initial >>> apostrophe is used, a second apostrophe is treated as >>> the closing quote (even with the QUOTES OFF option) and >>> remaining characters after the closing quote but before >>> the next delimiter are "dropped on the floor" by LOAD >>> TABLE. To try the same test (iAnywhere folks), the >>> table data I'm trying to load is here: http://wireless.fcc.gov/uls/data/complete/l_ship.zip (problem data lines are in the larger files) as defined in DDL here: http://wireless.fcc.gov/uls/data/doc.../pa_ddef44.txt Maybe a bug, maybe I'm nuts. Confirmation either way is >>> appreciated! |
![]() |
| Thread Tools | |
| Display Modes | |
| |