![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
Variable_name | Value | +-------------------------+---------------------+ protocol_version | 10 | version | 5.1.52 | version_comment | Source distribution | version_compile_machine | x86_64 | version_compile_os | unknown-linux-gnu | +-------------------------+---------------------+ |
|
Variable_name | Value | +-------------------------+------------------------------+ protocol_version | 10 | version | 5.1.47-community | version_comment | MySQL Community Server (GPL) | version_compile_machine | ia32 | version_compile_os | Win32 | +-------------------------+------------------------------+ |
#2
| |||
| |||
|
|
Something seems to have changed between that version and "5.1.47community" related to data format for LOAD DATA LOCAL. I have something that's working on "5.1.47community" and fails on "5.1.52." |
|
On CentOS, where this fails: ~~~~~~ | version | 5.1.52 | On Windows 7, where it works: ~~~~~~~~~ | version | 5.1.47-community | |
#3
| |||
| |||
|
|
Here's the LOAD DATA LOCAL statement: LOAD DATA LOCAL INFILE 'tmpYpaMdx.txt' REPLACE INTO TABLE companyindex FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ',' (conformed_company_name, domain, street_number, street_name, parser_name, location, state, postal_code, country_code, database_name, record_id) |
|
Here's the beginning of the data: "ABT UTILITY INCOME FUND INC",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430966" "ABT GROWTH & INCOME TRUST",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430968" "KEYSTONE AMERICA TAX FREE MONEY MARKET FUND",NULL,"200","BERKELEY","internal_english","BO STON","MA","02116","US","edgaraddr","430970" |
#4
| |||
| |||
|
|
John Nagle<nagle (AT) animats (DOT) com> wrote: Something seems to have changed between that version and "5.1.47community" related to data format for LOAD DATA LOCAL. I have something that's working on "5.1.47community" and fails on "5.1.52." [snip] On CentOS, where this fails: ~~~~~~ | version | 5.1.52 | On Windows 7, where it works: ~~~~~~~~~ | version | 5.1.47-community | This is very likely not MySQL versions, but end-of-line encoding. XL |
|
Variable_name | Value protocol_version | 10 version | 5.0.27-log version_bdb | Sleepycat Software: Berkeley DB 4.1.24 (October 21, 2006) version_comment | Source distribution version_compile_machine | i686 version_compile_os | redhat-linux-gnu |
#5
| ||||
| ||||
|
|
Here's the LOAD DATA LOCAL statement: LOAD DATA LOCAL INFILE 'tmpYpaMdx.txt' REPLACE INTO TABLE companyindex FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ',' (conformed_company_name, domain, street_number, street_name, parser_name, location, state, postal_code, country_code, database_name, record_id) In your data, you don't seem to have all the fields terminated by ','. (in particular, what seems to be the last field of a record). |
|
You also didn't specify what terminates a line, so exactly where does the field containing 430966 end, and why? |
|
This might also involve UNIX vs. Windows line endings. |
|
Here's the beginning of the data: "ABT UTILITY INCOME FUND INC",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430966" |
#6
| |||
| |||
|
|
I'm bringing up a new server which runs CentOS 6 with MySQL 5.1.52. Something seems to have changed between that version and "5.1.47community" related to data format for LOAD DATA LOCAL. I have something that's working on "5.1.47community" and fails on "5.1.52." Here's the LOAD DATA LOCAL statement: LOAD DATA LOCAL INFILE 'tmpYpaMdx.txt' REPLACE INTO TABLE companyindex FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ',' (conformed_company_name, domain, street_number, street_name, parser_name, location, state, postal_code, country_code, database_name, record_id) This is executed by a Python program {Python 2.7 in all cases), and the following warnings appear: InfoCompanyDb.py:295: Warning: Data truncated for column 'parser_name' at row 1 outcursor.execute(s) InfoCompanyDb.py:295: Warning: Data truncated for column 'state' at row 1 outcursor.execute(s) InfoCompanyDb.py:295: Warning: Row 1 doesn't contain data for all columns outcursor.execute(s) ... same for more rows This can be reproduced with the "mysql" command line program, so it's not a Python issue. Here's the beginning of the data: "ABT UTILITY INCOME FUND INC",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430966" "ABT GROWTH & INCOME TRUST",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430968" "KEYSTONE AMERICA TAX FREE MONEY MARKET FUND",NULL,"200","BERKELEY","internal_english","BO STON","MA","02116","US","edgaraddr","430970" "KEYSTONE AMERICA FUND OF GROWTH STOCKS",NULL,"200","BERKLEY","internal_english","B OSTON","MA","02116","US","edgaraddr","430972" "STROME INVESTMENT MANAGEMENT LP",NULL,"100","WILSHIRE","internal_english","SANT A MONICA","CA","90401","US","edgaraddr","430974" (Those are actually on one line each, with no blank lines.) Clearly the fields got out of sync somehow. Note the use of NULL. That's correct, according to the MySQL documentation. "If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'." Ref: http://dev.mysql.com/doc/refman/5.0/en/load-data.html Did handling of NULL chnage at some point? Here's the table definition: SHOW CREATE TABLE companyindex; CREATE TABLE `companyindex` ( `domain` varchar(255) DEFAULT NULL, `conformed_company_name` varchar(120) NOT NULL, `location` varchar(50) DEFAULT NULL, `state` varchar(2) DEFAULT NULL, `postal_code` varchar(10) DEFAULT NULL, `country_code` varchar(2) DEFAULT NULL, `database_name` varchar(10) NOT NULL, `record_id` bigint(20) NOT NULL, `street_number` varchar(10) DEFAULT NULL, `street_name` varchar(50) DEFAULT NULL, `parser_name` enum('no_address','no_parser','internal_english',' usc_deterministic','google_geocoder') DEFAULT NULL, UNIQUE KEY `record_id` (`record_id`,`database_name`), KEY `domain` (`domain`), KEY `conformed_company_name` (`conformed_company_name`), KEY `location` (`location`), KEY `state` (`state`), KEY `postal_code` (`postal_code`), KEY `country_code` (`country_code`), KEY `parser_name` (`parser_name`), KEY `street_number` (`street_number`,`street_name`,`postal_code`,`coun try_code`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; |
#7
| |||
| |||
|
|
On Thu, 19 Jan 2012 23:12:31 -0800, John Nagle wrote: I'm bringing up a new server which runs CentOS 6 with MySQL 5.1.52. Something seems to have changed between that version and "5.1.47community" related to data format for LOAD DATA LOCAL. I have something that's working on "5.1.47community" and fails on "5.1.52." Here's the LOAD DATA LOCAL statement: LOAD DATA LOCAL INFILE 'tmpYpaMdx.txt' REPLACE INTO TABLE companyindex FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ',' (conformed_company_name, domain, street_number, street_name, parser_name, location, state, postal_code, country_code, database_name, record_id) This is executed by a Python program {Python 2.7 in all cases), and the following warnings appear: InfoCompanyDb.py:295: Warning: Data truncated for column 'parser_name' at row 1 outcursor.execute(s) InfoCompanyDb.py:295: Warning: Data truncated for column 'state' at row 1 outcursor.execute(s) InfoCompanyDb.py:295: Warning: Row 1 doesn't contain data for all columns outcursor.execute(s) ... same for more rows This can be reproduced with the "mysql" command line program, so it's not a Python issue. Here's the beginning of the data: "ABT UTILITY INCOME FUND INC",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430966" "ABT GROWTH& INCOME TRUST",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430968" "KEYSTONE AMERICA TAX FREE MONEY MARKET FUND",NULL,"200","BERKELEY","internal_english","BO STON","MA","02116","US","edgaraddr","430970" "KEYSTONE AMERICA FUND OF GROWTH STOCKS",NULL,"200","BERKLEY","internal_english","B OSTON","MA","02116","US","edgaraddr","430972" "STROME INVESTMENT MANAGEMENT LP",NULL,"100","WILSHIRE","internal_english","SANT A MONICA","CA","90401","US","edgaraddr","430974" (Those are actually on one line each, with no blank lines.) Clearly the fields got out of sync somehow. Note the use of NULL. That's correct, according to the MySQL documentation. "If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'." Ref: http://dev.mysql.com/doc/refman/5.0/en/load-data.html Did handling of NULL chnage at some point? Here's the table definition: SHOW CREATE TABLE companyindex; CREATE TABLE `companyindex` ( `domain` varchar(255) DEFAULT NULL, `conformed_company_name` varchar(120) NOT NULL, `location` varchar(50) DEFAULT NULL, `state` varchar(2) DEFAULT NULL, `postal_code` varchar(10) DEFAULT NULL, `country_code` varchar(2) DEFAULT NULL, `database_name` varchar(10) NOT NULL, `record_id` bigint(20) NOT NULL, `street_number` varchar(10) DEFAULT NULL, `street_name` varchar(50) DEFAULT NULL, `parser_name` enum('no_address','no_parser','internal_english',' usc_deterministic','google_geocoder') DEFAULT NULL, UNIQUE KEY `record_id` (`record_id`,`database_name`), KEY `domain` (`domain`), KEY `conformed_company_name` (`conformed_company_name`), KEY `location` (`location`), KEY `state` (`state`), KEY `postal_code` (`postal_code`), KEY `country_code` (`country_code`), KEY `parser_name` (`parser_name`), KEY `street_number` (`street_number`,`street_name`,`postal_code`,`coun try_code`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; You're showing one table create and it has a first column `domain`. The first column in the data looks like a company name, and the second column in the data is NULL, while the second column in the CREATE TABLE is listed as NOT NULL. Let's take this from the top again... |
|
LOAD DATA LOCAL INFILE 'tmpYpaMdx.txt' REPLACE INTO TABLE companyindex FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ',' (conformed_company_name, domain, street_number, street_name, parser_name, location, state, postal_code, country_code, database_name, record_id) |
#8
| |||
| |||
|
|
On 1/21/2012 10:06 AM, Peter H. Coffin wrote: On Thu, 19 Jan 2012 23:12:31 -0800, John Nagle wrote: I'm bringing up a new server which runs CentOS 6 with MySQL 5.1.52. Something seems to have changed between that version and "5.1.47community" related to data format for LOAD DATA LOCAL. I have something that's working on "5.1.47community" and fails on "5.1.52." Here's the LOAD DATA LOCAL statement: LOAD DATA LOCAL INFILE 'tmpYpaMdx.txt' REPLACE INTO TABLE companyindex FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ',' (conformed_company_name, domain, street_number, street_name, parser_name, location, state, postal_code, country_code, database_name, record_id) This is executed by a Python program {Python 2.7 in all cases), and the following warnings appear: InfoCompanyDb.py:295: Warning: Data truncated for column 'parser_name' at row 1 outcursor.execute(s) InfoCompanyDb.py:295: Warning: Data truncated for column 'state' at row 1 outcursor.execute(s) InfoCompanyDb.py:295: Warning: Row 1 doesn't contain data for all columns outcursor.execute(s) ... same for more rows This can be reproduced with the "mysql" command line program, so it's not a Python issue. Here's the beginning of the data: "ABT UTILITY INCOME FUND INC",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430966" "ABT GROWTH& INCOME TRUST",NULL,"340","ROYAL PALM","internal_english","PALM BEACH","FL","33480","US","edgaraddr","430968" "KEYSTONE AMERICA TAX FREE MONEY MARKET FUND",NULL,"200","BERKELEY","internal_english","BO STON","MA","02116","US","edgaraddr","430970" "KEYSTONE AMERICA FUND OF GROWTH STOCKS",NULL,"200","BERKLEY","internal_english","B OSTON","MA","02116","US","edgaraddr","430972" "STROME INVESTMENT MANAGEMENT LP",NULL,"100","WILSHIRE","internal_english","SANT A MONICA","CA","90401","US","edgaraddr","430974" (Those are actually on one line each, with no blank lines.) Clearly the fields got out of sync somehow. Note the use of NULL. That's correct, according to the MySQL documentation. "If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'." Ref: http://dev.mysql.com/doc/refman/5.0/en/load-data.html Did handling of NULL chnage at some point? Here's the table definition: SHOW CREATE TABLE companyindex; CREATE TABLE `companyindex` ( `domain` varchar(255) DEFAULT NULL, `conformed_company_name` varchar(120) NOT NULL, `location` varchar(50) DEFAULT NULL, `state` varchar(2) DEFAULT NULL, `postal_code` varchar(10) DEFAULT NULL, `country_code` varchar(2) DEFAULT NULL, `database_name` varchar(10) NOT NULL, `record_id` bigint(20) NOT NULL, `street_number` varchar(10) DEFAULT NULL, `street_name` varchar(50) DEFAULT NULL, `parser_name` enum('no_address','no_parser','internal_english',' usc_deterministic','google_geocoder') DEFAULT NULL, UNIQUE KEY `record_id` (`record_id`,`database_name`), KEY `domain` (`domain`), KEY `conformed_company_name` (`conformed_company_name`), KEY `location` (`location`), KEY `state` (`state`), KEY `postal_code` (`postal_code`), KEY `country_code` (`country_code`), KEY `parser_name` (`parser_name`), KEY `street_number` (`street_number`,`street_name`,`postal_code`,`coun try_code`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; You're showing one table create and it has a first column `domain`. The first column in the data looks like a company name, and the second column in the data is NULL, while the second column in the CREATE TABLE is listed as NOT NULL. Let's take this from the top again... The field order in the data file is determined by the field list in the LOAD DATA INFILE, which is LOAD DATA LOCAL INFILE 'tmpYpaMdx.txt' REPLACE INTO TABLE companyindex FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ',' (conformed_company_name, domain, street_number, street_name, parser_name, location, state, postal_code, country_code, database_name, record_id) That NULL in "domain" seems to be causing trouble. But that's what the manual says to use when FIELDS ENCLOSED BY is specified. John Nagle Found the problem. I'm uploading a file that's UTF-8, and the |
![]() |
| Thread Tools | |
| Display Modes | |
| |