dbTalk Databases Forums  

Re: utf-8 - different scripts

comp.databases.mysql comp.databases.mysql


Discuss Re: utf-8 - different scripts in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Iain
 
Posts: n/a

Default Re: utf-8 - different scripts - SOLVED! - 08-03-2012 , 10:04 PM






Gordon Burditt wrote:
Quote:
I checked these earlier and changed them. Viewing the database
properties through Navicat, they are:
Character set: utf8 -- UTF-8 Unicode
Collation: utf8_general_ci


That is what the original db is - getting that info from querying
the 'INFORMATION_SCHEMA.COLUMNS' table ...
CHARACTER_SET_NAME: utf8 - utf8_general_ci
DEFAULT_COLLATE_NAME: utf8 - utf8_general_ci

You need to be concerned with:

- The character set of the connection. ("SET NAMES utf8")
This was the solution!!!

I put the line
mysql_query("SET NAMES 'utf8'")
after
mysql_select_db(...

Why this should be the difference - needed for it to work on another site
from the same hosting company (with identical phpinfo), I do not know. But
this has not got all the characters displaying properly. That is the Greek,
Hebrew, and Coptic. Whether this is the only, or the absolutely correct
solution, I'm not sure. But it works!

It still doesn't get the Coptic to appear correctly in MS IE, so I'll have
to play around with that. The Greek and Hebrew are still OK in MS IE though.

Could it be something to do with the data coming from a Windows source,
rather than from a Linux based server, I wonder? Or from a remote source
over the internet, rather than from the same hosting company?

And I had gone through creating a new table, populating it, and including
your "André". As it happened, the correct encoding appeared.

Very many thanks to all who contributed suggestions. I have learnt quite a
bit from all of this testing (including spending lots of time messing around
with different variations on the sql dumps, and importing, etc.)
All responses very much appreciated.

--
Iain

Reply With Quote
  #12  
Old   
Gordon Burditt
 
Posts: n/a

Default Re: utf-8 - different scripts - SOLVED! - 08-04-2012 , 02:10 AM






Quote:
You need to be concerned with:

- The character set of the connection. ("SET NAMES utf8")

This was the solution!!!

I put the line
mysql_query("SET NAMES 'utf8'")
after
mysql_select_db(...

Why this should be the difference -
If you do not put this in, MySQL is supposed to take the utf8 data
in the table and *TRANSLATE IT INTO latin1*. Since there may
not be a translation for some of the characters, especially Greek,
Hebrew, and Coptic, expect some problems. This is where question
marks come from. In the command-line client, try "set names ascii"
and SELECTing something with accented letters. You get question marks.

Quote:
needed for it to work on another site
from the same hosting company (with identical phpinfo), I do not know.
The character set of the connection is a characteristic of MySQL, not PHP.
It seems to me to vary with things like the $LANG environment variable
when you start up the MySQL client. I get different results if I start
the MySQL client in a text console vs. an xterm because of this.

Quote:
But
this has not got all the characters displaying properly. That is the Greek,
Hebrew, and Coptic. Whether this is the only, or the absolutely correct
solution, I'm not sure. But it works!
If your phpinfo() shows the default charset as iso-8859-1, and you're outputting
UTF-8, you need
header("Content-type: text/html; charset=utf-8");
near the front of your web page (before any text output, even blank lines).
It's probably also a good idea to put, in the <head> section:
<meta content="text/html; charset=utf-8" http-equiv="Content-type">

I don't know why it needs to be declared TWICE. I think it has to do with
quirky browsers. View the page in Firefox and type ctrl-I. You should
see Encoding: UTF-8, not Encoding: Windows-1252 or Encoding: ISO-8859-1.

Otherwise the browser will probably render utf-8 as Windows-1252
and you get wrong characters (funny accented letters where they are not
expected).



Quote:
It still doesn't get the Coptic to appear correctly in MS IE, so I'll have
to play around with that. The Greek and Hebrew are still OK in MS IE though.

Could it be something to do with the data coming from a Windows source,
rather than from a Linux based server, I wonder? Or from a remote source
over the internet, rather than from the same hosting company?
I really doubt this. The networking connection shouldn't affect character
set translations.

Quote:
And I had gone through creating a new table, populating it, and including
your "André". As it happened, the correct encoding appeared.
That says it went *INTO* the table correctly. Your problem is getting
it out.

Quote:
Very many thanks to all who contributed suggestions. I have learnt quite a
bit from all of this testing (including spending lots of time messing around
with different variations on the sql dumps, and importing, etc.)
All responses very much appreciated.

Reply With Quote
  #13  
Old   
Luuk
 
Posts: n/a

Default Re: utf-8 - different scripts - 08-04-2012 , 04:52 AM



On 03-08-2012 19:46, J.O. Aho wrote:
Quote:
On 03/08/12 18:14, Iain wrote:

I have tried several new dumps, with importing them and maybe with one
or two tweeks, but still no success.

Make a dump and use --skip-set-charset and see if it works better, there
are always issues with charsets when export/import, I hope this could
improve in mysql.

Other options is that you use sed to replace the charset in the dump you
made.

sed 's/Latin1/utf8/g' -i yourdatabasedump.sql

and then import it.

Don't do that....

You will change the charset which is used for creating the table, but
not the characterset which was used for exporting the data in that
table. On my linux system mysqldump will generate a UTF8 file ALWAYS
(if settings are not changed with some parameters), see below, and note
the 'c3 a9' which represents the e with accent-how is the name of that
thing again .


mysql> show create table charsetlatin\G
*************************** 1. row ***************************
Table: charsetlatin
Create Table: CREATE TABLE `charsetlatin` (
`i` int(11) DEFAULT NULL,
`t` varchar(20) CHARACTER SET utf8 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
1 row in set (0.00 sec)

mysql> show create table charsetutf8\G
*************************** 1. row ***************************
Table: charsetutf8
Create Table: CREATE TABLE `charsetutf8` (
`i` int(11) DEFAULT NULL,
`t` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> select * from charsetlatin;
+------+------+
Quote:
i | t |
+------+------+
1 | é |
+------+------+
1 row in set (0.00 sec)

mysql> select * from charsetutf8;
+------+------+
Quote:
i | t |
+------+------+
1 | é |
+------+------+
1 row in set (0.00 sec)

mysql> quit
Bye
~> mysqldump test charsetlatin >charsetlatin.sql
~> mysqldump test charsetutf8 >charsetutf8.sql
~> file charsetlatin.sql
charsetlatin.sql: UTF-8 Unicode text
~> file charsetutf8.sql
charsetutf8.sql: UTF-8 Unicode text
~>
~> hexdump charsetlatin.sql
.....
00000530 41 42 4c 45 20 4b 45 59 53 20 2a 2f 3b 0a 49 4e |ABLE KEYS
*/;.IN|
00000540 53 45 52 54 20 49 4e 54 4f 20 60 63 68 61 72 73 |SERT INTO
`chars|
00000550 65 74 6c 61 74 69 6e 60 20 56 41 4c 55 45 53 20 |etlatin`
VALUES |
00000560 28 31 2c 27 c3 a9 27 29 3b 0a 2f 2a 21 34 30 30
Quote:
(1,'..');./*!400|
00000570 30 30 20 41 4c 54 45 52 20 54 41 42 4c 45 20 60 |00 ALTER
TABLE `|
.....

~> hexdump charsetutf8.sql
.....
00000500 4b 45 59 53 20 2a 2f 3b 0a 49 4e 53 45 52 54 20 |KEYS
*/;.INSERT |
00000510 49 4e 54 4f 20 60 63 68 61 72 73 65 74 75 74 66 |INTO
`charsetutf|
00000520 38 60 20 56 41 4c 55 45 53 20 28 31 2c 27 c3 a9 |8` VALUES
(1,'..|
00000530 27 29 3b 0a 2f 2a 21 34 30 30 30 30 20 41 4c 54
Quote:
');./*!40000 ALT|
.....

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.