dbTalk Databases Forums  

When we're supposed to set default-character-set for the client

comp.databases.mysql comp.databases.mysql


Discuss When we're supposed to set default-character-set for the client in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Radoulov, Dimitre
 
Posts: n/a

Default When we're supposed to set default-character-set for the client - 11-12-2011 , 03:54 AM






Hi all,
I'm trying to understand when I'm supposed to set/force
default-character-set for the client.

The documentation states (short url: bit.ly/vT0r6o):


You can force client programs to use specific character set as follows:

[client]
default-character-set=charset_name

This is normally unnecessary. However, when character_set_system differs
from character_set_server or character_set_client, and you input
characters manually (as database object identifiers, column values, or
both), these may be displayed incorrectly in output from the client or
the output itself may be formatted incorrectly. In such cases, starting
the mysql client with --default-character-set=system_character_set—that
is, setting the client character set to match the system character
set—should fix the problem.


Let's assume charater_set_system and character_set_server are different.


My first question:

I suppose that when I'm importing an export files generated by mysqldump
the variable default-character-set has no effect as _set names_ is
always issued at the very beginning of the export file.
Am I right or there could be corner cases?

The second question:

What about manually running sql scripts? The documentations states
that some characters *may be displayed incorrectly*. How can I be sure
that the data is imported correctly (a part from running an application
test i.e. by trial and error)?


Best regards
Dimitre

Reply With Quote
  #2  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: When we're supposed to set default-character-set for the client - 11-16-2011 , 02:08 AM






Hi Dimitre,

"Radoulov, Dimitre" <cichomitiko (AT) gmail (DOT) com> wrote:
Quote:
I'm trying to understand when I'm supposed to set/force
default-character-set for the client.

The documentation states (short url: bit.ly/vT0r6o):

This is normally unnecessary. However, when character_set_system differs
from character_set_server or character_set_client, and you input
characters manually (as database object identifiers, column values, or
both), these may be displayed incorrectly in output from the client or
the output itself may be formatted incorrectly.
I never thought I had to say this, but this is nonsense. The
--default-character-set option is used by the client to issue
a respective SET NAMES statement after connecting. This is
always necessary, with only two exceptions:

1. your system uses the default client charset of latin1, or
2. the client executes SET NAMES or equivalent on it's own


Quote:
My first question:

I suppose that when I'm importing an export files generated by mysqldump
the variable default-character-set has no effect as _set names_ is
always issued at the very beginning of the export file.
Correct. This falls into category 2. above


Quote:
The second question:

What about manually running sql scripts? The documentations states
that some characters *may be displayed incorrectly*. How can I be sure
that the data is imported correctly (a part from running an application
test i.e. by trial and error)?
If you manually execute SQL statements, then you should run
SET NAMES xxx where xxx denotes the character encoding of your
terminal. On most systems this is utf8 nowadays. If you use a
consistent work environment, then you can put

default-character-set=xxx

in the [client] section of ~/.my.cnf and forget the whole thing.

If you want to run noninteractive SQL scripts, then the character
sets must be set to the encoding used in the script. Unfortunately
there is no mandatory encoding tag in text files (like XML has), so
there is no reliable way to "guess" the encoding. Hence I suggest
to prefix all your SQL scripts with SET NAMES=xxx.

If you run SQL statements with wrong client encoding settings, then
two things will happen:

1. data already in the database will display wrong when SELECTed.

2. data inserted by you will potentially be stored as garbage.

I.e. if your client character set is at the default of latin1 but
your terminal is using utf8, then each multibyte character will be
stored as multiple latin1 characters.


HTH, XL

Reply With Quote
  #3  
Old   
Radoulov, Dimitre
 
Posts: n/a

Default Re: When we're supposed to set default-character-set for the client - 11-16-2011 , 05:38 AM



On 16/11/2011 09:08, Axel Schwenke wrote:
[...]
Quote:
"Radoulov, Dimitre"<cichomitiko (AT) gmail (DOT) com> wrote:

I'm trying to understand when I'm supposed to set/force
default-character-set for the client.

The documentation states (short url: bit.ly/vT0r6o):

This is normally unnecessary. However, when character_set_system differs
from character_set_server or character_set_client, and you input
characters manually (as database object identifiers, column values, or
both), these may be displayed incorrectly in output from the client or
the output itself may be formatted incorrectly.

I never thought I had to say this, but this is nonsense. The
--default-character-set option is used by the client to issue
a respective SET NAMES statement after connecting. This is
always necessary, with only two exceptions:

1. your system uses the default client charset of latin1, or
2. the client executes SET NAMES or equivalent on it's own


My first question:

I suppose that when I'm importing an export files generated by mysqldump
the variable default-character-set has no effect as _set names_ is
always issued at the very beginning of the export file.

Correct. This falls into category 2. above


The second question:

What about manually running sql scripts? The documentations states
that some characters *may be displayed incorrectly*. How can I be sure
that the data is imported correctly (a part from running an application
test i.e. by trial and error)?

If you manually execute SQL statements, then you should run
SET NAMES xxx where xxx denotes the character encoding of your
terminal. On most systems this is utf8 nowadays. If you use a
consistent work environment, then you can put

default-character-set=xxx

in the [client] section of ~/.my.cnf and forget the whole thing.

If you want to run noninteractive SQL scripts, then the character
sets must be set to the encoding used in the script. Unfortunately
there is no mandatory encoding tag in text files (like XML has), so
there is no reliable way to "guess" the encoding. Hence I suggest
to prefix all your SQL scripts with SET NAMES=xxx.

If you run SQL statements with wrong client encoding settings, then
two things will happen:

1. data already in the database will display wrong when SELECTed.

2. data inserted by you will potentially be stored as garbage.

I.e. if your client character set is at the default of latin1 but
your terminal is using utf8, then each multibyte character will be
stored as multiple latin1 characters.

Hi Axel,
thank you very much for your valuable inputs!



Best regards
Dimitre

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.