dbTalk Databases Forums  

database engine creates problem with characters?

comp.databases.mysql comp.databases.mysql


Discuss database engine creates problem with characters? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
pac Man
 
Posts: n/a

Default database engine creates problem with characters? - 12-15-2010 , 09:30 AM






Hi

i developed a quite complex script for a migration job. It worked
well.
The database was a MyISAM.
Lately the database has been changed to InnoDB.
And then i saw that the old migration script wasnt working anymore.
i guessed the problem cause of the change of the storage.
it took a while to understand where the problema was.

In short, in a complex query where 3 tables from 2 different database
are called there is a part where it is stated:

D.short_desc = B.name

The query puts the rows in a temp table. The table has an ID, and a
CATEGORY_ID. And the ID is promary key.

The problem which stopped the execution was the duplication of ID.

Somewhere D.short_desc = B.name finds:

Klader

and somewhere else it finds:
Kläder

The same word, but the one spelled with the swedish character ä.

These two are considered the same word. The charset is DEFAULT
CHARSET=latin1
And even before changing storage I had the same charset.

The only change has been InnoDB. Why does innoDB consider these two
different entries as the same? If I write:

Select id from category where name = klader

then it retrieves two rows. Insted of one, which is the correct result.

Reply With Quote
  #2  
Old   
pac Man
 
Posts: n/a

Default Re: database engine creates problem with characters? - 12-15-2010 , 09:42 AM






On 15 Dec, 16:30, pac Man <querystra... (AT) gmail (DOT) com> wrote:

More details. i was lying. I never tested to run
Select id from category where name = klader

When I did it, both in PhpMyADMIN and through ssh, the query only
returns one row.

Anyway, when running the complex query with several tables involved,
then while joining the tables and the rows

D.short_desc = B.name still returns two rows. Which could be
considered (maybe?) as if
Select id from category where name = klader returned two rows.

Reply With Quote
  #3  
Old   
Álvaro G. Vicario
 
Posts: n/a

Default Re: database engine creates problem with characters? - 12-15-2010 , 10:28 AM



El 15/12/2010 16:30, pac Man escribió/wrote:
Quote:
Somewhere D.short_desc = B.name finds:

Klader

and somewhere else it finds:
Kläder

The same word, but the one spelled with the swedish character ä.

These two are considered the same word. The charset is DEFAULT
CHARSET=latin1
And even before changing storage I had the same charset.

The only change has been InnoDB. Why does innoDB consider these two
different entries as the same?
The rules to sort or compare characters do not depend on the character
set, they depend on the collation. If you haven't set any, it's probably
using the default collation for the latin1 charset, which is normally
(but, I guess, not necessarily) latin1_swedish_ci:

mysql> SHOW COLLATION LIKE 'latin1%';
+-------------------+---------+----+---------+----------+---------+
Quote:
Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
latin1_german1_ci | latin1 | 5 | | | 0 |
latin1_swedish_ci | latin1 | 8 | Yes | Yes | 0 |
latin1_danish_ci | latin1 | 15 | | | 0 |
latin1_german2_ci | latin1 | 31 | | Yes | 2 |
latin1_bin | latin1 | 47 | | Yes | 0 |
latin1_general_ci | latin1 | 48 | | | 0 |
latin1_general_cs | latin1 | 49 | | | 0 |
latin1_spanish_ci | latin1 | 94 | | | 0 |
+-------------------+---------+----+---------+----------+---------+

In this collation, 'a' and 'ä' are considered different:

http://www.collation-charts.org/mysq...wedish_ci.html

However, if you change the collation to e.g. latin1_spanish_ci, both
characters are the same:

http://www.collation-charts.org/mysq...panish_ci.html


My advise is that you always set an explicit collation in every table.
Than way your app does not depend on a server setting that may change
inadvertently. Plain SQL is:

CREATE TABLE foo (
one_column CHAR(3) COLLATE 'utf8_spanish_ci',
another_column VARCHAR(30) COLLATE 'latin1_general_cs',
[...]
)
COLLATE='latin1_bin'

If you omit column collations it'll use the table one.


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

Reply With Quote
  #4  
Old   
Álvaro G. Vicario
 
Posts: n/a

Default Re: database engine creates problem with characters? - 12-15-2010 , 10:29 AM



El 15/12/2010 17:28, "Álvaro G. Vicario" escribió/wrote:
Quote:
set, they depend on the collation. If you haven't set any, it's probably
using the default collation for the latin1 charset,
I forgot to mention that this query:

SELECT COLLATION(_latin1'a');

.... also tells you the default collation for latin1.


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.