dbTalk Databases Forums  

Re: [BUGS] BUG #1721: mutiple bytes character string comaprison

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss Re: [BUGS] BUG #1721: mutiple bytes character string comaprison in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Kris Jurka
 
Posts: n/a

Default Re: [BUGS] BUG #1721: mutiple bytes character string comaprison - 06-19-2005 , 11:32 PM








On Sun, 19 Jun 2005, Tom Lane wrote:

Quote:
"Chii-Tung Liu" <cdliou (AT) mail (DOT) cyut.edu.tw> writes:
PostgreSQL version: 8.0.3
Operating system: Windows XP SP2

When compare two UTF-8 encoded string that contains Chinese words, the
result is always TRUE

Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.

Shouldn't we forbid its creation then? At least a strongly worded
warning? We see these complaints too often.

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo (AT) postgresql (DOT) org)


Reply With Quote
  #2  
Old   
Tatsuo Ishii
 
Posts: n/a

Default Re: [BUGS] BUG #1721: mutiple bytes character string comaprison - 06-20-2005 , 12:40 AM






Quote:
The following bug has been logged online:

Bug reference: 1721
Logged by: Chii-Tung Liu
Email address: cdliou (AT) mail (DOT) cyut.edu.tw
PostgreSQL version: 8.0.3
Operating system: Windows XP SP2
Description: mutiple bytes character string comaprison error
Details:

When compare two UTF-8 encoded string that contains Chinese words, the
result is always TRUE
1. create a database test with encoding set to unicode
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UNICODE'
TABLESPACE = pg_default;
2. insert data with Chinese words
INSERT into node set title='1 中文'

3. SELECT title from node where title > '1.1 '
would return '1 中文'

4. Both SELECT '1 中文' > '1.1' and SELECT '1.1' > '1 中文' return
FALSE
I think you need to use C locale.
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #3  
Old   
Tatsuo Ishii
 
Posts: n/a

Default Re: [BUGS] BUG #1721: mutiple bytes character string comaprison - 06-20-2005 , 05:39 PM



Quote:
Tom Lane wrote:
Kris Jurka <books (AT) ejurka (DOT) com> writes:
On Sun, 19 Jun 2005, Tom Lane wrote:
Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.

Shouldn't we forbid its creation then?

There was serious discussion of that before the 8.0 release, but
we decided not to forbid it. Check the archives; I don't recall
the reasoning at the moment.

UTF8 encoding works with the C locale assuming you don't care about
ordering of the character set, e.g. Japanese.
No, sometimes Japanese needs char ordering too and I think this is not
a Windows only problem. The real problem is Unicode defines char
orderes in totally random manner because Chinese/Japanese/Korean Kanji
characters are "Unified" in Unicode. To solve the problem, we can use
convert UTF8 to EUC_JP using CONVERT. See archives for more details.

Or you can use Unicode locale only if your platform's locale database
is not broken and you only use single locale.
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.