dbTalk Databases Forums  

Re: [BUGS] BUG #1268: Two different Unicode chars are treated as

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss Re: [BUGS] BUG #1268: Two different Unicode chars are treated as in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Kent Tong
 
Posts: n/a

Default Re: [BUGS] BUG #1268: Two different Unicode chars are treated as - 09-23-2004 , 10:54 PM






Tom Lane wrote:

Quote:
"PostgreSQL Bugs List" <pgsql-bugs (AT) postgresql (DOT) org> writes:

Description: Two different Unicode chars are treated as equal in a
query


This would be a matter to take up with the maintainer of your locale
(which you didn't mention, but in any case it's a locale bug). We
just do what strcoll() tells us.
Thanks for the quick reply. The system locale is zh_TW.Big5. However,
I've tried setting it to "C" but the test case still fails.

In order to check if it's a locale bug, I've written a C program:

#include <locale.h>
#include <stdio.h>
#include <string.h>

int main() {
char *s1 = "\xe4\xba\x8c";
char *s2 = "\xe4\xba\x94";
setlocale(LC_ALL, "en.UTF-8");
//setlocale(LC_ALL, "zh.Big5"); //doesn't make any difference
printf("%d\n", strcoll(s1, s2));
return 0;
}

and compiled it and run it on that computer. It prints -1.
It means that strcoll is working.

Quote:
Note that it's possible this is a configuration error and not an
outright bug. Check to make sure that the locale you initdb'd
under is actually designed to work with UTF-8 data.
Does it matter? The encoding provided to initdb is just
a default for the databases to be created in the future.
When I used createdb, I did specify "-E unicode".

--
Kent Tong, Msc, MCSE, SCJP, CCSA, Delphi Certified
Manager of IT Dept, CPTTM
Authorized training for Borland, Cisco, Microsoft, Oracle, RedFlag & RedHat

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org


Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1268: Two different Unicode chars are treated as - 09-23-2004 , 11:35 PM






Kent Tong <kent (AT) cpttm (DOT) org.mo> writes:
Quote:
Does it matter? The encoding provided to initdb is just
a default for the databases to be created in the future.
Yes it does, and you missed the point. I said *locale*, not *encoding*.
The LC_COLLATE and LC_CTYPE settings that prevail during initdb are
fixed and not alterable without re-initdb. (I agree that this sucks,
but that's how it is for now...)

Your test program doesn't prove a lot unless you are sure it's executing
under the same locale settings as the postmaster is running in.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #3  
Old   
Kent Tong
 
Posts: n/a

Default Re: [BUGS] BUG #1268: Two different Unicode chars are treated as - 09-24-2004 , 04:33 AM



Tom Lane wrote:
Quote:
Yes it does, and you missed the point. I said *locale*, not *encoding*.
The LC_COLLATE and LC_CTYPE settings that prevail during initdb are
fixed and not alterable without re-initdb. (I agree that this sucks,
but that's how it is for now...)
You're right. After using:

initdb --locale zh_TW.utf8 /var/lib/pgsql/data

then it works fine!

Thanks again and sorry about any inconvenience.

--
Kent Tong, Msc, MCSE, SCJP, CCSA, Delphi Certified
Manager of IT Dept, CPTTM
Authorized training for Borland, Cisco, Microsoft, Oracle, RedFlag & RedHat

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.