dbTalk Databases Forums  

Re: [BUGS] equal operator fails on two identical strings if initdb

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss Re: [BUGS] equal operator fails on two identical strings if initdb in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Kent Tong
 
Posts: n/a

Default Re: [BUGS] equal operator fails on two identical strings if initdb - 11-24-2004 , 04:36 AM






Peter Eisentraut wrote:
Quote:
Here is a test (run in pgadmin III):
1. createdb db1 -E Unicode


Probably your locale does not support Unicode. You need to pick an
encoding that matches your locale or vice versa.
Is there any way to check?
I have other programs reading and writing Unicode on this
computer without problems.

Quote:
BTW, the locale for traditional chinese in postgresql.conf is
set to "traditional-chinese" literally. Shouldn't it be
zh_TW?


That depends on what locale names the Windows operating system
understands.
Are you using the locale routines in mingw?


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo (AT) postgresql (DOT) org)


Reply With Quote
  #2  
Old   
Peter Eisentraut
 
Posts: n/a

Default Re: [BUGS] equal operator fails on two identical strings if initdb - 11-24-2004 , 09:50 AM






Kent Tong wrote:
Quote:
Is there any way to check?
On a POSIX system, you can do

$ LC_ALL=<some_locale> locale charmap

and verify manually that the printed charmap (= character set encoding)
matches what you use in PostgreSQL. I don't know whether an equivalent
interface exists on Windows.

Quote:
I have other programs reading and writing Unicode on this
computer without problems.
Reading and writing Unicode is not a problem. But if you run the string
comparison operators, PostgreSQL passes the Unicode strings from your
database to the operating system's collation routines, which will
compare them thinking they are Big5 (or whatever) strings, which will
result in the random behavior you observed. You need to set an
appropriate locale so that the operating system also thinks they are in
Unicode.

Quote:
Are you using the locale routines in mingw?
I believe we do.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #3  
Old   
Kent Tong
 
Posts: n/a

Default Re: [BUGS] equal operator fails on two identical strings if initdb - 11-24-2004 , 09:42 PM



Peter Eisentraut wrote:
Quote:
On a POSIX system, you can do

$ LC_ALL=<some_locale> locale charmap

and verify manually that the printed charmap (= character set encoding)
matches what you use in PostgreSQL. I don't know whether an equivalent
interface exists on Windows.
Right, there is no such command.

Quote:
Reading and writing Unicode is not a problem. But if you run the string
comparison operators, PostgreSQL passes the Unicode strings from your
database to the operating system's collation routines, which will
compare them thinking they are Big5 (or whatever) strings, which will
result in the random behavior you observed. You need to set an
appropriate locale so that the operating system also thinks they are in
Unicode.
You mean the OS fails to convert unicode strings to Big5 or the
OS assumes the bytes are already in Big5?

It is the locale used for initdb or the default system locale
set in Windows that is used by the collation routines that you
mentioned above?

I just double checked my config and found that the default locale
is US english. The "supported languages" are:
* Traditional Chinese (default)
* Simplified Chinese
* Western Europe and United States.



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #4  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] equal operator fails on two identical strings if initdb - 11-24-2004 , 09:55 PM



Kent Tong <kent (AT) cpttm (DOT) org.mo> writes:
Quote:
You mean the OS fails to convert unicode strings to Big5 or the
OS assumes the bytes are already in Big5?
The latter.

Quote:
It is the locale used for initdb or the default system locale
set in Windows that is used by the collation routines that you
mentioned above?
The former.

The real problem here, IMHO, is that Postgres allows you to select a
"database encoding" setting that is different from the encoding implied
by the initdb locale (ie, the LC_CTYPE setting). If you make this
mistake, PG will carefully store data byte sequences in the specified
"database encoding" ... and then pass them to strcoll() for comparison
.... and strcoll() will assume that the data is in the encoding
associated with LC_CTYPE.

This is partially bad design on our part (we should really not have
invented a per-database encoding selection when the locale setting is
not per-database) and partially bad design on the part of the C standard
(which doesn't provide any very sane way to find out what encoding is
implied by an LC_CTYPE setting).

I think the only real fix is to abandon the C library's locale routines
and find or write our own library with a better API. This has been on
the TODO list for a long time but no one's quite wished to face up to
doing it ...

In the meantime, make sure your encoding setting agrees with the
LC_CTYPE value that initdb used.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.