dbTalk Databases Forums  

About upper() and lower to handle multibyte char

comp.databases.postgresql.general comp.databases.postgresql.general


Discuss About upper() and lower to handle multibyte char in the comp.databases.postgresql.general forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Weiping
 
Posts: n/a

Default About upper() and lower to handle multibyte char - 10-19-2004 , 08:52 AM






Hi,

while upgrade to 8.0 (beta3) we got some problem:

we have a database which encoding is UNICODE,
when we do queries like:
select upper('中文'); --select some multibyte character,
then postgresql response:

ERROR: invalid multibyte character for locale

but when we do it in a SQL_ASCII encoding database,
it's ok and return unchanged string, that's what we think correct result.

I've searched the archive and found that in 8.0, the upper()/lower()
function have been changed to could handle multibyte character,
but, what's the expected behavior of these two function in coping with
multibyte character?

Another question: from the archive, I know that on system with
<wctype.h> toupper/tolower functions, the postgresql would support
multibyte upper/lower function; my system (slackware 10) got <wctype.h>,
but why still I get the ERROR? How can I check if my postgresql installation
come with multibyte upper/lower support?

The problem make us very difficlut when using upper/lower to deal with
columns with more then one encoding char, like Chinese and English char
in Unicode
database, because the transaction would abort with the error above, that
breaks
our application a lot.

Thanks and any help would be appreciated

Laser


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: About upper() and lower to handle multibyte char - 10-19-2004 , 10:05 AM






Weiping <laser (AT) qmail (DOT) zhengmai.net.cn> writes:
Quote:
we have a database which encoding is UNICODE,
when we do queries like:
select upper('中文'); --select some multibyte character,
then postgresql response:

ERROR: invalid multibyte character for locale
What locale did you initdb in? The most likely explanation for this
is that the LC_CTYPE setting is not unicode-compatible.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings



Reply With Quote
  #3  
Old   
Weiping
 
Posts: n/a

Default Re: About upper() and lower to handle multibyte char - 10-19-2004 , 07:49 PM



Tom Lane wrote:

Quote:
What locale did you initdb in? The most likely explanation for this
is that the LC_CTYPE setting is not unicode-compatible.



emm, I initdb --no-locale, which means LC_CTYPE=C, but if I don't use it
there are
some other issue in multibyte comparing (= operator) operation, will try
again.

Thanks!

Laser

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply With Quote
  #4  
Old   
Weiping
 
Posts: n/a

Default Re: About upper() and lower to handle multibyte char - 10-19-2004 , 10:37 PM



Weiping wrote:

Quote:
Tom Lane wrote:

What locale did you initdb in? The most likely explanation for this
is that the LC_CTYPE setting is not unicode-compatible.



finally I get it work, while initdb, we should use matched locale
setting and database encoding, like:

initdb --locale=zh_CN.utf8 -E UNICODE ...

then everything ok (on my platform: slackware 10 and RH9).

Emm, I think it's better to add some words in our docs to tell the uesr
to do so,
because we always to use --no-locale while initdb, because the default
locale
setting of many Linux destro (normally en_US), would cause the multibyte
character compare operaction fail (like "select '一' = '二'", that's
"select 'one'='two'" in Chinese,
but it return true), and we use UNICODE as database encoding to store
multi-language characters
(like Japanese and Korean), don't know if the locale setting
(zh_CN.utf8) would conflict with
those setting.

Any better suggestion?

Thanks

Laser




---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.