dbTalk Databases Forums  

[BUGS] 8.0b3: problems with ILIKE and ~* on multibyte-chars

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] 8.0b3: problems with ILIKE and ~* on multibyte-chars in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
peter pilsl
 
Posts: n/a

Default [BUGS] 8.0b3: problems with ILIKE and ~* on multibyte-chars - 10-03-2004 , 09:11 AM







8.0 beta3


the caseinsensitive patternmatch-operators seems not to work with
multibyte, while lower() and upper()-functions finally works perfect.

provided the correct locale-setting on initdb (LC_CTYPE=de_AT.UTF-8 in
the example) the following happens:


example :

test=# \l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
test | postgres | UNICODE


test=# \d test
Table "public.test"
Column | Type | Modifiers
--------+------+-----------
t | text |
Indexes:
"t_idx" btree (t)


test=# select t from test;
t
---
ä
Ä
o
O
(4 rows)


test=# select t from test where t~*'ä';
t
---
ä
(1 row)

====> expected is to give 'ä' and 'Ä' as result, cause search is
caseinsesitive and 'Ä' is uppercase of 'ä'


test=# select t from test where t~*'Ä';
t
---
Ä
(1 row)

======> same as above


test=# select t from test where lower(t)~'ä';
t
---
ä
Ä
(2 rows)


=====> the correct result using a workaround



test=# select t from test where t~*'o';
t
---
o
O
(2 rows)

======> it works with "normal" chars



# /usr/local/pgsql8/bin/pg_controldata /data/postgres/postgres8/ | grep LC
LC_COLLATE: de_AT.UTF-8
LC_CTYPE: de_AT.UTF-8



general feedback on multibytes:

things work just great. I put 8.0b3 in productional for my
unicode-databases two days ago, cause lower() finally works. No problems
so far. Things work just great. THNX A LOT !!!

a big featurerequest is different locales for different databases. A
locale LC_COLLATE "de_AT.UTF-8" simply makes no sense for
non-unicode-databases and gives very odd results !!

If one wants to use locales and unicode, he needs to run two instances
of postgres. One for his unicodedatabases on one for its standard
databases to get correct sorting in his locale.


GREAT JOB !! THNX A LOT !!

peter



--
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl (AT) goldfisch (DOT) at

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] 8.0b3: problems with ILIKE and ~* on multibyte-chars - 10-03-2004 , 12:11 PM






peter pilsl <pilsl (AT) goldfisch (DOT) at> writes:
Quote:
the caseinsensitive patternmatch-operators seems not to work with
multibyte, while lower() and upper()-functions finally works perfect.
It looks to me like iwchareq() in src/backend/utils/adt/like.c still needs
work to handle multibyte characters in a sane fashion. Feel free to
submit a patch ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo (AT) postgresql (DOT) org)


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.