dbTalk Databases Forums  

[BUGS] BUG #2592: ILIKE does not care about locales

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #2592: ILIKE does not care about locales in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Robert Siemer
 
Posts: n/a

Default [BUGS] BUG #2592: ILIKE does not care about locales - 08-29-2006 , 05:48 AM







The following bug has been logged online:

Bug reference: 2592
Logged by: Robert Siemer
Email address: Robert.Siemer-postgresql.org (AT) backsla (DOT) sh
PostgreSQL version: 8.1.4
Operating system: Linux
Description: ILIKE does not care about locales
Details:

Hi!

As I don't want to risk getting things mixed up here in this very report,
lets assume:
s and t are strings with one 'international' character, one having the lower
case the other upper

lower(s) LIKE lower(t) yields True, as it should

s ILIKE t yields False --> I expect True


I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8

dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0
having the same problems.

Some "non-normative" examples for s and t:
http://rafb.net/paste/results/bMRfez77.html
and ä Ä, ñ Ñ, ö Ö

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply With Quote
  #2  
Old   
Tino Schwarze
 
Posts: n/a

Default Re: [BUGS] BUG #2592: ILIKE does not care about locales - 08-29-2006 , 08:06 AM






On Sun, Aug 27, 2006 at 12:58:00PM +0000, Robert Siemer wrote:

Quote:
Bug reference: 2592
Logged by: Robert Siemer
Email address: Robert.Siemer-postgresql.org (AT) backsla (DOT) sh
PostgreSQL version: 8.1.4
Operating system: Linux
Description: ILIKE does not care about locales
Details:

Hi!

As I don't want to risk getting things mixed up here in this very report,
lets assume:
s and t are strings with one 'international' character, one having the lower
case the other upper

lower(s) LIKE lower(t) yields True, as it should

s ILIKE t yields False --> I expect True


I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8

dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0
having the same problems.

Some "non-normative" examples for s and t:
http://rafb.net/paste/results/bMRfez77.html
and ä Ä, ñ Ñ, ö Ö
I can confirm this with de_DE.utf8. We currently initialize all our
PostgreSQL database clusters like this:
unset LANG
export LC_ALL=POSIX
initdb --encoding="UNICODE" --lc-collate="de_DE.utf8" --lc-ctype="de_DE@euro"

Only this way, ILIKE and "ORDER BY" work as expected. I don't know about
upper() and lower() though; I only tested ILIKE and ORDER BY.

Tino.

PS: Is there a place to search bugs? I couldn't find one (apart from the
mailing list) last time I was troubleshooting some problem.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #3  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2592: ILIKE does not care about locales - 08-29-2006 , 08:31 AM



Tino Schwarze <tino.schwarze (AT) tisc (DOT) de> writes:
Quote:
On Sun, Aug 27, 2006 at 12:58:00PM +0000, Robert Siemer wrote:
I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8

dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0
having the same problems.

I can confirm this with de_DE.utf8.
ilike currently doesn't work for multibyte encodings (eg utf8). This
bug has been known for a long while, eg,

http://archives.postgresql.org/pgsql...0/msg00002.php

but no one's stepped up to fix it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #4  
Old   
Tino Schwarze
 
Posts: n/a

Default Re: [BUGS] BUG #2592: ILIKE does not care about locales - 08-29-2006 , 08:43 AM



On Tue, Aug 29, 2006 at 09:27:59AM -0400, Tom Lane wrote:
Quote:
Tino Schwarze <tino.schwarze (AT) tisc (DOT) de> writes:
On Sun, Aug 27, 2006 at 12:58:00PM +0000, Robert Siemer wrote:
I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8

dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0
having the same problems.

I can confirm this with de_DE.utf8.

ilike currently doesn't work for multibyte encodings (eg utf8). This
bug has been known for a long while, eg,

http://archives.postgresql.org/pgsql...0/msg00002.php

but no one's stepped up to fix it.
The "use lower() for both strings" solution sounds reasonably simple to
me, but I'm not familiar with PgSQLs sources...

The funny thing is: It currently works, if you set the locale to de_DE
(non-UTF8)!

pg_controldata output:
LC_COLLATE: de_DE.utf8
LC_CTYPE: de_DE@euro

Output from test database:
test=# select * from test where test ilike '%ä%';
test
---------
äbcd
Äbbcd
bläbbcd
BLÄbbcd
(4 rows)

test=# select * from test where test ilike '%Ä%';
test
---------
äbcd
Äbbcd
bläbbcd
BLÄbbcd
(4 rows)

-> same result, both upper and lower ä match.

test=# select * from test where test ilike '%ö%';
test
------
ö
Ö
(2 rows)

test=# select * from test where test ilike '%Ö%';
test
------
ö
Ö
(2 rows)

test=# select * from test order by test desc;
test
---------
Ü
ü
Ö
ö
MÜLLER
müller
BLÄbbcd
bläbbcd
afbcd
aebcd
äbcd
abcd
Äbbcd
aabcd
(14 rows)


Everything is fine with this weird locale setting. (All other
locales are set to C)...

Bye,

Tino.


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.