dbTalk Databases Forums  

[BUGS] \w doesn't match non-ASCII letters

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] \w doesn't match non-ASCII letters in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Markus Bertheau
 
Posts: n/a

Default [BUGS] \w doesn't match non-ASCII letters - 06-14-2004 , 10:01 AM






oocms=3D# select '=D1=84' ~ '^\\w$';
?column?
----------
f
(1 =D0=B7=D0=B0=D0=BF=D0=B8=D1=81=D1=8C)

or

oocms=3D# select '=C3=A4' ~ '^\\w$';
?column?
----------
f
(1 =D0=B7=D0=B0=D0=BF=D0=B8=D1=81=D1=8C)

both should return true, as does=20

oocms=3D# select 'n' ~ '^\\w$';
?column?
----------
t
(1 =D0=B7=D0=B0=D0=BF=D0=B8=D1=81=D1=8C)

Thanks.

--=20
Markus Bertheau <twanger (AT) bluetwanger (DOT) de>


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org

Reply With Quote
  #2  
Old   
Peter Eisentraut
 
Posts: n/a

Default Re: [BUGS] \w doesn't match non-ASCII letters - 06-14-2004 , 10:20 AM






Markus Bertheau wrote:
Quote:
oocms=# select 'ф' ~ '^\\w$';
?column?
----------
f
(1 запись)
What locale are you using for LC_COLLATE? If it's C or POSIX, you need
to change it and re-initdb.


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Reply With Quote
  #3  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] \w doesn't match non-ASCII letters - 06-14-2004 , 10:31 AM



Peter Eisentraut <peter_e (AT) gmx (DOT) net> writes:
Quote:
Markus Bertheau wrote:
oocms=# select 'ф' ~ '^\\w$';
?column?
----------
f
(1 запись)

What locale are you using for LC_COLLATE? If it's C or POSIX, you need
to change it and re-initdb.
Another likely cause of trouble is that the regexp character
classification stuff is presently based on <ctype.h> functions and thus
cannot work in multibyte encodings.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Reply With Quote
  #4  
Old   
Markus Bertheau
 
Posts: n/a

Default Re: [BUGS] \w doesn't match non-ASCII letters - 06-14-2004 , 11:42 AM



=D0=92 =D0=9F=D0=BD=D0=B4, 14.06.2004, =D0=B2 17:25, Tom Lane =D0=BF=D0=B8=
=D1=88=D0=B5=D1=82:
Quote:
Peter Eisentraut <peter_e (AT) gmx (DOT) net> writes:
Markus Bertheau wrote:
oocms=3D# select '=D1=84' ~ '^\\w$';
?column?
----------
f
(1 =D0=B7=D0=B0=D0=BF=D0=B8=D1=81=D1=8C)
=20
What locale are you using for LC_COLLATE? If it's C or POSIX, you need=
=20
to change it and re-initdb.
=20
Another likely cause of trouble is that the regexp character
classification stuff is presently based on <ctype.h> functions and thus
cannot work in multibyte encodings.
This is in a UTF-8 database, so yes, these are multibyte characters. Is
there something planned to support UTF-8 in regexps?

--=20
Markus Bertheau <twanger (AT) bluetwanger (DOT) de>


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Reply With Quote
  #5  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] \w doesn't match non-ASCII letters - 06-14-2004 , 11:55 AM



Markus Bertheau <twanger (AT) bluetwanger (DOT) de> writes:
Quote:
Is there something planned to support UTF-8 in regexps?
It'd be relatively easy to use the <wctype.h> functions here if we
were convinced that pg_mb2wchar() generated exactly the same
wide-character encoding as the C library is expecting for the current
LC_CTYPE setting. In the absence of such a guarantee I think we'd
have to convert the pg_wchar back to multibyte form and then apply
mbstowcs(), which is rather painful, not least because our wide
character support doesn't seem to have any function for converting
back to multibyte form ...

Tatsuo, any thoughts here?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.