![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
I am also wondering, now that it's been raised, if we need to issue a "use utf8;" in the startup code, so that literals in the code get the right encoding. |
#2
| |||
| |||
|
|
Tom Lane wrote: I wrote: It looks to me like basically everywhere in plperl.c that does newSVpv() should follow it with #if PERL_BCDVERSION >= 0x5006000L if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv); #endif Experimentation proved that this was insufficient to fix Vitali's problem --- the string he's unhappy about is actually a hash key entry, and there's no documented way to mark the second argument of hv_store() as being a UTF-8 string. Some digging in the Perl source code found that since at least Perl 5.8.0, hv_fetch and hv_store recognize a negative key length as meaning a UTF-8 key (ick!!), so I used that hack. I am not sure there is any reasonable fix available in Perl 5.6.x. Attached patch applied to HEAD, but I'm not going to risk back-patching it without some field testing. Hmm. That negative pointer hack is mighty ugly. I am also wondering, now that it's been raised, if we need to issue a "use utf8;" in the startup code, so that literals in the code get the right encoding. |
#3
| |||
| |||
|
|
I wrote: It looks to me like basically everywhere in plperl.c that does newSVpv() should follow it with #if PERL_BCDVERSION >= 0x5006000L if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv); #endif Experimentation proved that this was insufficient to fix Vitali's problem --- the string he's unhappy about is actually a hash key entry, and there's no documented way to mark the second argument of hv_store() as being a UTF-8 string. Some digging in the Perl source code found that since at least Perl 5.8.0, hv_fetch and hv_store recognize a negative key length as meaning a UTF-8 key (ick!!), so I used that hack. I am not sure there is any reasonable fix available in Perl 5.6.x. Attached patch applied to HEAD, but I'm not going to risk back-patching it without some field testing. |
#4
| |||
| |||
|
|
It's clear whether you actually want to allow people to put utf8 characters directly into their source (especially if the database is not in utf8 encoding anyway). There is always the \u{xxxx} escape. |
#5
| |||
| |||
|
|
"Andrew Dunstan" <andrew (AT) dunslane (DOT) net> writes: I am also wondering, now that it's been raised, if we need to issue a "= use utf8;" in the startup code, so that literals in the code get the right encoding. =20 Good question. I took care to ensure that the code strings passed to Perl are marked as UTF8; perhaps that makes it happen implicitly? If not, are there any downsides to issuing "use utf8"? |
|
From each according to his ability. To each according to his ability to l= itigate. |
![]() |
| Thread Tools | |
| Display Modes | |
| |