![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
My aim is to display the ‘special’ (NON-Ascii) German character/ diacritic umlaut or diaresis correctly on a browser. The browser calls a cgi perl-script which resides on a linux server. The browser which calls the perl-script displays Vietnamese characters correctly (but not the umlaut) without any special setting. The script sets NLS_LANG variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s about it. |
|
$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8'; Works for Vietnamese characters, but not with umlaut (ö). |
|
But even before we get to a perl-script, perhaps the LC_CTYPE env variable needs to be set correctly. From my windows laptop, if I access Oracle through Oracle Query Server, I can see the umlaut. But if I open a linux-window, initiate an sqlplus session, and run the same SQL, I do not see the umlaut correctly. I have tried a few values for the env variable LC_CTYPE (like iso_8859_1, en_US, en_US.iso88591), but with no luck. The surprising thing is that ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less- known. Yet the Vietnamese characters are being displayed correctly. What settings should I use in a perl-script or for a linux-window to see the umlaut correctly? Please advise. |
#3
| |||
| |||
|
|
My aim is to display the ‘special’ (NON-Ascii) German character/ diacritic umlaut or diaresis correctly on a browser. The browser calls a cgi perl-script which resides on a linux server. The browser which calls the perl-script displays Vietnamese characters correctly (but not the umlaut) without any special setting. The script sets NLS_LANG variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s about it. $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8'; Works for Vietnamese characters, but not with umlaut (ö). But even before we get to a perl-script, perhaps the LC_CTYPE env variable needs to be set correctly. From my windows laptop, if I access Oracle through Oracle Query Server, I can see the umlaut. But if I open a linux-window, initiate an sqlplus session, and run the same SQL, I do not see the umlaut correctly. I have tried a few values for the env variable LC_CTYPE (like iso_8859_1, en_US, en_US.iso88591), but with no luck. The surprising thing is that ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less- known. Yet the Vietnamese characters are being displayed correctly. What settings should I use in a perl-script or for a linux-window to see the umlaut correctly? Please advise. Maybe this helps: (shameless self promotion) |
#4
| ||||
| ||||
|
|
Quoth "dn.perl (AT) gmail (DOT) com" <dn.perl (AT) gmail (DOT) com>: My aim is to display the ‘special’ (NON-Ascii) German character/ diacritic umlaut or diaresis correctly on a browser. The browser calls a cgi perl-script which resides on a linux server. The browser which calls the perl-script displays Vietnamese characters correctly (but not the umlaut) without any special setting. The script sets NLS_LANG variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s about it. You almost certainly don't want to do either of those. 'use utf8' does exactly one thing: it tells Perl your script itself is written in UTF-8. If that isn't the case you don't want to use it. Perl also doesn't take any notice of NLS_LANG or any of the other locale envvars unless you ask it to (and, normally, that's a bad idea). However, it's possible that whatever database interface you're using does. $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8'; Works for Vietnamese characters, but not with umlaut (ö). I don't think that's usually a valid locale on a Linux system. Usually they are of the form 'en_US.UTF-8', but in any case if you need locales at all you will want to check which locales are available on your system. |
|
But even before we get to a perl-script, perhaps the LC_CTYPE env variable needs to be set correctly. From my windows laptop, if I access Oracle through Oracle Query Server, I can see the umlaut. But if I open a linux-window, |
|
initiate an sqlplus session, and run the same SQL, I do not see the umlaut correctly. I have tried a few values for the env variable LC_CTYPE (like iso_8859_1, en_US, en_US.iso88591), but with no luck. The surprising thing is that ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less- known. Yet the Vietnamese characters are being displayed correctly. What settings should I use in a perl-script or for a linux-window to see the umlaut correctly? Please advise. OK. What is actually stored in the database (what data types are you using, and how is the data encoded before being stored)? How are you getting the data out of the database (the only correct answer here is 'DBI', or possibly a wrapper around that)? Have you read the DBI and DBD::Oracle docs for anything concerning character encodings? Have you read perlunitut and the other docs that refers you to? FWIW when I do this sort of thing I use Postgres with DBD::Pg, I set the database encoding to UTF-8 (this is a Pg-specific feature, but I wouldn't be surprised if Ora has got something similar), |
|
I push an :encoding(utf8) layer onto any filehandles, I make sure to send a 'Content-type: text/html; charset=utf-8' header, and everything Just Works. There are variations on that which work just as well, but that's by far the simplest approach. |
#5
| |||
| |||
|
|
My aim is to display the ‘special’ (NON-Ascii) German character/ diacritic umlaut or diaresis correctly on a browser. The browser calls a cgi perl-script which resides on a linux server. The browser which calls the perl-script displays Vietnamese characters correctly (but not the umlaut) without any special setting. The script sets NLS_LANG variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s about it. $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8'; Works for Vietnamese characters, but not with umlaut (ö). But even before we get to a perl-script, perhaps the LC_CTYPE env variable needs to be set correctly. From my windows laptop, if I access Oracle through Oracle Query Server, I can see the umlaut. But if I open a linux-window, initiate an sqlplus session, and run the same SQL, I do not see the umlaut correctly. I have tried a few values for the env variable LC_CTYPE (like iso_8859_1, en_US, en_US.iso88591), but with no luck. The surprising thing is that ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less- known. Yet the Vietnamese characters are being displayed correctly. What settings should I use in a perl-script or for a linux-window to see the umlaut correctly? Please advise. |
#6
| |||
| |||
|
|
On 2010-09-22 06:01, Ben Morrow <ben (AT) morrow (DOT) me.uk> wrote: You almost certainly don't want to do either of those. 'use utf8' does exactly one thing: it tells Perl your script itself is written in UTF-8. If that isn't the case you don't want to use it. Perl also doesn't take any notice of NLS_LANG or any of the other locale envvars unless you ask it to (and, normally, that's a bad idea). However, it's possible that whatever database interface you're using does. $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8'; Works for Vietnamese characters, but not with umlaut (ö). I don't think that's usually a valid locale on a Linux system. Usually they are of the form 'en_US.UTF-8', but in any case if you need locales at all you will want to check which locales are available on your system. The NLS_LANG environment variable is for Oracle. He does need that if he wants to get anything but US-ASCII out of (or into) an Oracle database. AMERICAN_AMERICA.UTF8 is a valid locale for Oracle, but for Oracle 9 or later you should use .AL32UTF8 instead of .UTF8 (.AL32UTF8 is real UTF-8, .UTF8 is a weird mixture of UTF-8 and UTF-16). |

|
FWIW when I do this sort of thing I use Postgres with DBD::Pg, I set the database encoding to UTF-8 (this is a Pg-specific feature, but I wouldn't be surprised if Ora has got something similar), DBD::Oracle does this if NLS_LANG includes a UTF-8-like character set. |
|
Since he has set that correctly he gets wide characters back from the database. The umlauts all have character codes <= 0xFF, so they can be printed as a single byte and perl does that. The vietnamese characters have codes >= 0x0100, so Perl converts them to UTF-8 (I bet he has a lot of "Wide character in print" warnings in log file). |
#7
| |||
| |||
|
|
On 09/22/2010 06:50 AM, dn.perl (AT) gmail (DOT) com wrote: My aim is to display the ‘special’ (NON-Ascii) German character/ diacritic umlaut or diaresis correctly on a browser. The browser calls a cgi perl-script which resides on a linux server. The browser which calls the perl-script displays Vietnamese characters correctly (but not the umlaut) without any special setting. The script sets NLS_LANG variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s about it. $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8'; Works for Vietnamese characters, but not with umlaut (ö). But even before we get to a perl-script, perhaps the LC_CTYPE env variable needs to be set correctly. From my windows laptop, if I access Oracle through Oracle Query Server, I can see the umlaut. But if I open a linux-window, initiate an sqlplus session, and run the same SQL, I do not see the umlaut correctly. I have tried a few values for the env variable LC_CTYPE (like iso_8859_1, en_US, en_US.iso88591), but with no luck. The surprising thing is that ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less- known. Yet the Vietnamese characters are being displayed correctly. What settings should I use in a perl-script or for a linux-window to see the umlaut correctly? Please advise. Apart from what I replied earlier, the correct way to encode is of course "ö" (without the quotes...) As this is all ASCII, no problems should arise. |
#8
| |||
| |||
|
|
Apart from what I replied earlier, the correct way to encode is of course "ö" (without the quotes...) |
#9
| |||
| |||
|
|
On 09/22/2010 06:50 AM, dn.p... (AT) gmail (DOT) com wrote: My aim is to display the ‘special’ (NON-Ascii) German character/ diacritic umlaut or diaresis correctly on a browser. The browser calls a cgi perl-script which resides on a linux server. The browser which calls the perl-script displays Vietnamese characters correctly (but not the umlaut) without any special setting. The script sets NLS_LANG variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s about it. $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8'; * * *Works for Vietnamese characters, but not with umlaut (ö). But even before we get to a perl-script, perhaps the LC_CTYPE env variable needs to be set correctly. From my windows laptop, if I access Oracle through Oracle Query Server, I can see the umlaut. But if I open a linux-window, initiate an sqlplus session, and run the same SQL, I do not see the umlaut correctly. I have tried a few values for the env variable LC_CTYPE (like iso_8859_1, en_US, en_US.iso88591), but with no luck. The surprising thing is that ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less- known. Yet the Vietnamese characters are being displayed correctly. What settings should I use in a perl-script or for a linux-window to see the umlaut correctly? Please advise. Maybe this helps: (shameless self promotion)http://vanbortel.blogspot.com/2009/0...rs-part-i.html Last part is here:http://vanbortel.blogspot.com/2010/0...s-part-iv.html |
#10
| |||
| |||
|
|
Apart from what I replied earlier, the correct way to encode is of course "ö" (without the quotes...) |
![]() |
| Thread Tools | |
| Display Modes | |
| |