dbTalk Databases Forums  

Displaying 'umlaut' character

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss Displaying 'umlaut' character in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
dn.perl@gmail.com
 
Posts: n/a

Default Displaying 'umlaut' character - 09-21-2010 , 11:50 PM






My aim is to display the ‘special’ (NON-Ascii) German character/
diacritic umlaut or diaresis correctly on a browser. The browser calls
a cgi perl-script which resides on a linux server. The browser which
calls the perl-script displays Vietnamese characters correctly (but
not the umlaut) without any special setting. The script sets NLS_LANG
variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
about it.

$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
Works for Vietnamese characters, but not with umlaut (ö).

But even before we get to a perl-script, perhaps the LC_CTYPE env
variable needs to be set correctly. From my windows laptop, if I
access Oracle through Oracle Query Server, I can see the umlaut. But
if I open a linux-window, initiate an sqlplus session, and run the
same SQL, I do not see the umlaut correctly. I have tried a few values
for the env variable LC_CTYPE (like iso_8859_1, en_US,
en_US.iso88591), but with no luck. The surprising thing is that
‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
known. Yet the Vietnamese characters are being displayed correctly.

What settings should I use in a perl-script or for a linux-window to
see the umlaut correctly? Please advise.

Reply With Quote
  #2  
Old   
Ben Morrow
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 01:01 AM






Quoth "dn.perl (AT) gmail (DOT) com" <dn.perl (AT) gmail (DOT) com>:
Quote:
My aim is to display the ‘special’ (NON-Ascii) German character/
diacritic umlaut or diaresis correctly on a browser. The browser calls
a cgi perl-script which resides on a linux server. The browser which
calls the perl-script displays Vietnamese characters correctly (but
not the umlaut) without any special setting. The script sets NLS_LANG
variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
about it.
You almost certainly don't want to do either of those. 'use utf8' does
exactly one thing: it tells Perl your script itself is written in UTF-8.
If that isn't the case you don't want to use it. Perl also doesn't take
any notice of NLS_LANG or any of the other locale envvars unless you ask
it to (and, normally, that's a bad idea). However, it's possible that
whatever database interface you're using does.

Quote:
$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
Works for Vietnamese characters, but not with umlaut (ö).
I don't think that's usually a valid locale on a Linux system. Usually
they are of the form 'en_US.UTF-8', but in any case if you need locales
at all you will want to check which locales are available on your
system.

Quote:
But even before we get to a perl-script, perhaps the LC_CTYPE env
variable needs to be set correctly. From my windows laptop, if I
access Oracle through Oracle Query Server, I can see the umlaut. But
if I open a linux-window, initiate an sqlplus session, and run the
same SQL, I do not see the umlaut correctly. I have tried a few values
for the env variable LC_CTYPE (like iso_8859_1, en_US,
en_US.iso88591), but with no luck. The surprising thing is that
‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
known. Yet the Vietnamese characters are being displayed correctly.

What settings should I use in a perl-script or for a linux-window to
see the umlaut correctly? Please advise.
OK. What is actually stored in the database (what data types are you
using, and how is the data encoded before being stored)? How are you
getting the data out of the database (the only correct answer here is
'DBI', or possibly a wrapper around that)? Have you read the DBI and
DBD::Oracle docs for anything concerning character encodings? Have you
read perlunitut and the other docs that refers you to?

FWIW when I do this sort of thing I use Postgres with DBD::Pg, I set the
database encoding to UTF-8 (this is a Pg-specific feature, but I
wouldn't be surprised if Ora has got something similar), I push an
:encoding(utf8) layer onto any filehandles, I make sure to send a
'Content-type: text/html; charset=utf-8' header, and everything Just
Works. There are variations on that which work just as well, but that's
by far the simplest approach.

Ben

Reply With Quote
  #3  
Old   
Frank van Bortel
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 02:20 AM



On 09/22/2010 06:50 AM, dn.perl (AT) gmail (DOT) com wrote:
Quote:
My aim is to display the ‘special’ (NON-Ascii) German character/
diacritic umlaut or diaresis correctly on a browser. The browser calls
a cgi perl-script which resides on a linux server. The browser which
calls the perl-script displays Vietnamese characters correctly (but
not the umlaut) without any special setting. The script sets NLS_LANG
variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
about it.

$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
Works for Vietnamese characters, but not with umlaut (ö).

But even before we get to a perl-script, perhaps the LC_CTYPE env
variable needs to be set correctly. From my windows laptop, if I
access Oracle through Oracle Query Server, I can see the umlaut. But
if I open a linux-window, initiate an sqlplus session, and run the
same SQL, I do not see the umlaut correctly. I have tried a few values
for the env variable LC_CTYPE (like iso_8859_1, en_US,
en_US.iso88591), but with no luck. The surprising thing is that
‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
known. Yet the Vietnamese characters are being displayed correctly.

What settings should I use in a perl-script or for a linux-window to
see the umlaut correctly? Please advise.

Maybe this helps: (shameless self promotion)
http://vanbortel.blogspot.com/2009/0...rs-part-i.html
Last part is here:
http://vanbortel.blogspot.com/2010/0...s-part-iv.html
--

Regards,

Frank van Bortel

Reply With Quote
  #4  
Old   
Peter J. Holzer
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 02:36 AM



On 2010-09-22 06:01, Ben Morrow <ben (AT) morrow (DOT) me.uk> wrote:
Quote:
Quoth "dn.perl (AT) gmail (DOT) com" <dn.perl (AT) gmail (DOT) com>:
My aim is to display the ‘special’ (NON-Ascii) German character/
diacritic umlaut or diaresis correctly on a browser. The browser calls
a cgi perl-script which resides on a linux server. The browser which
calls the perl-script displays Vietnamese characters correctly (but
not the umlaut) without any special setting. The script sets NLS_LANG
variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
about it.

You almost certainly don't want to do either of those. 'use utf8' does
exactly one thing: it tells Perl your script itself is written in UTF-8.
If that isn't the case you don't want to use it. Perl also doesn't take
any notice of NLS_LANG or any of the other locale envvars unless you ask
it to (and, normally, that's a bad idea). However, it's possible that
whatever database interface you're using does.

$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
Works for Vietnamese characters, but not with umlaut (ö).

I don't think that's usually a valid locale on a Linux system. Usually
they are of the form 'en_US.UTF-8', but in any case if you need locales
at all you will want to check which locales are available on your
system.
The NLS_LANG environment variable is for Oracle. He does need that if he
wants to get anything but US-ASCII out of (or into) an Oracle database.
AMERICAN_AMERICA.UTF8 is a valid locale for Oracle, but for Oracle 9 or
later you should use .AL32UTF8 instead of .UTF8 (.AL32UTF8 is real
UTF-8, .UTF8 is a weird mixture of UTF-8 and UTF-16).



Quote:
But even before we get to a perl-script, perhaps the LC_CTYPE env
variable needs to be set correctly. From my windows laptop, if I
access Oracle through Oracle Query Server, I can see the umlaut. But
if I open a linux-window,
Whatever "a linux window" may be. Putty? An X server? A VM running on
the windows host? Whatever it is, NLS_LANG must match the character set
used by the terminal emulator.

Quote:
initiate an sqlplus session, and run the same SQL, I do not see the
umlaut correctly. I have tried a few values for the env variable
LC_CTYPE (like iso_8859_1, en_US, en_US.iso88591), but with no luck.
The surprising thing is that ‘umalut’ is a muck-known alphabet,
Vietnamese alphabets are less- known. Yet the Vietnamese characters
are being displayed correctly.

What settings should I use in a perl-script or for a linux-window to
see the umlaut correctly? Please advise.

OK. What is actually stored in the database (what data types are you
using, and how is the data encoded before being stored)? How are you
getting the data out of the database (the only correct answer here is
'DBI', or possibly a wrapper around that)? Have you read the DBI and
DBD::Oracle docs for anything concerning character encodings? Have you
read perlunitut and the other docs that refers you to?

FWIW when I do this sort of thing I use Postgres with DBD::Pg, I set the
database encoding to UTF-8 (this is a Pg-specific feature, but I
wouldn't be surprised if Ora has got something similar),
DBD::Oracle does this if NLS_LANG includes a UTF-8-like character set.
Since he has set that correctly he gets wide characters back from the
database. The umlauts all have character codes <= 0xFF, so they can be
printed as a single byte and perl does that. The vietnamese characters
have codes >= 0x0100, so Perl converts them to UTF-8 (I bet he has a lot
of "Wide character in print" warnings in log file).

Quote:
I push an :encoding(utf8) layer onto any filehandles, I make sure to
send a 'Content-type: text/html; charset=utf-8' header, and everything
Just Works. There are variations on that which work just as well, but
that's by far the simplest approach.
ACK. The OP is probably missing the :encoding(utf8) layer.

hp

Reply With Quote
  #5  
Old   
Frank van Bortel
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 03:13 AM



On 09/22/2010 06:50 AM, dn.perl (AT) gmail (DOT) com wrote:
Quote:
My aim is to display the ‘special’ (NON-Ascii) German character/
diacritic umlaut or diaresis correctly on a browser. The browser calls
a cgi perl-script which resides on a linux server. The browser which
calls the perl-script displays Vietnamese characters correctly (but
not the umlaut) without any special setting. The script sets NLS_LANG
variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
about it.

$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
Works for Vietnamese characters, but not with umlaut (ö).

But even before we get to a perl-script, perhaps the LC_CTYPE env
variable needs to be set correctly. From my windows laptop, if I
access Oracle through Oracle Query Server, I can see the umlaut. But
if I open a linux-window, initiate an sqlplus session, and run the
same SQL, I do not see the umlaut correctly. I have tried a few values
for the env variable LC_CTYPE (like iso_8859_1, en_US,
en_US.iso88591), but with no luck. The surprising thing is that
‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
known. Yet the Vietnamese characters are being displayed correctly.

What settings should I use in a perl-script or for a linux-window to
see the umlaut correctly? Please advise.

Apart from what I replied earlier, the correct way to encode
is of course "&ouml;" (without the quotes...)
As this is all ASCII, no problems should arise.
--

Regards,

Frank van Bortel

Reply With Quote
  #6  
Old   
Ben Morrow
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 03:19 AM



Quoth "Peter J. Holzer" <hjp-usenet2 (AT) hjp (DOT) at>:
Quote:
On 2010-09-22 06:01, Ben Morrow <ben (AT) morrow (DOT) me.uk> wrote:

You almost certainly don't want to do either of those. 'use utf8' does
exactly one thing: it tells Perl your script itself is written in UTF-8.
If that isn't the case you don't want to use it. Perl also doesn't take
any notice of NLS_LANG or any of the other locale envvars unless you ask
it to (and, normally, that's a bad idea). However, it's possible that
whatever database interface you're using does.

$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
Works for Vietnamese characters, but not with umlaut (ö).

I don't think that's usually a valid locale on a Linux system. Usually
they are of the form 'en_US.UTF-8', but in any case if you need locales
at all you will want to check which locales are available on your
system.

The NLS_LANG environment variable is for Oracle. He does need that if he
wants to get anything but US-ASCII out of (or into) an Oracle database.
AMERICAN_AMERICA.UTF8 is a valid locale for Oracle, but for Oracle 9 or
later you should use .AL32UTF8 instead of .UTF8 (.AL32UTF8 is real
UTF-8, .UTF8 is a weird mixture of UTF-8 and UTF-16).
Ah, I see. (I don't use Oracle.) I was getting confused with NLSPATH
used by catgets(3), I think.

Weird choice of environment variable: I would expect something prefixed
with OC8 or some such. <shrug> I guess it's just part of the 'we own the
whole world' Oracle mentality...

Quote:
FWIW when I do this sort of thing I use Postgres with DBD::Pg, I set the
database encoding to UTF-8 (this is a Pg-specific feature, but I
wouldn't be surprised if Ora has got something similar),

DBD::Oracle does this if NLS_LANG includes a UTF-8-like character set.
In Pg this is a per-database setting indicating how the strings are
stored as well as how they are returned by default; asking for
per-connection on-the-fly reencoding is different. (Not really important
here, I know.)

Quote:
Since he has set that correctly he gets wide characters back from the
database. The umlauts all have character codes <= 0xFF, so they can be
printed as a single byte and perl does that. The vietnamese characters
have codes >= 0x0100, so Perl converts them to UTF-8 (I bet he has a lot
of "Wide character in print" warnings in log file).
Yup. This presumably means he *is* correctly sending the charset
Content-type parameter, otherwise the situation would be exactly
reversed.

Ben

Reply With Quote
  #7  
Old   
Ben Morrow
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 03:22 AM



Quoth frank.van.bortel (AT) gmail (DOT) com:
Quote:
On 09/22/2010 06:50 AM, dn.perl (AT) gmail (DOT) com wrote:

My aim is to display the ‘special’ (NON-Ascii) German character/
diacritic umlaut or diaresis correctly on a browser. The browser calls
a cgi perl-script which resides on a linux server. The browser which
calls the perl-script displays Vietnamese characters correctly (but
not the umlaut) without any special setting. The script sets NLS_LANG
variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
about it.

$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
Works for Vietnamese characters, but not with umlaut (ö).

But even before we get to a perl-script, perhaps the LC_CTYPE env
variable needs to be set correctly. From my windows laptop, if I
access Oracle through Oracle Query Server, I can see the umlaut. But
if I open a linux-window, initiate an sqlplus session, and run the
same SQL, I do not see the umlaut correctly. I have tried a few values
for the env variable LC_CTYPE (like iso_8859_1, en_US,
en_US.iso88591), but with no luck. The surprising thing is that
‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
known. Yet the Vietnamese characters are being displayed correctly.

What settings should I use in a perl-script or for a linux-window to
see the umlaut correctly? Please advise.


Apart from what I replied earlier, the correct way to encode
is of course "&ouml;" (without the quotes...)
As this is all ASCII, no problems should arise.
Also note that if you push :encoding(US-ASCII) with
$PerlIO::encoding::fallback set to Encode::FB_XMLCREF Perl will do the
conversion for you (well, it'll give you &#xHHHH; entities, but that's
equivalent). (Yes, this is a really nasty interface.)

Ben

Reply With Quote
  #8  
Old   
Peter J. Holzer
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 08:29 AM



On 2010-09-22 08:13, Frank van Bortel <fbortel (AT) home (DOT) nl> wrote:
Quote:
Apart from what I replied earlier, the correct way to encode
is of course "&ouml;" (without the quotes...)
That's not *the* correct way, just *a* correct way. Encoding it in the
charset indicated in the Content-Type header or a meta tag is equally
correct (and preferrable in most cicumstances, IMHO).

hp

Reply With Quote
  #9  
Old   
joel garry
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 11:35 AM



On Sep 22, 12:20*am, Frank van Bortel <fbor... (AT) home (DOT) nl> wrote:
Quote:
On 09/22/2010 06:50 AM, dn.p... (AT) gmail (DOT) com wrote:



My aim is to display the ‘special’ (NON-Ascii) German character/
diacritic umlaut or diaresis correctly on a browser. The browser calls
a cgi perl-script which resides on a linux server. The browser which
calls the perl-script displays Vietnamese characters correctly (but
not the umlaut) without any special setting. The script sets NLS_LANG
variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
about it.

$ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
* * *Works for Vietnamese characters, but not with umlaut (ö).

But even before we get to a perl-script, perhaps the LC_CTYPE env
variable needs to be set correctly. From my windows laptop, if I
access Oracle through Oracle Query Server, I can see the umlaut. But
if I open a linux-window, initiate an sqlplus session, and run the
same SQL, I do not see the umlaut correctly. I have tried a few values
for the env variable LC_CTYPE (like iso_8859_1, en_US,
en_US.iso88591), but with no luck. The surprising thing is that
‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
known. Yet the Vietnamese characters are being displayed correctly.

What settings should I use in a perl-script or for a linux-window to
see the umlaut correctly? Please advise.

Maybe this helps: (shameless self promotion)http://vanbortel.blogspot.com/2009/0...rs-part-i.html
Last part is here:http://vanbortel.blogspot.com/2010/0...s-part-iv.html
Thanks for that Frank, I'm always forgetting where I've seen the
excellent write-up.

It always need to be emphasized that using the wrong database
character set creates a ticking time bomb, as Oracle is so
sophisticated about automatic conversions in various circumstances.

jg
--
@home.com is bogus.
http://www.fastcompany.com/1690122/b...ckberry-google

Reply With Quote
  #10  
Old   
Jürgen Exner
 
Posts: n/a

Default Re: Displaying 'umlaut' character - 09-22-2010 , 08:13 PM



Frank van Bortel <fbortel (AT) home (DOT) nl> wrote:
Quote:
Apart from what I replied earlier, the correct way to encode
is of course "&ouml;" (without the quotes...)
If that were true then I guess we wouldn't need Unicode and all the
gazillion other attempts to represent non-English letters.

jue

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.