![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||||
| |||||
|
|
maybe one of you can explain a phenomenon we observe: We are using an Oracle 9i database on a Solaris server as part of a web based system with both Perl and Java programs accessing the database. The database uses the WEISO8859P1 (Latin-1) character set, in which the Euro sign does not exist. Nevertheless, the system does handle the Euro sign. We can insert Euro signs through web forms and get them properly displayed in web pages generated from database queries - as long as done through Perl programs. |
|
When doing queries from Java, the situation is different: - For VARCHAR fields, the ResultSet method getString() yields 0x0080 for a Euro sign (which is the Windows-1252 encoding). |
|
- For CLOBs (CLOB object obtained by the OracleResultSet method getCLOB()), the result depends on the kind of access: - When using getAsciiStream() and creating an InputStreamReader with encoding ISO-8859-1 from it, we get 0x00FD for a Euro sign. |
|
- When using getCharacterStream(), we get 0xFFFD (which is the Unicode representation of an invalid character). |
|
_|_) | Sysadmin WSR | ignorance has no successful track record. | | hjp (AT) hjp (DOT) at | __/ | http://www.hjp.at/ | -- Bill Code on asrg (AT) irtf (DOT) org |
#3
| |||
| |||
|
|
Dear Oracle experts, maybe one of you can explain a phenomenon we observe: We are using an Oracle 9i database on a Solaris server as part of a web based system with both Perl and Java programs accessing the database. The database uses the WEISO8859P1 (Latin-1) character set, in which the Euro sign does not exist. Nevertheless, the system does handle the Euro sign. We can insert Euro signs through web forms and get them properly displayed in web pages generated from database queries - as long as done through Perl programs. When doing queries from Java, the situation is different: - For VARCHAR fields, the ResultSet method getString() yields 0x0080 for a Euro sign (which is the Windows-1252 encoding). - For CLOBs (CLOB object obtained by the OracleResultSet method getCLOB()), the result depends on the kind of access: - When using getAsciiStream() and creating an InputStreamReader with encoding ISO-8859-1 from it, we get 0x00FD for a Euro sign. - When using getCharacterStream(), we get 0xFFFD (which is the Unicode representation of an invalid character). What does the Oracle database actually do with the Euro signs? How can the observed effects be explained? Franz |
#4
| |||||
| |||||
|
|
We are using an Oracle 9i database on a Solaris server as part of a web based system with both Perl and Java programs accessing the database. The database uses the WEISO8859P1 (Latin-1) character set, in which the Euro sign does not exist. Nevertheless, the system does handle the Euro sign. We can insert Euro signs through web forms and get them properly displayed in web pages generated from database queries - as long as done through Perl programs. Let me guess: 1) The web forms have the content type "text/html; charset=iso-8859-1". 2) The Perl scripts have NLS_LANG set to WEISO8859P1 and just shuffle byte strings between the database and the browser. When doing queries from Java, the situation is different: - For VARCHAR fields, the ResultSet method getString() yields 0x0080 for a Euro sign (which is the Windows-1252 encoding). For compatibility with Internet Explorer, most browsers assume charset=windows-1252 when they encounter charset=iso-8859-1. Therefore, when a user enters a Euro sign in a form, the browser will transmit it as "%80" (instead of rejecting it). Similarly, when the browser reads the byte 0x80 in a page, it will display a Euro sign (instead of an "unknown character" symbol). So the Perl code will receive a character 0x80 from the form, store it in the database, retrieve it again, send it to the browser and the browser will display a Euro sign. As long as the Perl code doesn't try to do anything with that character it won't notice that it isn't really a Euro sign. |
|
That would probably work in Java, too, if you just passed on that 0x0080 character to the browser and claimed that the encoding is ISO-8859-1. |
|
Since you noticed that it isn't working, you are probably doing something different - maybe use utf8 encoding, maybe just some more complex processing. |
|
- For CLOBs (CLOB object obtained by the OracleResultSet method getCLOB()), the result depends on the kind of access: - When using getAsciiStream() and creating an InputStreamReader with encoding ISO-8859-1 from it, we get 0x00FD for a Euro sign. I assume that 0x00FD is just 0xFFFD truncated to 8 bits. |
|
- When using getCharacterStream(), we get 0xFFFD (which is the Unicode representation of an invalid character). This is strange. It should also be 0x0080, it that is what is stored in the database (did you check that? - use Perl or the SQL dump() function). |
![]() |
| Thread Tools | |
| Display Modes | |
| |