dbTalk Databases Forums  

Re: How to Output Query Result as UTF-16 XML

comp.databases.oracle.tools comp.databases.oracle.tools


Discuss Re: How to Output Query Result as UTF-16 XML in the comp.databases.oracle.tools forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Malcolm Dew-Jones
 
Posts: n/a

Default Re: How to Output Query Result as UTF-16 XML - 06-10-2008 , 08:31 PM






Bill Wordsworth (bill.wordsworth (AT) gmail (DOT) com) wrote:
: This is what I have done so far:

: In Stored Procedure, I set-
: * v_xml OUT NVARCHAR2
: * v_xml := '<?xml version="1.0" encoding="UTF-16" ?>...'; -- without
: any chr(10) newline

: In Application, I set-
: * header('Content-Type: application/xml; charset=UTF-16');
: * mb_convert_encoding($v_xml, "UTF-16");

You can convert the data all you like while it is in the database
environment, but as soon the data is being extracted from the database
environment into the layer that is retrieving the data from the database,
the character data will be converted from what ever it was in the database
into the charset of the calling environment.

If you are using the application server then the NLS_LANG set at the
operating system level before apache is started will control that final
implicit conversion. Other tools likewise.

Presumably if you store the XML as a BLOB then the bytes in the blob will
be extracted without that implicit conversion. In that case you could
send it to the client as-is and the client would see exactly the byte
sequence you prepared for them in the database (you would of course ensure
the bytes were valid utf-16).

In my experience, when the data is extracted than Oracle insists on doing
this implicit conversion, but if you have control over the environment
then you maybe able to set the settings you need.

Perhaps using JAVA in the database to extract the data, for example into a
file, writing the bytes yourself, thereby bypassing what oracle does,
would get around this problem. Perhaps using the C interface and
extracting the data using a C program (perhaps running on the application
server as a CGI program for example), would provide more flexibility.
Perhaps there is some setting I do not know about in the normal oracle
tools that move data from the database into the "outside world".

Otherwise, you can do something like what I do with utf-8 xml. You
prepare your data (outside the database) using the correct byte sequences
to be utf-8. Then you load it into the database using the same NLS_LANG
setting as the applicaiton server will use to extract it later. Then you
manipulate the data in the data base, _almost_ as normal. The _BIG_
caveat is that your manipulations must not rearrange the bytes in a way
that messes up the byte sequences. In utf-8 using xml that is not as hard
as it sounds. As one of many examples, attribute names can be manipulated
as long as the name is always treated as a single unit, because the
original byte sequence within the name will be unaltered. Then you
extract the data. The database "converts" the data using the reverse of
the original tranformation, but the result is simply to restore all the
original byte sequences. The tool that receives the data doesn't care
what charset the database thought it was sending - as long as the bytes
are valid for what it thinks it is receiving.

utf-16 would be harder but the underlying principle would be the same - as
long as your manipulations keep the required bytes together then the final
extraction will convert the data back to the original (valid) utf-16.

$0.25


Reply With Quote
  #2  
Old   
Malcolm Dew-Jones
 
Posts: n/a

Default Re: How to Output Query Result as UTF-16 XML - 06-10-2008 , 08:31 PM






Bill Wordsworth (bill.wordsworth (AT) gmail (DOT) com) wrote:
: This is what I have done so far:

: In Stored Procedure, I set-
: * v_xml OUT NVARCHAR2
: * v_xml := '<?xml version="1.0" encoding="UTF-16" ?>...'; -- without
: any chr(10) newline

: In Application, I set-
: * header('Content-Type: application/xml; charset=UTF-16');
: * mb_convert_encoding($v_xml, "UTF-16");

You can convert the data all you like while it is in the database
environment, but as soon the data is being extracted from the database
environment into the layer that is retrieving the data from the database,
the character data will be converted from what ever it was in the database
into the charset of the calling environment.

If you are using the application server then the NLS_LANG set at the
operating system level before apache is started will control that final
implicit conversion. Other tools likewise.

Presumably if you store the XML as a BLOB then the bytes in the blob will
be extracted without that implicit conversion. In that case you could
send it to the client as-is and the client would see exactly the byte
sequence you prepared for them in the database (you would of course ensure
the bytes were valid utf-16).

In my experience, when the data is extracted than Oracle insists on doing
this implicit conversion, but if you have control over the environment
then you maybe able to set the settings you need.

Perhaps using JAVA in the database to extract the data, for example into a
file, writing the bytes yourself, thereby bypassing what oracle does,
would get around this problem. Perhaps using the C interface and
extracting the data using a C program (perhaps running on the application
server as a CGI program for example), would provide more flexibility.
Perhaps there is some setting I do not know about in the normal oracle
tools that move data from the database into the "outside world".

Otherwise, you can do something like what I do with utf-8 xml. You
prepare your data (outside the database) using the correct byte sequences
to be utf-8. Then you load it into the database using the same NLS_LANG
setting as the applicaiton server will use to extract it later. Then you
manipulate the data in the data base, _almost_ as normal. The _BIG_
caveat is that your manipulations must not rearrange the bytes in a way
that messes up the byte sequences. In utf-8 using xml that is not as hard
as it sounds. As one of many examples, attribute names can be manipulated
as long as the name is always treated as a single unit, because the
original byte sequence within the name will be unaltered. Then you
extract the data. The database "converts" the data using the reverse of
the original tranformation, but the result is simply to restore all the
original byte sequences. The tool that receives the data doesn't care
what charset the database thought it was sending - as long as the bytes
are valid for what it thinks it is receiving.

utf-16 would be harder but the underlying principle would be the same - as
long as your manipulations keep the required bytes together then the final
extraction will convert the data back to the original (valid) utf-16.

$0.25


Reply With Quote
  #3  
Old   
Malcolm Dew-Jones
 
Posts: n/a

Default Re: How to Output Query Result as UTF-16 XML - 06-10-2008 , 08:31 PM



Bill Wordsworth (bill.wordsworth (AT) gmail (DOT) com) wrote:
: This is what I have done so far:

: In Stored Procedure, I set-
: * v_xml OUT NVARCHAR2
: * v_xml := '<?xml version="1.0" encoding="UTF-16" ?>...'; -- without
: any chr(10) newline

: In Application, I set-
: * header('Content-Type: application/xml; charset=UTF-16');
: * mb_convert_encoding($v_xml, "UTF-16");

You can convert the data all you like while it is in the database
environment, but as soon the data is being extracted from the database
environment into the layer that is retrieving the data from the database,
the character data will be converted from what ever it was in the database
into the charset of the calling environment.

If you are using the application server then the NLS_LANG set at the
operating system level before apache is started will control that final
implicit conversion. Other tools likewise.

Presumably if you store the XML as a BLOB then the bytes in the blob will
be extracted without that implicit conversion. In that case you could
send it to the client as-is and the client would see exactly the byte
sequence you prepared for them in the database (you would of course ensure
the bytes were valid utf-16).

In my experience, when the data is extracted than Oracle insists on doing
this implicit conversion, but if you have control over the environment
then you maybe able to set the settings you need.

Perhaps using JAVA in the database to extract the data, for example into a
file, writing the bytes yourself, thereby bypassing what oracle does,
would get around this problem. Perhaps using the C interface and
extracting the data using a C program (perhaps running on the application
server as a CGI program for example), would provide more flexibility.
Perhaps there is some setting I do not know about in the normal oracle
tools that move data from the database into the "outside world".

Otherwise, you can do something like what I do with utf-8 xml. You
prepare your data (outside the database) using the correct byte sequences
to be utf-8. Then you load it into the database using the same NLS_LANG
setting as the applicaiton server will use to extract it later. Then you
manipulate the data in the data base, _almost_ as normal. The _BIG_
caveat is that your manipulations must not rearrange the bytes in a way
that messes up the byte sequences. In utf-8 using xml that is not as hard
as it sounds. As one of many examples, attribute names can be manipulated
as long as the name is always treated as a single unit, because the
original byte sequence within the name will be unaltered. Then you
extract the data. The database "converts" the data using the reverse of
the original tranformation, but the result is simply to restore all the
original byte sequences. The tool that receives the data doesn't care
what charset the database thought it was sending - as long as the bytes
are valid for what it thinks it is receiving.

utf-16 would be harder but the underlying principle would be the same - as
long as your manipulations keep the required bytes together then the final
extraction will convert the data back to the original (valid) utf-16.

$0.25


Reply With Quote
  #4  
Old   
Malcolm Dew-Jones
 
Posts: n/a

Default Re: How to Output Query Result as UTF-16 XML - 06-10-2008 , 08:31 PM



Bill Wordsworth (bill.wordsworth (AT) gmail (DOT) com) wrote:
: This is what I have done so far:

: In Stored Procedure, I set-
: * v_xml OUT NVARCHAR2
: * v_xml := '<?xml version="1.0" encoding="UTF-16" ?>...'; -- without
: any chr(10) newline

: In Application, I set-
: * header('Content-Type: application/xml; charset=UTF-16');
: * mb_convert_encoding($v_xml, "UTF-16");

You can convert the data all you like while it is in the database
environment, but as soon the data is being extracted from the database
environment into the layer that is retrieving the data from the database,
the character data will be converted from what ever it was in the database
into the charset of the calling environment.

If you are using the application server then the NLS_LANG set at the
operating system level before apache is started will control that final
implicit conversion. Other tools likewise.

Presumably if you store the XML as a BLOB then the bytes in the blob will
be extracted without that implicit conversion. In that case you could
send it to the client as-is and the client would see exactly the byte
sequence you prepared for them in the database (you would of course ensure
the bytes were valid utf-16).

In my experience, when the data is extracted than Oracle insists on doing
this implicit conversion, but if you have control over the environment
then you maybe able to set the settings you need.

Perhaps using JAVA in the database to extract the data, for example into a
file, writing the bytes yourself, thereby bypassing what oracle does,
would get around this problem. Perhaps using the C interface and
extracting the data using a C program (perhaps running on the application
server as a CGI program for example), would provide more flexibility.
Perhaps there is some setting I do not know about in the normal oracle
tools that move data from the database into the "outside world".

Otherwise, you can do something like what I do with utf-8 xml. You
prepare your data (outside the database) using the correct byte sequences
to be utf-8. Then you load it into the database using the same NLS_LANG
setting as the applicaiton server will use to extract it later. Then you
manipulate the data in the data base, _almost_ as normal. The _BIG_
caveat is that your manipulations must not rearrange the bytes in a way
that messes up the byte sequences. In utf-8 using xml that is not as hard
as it sounds. As one of many examples, attribute names can be manipulated
as long as the name is always treated as a single unit, because the
original byte sequence within the name will be unaltered. Then you
extract the data. The database "converts" the data using the reverse of
the original tranformation, but the result is simply to restore all the
original byte sequences. The tool that receives the data doesn't care
what charset the database thought it was sending - as long as the bytes
are valid for what it thinks it is receiving.

utf-16 would be harder but the underlying principle would be the same - as
long as your manipulations keep the required bytes together then the final
extraction will convert the data back to the original (valid) utf-16.

$0.25


Reply With Quote
  #5  
Old   
Malcolm Dew-Jones
 
Posts: n/a

Default Re: How to Output Query Result as UTF-16 XML - 06-10-2008 , 08:31 PM



Bill Wordsworth (bill.wordsworth (AT) gmail (DOT) com) wrote:
: This is what I have done so far:

: In Stored Procedure, I set-
: * v_xml OUT NVARCHAR2
: * v_xml := '<?xml version="1.0" encoding="UTF-16" ?>...'; -- without
: any chr(10) newline

: In Application, I set-
: * header('Content-Type: application/xml; charset=UTF-16');
: * mb_convert_encoding($v_xml, "UTF-16");

You can convert the data all you like while it is in the database
environment, but as soon the data is being extracted from the database
environment into the layer that is retrieving the data from the database,
the character data will be converted from what ever it was in the database
into the charset of the calling environment.

If you are using the application server then the NLS_LANG set at the
operating system level before apache is started will control that final
implicit conversion. Other tools likewise.

Presumably if you store the XML as a BLOB then the bytes in the blob will
be extracted without that implicit conversion. In that case you could
send it to the client as-is and the client would see exactly the byte
sequence you prepared for them in the database (you would of course ensure
the bytes were valid utf-16).

In my experience, when the data is extracted than Oracle insists on doing
this implicit conversion, but if you have control over the environment
then you maybe able to set the settings you need.

Perhaps using JAVA in the database to extract the data, for example into a
file, writing the bytes yourself, thereby bypassing what oracle does,
would get around this problem. Perhaps using the C interface and
extracting the data using a C program (perhaps running on the application
server as a CGI program for example), would provide more flexibility.
Perhaps there is some setting I do not know about in the normal oracle
tools that move data from the database into the "outside world".

Otherwise, you can do something like what I do with utf-8 xml. You
prepare your data (outside the database) using the correct byte sequences
to be utf-8. Then you load it into the database using the same NLS_LANG
setting as the applicaiton server will use to extract it later. Then you
manipulate the data in the data base, _almost_ as normal. The _BIG_
caveat is that your manipulations must not rearrange the bytes in a way
that messes up the byte sequences. In utf-8 using xml that is not as hard
as it sounds. As one of many examples, attribute names can be manipulated
as long as the name is always treated as a single unit, because the
original byte sequence within the name will be unaltered. Then you
extract the data. The database "converts" the data using the reverse of
the original tranformation, but the result is simply to restore all the
original byte sequences. The tool that receives the data doesn't care
what charset the database thought it was sending - as long as the bytes
are valid for what it thinks it is receiving.

utf-16 would be harder but the underlying principle would be the same - as
long as your manipulations keep the required bytes together then the final
extraction will convert the data back to the original (valid) utf-16.

$0.25


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.