dbTalk Databases Forums  

Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ?

comp.databases.oracle.server comp.databases.oracle.server


Discuss Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? in the comp.databases.oracle.server forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
BicycleRepairman
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-19-2011 , 09:21 AM






On Mar 17, 8:34*am, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
"Steve Howard" <stevedhow... (AT) gmail (DOT) com> wrote in message

news:8c80cc88-e48f-4c84-8023-856244e660b6 (AT) r19g2000prm (DOT) googlegroups.com...
On Mar 15, 1:36 pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:

Hello

I have a particular set of PDF files that cannot be loaded successfully
into
an Oracle BLOB column.

Do you have an actual error message?

No error. It's just that some extra characters are added into the BLOB.
For instance, I see that a 0A is added following each 0C (LineFeed added
following each FormFeed).
So if the PDF contains 50 FormFeeds, then 50 LineFeeds are added, making 50
characters to be truncated from the end of the file (the BLOB length always
matches the file's length).

And the problem is with INSERTING the file into the BLOB. The extraction
from Oracle is not the problem, which I can prove by loading the same PDF
with a dot net program, and extracting it successfully with the same
extraction procedure. dbms_lob.loadBLOBfrofile does the damage.

This is with 10.2.0.4

Yesterday, I downloaded Acrobat Pro trial version. I created a new PDF file
with it (from a MS-Word document), loaded it into Oracle, and could not
retrieve it in good condition. I have not gone into the details yet but I
suspect the same problem.
Strangely, I have other PDF coming from different sources, that work without
problem. Actually, I was surprised that I could not use a PLSQL program that
I used for a couple of years to load other PDFs, to load these new ones.
Can't tell what Oracle does not like about these new ones.

And yes I see there are many example out there showing how to load and
extract PDF with Oracle, but non mentions that some PDF (nor any other file
types) are not "supported".

Thanks
Syltrem
I had this happen a couple of years ago on one system... couldn't be
reproduced in our dev and test environments; the final result was a
character set conversion issue, where the (thick client) software we
were using instructed OCI to set the characterset to UTF-8 (AL32UTF8)
but sql*net was not making the character set conversion properly and
was therefore storing the blob in a different character set than what
we all "knew" was the character set we were using. It took *months* to
trace it down (because no one believed that the thick client
executable (commercial software, in use for a decade) could possibly
be at fault.
So -- as a test -- use something like sql*developer to load the PDF
and see in what environments the resulting file can be viewed. In our
case, a .Net application could see the sql*developer inserted file,
but our thick client app could not.

Reply With Quote
  #12  
Old   
Syltrem
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-21-2011 , 08:06 AM






"Jonathan Lewis" <jonathan (AT) jlcomp (DOT) demon.co.uk> wrote

Quote:
"Syltrem" <syltremzulu (AT) videotron (DOT) ca> wrote in message
news:qsNfp.19889$Lj7.60 (AT) en-nntp-03 (DOT) dc1.easynews.com...
Hello

I have a particular set of PDF files that cannot be loaded successfully
into an Oracle BLOB column.


declare
l_blob BLOB;
l_bfile BFILE:=BFILENAME('COPY_DB','ORIGINAL_FILE.PDF');
v_dest_offset integer:=1;
v_src_offset integer:=1;
begin
insert into test
values(1,EMPTY_BLOB())
returning pdf into l_Blob;
dbms_lob.fileopen(l_bfile,dbms_lob.FILE_READONLY);
dbms_lob.loadblobfromfile(dest_lob=>l_Blob, src_bfile=>l_bfile,
amount=>dbms_lob.getlength(l_bfile),
dest_offset=>v_dest_offset, src_offset=>v_src_offset);
dbms_lob.fileclose(l_bfile);
commit;
end;
/


select UTL_RAW.CAST_TO_VARCHAR2(dbms_lob.substr(PDF,2000) ) from test;

A hexadecimal dump of the original imported file, compared to that of the
exported file, also shows the difference.




Your interpretation seems very unlikely - unless you are using an early
version of Oracle that has some (hypothetical) silly bugs that have been
eliminated in more recent patches releases.

It's not my interpretation, it's that of the Oracle support person handling
my SR.
He said the files loaded by loadblobfromfile must be "supported".
Was news to me but he said this is so.


Quote:
If this happens to pdf files it would also happen arbitrarily to wav, doc,
avi, mov, xls, etc ... and there would probably be lots of people
complaining about LOBs in the database not working.

I cannot agree more, and tried to expain this to Oracle but failed
somehow...


Quote:
Checks:

Is the directory you read from NFS mounted or Samba mounted across
platforms ? (Yes, your comment about BLOB/CLOB should make this
irrelevant, probably)

No, the file is local to the server. I did this also with Oracle XE (Win XP)
with the PDF on my C drive, with the same results.
I'll try that again just to be 200% sure as it makes less and less sense.

Quote:
Is the character set of the server the same as the characterset of the
session that runs your SQL*Plus extract ? (And your comments about
BLOB/CLOB aren't relevant here because you're using raw_to_varchar2() to
convert a stream of raw bytes into characters.

Finally: when you spool the raw_to_varchar2() output, it may be
SQL*Plus that is taking every LF and turning it into CRLF. If you want to
compare input and output you'll probably have to do it the hard way, use
rawtohex() on the substr() result, and compare with an od -x (or similar)
of the input file and compare by eye.

Agreed, but it shows the same extra characters in the same spot as I see
them in the file I create when the BLOB is extracted with UTL_FILE.
But it's not CRLF, it's FFLF or LFFF (can't remember if LF is added in front
of FF or after it)

And I know it's not my extraction procedure that's wrong as it can
successfully extract the PDF it it was originally imported to the BLOB with
a dot net program.
Unless of course there really is something weird about character set
conversion (which should not happend with BLOB). I run the load and extract
PLSQL from the same session... so NLS settings can't change.

Quote:
Solving the problem - you have a reproducible test case, have you sent it
to support ask them what they see ?
They say they can reproduce, and the problem is my PDF which is probably
"wrong".

???

Quote:
Find the smallest file that reproduces the problem - preferably something
that fits inside your 2000 byte limit, and sent it to Oracle with your
input script, output script, and output file.


--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com


Reply With Quote
  #13  
Old   
prunoki
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-22-2011 , 02:09 AM



On Mar 21, 3:06*pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
"Jonathan Lewis" <jonat... (AT) jlcomp (DOT) demon.co.uk> wrote in message

news:su-dnfjX9eYJ5BnQnZ2dnUVZ8l2dnZ2d (AT) bt (DOT) com...











"Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote in message
news:qsNfp.19889$Lj7.60 (AT) en-nntp-03 (DOT) dc1.easynews.com...
Hello

I have a particular set of PDF files that cannot be loaded successfully
into an Oracle BLOB column.

declare
l_blob * *BLOB;
l_bfile * BFILE:=BFILENAME('COPY_DB','ORIGINAL_FILE.PDF');
v_dest_offset integer:=1;
v_src_offset integer:=1;
begin
insert into test
values(1,EMPTY_BLOB())
returning pdf into l_Blob;
dbms_lob.fileopen(l_bfile,dbms_lob.FILE_READONLY);
dbms_lob.loadblobfromfile(dest_lob=>l_Blob, src_bfile=>l_bfile,
amount=>dbms_lob.getlength(l_bfile),
* * * *dest_offset=>v_dest_offset, src_offset=>v_src_offset);
dbms_lob.fileclose(l_bfile);
commit;
end;
/

select UTL_RAW.CAST_TO_VARCHAR2(dbms_lob.substr(PDF,2000) ) from test;

A hexadecimal dump of the original imported file, compared to that of the
exported file, also shows the difference.

Your interpretation seems very unlikely - unless you are using an early
version of Oracle that has some (hypothetical) silly bugs that have been
eliminated in more recent patches releases.

It's not my interpretation, it's that of the Oracle support person handling
my SR.
He said the files loaded by loadblobfromfile must be "supported".
Was news to me but he said this is so.

If this happens to pdf files it would also happen arbitrarily to wav, doc,
avi, mov, xls, etc ... and there would probably be lots of people
complaining about LOBs in the database not working.

I cannot agree more, and tried to expain this to Oracle but failed
somehow...

Checks:

* *Is the directory you read from NFS mounted or Samba mounted across
platforms ? *(Yes, your comment about BLOB/CLOB should make this
irrelevant, probably)

No, the file is local to the server. I did this also with Oracle XE (Win XP)
with the PDF on my C drive, with the same results.
I'll try that again just to be 200% sure as it makes less and less sense.

* *Is the character set of the server the same as the characterset of the
session that runs your SQL*Plus extract ? (And your comments about
BLOB/CLOB aren't relevant here because you're using raw_to_varchar2() to
convert a stream of raw bytes into characters.

* *Finally: *when you spool the raw_to_varchar2() output, it may be
SQL*Plus that is taking every LF and turning it into CRLF. If you want to
compare input and output you'll probably have to do it the hard way, use
rawtohex() on the substr() result, and compare with an od -x (or similar)
of the input file and compare by eye.

Agreed, but it shows the same extra characters in the same spot as I see
them in the file I create when the BLOB is extracted with UTL_FILE.
But it's not CRLF, it's FFLF or LFFF (can't remember if LF is added in front
of FF or after it)

And I know it's not my extraction procedure that's wrong as it can
successfully extract the PDF it it was originally imported to the BLOB with
a dot net program.
Unless of course there really is something weird about character set
conversion (which should not happend with BLOB). I run the load and extract
PLSQL from the same session... so NLS settings can't change.



Solving the problem - you have a reproducible test case, have you sent it
to support ask them what they see ?

They say they can reproduce, and the problem is my PDF which is probably
"wrong".

???







Find the smallest file that reproduces the problem - preferably something
that fits inside your 2000 byte limit, and sent it to Oracle with your
input script, output script, and output file.

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com
Do you use utl_file to extract the blob? Maybe I am wrong, but I
remember that back in the days I had to switch over to using Java
because utl_file inserted extra piece of lf-s into my file. If you
have a Toad or some other utility, any other means to extract the
blob, you should try it to verify.

Krisztian

Reply With Quote
  #14  
Old   
John Hurley
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-22-2011 , 06:58 AM



syltrem:

# They say they can reproduce, and the problem is my PDF which is
probably "wrong".

Why don't you give us a link to somewhere where you put up what you
think is a "good version" of your PDF and some other people could
check it out.

Include in also please a copy of the code you are using to get it in
and out ... that way some other people could check it out for you.

Reply With Quote
  #15  
Old   
onedbguru
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-22-2011 , 05:12 PM



On Mar 22, 8:58*am, John Hurley <hurleyjo... (AT) yahoo (DOT) com> wrote:
Quote:
syltrem:

# They say they can reproduce, and the problem is my PDF which is
probably "wrong".

Why don't you give us a link to somewhere where you put up what you
think is a "good version" of your PDF and some other people could
check it out.

Include in also please a copy of the code you are using to get it in
and out ... that way some other people could check it out for you.
So the question is can you successfully load/unload *any* pdf or just
this particular one using dbms_lob.loadblobfromfile?

Reply With Quote
  #16  
Old   
Syltrem
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-24-2011 , 11:47 AM



I found what is wrong with the file.
Someone asked if I was using Samba and yes, I do (in fact it is
Pathworks/Advanced Server which is based on Samba).

When the file is moved from Windows to VMS using Samba, then back to
Windows. No problem.
But the same file (sitting on VMS) is not loaded properly in Oracle for some
reason.

If I copy the file from Windows to VMS with FTP, I can successfully load it
into Oracle and retrieve it intact.

File attributes are different (on VMS) when the file gets there via FTP vs
Samba. Somehow it does not matter to Samba and Acrobat, but Oracle does not
like it.

Problem solved.
I'm still in doubt with all Oracle support told me about the files having to
be "supported" (or be of an Oracle "known" type).
Anyways.

Thanks to all
Syltrem

Reply With Quote
  #17  
Old   
Jeremy
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-24-2011 , 12:06 PM



In article <XsLip.726032$Bh.128724 (AT) en-nntp-12 (DOT) dc1.easynews.com>,
syltremzulu (AT) videotron (DOT) ca says...

Quote:
I'm still in doubt with all Oracle support told me about the files
having to
be "supported" (or be of an Oracle "known" type).

I think that is utter rubbish. It's just bits.

--
jeremy

Reply With Quote
  #18  
Old   
onedbguru
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-24-2011 , 04:58 PM



On Mar 24, 2:06*pm, Jeremy <jeremy0... (AT) gmail (DOT) com> wrote:
Quote:
In article <XsLip.726032$Bh.128... (AT) en-nntp-12 (DOT) dc1.easynews.com>,
syltremz... (AT) videotron (DOT) ca says...



I'm still in doubt with all Oracle support told me about the files
having to
be "supported" (or be of an Oracle "known" type).

I think that is utter rubbish. It's just bits.

--
jeremy
Yes, bits is bits except when you take a binary file and add CR/LF
pairs when it shouldn't. IIRC, there is/was a switch somewhere that
caused SAMBA to leave the file alone an not try to "translate" it - it
behaves as if the file is transferred to the SAMBA share on VMS in
ASCII mode which adds the CR/LF because it is not smart enough to know
that it is a binary vs a text file.

Reply With Quote
  #19  
Old   
Jeremy
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-25-2011 , 03:54 AM



In article <20930423-a6d5-4ac2-ab2f-94d0b3c804e4
@x8g2000prh.googlegroups.com>, onedbguru (AT) yahoo (DOT) com says...
Quote:
On Mar 24, 2:06*pm, Jeremy <jeremy0... (AT) gmail (DOT) com> wrote:
In article <XsLip.726032$Bh.128... (AT) en-nntp-12 (DOT) dc1.easynews.com>,
syltremz... (AT) videotron (DOT) ca says...



I'm still in doubt with all Oracle support told me about the files
having to
be "supported" (or be of an Oracle "known" type).

I think that is utter rubbish. It's just bits.


Yes, bits is bits except when you take a binary file and add CR/LF
pairs when it shouldn't. IIRC, there is/was a switch somewhere that
caused SAMBA to leave the file alone an not try to "translate" it - it
behaves as if the file is transferred to the SAMBA share on VMS in
ASCII mode which adds the CR/LF because it is not smart enough to know
that it is a binary vs a text file.
Quite - however the assertion that the content of a BLOB has to be of
some defined and recognized structure is plainly ludicrous.

--
jeremy

Reply With Quote
  #20  
Old   
Syltrem
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-25-2011 , 07:37 AM



"onedbguru" <onedbguru (AT) yahoo (DOT) com> wrote

On Mar 24, 2:06 pm, Jeremy <jeremy0... (AT) gmail (DOT) com> wrote:
Quote:
In article <XsLip.726032$Bh.128... (AT) en-nntp-12 (DOT) dc1.easynews.com>,
syltremz... (AT) videotron (DOT) ca says...



I'm still in doubt with all Oracle support told me about the files
having to
be "supported" (or be of an Oracle "known" type).

I think that is utter rubbish. It's just bits.

--
jeremy

Yes, bits is bits except when you take a binary file and add CR/LF
pairs when it shouldn't. IIRC, there is/was a switch somewhere that
caused SAMBA to leave the file alone an not try to "translate" it - it
behaves as if the file is transferred to the SAMBA share on VMS in
ASCII mode which adds the CR/LF because it is not smart enough to know
that it is a binary vs a text file.

Possible, but strangely Samba can retrieve the file from VMS and present it
to Acrobat running on Windows, and Acrobat is happy with it. So it's
unlikely that going to HP to get support on Pathworks (Samba) will go
anywhere as it's fine from their point of view.

But obviously when you *don't* use Samba to read the file, it seems to be
misinterpreted.
When I FTP the file to VMS it gets there as a fixed length record size, 512
bytes.
When using Samba, it's variable size records. So possibly Oracle thinks it's
some ascii file ending with CRLF and does not load it properly. I could
probably play with the file/record attributes and get it to work but I
already worked around the problem so for this time, I'm finished with that.
At least I know what's wrong, as it could happen again with other files and
I'll know what to do.

Syltrem

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.