dbTalk Databases Forums  

Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ?

comp.databases.oracle.server comp.databases.oracle.server


Discuss Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? in the comp.databases.oracle.server forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Syltrem
 
Posts: n/a

Default Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-15-2011 , 11:36 AM






Hello

I have a particular set of PDF files that cannot be loaded successfully into
an Oracle BLOB column.

I don't know why that is, and Oracle support says that my file is corrupt
???!!!

The file can be opened no problem with Adobe's Acrobat reader.
I can load other PDF files into a BLOB, extract them, and they are still
good to Acrobat.

But these files, no.

What I can see if when they are loaded into the BLOB, some extra characters
are added, and while the BLOB contains the same number of bytes as the
original file, it is in fact truncated due to the extra characters added
(i.e. 200 extras added = 200 truncated at the end of the file).

I loaded the file with DBMS_LOB:LoadCLOBFromFile like this :

declare
l_blob BLOB;
l_bfile BFILE:=BFILENAME('COPY_DB','ORIGINAL_FILE.PDF');
v_dest_offset integer:=1;
v_src_offset integer:=1;
begin
insert into test
values(1,EMPTY_BLOB())
returning pdf into l_Blob;
dbms_lob.fileopen(l_bfile,dbms_lob.FILE_READONLY);
dbms_lob.loadblobfromfile(dest_lob=>l_Blob, src_bfile=>l_bfile,
amount=>dbms_lob.getlength(l_bfile),
dest_offset=>v_dest_offset, src_offset=>v_src_offset);
dbms_lob.fileclose(l_bfile);
commit;
end;
/

If I do this SQL below into a SPOOLed file (because on the screen it is
reall gibberish) from SQLplus, I do see the extra characters when I load the
spooled file into a text editor, and compare that with the PDF file also
loaded into the text editor.
select UTL_RAW.CAST_TO_VARCHAR2(dbms_lob.substr(PDF,2000) ) from test;

A hexadecimal dump of the original imported file, compared to that of the
exported file, also shows the difference.

Oracle support is trying to convince me that this is normal, certain files
cannot be loaded into a BLOB. I don't see that makes any sense. They also
blabber about character set conversions, which should not interfere as this
is not text (CLOB) but binary (BLOB).

Lastly, if I read the file into a dot net program variable, then update the
BLOB column with that variable's value, the file is loaded correctly. And
can be extracted correctly too.



So obviously something is wrong with dbms_lob.loadblobfromfile unless I miss
some imprtant point.

To me, a byte is a byte and as long as the file is loaded as binaty data,
every single byte should be read and moved into the BLOB column without any
change whatsoever.

Thanks for your comments. I could not see anything in Google or Metalink
expaining that some file could not be loaded into a BLOB.



Thanks !

Syltrem



P.S. I also tried loading the file with SQL*Loader and it too adds extra
characters, but does not truncate it (the BLOB colum is larger than the size
of the original file). Upon extraction, the file is seen as corrupt by
Acrobat (of course).

Reply With Quote
  #2  
Old   
Steve Howard
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-16-2011 , 03:17 PM






On Mar 15, 1:36*pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
Hello

I have a particular set of PDF files that cannot be loaded successfully into
an Oracle BLOB column.


Do you have an actual error message?

Reply With Quote
  #3  
Old   
onedbguru
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-16-2011 , 06:50 PM



On Mar 16, 5:17*pm, Steve Howard <stevedhow... (AT) gmail (DOT) com> wrote:
Quote:
On Mar 15, 1:36*pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:

Hello

I have a particular set of PDF files that cannot be loaded successfullyinto
an Oracle BLOB column.

Do you have an actual error message?
Also, what version? it DOES make a difference... a quick search for
pdf blob oracle finds many working examples of storing/retrieving
*LOB data.

Reply With Quote
  #4  
Old   
Syltrem
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-17-2011 , 07:34 AM



"Steve Howard" <stevedhoward (AT) gmail (DOT) com> wrote

On Mar 15, 1:36 pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
Hello

I have a particular set of PDF files that cannot be loaded successfully
into
an Oracle BLOB column.



Do you have an actual error message?
No error. It's just that some extra characters are added into the BLOB.
For instance, I see that a 0A is added following each 0C (LineFeed added
following each FormFeed).
So if the PDF contains 50 FormFeeds, then 50 LineFeeds are added, making 50
characters to be truncated from the end of the file (the BLOB length always
matches the file's length).

And the problem is with INSERTING the file into the BLOB. The extraction
from Oracle is not the problem, which I can prove by loading the same PDF
with a dot net program, and extracting it successfully with the same
extraction procedure. dbms_lob.loadBLOBfrofile does the damage.

This is with 10.2.0.4

Yesterday, I downloaded Acrobat Pro trial version. I created a new PDF file
with it (from a MS-Word document), loaded it into Oracle, and could not
retrieve it in good condition. I have not gone into the details yet but I
suspect the same problem.
Strangely, I have other PDF coming from different sources, that work without
problem. Actually, I was surprised that I could not use a PLSQL program that
I used for a couple of years to load other PDFs, to load these new ones.
Can't tell what Oracle does not like about these new ones.

And yes I see there are many example out there showing how to load and
extract PDF with Oracle, but non mentions that some PDF (nor any other file
types) are not "supported".

Thanks
Syltrem

Reply With Quote
  #5  
Old   
gazzag
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-17-2011 , 08:55 AM



On Mar 17, 1:34*pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
No error. It's just that some extra characters are added into the BLOB.
For instance, I see that a 0A is added following each 0C (LineFeed added
following each FormFeed).
So if the PDF contains 50 FormFeeds, then 50 LineFeeds are added, making 50
characters to be truncated from the end of the file (the BLOB length always
matches the file's length).

And the problem is with INSERTING the file into the BLOB. The extraction
from Oracle is not the problem, which I can prove by loading the same PDF
with a dot net program, and extracting it successfully with the same
extraction procedure. dbms_lob.loadBLOBfrofile does the damage.

This is with 10.2.0.4

Yesterday, I downloaded Acrobat Pro trial version. I created a new PDF file
with it (from a MS-Word document), loaded it into Oracle, and could not
retrieve it in good condition. I have not gone into the details yet but I
suspect the same problem.
Strangely, I have other PDF coming from different sources, that work without
problem. Actually, I was surprised that I could not use a PLSQL program that
I used for a couple of years to load other PDFs, to load these new ones.
Can't tell what Oracle does not like about these new ones.

And yes I see there are many example out there showing how to load and
extract PDF with Oracle, but non mentions that some PDF (nor any other file
types) are not "supported".

Thanks
Syltrem
Not sure if I posted this properly, so will try again:

Do you have any O/S details? Could it be a MS Windows / Unix CR/LF
conversion issue?

HTH

-g

Reply With Quote
  #6  
Old   
Syltrem
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-17-2011 , 09:41 AM



"gazzag" <gareth (AT) jamms (DOT) org> wrote

On Mar 17, 1:34 pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
No error. It's just that some extra characters are added into the BLOB.
For instance, I see that a 0A is added following each 0C (LineFeed added
following each FormFeed).
So if the PDF contains 50 FormFeeds, then 50 LineFeeds are added, making
50
characters to be truncated from the end of the file (the BLOB length
always
matches the file's length).

And the problem is with INSERTING the file into the BLOB. The extraction
from Oracle is not the problem, which I can prove by loading the same PDF
with a dot net program, and extracting it successfully with the same
extraction procedure. dbms_lob.loadBLOBfrofile does the damage.

This is with 10.2.0.4

Yesterday, I downloaded Acrobat Pro trial version. I created a new PDF
file
with it (from a MS-Word document), loaded it into Oracle, and could not
retrieve it in good condition. I have not gone into the details yet but I
suspect the same problem.
Strangely, I have other PDF coming from different sources, that work
without
problem. Actually, I was surprised that I could not use a PLSQL program
that
I used for a couple of years to load other PDFs, to load these new ones.
Can't tell what Oracle does not like about these new ones.

And yes I see there are many example out there showing how to load and
extract PDF with Oracle, but non mentions that some PDF (nor any other
file
types) are not "supported".

Thanks
Syltrem

Not sure if I posted this properly, so will try again:

Do you have any O/S details? Could it be a MS Windows / Unix CR/LF
conversion issue?

HTH

-g
Production is on OpenVMS 8.4 / Itanium but I also tried with Windows XP,
Oracle XE and got the same problem.

Syltrem

Reply With Quote
  #7  
Old   
onedbguru
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makessense ? - 03-17-2011 , 06:09 PM



On Mar 17, 11:41*am, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
"gazzag" <gar... (AT) jamms (DOT) org> wrote in message

news:828ea98b-bb94-4ee2-88f3-43b037836f54 (AT) k9g2000yqi (DOT) googlegroups.com...
On Mar 17, 1:34 pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:











No error. It's just that some extra characters are added into the BLOB.
For instance, I see that a 0A is added following each 0C (LineFeed added
following each FormFeed).
So if the PDF contains 50 FormFeeds, then 50 LineFeeds are added, making
50
characters to be truncated from the end of the file (the BLOB length
always
matches the file's length).

And the problem is with INSERTING the file into the BLOB. The extraction
from Oracle is not the problem, which I can prove by loading the same PDF
with a dot net program, and extracting it successfully with the same
extraction procedure. dbms_lob.loadBLOBfrofile does the damage.

This is with 10.2.0.4

Yesterday, I downloaded Acrobat Pro trial version. I created a new PDF
file
with it (from a MS-Word document), loaded it into Oracle, and could not
retrieve it in good condition. I have not gone into the details yet butI
suspect the same problem.
Strangely, I have other PDF coming from different sources, that work
without
problem. Actually, I was surprised that I could not use a PLSQL program
that
I used for a couple of years to load other PDFs, to load these new ones..
Can't tell what Oracle does not like about these new ones.

And yes I see there are many example out there showing how to load and
extract PDF with Oracle, but non mentions that some PDF (nor any other
file
types) are not "supported".

Thanks
Syltrem

Not sure if I posted this properly, so will try again:

Do you have any O/S details? *Could it be a MS Windows / Unix CR/LF
conversion issue?

HTH

-g

Production is on OpenVMS 8.4 / Itanium but I also tried with Windows XP,
Oracle XE and got the same problem.

Syltrem
How did you get the file onto the OpenVMS server? ftp? Did you use
binary mode? - I think that you probably already know that, but just
checking.

M.
ex-DECcie - RDB Rules.

Reply With Quote
  #8  
Old   
Syltrem
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-17-2011 , 07:29 PM



"onedbguru" <onedbguru (AT) yahoo (DOT) com> a écrit dans le message de news:
1ce00a7c-f5d5-4c17-8230-49b41479453e...oglegroups.com...
On Mar 17, 11:41 am, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:
Quote:
"gazzag" <gar... (AT) jamms (DOT) org> wrote in message

news:828ea98b-bb94-4ee2-88f3-43b037836f54 (AT) k9g2000yqi (DOT) googlegroups.com...
On Mar 17, 1:34 pm, "Syltrem" <syltremz... (AT) videotron (DOT) ca> wrote:











No error. It's just that some extra characters are added into the BLOB.
For instance, I see that a 0A is added following each 0C (LineFeed added
following each FormFeed).
So if the PDF contains 50 FormFeeds, then 50 LineFeeds are added, making
50
characters to be truncated from the end of the file (the BLOB length
always
matches the file's length).

And the problem is with INSERTING the file into the BLOB. The extraction
from Oracle is not the problem, which I can prove by loading the same
PDF
with a dot net program, and extracting it successfully with the same
extraction procedure. dbms_lob.loadBLOBfrofile does the damage.

This is with 10.2.0.4

Yesterday, I downloaded Acrobat Pro trial version. I created a new PDF
file
with it (from a MS-Word document), loaded it into Oracle, and could not
retrieve it in good condition. I have not gone into the details yet but
I
suspect the same problem.
Strangely, I have other PDF coming from different sources, that work
without
problem. Actually, I was surprised that I could not use a PLSQL program
that
I used for a couple of years to load other PDFs, to load these new ones.
Can't tell what Oracle does not like about these new ones.

And yes I see there are many example out there showing how to load and
extract PDF with Oracle, but non mentions that some PDF (nor any other
file
types) are not "supported".

Thanks
Syltrem

Not sure if I posted this properly, so will try again:

Do you have any O/S details? Could it be a MS Windows / Unix CR/LF
conversion issue?

HTH

-g

Production is on OpenVMS 8.4 / Itanium but I also tried with Windows XP,
Oracle XE and got the same problem.

Syltrem

How did you get the file onto the OpenVMS server? ftp? Did you use
binary mode? - I think that you probably already know that, but just
checking.

M.
ex-DECcie - RDB Rules.
I know that and it's not the problem.

Conversely, If a windows dot net program loads the file into the BLOB, then
PLSQL extracts it, it's good for Acrobat. Actually, this will create an
extracted file that's an exact copy of the original file. But
dbms_lob.loadblobfromfile breaks the file when loading it into the blob.

It does so on Windows too (Oracle XE) when OpenVMS is nowhere involved (when
the PDF file is local to the Windows system)

Syltrem

Reply With Quote
  #9  
Old   
Sybrand Bakker
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-18-2011 , 05:08 PM



On Thu, 17 Mar 2011 09:34:49 -0400, "Syltrem"
<syltremzulu (AT) videotron (DOT) ca> wrote:

Quote:
And yes I see there are many example out there showing how to load and
extract PDF with Oracle, but non mentions that some PDF (nor any other file
types) are not "supported".
There is no such statement as PDF works without issue, at least in
11gR2 and no doubt also in earlier releases.

------------
Sybrand Bakker
Senior Oracle DBA

Reply With Quote
  #10  
Old   
Jonathan Lewis
 
Posts: n/a

Default Re: Particular file (PDF) cannot be loaded into Oracle BLOB. Makes sense ? - 03-19-2011 , 03:39 AM



"Syltrem" <syltremzulu (AT) videotron (DOT) ca> wrote

Quote:
Hello

I have a particular set of PDF files that cannot be loaded successfully
into an Oracle BLOB column.


declare
l_blob BLOB;
l_bfile BFILE:=BFILENAME('COPY_DB','ORIGINAL_FILE.PDF');
v_dest_offset integer:=1;
v_src_offset integer:=1;
begin
insert into test
values(1,EMPTY_BLOB())
returning pdf into l_Blob;
dbms_lob.fileopen(l_bfile,dbms_lob.FILE_READONLY);
dbms_lob.loadblobfromfile(dest_lob=>l_Blob, src_bfile=>l_bfile,
amount=>dbms_lob.getlength(l_bfile),
dest_offset=>v_dest_offset, src_offset=>v_src_offset);
dbms_lob.fileclose(l_bfile);
commit;
end;
/


select UTL_RAW.CAST_TO_VARCHAR2(dbms_lob.substr(PDF,2000) ) from test;

A hexadecimal dump of the original imported file, compared to that of the
exported file, also shows the difference.



Your interpretation seems very unlikely - unless you are using an early
version of Oracle that has some (hypothetical) silly bugs that have been
eliminated in more recent patches releases.

If this happens to pdf files it would also happen arbitrarily to wav, doc,
avi, mov, xls, etc ... and there would probably be lots of people
complaining about LOBs in the database not working.

Checks:

Is the directory you read from NFS mounted or Samba mounted across
platforms ? (Yes, your comment about BLOB/CLOB should make this
irrelevant, probably)

Is the character set of the server the same as the characterset of the
session that runs your SQL*Plus extract ? (And your comments about
BLOB/CLOB aren't relevant here because you're using raw_to_varchar2() to
convert a stream of raw bytes into characters.

Finally: when you spool the raw_to_varchar2() output, it may be
SQL*Plus that is taking every LF and turning it into CRLF. If you want to
compare input and output you'll probably have to do it the hard way, use
rawtohex() on the substr() result, and compare with an od -x (or similar)
of the input file and compare by eye.


Solving the problem - you have a reproducible test case, have you sent it
to support ask them what they see ?
Find the smallest file that reproduces the problem - preferably something
that fits inside your 2000 byte limit, and sent it to Oracle with your
input script, output script, and output file.


--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.