![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
John, thank you for your reply. I have tried textcopy.exe, it doesn't seem to matter. What I am curious about is the type column name. The definition from MSDN doesn't say whether it should be defined as char(4) '.pdf' or char(3) 'pdf'. [@type_colname =] 'type_column_name' Is the name of a column in qualified_table_name that holds the document type of column_name. This column must be char, nchar, varchar, or nvarchar. It is only used when the data type of column_name is an image. type_column_name is sysname, with no default. I can confirm that the DocType column is char(4) with '.pdf' in there. Any ideas? John, are you able to index and search the pdf file I've attached? How long should it take to index one pdf file? *** Sent via Devdex http://www.devdex.com *** Don't just participate in USENET...get rewarded for it! |
#3
| |||
| |||
|
#4
| |||
| |||
|
|
I have found this necessary for Full text indexing of PDF's Modify the Registry to set full text indexing to single threading, the PDF Filter does not support multi-threading; The key is: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Search\1.0\G athering Manager\ And change the value of: RobotThreadsNumber to 1. There is a kb article somewhere on it, but I've forgotten the number... Bob Horkay |
#5
| |||
| |||
|
#6
| |||
| |||
|
|
John, I have tried another pdf (from Acrobat itself) and it worked. I think we finaly located the source of the problem. According to your explanation: "... was converted improperly from a MS Word doc file." So, is it true that if a MS Word (or any files) was properly converted to pdf using a 3rd party software, it would work. The reason I am asking is that in my development environment, all PDFs are created/provided from various sources, we don't generate the PDFs ourselves. Which means we need to handle PDFs that are created by software other than Acrobat's. I am going to do some tests on other PDFs as well, and I will let you know the outcome. Thanks again, Louie *** Sent via Devdex http://www.devdex.com *** Don't just participate in USENET...get rewarded for it! |
#7
| |||
| |||
|
#8
| |||
| |||
|
|
Eventually, we decided to extract text from the pdf files and store the text instead. Since we need to retrieve the pdf files after searching is done, so there is no point storing the actual files twice. *** Sent via Developersdex http://www.developersdex.com *** Don't just participate in USENET...get rewarded for it! |
![]() |
| Thread Tools | |
| Display Modes | |
| |