Indexing PDF files using Solr and Tika -

- February 15, 2010

i'm trying index pdf files using solr. included tika config file force use pdf parser, keeps using emptyparser. result, metadata returned correctly, content empty. it's important mention i'm using sole-word.pdf file not scanned pdf.

what should pdf content in case please?

this extractrequesthandle used:

<requesthandler name="/update/extract" startup="lazy" class="solr.extraction.extractingrequesthandler" >    <lst name="defaults">       <str name="lowernames">true</str>       <str name="uprefix">attr_</str>       <str name="captureattr">true</str>       <str name="fmap.content">attr_content</str>       <str name="literalsoverride">true</str>       <str name="tika.config">./tika/tika.config</str>       <str name="parsecontext.config">          <entries>             <entry class="org.apache.tika.parser.pdf.pdfparserconfig" impl="org.apache.tika.parser.pdf.pdfparserconfig">               <property name="extractinlineimages" value="true"/>               <property name="sortbyposition" value="true"/>             </entry>          </entries>       </str>   </lst> </requesthandler>

Search This Blog

QR

Indexing PDF files using Solr and Tika -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -