Indexing PDF files using Solr and Tika -


i'm trying index pdf files using solr. included tika config file force use pdf parser, keeps using emptyparser. result, metadata returned correctly, content empty. it's important mention i'm using sole-word.pdf file not scanned pdf.

what should pdf content in case please?

this extractrequesthandle used:

<requesthandler name="/update/extract" startup="lazy" class="solr.extraction.extractingrequesthandler" >    <lst name="defaults">       <str name="lowernames">true</str>       <str name="uprefix">attr_</str>       <str name="captureattr">true</str>       <str name="fmap.content">attr_content</str>       <str name="literalsoverride">true</str>       <str name="tika.config">./tika/tika.config</str>       <str name="parsecontext.config">          <entries>             <entry class="org.apache.tika.parser.pdf.pdfparserconfig" impl="org.apache.tika.parser.pdf.pdfparserconfig">               <property name="extractinlineimages" value="true"/>               <property name="sortbyposition" value="true"/>             </entry>          </entries>       </str>   </lst> </requesthandler> 


Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -