Indexing PDF files using Solr and Tika -
i'm trying index pdf files using solr. included tika config file force use pdf parser, keeps using emptyparser. result, metadata returned correctly, content empty. it's important mention i'm using sole-word.pdf file not scanned pdf.
what should pdf content in case please?
this extractrequesthandle used:
<requesthandler name="/update/extract" startup="lazy" class="solr.extraction.extractingrequesthandler" > <lst name="defaults"> <str name="lowernames">true</str> <str name="uprefix">attr_</str> <str name="captureattr">true</str> <str name="fmap.content">attr_content</str> <str name="literalsoverride">true</str> <str name="tika.config">./tika/tika.config</str> <str name="parsecontext.config"> <entries> <entry class="org.apache.tika.parser.pdf.pdfparserconfig" impl="org.apache.tika.parser.pdf.pdfparserconfig"> <property name="extractinlineimages" value="true"/> <property name="sortbyposition" value="true"/> </entry> </entries> </str> </lst> </requesthandler>
Comments
Post a Comment