ft:search() returns all index fields

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

ft:search() returns all index fields

ron.vandenbranden
Hi,

I'm trying to see if I can use the content extraction module for
developing a search interface for a bunch of PDF files. I've tried the
demo app at http://localhost:8080/exist/apps/demo/cex-demo.html, but
this seems to produce inaccurate results. Basically, if a result is
found in the 'page' field of an index on a PDF file, it seems that all
pages of that PDF file are returned. I'm testing with eXist-develop,
revision d9ecd33 on Windows, with Oracle JDK 1.8.0_73.

I've tried to isolate the problem by following the steps outlined in the
eXist blog post at
http://exist-db.org/exist/apps/wiki/blogs/eXist/ContentExtraction.
Attached is a test script that stores a test document, creates a Lucene
index, queries that index, and deletes the test document again.

Basically, with following index in place:

   ft:index('/db/apps/test.txt',
    <doc>
      <field name="title" store="yes">Indexing</field>
      <field name="para" store="yes">This is the first paragraph.</field>
      <field name="para" store="yes">And a second paragraph.</field>
    </doc>)

I would expect the query ft:search('/db/apps/test.txt', 'para:second')
to return only the second <field>. Yet, it appears that whenever a match
is found in a field, *all* fields with the same name are returned for
that document:

   <results>
     <search score="4.7551346" uri="/db/apps/test.xml">
       <field name="para">This is the first paragraph.</field>
       <field name="para">And a <exist:match xmlns:exist="http://exist.sourceforge.net/NS/exist">second</exist:match> paragraph.</field>
     </search>
   </results>

Of course, the matching <field>s can be identified by their embedded
<exist:match> element, but I would rather expect that only matching
<field>s are returned in the first place.

Is this a bug, or am I'm misunderstanding how ft:search() works?

Best,

Ron

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open

ft-search-test.xq (1021 bytes) Download Attachment