ft:search() returns all index fields

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

ft:search() returns all index fields


I'm trying to see if I can use the content extraction module for
developing a search interface for a bunch of PDF files. I've tried the
demo app at http://localhost:8080/exist/apps/demo/cex-demo.html, but
this seems to produce inaccurate results. Basically, if a result is
found in the 'page' field of an index on a PDF file, it seems that all
pages of that PDF file are returned. I'm testing with eXist-develop,
revision d9ecd33 on Windows, with Oracle JDK 1.8.0_73.

I've tried to isolate the problem by following the steps outlined in the
eXist blog post at
Attached is a test script that stores a test document, creates a Lucene
index, queries that index, and deletes the test document again.

Basically, with following index in place:

      <field name="title" store="yes">Indexing</field>
      <field name="para" store="yes">This is the first paragraph.</field>
      <field name="para" store="yes">And a second paragraph.</field>

I would expect the query ft:search('/db/apps/test.txt', 'para:second')
to return only the second <field>. Yet, it appears that whenever a match
is found in a field, *all* fields with the same name are returned for
that document:

     <search score="4.7551346" uri="/db/apps/test.xml">
       <field name="para">This is the first paragraph.</field>
       <field name="para">And a <exist:match xmlns:exist="http://exist.sourceforge.net/NS/exist">second</exist:match> paragraph.</field>

Of course, the matching <field>s can be identified by their embedded
<exist:match> element, but I would rather expect that only matching
<field>s are returned in the first place.

Is this a bug, or am I'm misunderstanding how ft:search() works?



Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
Exist-open mailing list
[hidden email]

ft-search-test.xq (1021 bytes) Download Attachment