dramatical performance drop for ft:query() in eXist-2.2

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

dramatical performance drop for ft:query() in eXist-2.2

ron.vandenbranden
Hi,

I'm porting an existing app from eXist-2.1 to eXist-2.2, and have
noticed a spectacular performance drop for the ft:query() function.

I'm running queries like:

   collection('/db/apps/test')//p[ft:query(., 'string')]

This successfully returns the nodes I expect, so the index is working
properly. Yet, while this query ran in 0.069 sec. in eXist-2.1, the same
query on the same data takes 23 sec. with eXist-2.2!

When checking Monex in eXist-2.2, I've noticed this in the functions tab:

   ft:query     org.exist.xquery.modules.lucene.Query [7:85]     14152     21.137

I've checked, and the huge number of function calls equals the number of
<p> nodes in my data, thus suggesting that ft:query() is called for each
each <p> element. I don't know if this differs from eXist-2.1 (running
inside Tomcat-7, where Monex doesn't work -- see
https://github.com/wolfgangmm/monex/issues/9).

Yet, I've tried rephrasing the query as:

   ft:query(collection('/db/apps/test')//p, 'string')

...which returns the same correct results, but now in 0.059 sec., the
same lightning speed as both variants of this query in eXist-2.1.
Indeed, with this variant, eXist-2.2's Monex shows there's only 1
ft:query() function call.

I'd very much prefer to keep my queries untouched (they're constructed
dynamically, and it would be quite inconvenient to lose the possibility
to express the ft:query() within a predicate as in the first variant).

Has anything changed w.r.t. the evaluation of ft:query() in eXist-2.2,
or is this unintended behaviour? Could it be an XQuery optimizer issue?

Best,

Ron

--
Ron Van den Branden
CTB   - Centrum voor Teksteditie en Bronnenstudie /
         Centre for Scholarly Editing and Document Studies
KANTL - Koninklijke Academie voor Nederlandse Taal- en Letterkunde /
         Royal Academy of Dutch Language and Literature
Koningstraat 18 b-9000 Gent Belgium
E-mail : [hidden email]


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: dramatical performance drop for ft:query() in eXist-2.2

wolfgangmm
Hi Ron,

> When checking Monex in eXist-2.2, I've noticed this in the functions tab:
>
>   ft:query     org.exist.xquery.modules.lucene.Query [7:85]     14152     21.137
>
> I've checked, and the huge number of function calls equals the number of
> <p> nodes in my data, thus suggesting that ft:query() is called for each
> each <p> element.

It definitely sounds like an optimiser issue. However, if I run collection(…)//tei:p[ft:query(., "string")] as a standalone expression on a huge collection, it behaves as expected. I also wrapped it into util:eval, but ft:query is still only called once, at least on develop.

It is certainly possible that something in your concrete setup triggers the optimiser to switch into single-node execution mode. I would need to see how the actual expression is being called?

Wolfgang
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: dramatical performance drop for ft:query() in eXist-2.2

Chris Kemp
Ron and Wolfgang,

Since I experienced this issue a couple of weeks ago while migrating several collections from 1.4 to 2.2 stable on my production server (RHEL 6.6, Java 1.8.0_31) I wanted to toss my information into the mix. I saw a substantial runtime increase in a couple of queries: they ran in roughly half a second on 1.4, and they increased to 96 seconds in 2.2. Using the query profiler I tracked down that ft:query was the offending function in 2.2.

One query in question: collection('/db/apps/app/data')//tei:sp[ft:query(., 'query')

This ran perfectly well in 2.2 on my local machine, with speed comparable to my production server’s 1.4 instance. The queries also worked correctly on an AWS installation of eXist 2.2. Like Ron,

The index configuration files were the same from my production server to my local machine, and all of the data was the same. Monitoring and Profiling on 2.2 indicated that the correct indexes were present. For whatever reason, the queries on 2.2 were running over each node (tei:sp in this case) individually. There are 5853 sp nodes in this dataset, and I noted that ft:query was called 5853 times.

After consulting with several folks (Claudius Teodorescu and Dan McCreary, primarily) and re-reading eXist’s Lucene documentation, I decided to try changing the ft:query call to ft:query-field. Claudius also suggested that more concretely setting the context for ft:query would work in this case. So two variations worked in terms of a "fix":

collection('/db/apps/app/data')//tei:sp[ft:query-field(., 'query')
and
collection('/db/apps/app/data')//tei:sp[ft:query(tei:sp, 'query')

Both return the expected results, and in this situation I'm ok with specifying tei:sp as the context. For other collections I'm in a similar situation as Ron - queries are constructed dynamically. However, I must say that I'm not experiencing these kinds of slow-downs with other collections with dynamically constructed queries.

Again - I only experienced this on my production server, not on my local machine or AWS.

Chris
Reply | Threaded
Open this post in threaded view
|

Re: dramatical performance drop for ft:query() in eXist-2.2

ron.vandenbranden
In reply to this post by wolfgangmm
Hi Wolfgang,

On 11/03/2015 12:19, [hidden email] wrote:
> It definitely sounds like an optimiser issue. However, if I run collection(…)//tei:p[ft:query(., "string")] as a standalone expression on a huge collection, it behaves as expected. I also wrapped it into util:eval, but ft:query is still only called once, at least on develop.
>
> It is certainly possible that something in your concrete setup triggers the optimiser to switch into single-node execution mode. I would need to see how the actual expression is being called?
>

Indeed, I seem to have over-simplified my query a bit. Apparently,
parentheses seem to play a role. In my original query, I was selecting
//(back//note|text) as context nodes for the predicate with the
ft:query() function. This can be easily reproduced in a standalone
expression by just wrapping the context node in parentheses. When I run:

   collection(…)//tei:p[ft:query(., "string")]

on my test corpus, this is as fast as ft:query(collection(...)//tei:p,
"string"), and I can't find any trace of ft:query() in the Functions tab
in Monex.

Yet, when the context node is wrapped in parentheses as in:

   collection(…)//(tei:p)[ft:query(., "string")]

...performance is much lower, and I see the ft:query() function listed
in the Functions tab in Monex, with as many calls as there are <p>
elements in my collection.

This seems to suggest that parentheses are triggering single-node
execution mode, IMO.

Best,

Ron

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: dramatical performance drop for ft:query() in eXist-2.2

ron.vandenbranden
Should I file this as a bug?

Best,

Ron

On 11/03/2015 14:40, [hidden email] wrote:

> Hi Wolfgang,
>
> On 11/03/2015 12:19, [hidden email] wrote:
>> It definitely sounds like an optimiser issue. However, if I run
>> collection(…)//tei:p[ft:query(., "string")] as a standalone
>> expression on a huge collection, it behaves as expected. I also
>> wrapped it into util:eval, but ft:query is still only called once, at
>> least on develop.
>>
>> It is certainly possible that something in your concrete setup
>> triggers the optimiser to switch into single-node execution mode. I
>> would need to see how the actual expression is being called?
>>
>
> Indeed, I seem to have over-simplified my query a bit. Apparently,
> parentheses seem to play a role. In my original query, I was selecting
> //(back//note|text) as context nodes for the predicate with the
> ft:query() function. This can be easily reproduced in a standalone
> expression by just wrapping the context node in parentheses. When I run:
>
>   collection(…)//tei:p[ft:query(., "string")]
>
> on my test corpus, this is as fast as ft:query(collection(...)//tei:p,
> "string"), and I can't find any trace of ft:query() in the Functions
> tab in Monex.
>
> Yet, when the context node is wrapped in parentheses as in:
>
>   collection(…)//(tei:p)[ft:query(., "string")]
>
> ...performance is much lower, and I see the ft:query() function listed
> in the Functions tab in Monex, with as many calls as there are <p>
> elements in my collection.
>
> This seems to suggest that parentheses are triggering single-node
> execution mode, IMO.
>
> Best,
>
> Ron
>


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: dramatical performance drop for ft:query() in eXist-2.2

wolfgangmm
> Should I file this as a bug?

I’m looking into it. You may file a bug, but I have it on my radar anyway. It is clearly a regression in 2.2.

Wolfgang
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: dramatical performance drop for ft:query() in eXist-2.2

ron.vandenbranden
Ok, thanks. While investigating whether other functions were affected as
well, I found unexpected results for range:contains(), too:

     (: 236 results :)
     collection('/db/apps/demo/data')//SPEAKER[range:contains(., 't')]
     (: 0 results :)
     collection('/db/apps/demo/data')//(SPEAKER)[range:contains(., 't')]

Since both might be related, I've included this in the bug report at
https://github.com/eXist-db/exist/issues/522.

Best,

Ron


On 16/03/2015 16:00, Wolfgang Meier wrote:
>> Should I file this as a bug?
> I’m looking into it. You may file a bug, but I have it on my radar anyway. It is clearly a regression in 2.2.
>
> Wolfgang


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open