optimizer regression in eXist-2.2 and eXist-3.0.RC1

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

optimizer regression in eXist-2.2 and eXist-3.0.RC1

ron.vandenbranden
Hi,

While I was trying to move an app from eXist-2.1 to a newer version, I've noticed a significant performance drop in eXist-2.2 and eXist-3.0.RC1. I've been able to isolate the issue: there seems to be a problem when nodes are first stored in variables, and those variables are queried later on. I'll illustrate with simplified sample queries below. The results are consistent in eXist-2.2 and eXist-3.0.RC1.

This query (which can be run directly on the 'Demo Apps' collection) returns 419 results in under 0.020 sec. (according to eXide):
  let $a := //SPEECH[ft:query(., 'hamlet')]
  return $a
However, if the <SPEECH> nodes are first stored in a variable, which is subsequently queried, the same results are returned, but the query takes more than 2 sec.:
  let $a := //SPEECH
  return $a[ft:query(., 'hamlet')]
This seems to suggest that the optimizer does not optimize the latter query as well as the first one. Of course, these are small queries on small data sets, but real-life queries can slow down to a couple of minutes. Of course, there is a solution (express all queries as one-liners), but my app requires this level of indirection in order to compose complex queries.

Other possible relevant observations:

[1] Only the Lucene full-text and new range index seem to be affected:
  let $a := //SPEAKER[matches(., 'hamlet', 'i')]
  return $a
and
  let $a := //SPEAKER
  return $a[matches(., 'hamlet', 'i')]
both perform fast (359 results in 0.029 sec.)

[2] Additionally, the following results seem to suggest that the optimizer does not use the index for the 'indirect' versions, or at least not in the same way as for the 'direct' versions:

* Range index: the 'direct' version of following query returns 359 results in 0.018 sec.:
  let $a := //SPEAKER[range:matches(., 'HAMLET')]
  return $a
The 'indirect' version doesn't return anything at all:
  let $a := //SPEAKER
  return $a[range:matches(., 'HAMLET')]
This seems to suggest that the 'indirect' version does not query the actual index.

* Lucene full-text index: the 'direct' version of following query returns 1008 results in 0.037 sec.:
  let $a := //SPEECH[ft:query(., 'the')]
  return $a
The 'indirect' version returns the same results in 1.731 sec.:
  let $a := //SPEECH
  return $a[ft:query(., 'the')]
Yet, the eXist-2.2 version at http://demo.exist-db.org/exist/apps/eXide does not return any results for the 'direct' version. I don't know if this is due to the specific revision number of that eXist version (system:get-revision() doesn't return anything), but it doesn't seem to be consistent with the index definition in /db/system/config/db/apps/demo/data/collection.xconf. There, the stopwords list is empty, and indeed, the word 'the' is retrieved when the index is queried with util:index-keys(). I don't know if this a particular quirk of the eXist-2.2 build and/or configuration running online at http://demo.exist-db.org, but the incongruity between the results from the 'direct' and 'indirect' queries for the lucene full-text index there might suggest that the optimizer uses the index differently in both 'direct' and 'indirect' query versions.

Best (wishes!),

Ron

------------------------------------------------------------------------------

_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: optimizer regression in eXist-2.2 and eXist-3.0.RC1

Jens Østergaard Petersen-2
Hi Ron,

I can corroborate what you report. Indirection is surely the way to go, so this is a considerable problem. Isn’t the case clear enough for raising a couple of issues?

Best,

Jens

On 4 January 2016 at 22:18:11, [hidden email] ([hidden email]) wrote:

Hi,

While I was trying to move an app from eXist-2.1 to a newer version, I've noticed a significant performance drop in eXist-2.2 and eXist-3.0.RC1. I've been able to isolate the issue: there seems to be a problem when nodes are first stored in variables, and those variables are queried later on. I'll illustrate with simplified sample queries below. The results are consistent in eXist-2.2 and eXist-3.0.RC1.

This query (which can be run directly on the 'Demo Apps' collection) returns 419 results in under 0.020 sec. (according to eXide):
  let $a := //SPEECH[ft:query(., 'hamlet')]
  return $a
However, if the <SPEECH> nodes are first stored in a variable, which is subsequently queried, the same results are returned, but the query takes more than 2 sec.:
  let $a := //SPEECH
  return $a[ft:query(., 'hamlet')]
This seems to suggest that the optimizer does not optimize the latter query as well as the first one. Of course, these are small queries on small data sets, but real-life queries can slow down to a couple of minutes. Of course, there is a solution (express all queries as one-liners), but my app requires this level of indirection in order to compose complex queries.

Other possible relevant observations:

[1] Only the Lucene full-text and new range index seem to be affected:
  let $a := //SPEAKER[matches(., 'hamlet', 'i')]
  return $a
and
  let $a := //SPEAKER
  return $a[matches(., 'hamlet', 'i')]
both perform fast (359 results in 0.029 sec.)

[2] Additionally, the following results seem to suggest that the optimizer does not use the index for the 'indirect' versions, or at least not in the same way as for the 'direct' versions:

* Range index: the 'direct' version of following query returns 359 results in 0.018 sec.:
  let $a := //SPEAKER[range:matches(., 'HAMLET')]
  return $a
The 'indirect' version doesn't return anything at all:
  let $a := //SPEAKER
  return $a[range:matches(., 'HAMLET')]
This seems to suggest that the 'indirect' version does not query the actual index.

* Lucene full-text index: the 'direct' version of following query returns 1008 results in 0.037 sec.:
  let $a := //SPEECH[ft:query(., 'the')]
  return $a
The 'indirect' version returns the same results in 1.731 sec.:
  let $a := //SPEECH
  return $a[ft:query(., 'the')]
Yet, the eXist-2.2 version at http://demo.exist-db.org/exist/apps/eXide does not return any results for the 'direct' version. I don't know if this is due to the specific revision number of that eXist version (system:get-revision() doesn't return anything), but it doesn't seem to be consistent with the index definition in /db/system/config/db/apps/demo/data/collection.xconf. There, the stopwords list is empty, and indeed, the word 'the' is retrieved when the index is queried with util:index-keys(). I don't know if this a particular quirk of the eXist-2.2 build and/or configuration running online at http://demo.exist-db.org, but the incongruity between the results from the 'direct' and 'indirect' queries for the lucene full-text index there might suggest that the optimizer uses the index differently in both 'direct' and 'indirect' query versions.

Best (wishes!),

Ron
------------------------------------------------------------------------------
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open

------------------------------------------------------------------------------

_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: optimizer regression in eXist-2.2 and eXist-3.0.RC1

ron.vandenbranden
Hi Jens,

Thanks for your thoughts. I've tried to summarize the problem in https://github.com/eXist-db/exist/issues/873.

Best,

Ron

On 5/01/2016 17:13, Jens Østergaard Petersen wrote:
Hi Ron,

I can corroborate what you report. Indirection is surely the way to go, so this is a considerable problem. Isn’t the case clear enough for raising a couple of issues?

Best,

Jens



------------------------------------------------------------------------------

_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: optimizer regression in eXist-2.2 and eXist-3.0.RC1

ron.vandenbranden
I've just checked with the latest commit in the develop branch (d9ecd338b18eb1e6298881b5c2768c06b690a1ac), and the problem is still there.

Apart from my personal interest in this issue (I hope I'm not being too impatient) I wonder if this isn't an important issue in the light of a new release, since indirection seems quite a common coding pattern. For me, at least, this would be a major show-stopper for upgrading to eXist-3.0. Again, if there is anything I can do more besides reporting and testing (and fixing the issue myself, which I can't, unfortunately), I'd gladly do so. I hope the issue I reported at https://github.com/eXist-db/exist/issues/873 is clear and unambiguous; otherwise I would love to investigate how it can be improved.

Best,

Ron

On 5/01/2016 20:25, [hidden email] wrote:
Hi Jens,

Thanks for your thoughts. I've tried to summarize the problem in https://github.com/eXist-db/exist/issues/873.

Best,

Ron

On 5/01/2016 17:13, Jens Østergaard Petersen wrote:
Hi Ron,

I can corroborate what you report. Indirection is surely the way to go, so this is a considerable problem. Isn’t the case clear enough for raising a couple of issues?

Best,

Jens




------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open