Strange util:eval performance (nested predicate vs. FLWOR)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange util:eval performance (nested predicate vs. FLWOR)

Andreas Wagner
Dear all,

I have a strange performance finding, a stark discrepancy between
util:eval($string) called "directly" and "several levels deep", where,
additionally, there is a stark discrepancy between $string using a
nested predicate and a flwor expression...

I have a 'W0013_nodeIndex.xml' document with meta information about
nodes in a TEI/XML file. Oxygen's file properties dialogue, connected
over xmlrpc says this meta/index file is 23.248.896 bytes and it
contains roughly 32.000 child nodes below a sal:index root element:

<sal:index xmlns:sal="http://salamanca.adwmainz.de" work="W0013">
    <sal:node type="text" subtype="work_multivolume" n="completeWork" xinc="W0013_Vol01.xml W0013_Vol02.xml">
        <sal:title>Non-titleable node (text completeWork)</sal:title>
        <sal:fragment>0001_titleVol01</sal:fragment>
        <sal:citableParent/>
        <sal:crumbtrail>
            <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#completeWork">Non-titleable node (text completeWork)</a>
        </sal:crumbtrail>
        <sal:citetrail/>
    </sal:node>
    <sal:node type="text" subtype="work_volume" n="Vol01">
        <sal:title>Vol. 1</sal:title>
        <sal:fragment>0001_titleVol01</sal:fragment>
        <sal:citableParent>completeWork</sal:citableParent>
        <sal:crumbtrail>
            <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#Vol01">Vol. 1</a>
        </sal:crumbtrail>
        <sal:citetrail>vol1</sal:citetrail>
    </sal:node>
    <sal:node type="div" subtype="contents" n="Vol01Contents">
        <sal:title>Contents</sal:title>
        <sal:fragment>0002_Vol01Contents</sal:fragment>
        <sal:citableParent>Vol01</sal:citableParent>
        <sal:crumbtrail>
            <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#Vol01">Vol. 1</a> » <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#frontmatter_V1">Non-titleable node (front frontmatter_V1)</a> » <a href="work.html?wid=W0013&amp;frag=0002_Vol01Contents#Vol01Contents">Contents</a>
        </sal:crumbtrail>
        <sal:citetrail>vol1.frontmatter.1</sal:citetrail>
    </sal:node>
...
</sal:index>

I also have a range index defined this way:
<range>
    <create qname="sal:node">
       <field name="sal.node.n" match="@n" type="xs:string"/>
       <field name="sal.node.type" match="@type" type="xs:string"/>
       <field name="sal.node.subtype" match="@subtype" type="xs:string"/>
    </create>
    ...
</range>


For a given node, I want to find the "citetrail" of the node that is the
"citableParent" of the current node:
  //sal:node[@n eq $currentNode/sal:citableParent/text()]/sal:citetrail/text()
or something like that, I think.


I have the following xql in eXide:

  xquery version "3.0";
  declare namespace util = "http://exist-db.org/xquery/util";
  declare namespace sal  = "http://salamanca.adwmainz.de";

  let $currentResource :=  doc('/db/apps/salamanca/data/W0013_nodeIndex.xml')//sal:index

  let $objectNodes :=
    util:eval("$currentResource//sal:node[@n eq $currentResource//sal:node[@type='front'][1]/sal:citableParent/text()]/sal:citetrail/text()")

  let $objectNodes2 :=
    util:eval("$currentResource/(for $t in $currentResource//sal:node[@type eq 'front'][1]/sal:citableParent/text() return $currentResource//sal:node[@n eq $t]/sal:citetrail/text())")

  return $objectNodes


When evaluated (changing between return $objectNodes and return
$objectNodes2), both expressions yield identical and correct results,
one string 'vol1', and they do so in both cases in roughly 0.07
seconds.



Now comes the problem: In the actual application, my util:eval is called
in a function called by a function called by a function ..., but
actually it is evaluating the very same strings, with

  $currentResource := doc('/db/apps/salamanca/data/W0013_nodeIndex.xml')//sal:index

(defined in several steps somewhere in "upstream" functions).

And here the second form takes 11 seconds, whereas the first form, with
nested predicates, takes literally *hours*. I am aware that in both
forms the actual filtering is done by predicates and not by a where
clause, but I would not have expected the flwor expression to be (so
much) faster, especially given that they perform about the same in the
"direct" eXide query.


The behaviour is approximately the same in eXist 2.2 release and in
3.0.RC2/Build 20160520 from one of those nightlies provided by Adam
Retter.


Has anybody any idea about where the performance of the nested predicate
could be harmed (but not so much that of a flwor expression)?

Thank you so much for any insights,

Andreas



--
Dr. Andreas Wagner                          twitter: @anwagnerdreas
Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
and Institute of Philosophy                 fax: +49 (0)69/798-32794
Goethe University Frankfurt

IGF HP 25 / R 2.455
Norbert-Wollheim-Platz 1
60629 Frankfurt am Main

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

xquery function execute way?

lin_xd
Hi.all,
  I execute xquery on exide: 

let $d:=current-dateTime()
let $n:=count(collection('/db/ehr/v2')//HDSN00.01.001[@value le "20101230"])
 return  <x>{concat($d,'==',$n,'==',current-time())}</x>

but the time after and before execute the long run query is same,   why?  I don't catch the xquery execute way(method) ?




--
此致

   林晓东

莫愁前路无知己,天下谁人不识君。


At 2016-07-04 19:02:12, "Andreas Wagner" <[hidden email]> wrote: >Dear all, > >I have a strange performance finding, a stark discrepancy between >util:eval($string) called "directly" and "several levels deep", where, >additionally, there is a stark discrepancy between $string using a >nested predicate and a flwor expression... > >I have a 'W0013_nodeIndex.xml' document with meta information about >nodes in a TEI/XML file. Oxygen's file properties dialogue, connected >over xmlrpc says this meta/index file is 23.248.896 bytes and it >contains roughly 32.000 child nodes below a sal:index root element: > ><sal:index xmlns:sal="http://salamanca.adwmainz.de" work="W0013"> > <sal:node type="text" subtype="work_multivolume" n="completeWork" xinc="W0013_Vol01.xml W0013_Vol02.xml"> > <sal:title>Non-titleable node (text completeWork)</sal:title> > <sal:fragment>0001_titleVol01</sal:fragment> > <sal:citableParent/> > <sal:crumbtrail> > <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#completeWork">Non-titleable node (text completeWork)</a> > </sal:crumbtrail> > <sal:citetrail/> > </sal:node> > <sal:node type="text" subtype="work_volume" n="Vol01"> > <sal:title>Vol. 1</sal:title> > <sal:fragment>0001_titleVol01</sal:fragment> > <sal:citableParent>completeWork</sal:citableParent> > <sal:crumbtrail> > <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#Vol01">Vol. 1</a> > </sal:crumbtrail> > <sal:citetrail>vol1</sal:citetrail> > </sal:node> > <sal:node type="div" subtype="contents" n="Vol01Contents"> > <sal:title>Contents</sal:title> > <sal:fragment>0002_Vol01Contents</sal:fragment> > <sal:citableParent>Vol01</sal:citableParent> > <sal:crumbtrail> > <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#Vol01">Vol. 1</a> » <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#frontmatter_V1">Non-titleable node (front frontmatter_V1)</a> » <a href="work.html?wid=W0013&amp;frag=0002_Vol01Contents#Vol01Contents">Contents</a> > </sal:crumbtrail> > <sal:citetrail>vol1.frontmatter.1</sal:citetrail> > </sal:node> >... ></sal:index> > >I also have a range index defined this way: ><range> > <create qname="sal:node"> > <field name="sal.node.n" match="@n" type="xs:string"/> > <field name="sal.node.type" match="@type" type="xs:string"/> > <field name="sal.node.subtype" match="@subtype" type="xs:string"/> > </create> > ... ></range> > > >For a given node, I want to find the "citetrail" of the node that is the >"citableParent" of the current node: > //sal:node[@n eq $currentNode/sal:citableParent/text()]/sal:citetrail/text() >or something like that, I think. > > >I have the following xql in eXide: > > xquery version "3.0"; > declare namespace util = "http://exist-db.org/xquery/util"; > declare namespace sal = "http://salamanca.adwmainz.de"; > > let $currentResource := doc('/db/apps/salamanca/data/W0013_nodeIndex.xml')//sal:index > > let $objectNodes := > util:eval("$currentResource//sal:node[@n eq $currentResource//sal:node[@type='front'][1]/sal:citableParent/text()]/sal:citetrail/text()") > > let $objectNodes2 := > util:eval("$currentResource/(for $t in $currentResource//sal:node[@type eq 'front'][1]/sal:citableParent/text() return $currentResource//sal:node[@n eq $t]/sal:citetrail/text())") > > return $objectNodes > > >When evaluated (changing between return $objectNodes and return >$objectNodes2), both expressions yield identical and correct results, >one string 'vol1', and they do so in both cases in roughly 0.07 >seconds. > > > >Now comes the problem: In the actual application, my util:eval is called >in a function called by a function called by a function ..., but >actually it is evaluating the very same strings, with > > $currentResource := doc('/db/apps/salamanca/data/W0013_nodeIndex.xml')//sal:index > >(defined in several steps somewhere in "upstream" functions). > >And here the second form takes 11 seconds, whereas the first form, with >nested predicates, takes literally *hours*. I am aware that in both >forms the actual filtering is done by predicates and not by a where >clause, but I would not have expected the flwor expression to be (so >much) faster, especially given that they perform about the same in the >"direct" eXide query. > > >The behaviour is approximately the same in eXist 2.2 release and in >3.0.RC2/Build 20160520 from one of those nightlies provided by Adam >Retter. > > >Has anybody any idea about where the performance of the nested predicate >could be harmed (but not so much that of a flwor expression)? > >Thank you so much for any insights, > >Andreas > > > >-- >Dr. Andreas Wagner twitter: @anwagnerdreas >Project "The School of Salamanca" web: http://salamanca.adwmainz.de >Academy of Sciences and Literature, Mainz fon: +49 (0)69/798-32774 >and Institute of Philosophy fax: +49 (0)69/798-32794 >Goethe University Frankfurt > >IGF HP 25 / R 2.455 >Norbert-Wollheim-Platz 1 >60629 Frankfurt am Main > >------------------------------------------------------------------------------ >Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San >Francisco, CA to explore cutting-edge tech and listen to tech luminaries >present their vision of the future. This family event has something for >everyone, including kids. Get more information and register today. >http://sdm.link/attshape >_______________________________________________ >Exist-open mailing list >[hidden email] >https://lists.sourceforge.net/lists/listinfo/exist-open


 


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: xquery function execute way?

Peter Stadler
Hi,

try util:system-time():
"Returns the current xs:time (with timezone) as reported by the Java method System.currentTimeMillis(). Contrary to fn:current-time, this function is not stable, i.e. the returned xs:time will change during the evaluation time of a query and can be used to measure time differences.“ (from the function documentation)

Best
Peter

> Am 05.07.2016 um 12:39 schrieb 林晓东 <[hidden email]>:
>
> Hi.all,
>   I execute xquery on exide:
>
> let $d:=current-dateTime()
> let $n:=count(collection('/db/ehr/v2')//HDSN00.01.001[@value le "20101230"])
>  return  <x>{concat($d,'==',$n,'==',current-time())}</x>
>
> but the time after and before execute the long run query is same,   why?  I don't catch the xquery execute way(method) ?
>
>
>
>
> --
> 此致
>
>    林晓东
>
> 莫愁前路无知己,天下谁人不识君。
>
>
> At 2016-07-04 19:02:12, "Andreas Wagner" <[hidden email]> wrote:
> >Dear all,
> >
> >I have a strange performance finding, a stark discrepancy between
> >util:eval($string) called "directly" and "several levels deep", where,
> >additionally, there is a stark discrepancy between $string using a
> >nested predicate and a flwor expression...
> >
> >I have a 'W0013_nodeIndex.xml' document with meta information about
> >nodes in a TEI/XML file. Oxygen's file properties dialogue, connected
> >over xmlrpc says this meta/index file is 23.248.896 bytes and it
> >contains roughly 32.000 child nodes below a sal:index root element:
> >
> ><sal:index xmlns:sal="http://salamanca.adwmainz.de" work="W0013">
> >    <sal:node type="text" subtype="work_multivolume" n="completeWork" xinc="W0013_Vol01.xml W0013_Vol02.xml">
> >        <sal:title>Non-titleable node (text completeWork)</sal:title>
> >        <sal:fragment>0001_titleVol01</sal:fragment>
> >        <sal:citableParent/>
> >        <sal:crumbtrail>
> >            <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#completeWork">Non-titleable node (text completeWork)</a>
> >        </sal:crumbtrail>
> >        <sal:citetrail/>
> >    </sal:node>
> >    <sal:node type="text" subtype="work_volume" n="Vol01">
> >        <sal:title>Vol. 1</sal:title>
> >        <sal:fragment>0001_titleVol01</sal:fragment>
> >        <sal:citableParent>completeWork</sal:citableParent>
> >        <sal:crumbtrail>
> >            <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#Vol01">Vol. 1</a>
> >        </sal:crumbtrail>
> >        <sal:citetrail>vol1</sal:citetrail>
> >    </sal:node>
> >    <sal:node type="div" subtype="contents" n="Vol01Contents">
> >        <sal:title>Contents</sal:title>
> >        <sal:fragment>0002_Vol01Contents</sal:fragment>
> >        <sal:citableParent>Vol01</sal:citableParent>
> >        <sal:crumbtrail>
> >            <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#Vol01">Vol. 1</a> » <a href="work.html?wid=W0013&amp;frag=0001_titleVol01#frontmatter_V1">Non-titleable node (front frontmatter_V1)</a> » <a href="work.html?wid=W0013&amp;frag=0002_Vol01Contents#Vol01Contents">Contents</a>
> >        </sal:crumbtrail>
> >        <sal:citetrail>vol1.frontmatter.1</sal:citetrail>
> >    </sal:node>
> >...
> ></sal:index>
> >
> >I also have a range index defined this way:
> ><range>
> >    <create qname="sal:node">
> >       <field name="sal.node.n" match="@n" type="xs:string"/>
> >       <field name="sal.node.type" match="@type" type="xs:string"/>
> >       <field name="sal.node.subtype" match="@subtype" type="xs:string"/>
> >    </create>
> >    ...
> ></range>
> >
> >
> >For a given node, I want to find the "citetrail" of the node that is the
> >"citableParent" of the current node:
> >  //sal:node[@n eq $currentNode/sal:citableParent/text()]/sal:citetrail/text()
> >or something like that, I think.
> >
> >
> >I have the following xql in eXide:
> >
> >  xquery version "3.0";
> >  declare namespace util = "http://exist-db.org/xquery/util";
> >  declare namespace sal  = "http://salamanca.adwmainz.de";
> >
> >  let $currentResource :=  doc('/db/apps/salamanca/data/W0013_nodeIndex.xml')//sal:index
> >
> >  let $objectNodes :=
> >    util:eval("$currentResource//sal:node[@n eq $currentResource//sal:node[@type='front'][1]/sal:citableParent/text()]/sal:citetrail/text()")
> >
> >  let $objectNodes2 :=
> >    util:eval("$currentResource/(for $t in $currentResource//sal:node[@type eq 'front'][1]/sal:citableParent/text() return $currentResource//sal:node[@n eq $t]/sal:citetrail/text())")
> >
> >  return $objectNodes
> >
> >
> >When evaluated (changing between return $objectNodes and return
> >$objectNodes2), both expressions yield identical and correct results,
> >one string 'vol1', and they do so in both cases in roughly 0.07
> >seconds.
> >
> >
> >
> >Now comes the problem: In the actual application, my util:eval is called
> >in a function called by a function called by a function ..., but
> >actually it is evaluating the very same strings, with
> >
> >  $currentResource := doc('/db/apps/salamanca/data/W0013_nodeIndex.xml')//sal:index
> >
> >(defined in several steps somewhere in "upstream" functions).
> >
> >And here the second form takes 11 seconds, whereas the first form, with
> >nested predicates, takes literally *hours*. I am aware that in both
> >forms the actual filtering is done by predicates and not by a where
> >clause, but I would not have expected the flwor expression to be (so
> >much) faster, especially given that they perform about the same in the
> >"direct" eXide query.
> >
> >
> >The behaviour is approximately the same in eXist 2.2 release and in
> >3.0.RC2/Build 20160520 from one of those nightlies provided by Adam
> >Retter.
> >
> >
> >Has anybody any idea about where the performance of the nested predicate
> >could be harmed (but not so much that of a flwor expression)?
> >
> >Thank you so much for any insights,
> >
> >Andreas
> >
> >
> >
> >--
> >Dr. Andreas Wagner                          twitter: @anwagnerdreas
> >Project "The School of Salamanca"           web: http://salamanca.adwmainz.de
> >Academy of Sciences and Literature, Mainz   fon: +49 (0)69/798-32774
> >and Institute of Philosophy                 fax: +49 (0)69/798-32794
> >Goethe University Frankfurt
> >
> >IGF HP 25 / R 2.455
> >Norbert-Wollheim-Platz 1
> >60629 Frankfurt am Main
> >
> >------------------------------------------------------------------------------
> >Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
> >Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> >present their vision of the future. This family event has something for
> >everyone, including kids. Get more information and register today.
> >http://sdm.link/attshape
> >_______________________________________________
> >Exist-open mailing list
> >[hidden email]
> >https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>
>
> ------------------------------------------------------------------------------
> Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape_______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
--
Peter Stadler
Carl-Maria-von-Weber-Gesamtausgabe
Arbeitsstelle Detmold
Hornsche Str. 39
D-32756 Detmold
Tel. +49 5231 975-676
Fax: +49 5231 975-668
stadler at weber-gesamtausgabe.de
www.weber-gesamtausgabe.de


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open

signature.asc (465 bytes) Download Attachment