Improve Performance for XQuery

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Improve Performance for XQuery

Eichstädt, Konrad

Hi All,

 

I facing a problem executing a XQuery which needs too much time. The Query is this and based on monitoring a know that the query use the index full.

 

declare variable $allColl := (collection('/db/kpe/ead'),collection('/db/kpe/ead_virt'));

 

declare option output:method "json";

declare option output:media-type "application/json";

 

<kalliope>

{

for $element in $allColl//ead:*[@source='GND']

group by $elementgrp := $element/@authfilenumber

return <gndids>

    {data($elementgrp)}

    </gndids>

}

</kalliope>

 

We are using an range index which looks like this

<collection xmlns="http://exist-db.org/collection-config/1.0">

<index xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ead="urn:isbn:1-931666-22-9">

<fulltext default="none" attributes="false"/>

<create qname="ead:unitid" type="xs:string"/>

<range>

<create qname="ead:date" type="xs:string"/>

<create qname="@id" type="xs:string"/>

<create qname="@authfilenumber" type="xs:string"/>

<create qname="@role" type="xs:string"/>

<create qname="@audience" type="xs:string"/>

<create qname="@source" type="xs:string"/>

<create qname="@encodinganalog" type="xs:string"/>

<create qname="persname">

<field name="persname-encodinganalog" match="@encodinganalog" type="xs:string"/>

<field name="persname-role" match="@role" type="xs:string"/>

</create>

<create qname="@source:@authfilenumber">

<field name="name-source" match="@source" type="xs:string"/>

<field name="name-authfilenumber" match="@authfilenumber" type="xs:string"/>

</create>

</range>

</index>

</collection>

 

Could someone please help to check if the XQuery can be optimized? The Query runs at the moment around 20 Minutes for 40.000 XML Documents.

 

Many Thanks …

 

Best regards

 

Konrad

 

Konrad Eichstädt

Staatsbibliothek zu Berlin

Preußischer Kulturbesitz

Potsdamer Str. 33
10785 Berlin

Informations- und Datenmanagement

IDM II elektronische Ressourcen

Tel: +49 030 266 432222

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Improve Performance for XQuery

Michael Westbay-2
Hi Eichstädt,

One thing I had noticed in the past is that just saving a collection root to a variable was costly. I don't know if it still is, but I would try to do away with the $allColl declared variable and do it this way:

<kalliope>
{
let $elements := (collection('/db/kpe/aed')//ead:*[@source eq 'GND'],collection('/db/kpe/ead_virt')/ead:*[@source eq 'GND'])
for $element in $elements
group by $elementgrp := $element/@authfilenumber
return <gndids>
    {data($elementgrp)}
    </gndids>
}
</kalliope>

Next, I would declare the @source attribute as a qname index under /db/system/config/db/kpe/collection.xconf. I don't believe that attributes get indexed when using the default fulltext index only.

Finally, I would also make the @authfilenumber a qname index as well, just to be sure. Time it once without it, then again with it indexed.

Hope this helps.


2016-09-22 18:08 GMT+09:00 Eichstädt, Konrad <[hidden email]>:

Hi All,

 

I facing a problem executing a XQuery which needs too much time. The Query is this and based on monitoring a know that the query use the index full.

 

declare variable $allColl := (collection('/db/kpe/ead'),collection('/db/kpe/ead_virt'));

 

declare option output:method "json";

declare option output:media-type "application/json";

 

<kalliope>

{

for $element in $allColl//ead:*[@source='GND']

group by $elementgrp := $element/@authfilenumber

return <gndids>

    {data($elementgrp)}

    </gndids>

}

</kalliope>

 

We are using an range index which looks like this

<collection xmlns="http://exist-db.org/collection-config/1.0">

<index xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ead="urn:isbn:1-931666-22-9">

<fulltext default="none" attributes="false"/>

<create qname="ead:unitid" type="xs:string"/>

<range>

<create qname="ead:date" type="xs:string"/>

<create qname="@id" type="xs:string"/>

<create qname="@authfilenumber" type="xs:string"/>

<create qname="@role" type="xs:string"/>

<create qname="@audience" type="xs:string"/>

<create qname="@source" type="xs:string"/>

<create qname="@encodinganalog" type="xs:string"/>

<create qname="persname">

<field name="persname-encodinganalog" match="@encodinganalog" type="xs:string"/>

<field name="persname-role" match="@role" type="xs:string"/>

</create>

<create qname="@source:@authfilenumber">

<field name="name-source" match="@source" type="xs:string"/>

<field name="name-authfilenumber" match="@authfilenumber" type="xs:string"/>

</create>

</range>

</index>

</collection>

 

Could someone please help to check if the XQuery can be optimized? The Query runs at the moment around 20 Minutes for 40.000 XML Documents.

 

Many Thanks …

 

Best regards

 

Konrad

 

Konrad Eichstädt

Staatsbibliothek zu Berlin

Preußischer Kulturbesitz

Potsdamer Str. 33
10785 Berlin

Informations- und Datenmanagement

IDM II elektronische Ressourcen

Tel: <a href="tel:%2B49%20030%20266%20432222" value="+4930266432222" target="_blank">+49 030 266 432222

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




--
Michael Westbay
Writer/System Administrator
http://www.japanesebaseball.com/

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Improve Performance for XQuery

Peter Stadler
Another thing to debug could be your use of the wildcard in $allColl//ead:*[@source='GND‘] which caused trouble in the past (I don’t know if that’s still the case, though)
You could try to replace with $allColl//@source[.='GND‘]/parent::ead:*.
If it’s a known set of elements I usually use the union of those elements, e.g.
$allColl//ead:A[@source='GND‘]  | $allColl//ead:B[@source='GND‘] | $allColl//ead:C[@source='GND‘]

I’d also create the initial $allColl with a union of nodes (rather than a sequence) – that might be cosmetics but you could give a try …

Best
Peter


> Am 02.10.2016 um 17:10 schrieb Michael Westbay <[hidden email]>:
>
> Hi Eichstädt,
>
> One thing I had noticed in the past is that just saving a collection root to a variable was costly. I don't know if it still is, but I would try to do away with the $allColl declared variable and do it this way:
>
> <kalliope>
> {
> let $elements := (collection('/db/kpe/aed')//ead:*[@source eq 'GND'],collection('/db/kpe/ead_virt')/ead:*[@source eq 'GND'])
> for $element in $elements
> group by $elementgrp := $element/@authfilenumber
> return <gndids>
>     {data($elementgrp)}
>     </gndids>
> }
> </kalliope>
>
> Next, I would declare the @source attribute as a qname index under /db/system/config/db/kpe/collection.xconf. I don't believe that attributes get indexed when using the default fulltext index only.
>
> Finally, I would also make the @authfilenumber a qname index as well, just to be sure. Time it once without it, then again with it indexed.
>
> Hope this helps.
>
>
> 2016-09-22 18:08 GMT+09:00 Eichstädt, Konrad <[hidden email]>:
> Hi All,
>
>
>
> I facing a problem executing a XQuery which needs too much time. The Query is this and based on monitoring a know that the query use the index full.
>
>
>
> declare variable $allColl := (collection('/db/kpe/ead'),collection('/db/kpe/ead_virt'));
>
>
>
> declare option output:method "json";
>
> declare option output:media-type "application/json";
>
>
>
> <kalliope>
>
> {
>
> for $element in $allColl//ead:*[@source='GND']
>
> group by $elementgrp := $element/@authfilenumber
>
> return <gndids>
>
>     {data($elementgrp)}
>
>     </gndids>
>
> }
>
> </kalliope>
>
>
>
> We are using an range index which looks like this
>
> <collection xmlns="http://exist-db.org/collection-config/1.0">
>
> <index xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ead="urn:isbn:1-931666-22-9">
>
> <fulltext default="none" attributes="false"/>
>
> <create qname="ead:unitid" type="xs:string"/>
>
> <range>
>
> <create qname="ead:date" type="xs:string"/>
>
> <create qname="@id" type="xs:string"/>
>
> <create qname="@authfilenumber" type="xs:string"/>
>
> <create qname="@role" type="xs:string"/>
>
> <create qname="@audience" type="xs:string"/>
>
> <create qname="@source" type="xs:string"/>
>
> <create qname="@encodinganalog" type="xs:string"/>
>
> <create qname="persname">
>
> <field name="persname-encodinganalog" match="@encodinganalog" type="xs:string"/>
>
> <field name="persname-role" match="@role" type="xs:string"/>
>
> </create>
>
> <create qname="@source:@authfilenumber">
>
> <field name="name-source" match="@source" type="xs:string"/>
>
> <field name="name-authfilenumber" match="@authfilenumber" type="xs:string"/>
>
> </create>
>
> </range>
>
> </index>
>
> </collection>
>
>
>
> Could someone please help to check if the XQuery can be optimized? The Query runs at the moment around 20 Minutes for 40.000 XML Documents.
>
>
>
> Many Thanks …
>
>
>
> Best regards
>
>
>
> Konrad
>
>
>
> Konrad Eichstädt
>
> Staatsbibliothek zu Berlin
>
> Preußischer Kulturbesitz
>
> Potsdamer Str. 33
> 10785 Berlin
>
> Informations- und Datenmanagement
>
> IDM II elektronische Ressourcen
>
> Tel: +49 030 266 432222
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>
>
> --
> Michael Westbay
> Writer/System Administrator
> http://www.japanesebaseball.com/
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot_______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
--
Peter Stadler
Carl-Maria-von-Weber-Gesamtausgabe
Arbeitsstelle Detmold
Hornsche Str. 39
D-32756 Detmold
Tel. +49 5231 975-676
Fax: +49 5231 975-668
stadler at weber-gesamtausgabe.de
www.weber-gesamtausgabe.de


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Memory Leak?

aik01
we are using existdb/betterform since 3 years.
Overall we were happy. Data volume and app complexity got bigger over time.
Now, after restart and cache filling the response times are ok (~1s),  
cache hit ratios are oK (< 1:10).
Used Memory goes rapidly up to 3,5-3,8GB (4.5GB committted),
and then it increases slowly despite garbage collection.
After > 24-48h uptime the system is getting slower and finally deadlocks.
Every day or two, I have to restart the system.
The critical limit for available memory seems to be 600-800MB before  
GC errors appear in exist.log.

Our Situation:
- 3.0RC2 develop branch on Ubuntu 15.04, Oracle Java 8
- apache2 proxy
- Server with 8GB 4 cores
- exist config with 5G Memory, 2G Cache, 128MB Collection Cache, 10 broker max
- 60 user, up to 10 concurrent
- 10 collection; several small, but two with each >100.000 objects  
(1-5kB size)
- range index over an attribute, which occur multiple in every object
- db accessed mostly via RestXQ from Betterform or AJAX

Why does used memory increase over time:
- Failed XQuerys through peak load?
- Memory leak in RestXQ API / Betterform?
Currently I'm trying to define better indices (separate per  
collection, only the xpaths needed) and to decrease cache.

Any further remedies?

regards
Peter


------------------------------------------------------------------------------
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive.
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Memory Leak?

Adam Retter
When you believe you have a java process which is holding a lot of
memory unnecessarily (i.e. there are one or more memory leaks), then
you can take a dump of the heap preferably when it is at its worst.

To create a dump, run `jps` first to find out what the pid of your
eXist-db process is (unless you already know).

Then to create the actual dump use jmap, replacing <pid> for the
process id of the eXist-db process.

jmap -dump:format=b,file=my-heap.bin <pid>

You can then compress that file, and make it available to us
(privately) and we can take a look and see if we can see any leaks.

Cheers Adam.



On 26 October 2016 at 07:32,  <[hidden email]> wrote:

> we are using existdb/betterform since 3 years.
> Overall we were happy. Data volume and app complexity got bigger over time.
> Now, after restart and cache filling the response times are ok (~1s),
> cache hit ratios are oK (< 1:10).
> Used Memory goes rapidly up to 3,5-3,8GB (4.5GB committted),
> and then it increases slowly despite garbage collection.
> After > 24-48h uptime the system is getting slower and finally deadlocks.
> Every day or two, I have to restart the system.
> The critical limit for available memory seems to be 600-800MB before
> GC errors appear in exist.log.
>
> Our Situation:
> - 3.0RC2 develop branch on Ubuntu 15.04, Oracle Java 8
> - apache2 proxy
> - Server with 8GB 4 cores
> - exist config with 5G Memory, 2G Cache, 128MB Collection Cache, 10 broker max
> - 60 user, up to 10 concurrent
> - 10 collection; several small, but two with each >100.000 objects
> (1-5kB size)
> - range index over an attribute, which occur multiple in every object
> - db accessed mostly via RestXQ from Betterform or AJAX
>
> Why does used memory increase over time:
> - Failed XQuerys through peak load?
> - Memory leak in RestXQ API / Betterform?
> Currently I'm trying to define better indices (separate per
> collection, only the xpaths needed) and to decrease cache.
>
> Any further remedies?
>
> regards
> Peter
>
>
> ------------------------------------------------------------------------------
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open



--
Adam Retter

eXist Developer
{ United Kingdom }
[hidden email]
irc://irc.freenode.net/existdb

------------------------------------------------------------------------------
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive.
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open