XQuery performances + tests

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

XQuery performances + tests

Cédric Meier
Hi all,

We are using eXist in a PHP web site project for 2 main points (for the moment): A project management module and a documents management module. On each of these modules we have to provide some lists for the interfaces (eg: list of last added documents or list of opened projects). Testing our application with more than 200 documents per collection was quite slow... So I started to make some tests.

The eXist database works fine for us (no more stability or corruption problems since months) but the performances decrease with a "large" collection. To verify this I made some tests with the java client provided with eXist (to exclude the PHP part of the project) and reported the time on a small graph (see attached PDF file for details of tests).

The result of these tests is that eXist is quite slow when querying in collections. For example, a simple XQuery (without "order by" or access to other collections) needs more than 10 seconds to process on a collection of 1'000 documents!

I've not seen such tests on the mailing list till now and I wanted to give you my feedback. I would really apreciate to read your reactions.

Here are some questions:
What do you think about using eXist in a such context (lists of collections, etc.)? Is that really suitable for this (performances, capacity)?
What do you think about my tests? Are they realistic? Are they a representation of what the eXist database should do?
Are my XQueries totally wrong written? (see attached PDF)
What's your reactions with this problem? Is that a problem for you too?

Thanks in advance for your answers!

Cédric Meier

eXist_xquery_tests.pdf (51K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: XQuery performances + tests

José María Fernández González
I haven't seen in depth the included PDF, but I feel your execution
times will improve if you explicitly define indexes over the most used
expressions in your queries (for instance, a numerical index over
version/@number).

        Best regards,
                José María

Cédric Meier wrote:

> Hi all,
>
> We are using eXist in a PHP web site project for 2 main points (for the
> moment): A project management module and a documents management module.
> On each of these modules we have to provide some lists for the
> interfaces (eg: list of last added documents or list of opened
> projects). Testing our application with more than 200 documents per
> collection was quite slow... So I started to make some tests.
>
> The eXist database works fine for us (no more stability or corruption
> problems since months) but the performances decrease with a "large"
> collection. To verify this I made some tests with the java client
> provided with eXist (to exclude the PHP part of the project) and
> reported the time on a small graph (see attached PDF file for details of
> tests).
>
> The result of these tests is that eXist is quite slow when querying in
> collections. For example, a simple XQuery (without "order by" or access
> to other collections) needs more than 10 seconds to process on a
> collection of 1'000 documents!
>
> I've not seen such tests on the mailing list till now and I wanted to
> give you my feedback. I would really apreciate to read your reactions.
>
> Here are some questions:
> What do you think about using eXist in a such context (lists of
> collections, etc.)? Is that really suitable for this (performances,
> capacity)?
> What do you think about my tests? Are they realistic? Are they a
> representation of what the eXist database should do?
> Are my XQueries totally wrong written? (see attached PDF)
> What's your reactions with this problem? Is that a problem for you too?
>
> Thanks in advance for your answers!
>
> Cédric Meier

--
José María Fernández González e-mail: [hidden email]
Tlfn: (+34) 91 585 54 50 Fax: (+34) 91 585 45 06
Grupo de Diseño de Proteinas Protein Design Group
Centro Nacional de Biotecnología National Center of Biotechnology
C.P.: 28049 Zip Code: 28049
C/. Darwin nº 3 (Campus Cantoblanco, U. Autónoma), Madrid (Spain)


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery performances + tests

Adam Retter-7
In reply to this post by Cédric Meier
I am very interested in eXist performance as I have to purchase a server
soon to run eXist on and deliver a web application with eXist that will
have initially ~4000 documents in a collection but may have upto 30,000
documents.


I also have a Pentium M 1.60GHz laptop with Sun JVM 1.40

It has 2GB or RAM although I could remove 1GB for the test.

The main difference is that it is running SuSE Linux 9.3 Professional
(Kernel 2.6.11.4-21.9-default) instead of Windows XP.

I would like to repeat your experiment and compare the results, if you
would also be interested in this could you send me the documents you
stored in the collection. Also could you please send me a copy of the
original document you used to produce the PDF so I can update it with my
results for comparisson. Also did you take the first execution time for
the query from the admin client or the second (I believe the query's can
be cached), I have certainly noticed that running the query several
times gives different response times, I usually take the mean time of 10
executions.

I also have access to other hardware that may be interesting to test -
Solaris 10 on Sparc Ultra 10 and a Dual SMP Intel PIII @ 450Mhz with any
x86 OS I fancy.


Thanks Adam Retter



On Wed, 2005-09-14 at 15:42 +0100, Cédric Meier wrote:

> Hi all,
>
> We are using eXist in a PHP web site project for 2 main points (for
> the moment): A project management module and a documents management
> module. On each of these modules we have to provide some lists for the
> interfaces (eg: list of last added documents or list of opened
> projects). Testing our application with more than 200 documents per
> collection was quite slow... So I started to make some tests.
>
> The eXist database works fine for us (no more stability or corruption
> problems since months) but the performances decrease with a "large"
> collection. To verify this I made some tests with the java client
> provided with eXist (to exclude the PHP part of the project) and
> reported the time on a small graph (see attached PDF file for details
> of tests).
>
> The result of these tests is that eXist is quite slow when querying in
> collections. For example, a simple XQuery (without "order by" or
> access to other collections) needs more than 10 seconds to process on
> a collection of 1'000 documents!
>
> I've not seen such tests on the mailing list till now and I wanted to
> give you my feedback. I would really apreciate to read your reactions.
>
> Here are some questions:
> What do you think about using eXist in a such context (lists of
> collections, etc.)? Is that really suitable for this (performances,
> capacity)?
> What do you think about my tests? Are they realistic? Are they a
> representation of what the eXist database should do?
> Are my XQueries totally wrong written? (see attached PDF)
> What's your reactions with this problem? Is that a problem for you
> too?
>
> Thanks in advance for your answers!
>
> Cédric Meier


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery performances + tests

Michael Beddow-2
In reply to this post by Cédric Meier
I'm afraid I have rather too many things needing urgent attention to give
your reports the time  they deserve right now, but I get the impression that
quite a few people scan this list for references to "performance" and go
away again when they see unchallenged claims that eXist isn't suitable for
"large" collections, so just a few initial thoughts while I take a coffee
break....

My current collection-size of around 50,000 documents averaging roughly 2K
each would be classed as "Mickey Mouse" by one recent poster here, but they
are large in comparison to your test collection. However, if I was routinely
seeing anything like 10 second response times on simple queries I would have
given up on eXist long ago.

José María is quite right. You can't expect reasonable performance from
queries that rely as heavily on attribute value comparisons as yours do
unless you either define a range index over the attributes you are
targetting and keep your code as before or (very much a second best solution
nowadays, but at one time it was the only way) you configure alphanumeric
attribute fulltext indexing and use the non-standard comparison operator &=.

Unless you have defined such range indexes (or are using the fulltext index
on your attribute values) what you are in fact testing in your examples is
largely the latency of your  file system, because eXist is having to use
brute force methods to find matches in the persistent DOM.

Michael Beddow




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery performances + tests

wolfgangmm
In reply to this post by Cédric Meier
Hi,

thanks for sharing your results with us. The test results are quite
interesting, because they cover an area to which I did not pay much
attention so far. As you said, the queries do mostly data
transformation, creating a list of the collection contents with more
or less details. Consequently, all your queries start with something
like "for $document in collection() ...", thus forcing eXist to
process one document after the other. In other words, you are here
using eXist more like a transformation engine.

Without being able to play with your queries (which could reveal some
hidden hotspots), I would nevertheless expect the queries to scale
somehow linear with the number of documents in the collection. But
they clearly do not! I can in part also guess why (this only a *guess*
which needs to be confirmed): the problem is that, by design, eXist
organizes indexes by collection and its path joins are mainly
optimized to process larger node sets at once. For example, to
evaluate the simple expression $document/general, eXist will look up
the QName "general" in the structural index and scan through all
occurrences of "general" in the collection, testing for each
occurrence if it belongs to the correct document and has a parent node
in $document.

Now, if you process the collection document-by-document, the
complexity of this path-join operation increases with every additional
document you add to the collection. The structural index becomes more
and more useless and a simple tree-traversal without index might have
yielded better results!

That said, I see at least two areas where we could work at to improve
this: 1) scanning the structural index needs to become faster, 2) the
structural index should not be used in cases where a brute force
attack is sufficient.

I already did some work on item 1) recently (in cvs) and would be
interested to check if it makes a difference. I also see that we
should have more data-transformation tests similiar to yours. I guess
watching the queries with a profiler might already reveal some obvious
hotspots, which might not even be related to anything I said above and
can be fixed quite easily. If you want to help us to sort this out,
you could provide us some (auto-generated) data and the queries. I'm
sure we can do something to improve this.

As always, it is of major importance to have the right data and the
right tests to locate and fix performance issues. Without detailed
feedback, I'm limited to the problems I find in my own projects. I
thus welcome concrete feedback like yours.

Wolfgang


On 9/14/05, Cédric Meier <[hidden email]> wrote:

>  Hi all,
>  
>  We are using eXist in a PHP web site project for 2 main points (for the
> moment): A project management module and a documents management module. On
> each of these modules we have to provide some lists for the interfaces (eg:
> list of last added documents or list of opened projects). Testing our
> application with more than 200 documents per collection was quite slow... So
> I started to make some tests.
>  
>  The eXist database works fine for us (no more stability or corruption
> problems since months) but the performances decrease with a "large"
> collection. To verify this I made some tests with the java client provided
> with eXist (to exclude the PHP part of the project) and reported the time on
> a small graph (see attached PDF file for details of tests).
>  
>  The result of these tests is that eXist is quite slow when querying in
> collections. For example, a simple XQuery (without "order by" or access to
> other collections) needs more than 10 seconds to process on a collection of
> 1'000 documents!
>  
>  I've not seen such tests on the mailing list till now and I wanted to give
> you my feedback. I would really apreciate to read your reactions.
>  
>  Here are some questions:
>  What do you think about using eXist in a such context (lists of
> collections, etc.)? Is that really suitable for this (performances,
> capacity)?
>  What do you think about my tests? Are they realistic? Are they a
> representation of what the eXist database should do?
>  Are my XQueries totally wrong written? (see attached PDF)
>  What's your reactions with this problem? Is that a problem for you too?
>  
>  Thanks in advance for your answers!
>  
>  Cédric Meier
>


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery performances + tests

wolfgangmm
In reply to this post by Michael Beddow-2
> Unless you have defined such range indexes (or are using the fulltext index
> on your attribute values) what you are in fact testing in your examples is
> largely the latency of your  file system, because eXist is having to use
> brute force methods to find matches in the persistent DOM.

That's quite correct and another major point I did not mention in my
response. However, I would only expect a partial improvement from
using range indexes for the type of data-transformation queries in
question. But you never can be sure, so it would indeed be interesting
to see the difference.

Looking at the queries some more, I also think you can remove the
final text() step in many cases. It is unnecessary and costs
performance.

Wolfgang


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery performances + tests

wolfgangmm
> Looking at the queries some more, I also think you can remove the
> final text() step in many cases. It is unnecessary and costs
> performance.

Sorry, my mistake. I thought you were often using the same element
names for the output as in the input. Instead of creating a new
element with e.g. <id>{$document/idDoc/text()}</id>, you could then
simply output {$document/idDoc}, which would be faster.

Wolfgang


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Query expression not supported

John C. Landers
In reply to this post by wolfgangmm
I can do the following queries just fine:
for $x in /* return $x --means no namespace, any local name.
and
for $x in /tes:* return $x --means namespace tes any local name

but this doesn't work:
for $x in /*:* return $x -- any namespace, any local name
Throws:
Meaning only those with a namespace
line 1:13: unexpected token: :
        at org.exist.xquery.parser.XQueryParser.unaryExpr(XQueryParser.java:4195

Maybe I am missing something.

I am using eXist-snapshot-20050620.jar  I believe.


jcl.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Query expression not supported

Michael Beddow-2
Yes, looks like something wrong here.

For clarity, though not relevant to the actual bug:

>>for $x in /* return $x --means no namespace, any local name.

actually it means "in the default namespace" which is a significantly
different thing in some cases.

And for completeness, the namespace wildcard in itself (nasty beast IMHO,
but it got into XPath 2 after quite a battle) does work, so

for $x in /*:blort return $x

will correctly return elements with local name blort in any but the default
namespace (though I feel sorry for anyone whose data makes that a useful
expression to employ...)

But eXist's XPath parser incorrectly throws the reported exception
immediately it sees *:* (or more precisely, when it finds the second * token
in that string), so yes, it looks like a bug.

Michael Beddow



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery performances + tests

Cédric Meier
In reply to this post by wolfgangmm
Hi,
Thank you for all your answers and your tips!
Now I have to make further tests with range indexes, the non-standard operator &= and simplified xqueries.
I'll post my results on this list as soon as possible.

Thanks for your help!

Cédric


Wolfgang Meier a écrit :
Unless you have defined such range indexes (or are using the fulltext index
on your attribute values) what you are in fact testing in your examples is
largely the latency of your  file system, because eXist is having to use
brute force methods to find matches in the persistent DOM.
    

That's quite correct and another major point I did not mention in my
response. However, I would only expect a partial improvement from
using range indexes for the type of data-transformation queries in
question. But you never can be sure, so it would indeed be interesting
to see the difference.

Looking at the queries some more, I also think you can remove the
final text() step in many cases. It is unnecessary and costs
performance.

Wolfgang


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open


  
Reply | Threaded
Open this post in threaded view
|

RE: XQuery performances + tests

Adam Retter-7
In reply to this post by Cédric Meier
Have done the testing on linux now. Initial indications are its a little
slower...
 
will try and posts the results tomorrow.
 
cheers adam

  _____  

From: Cédric Meier [mailto:[hidden email]]
Sent: Thu 15/09/2005 18:12
To: [hidden email]
Subject: Re: [Exist-open] XQuery performances + tests


Hi,
Thank you for all your answers and your tips!
Now I have to make further tests with range indexes, the non-standard
operator &= and simplified xqueries.
I'll post my results on this list as soon as possible.

Thanks for your help!

Cédric


Wolfgang Meier a écrit :

Unless you have defined such range indexes (or are using the fulltext index

on your attribute values) what you are in fact testing in your examples is

largely the latency of your  file system, because eXist is having to use

brute force methods to find matches in the persistent DOM.

   

That's quite correct and another major point I did not mention in my

response. However, I would only expect a partial improvement from

using range indexes for the type of data-transformation queries in

question. But you never can be sure, so it would indeed be interesting

to see the difference.



Looking at the queries some more, I also think you can remove the

final text() step in many cases. It is unnecessary and costs

performance.



Wolfgang





-------------------------------------------------------

SF.Net email is sponsored by:

Tame your development challenges with Apache's Geronimo App Server. Download

it for free - -and be entered to win a 42" plasma tv or your very own

Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
<http://sourceforge.net/geronimo.php>

_______________________________________________

Exist-open mailing list

[hidden email] <mailto:[hidden email]>

https://lists.sourceforge.net/lists/listinfo/exist-open
<https://lists.sourceforge.net/lists/listinfo/exist-open>





 



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Fix: XQuery performances + tests

Wolfgang Meier-2
Hi Cédric,

I made some tests with your data and had a fast success!!! Running the queries
with a profiler showed that an existing optimization to speed up path
expressions nested within for loops was not applied in the correct way. I
just had to change a few flags internally...

Compared with the 050802 snapshot, Q1 is down to roughly 1/4, Q2 to Q4 take
only 5-8% of the previous time. See the attached PDF for results.

The changes are in the CVS. I could probably find more things to improve, but
I'm happy with this for today. Using a range index on version/@number did not
show a further improvement. I still have to check way.

Wolfgang

perf050917.pdf (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fix: XQuery performances + tests

Cédric Meier
Hi Wolfgang,

Great!!! I just made some tests with the CVS version: The improvement goes from 4.4 times (first query) to 16 times faster (3th query)!!!
That really speed up our xquery and application!
Thank you for your work and the changes during this week-end! It's really really usefull!!!

Anyway eXist seems to be a lot faster when I use it only for querying datas and not transform to much datas (like an xsl-t). I'll check more about this during next weeks.

Thanks for all!!
Cédric


Wolfgang Meier a écrit :
Hi Cédric,

I made some tests with your data and had a fast success!!! Running the queries 
with a profiler showed that an existing optimization to speed up path 
expressions nested within for loops was not applied in the correct way. I 
just had to change a few flags internally...

Compared with the 050802 snapshot, Q1 is down to roughly 1/4, Q2 to Q4 take 
only 5-8% of the previous time. See the attached PDF for results.

The changes are in the CVS. I could probably find more things to improve, but 
I'm happy with this for today. Using a range index on version/@number did not 
show a further improvement. I still have to check way.

Wolfgang
  
Reply | Threaded
Open this post in threaded view
|

Re: Fix: XQuery performances + tests

Michael Beddow-2
Cédric,

I think all eXist users will be grateful to you for posting your performance
measurements, along with data and code that allowed the source of the delays
to be tracked down, the second component being the vital bit.

>Anyway eXist seems to be a lot faster when I use it only for querying
>datas and not transform to much datas (like an xsl-t).

I suspect there is always likely to be this differential, though no doubt
eXist's efficiency in construction and transformation can and will be
improved.

My personal view is that speed (and correctness!) of data location and
retrieval should be the prime focus of optimisation, because if that is
achieved, those of us who need to do clever things with the retrieved data
can do it in a middle tier of our own.

Maybe I'm unduly influenced by the fact that nearly all my uses of eXist
date from a time when there was no XQuery support, so I had no choice but to
add a lot of local logic in my wrappers. I have gradually shifted some of it
over into XQuery functions (I find the ability to cache results in a session
on the back end especially useful), but where performance is critical, I
expect that I will still be better off doing things in my middle tier that
eXist could now theoretically do for me.

Michael Beddow



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Fix: XQuery performances + tests

Jean-Marc Vanel-3
In reply to this post by Cédric Meier
Cédric Meier wrote:

> Hi Wolfgang,
>
> Great!!! I just made some tests with the CVS version: The improvement
> goes from 4.4 times (first query) to 16 times faster (3th query)!!!
> That really speed up our xquery and application!
> Thank you for your work and the changes during this week-end! It's
> really really usefull!!!
>
> Anyway eXist seems to be a lot faster when I use it only for querying
> datas and not transform to much datas (like an xsl-t).

Bonjour Cédric

I think like Michael B. that the most important is performance in
retrieving and querying.
However I'm shure that XQuery in eXist has the potential to replace XSLT
(and JSP, PHP, XSP, ... by the way) in most of its uses.

But currently it requires some knowhow to write efficent XQueries. I
wrote something about that here :
http://jmvanel.free.fr/exist/optimisation-xquery-fr.html

Alas, in French for now :-(  .

> I'll check more about this during next weeks.
>
> Thanks for all!!
> Cédric
>
>
> Wolfgang Meier a écrit :
>
>>Hi Cédric,
>>
>>I made some tests with your data and had a fast success!!! Running the queries
>>with a profiler showed that an existing optimization to speed up path
>>expressions nested within for loops was not applied in the correct way. I
>>just had to change a few flags internally...
>>
>>Compared with the 050802 snapshot, Q1 is down to roughly 1/4, Q2 to Q4 take
>>only 5-8% of the previous time. See the attached PDF for results.
>>
>>The changes are in the CVS. I could probably find more things to improve, but
>>I'm happy with this for today. Using a range index on version/@number did not
>>show a further improvement. I still have to check way.
>>
>>Wolfgang
>>  
>>


--
Jean-Marc Vanel 01 39 43 31 46
Conseil et Services / développement & intégration logiciels
Logiciel libre, Web, Java, XML ...
A la pointe de la technique, au service des projets
http://jmvanel.free.fr/ ===) CV, software resources

Mes journaux:
- sujets généraux en Français: http://jmvanel.free.fr/Block-note.html
- sujets informatiques en Français: http://jmvanel.free.fr/notes-informatiques.html
- computer science diary : http://jmvanel.free.fr/computer-notes.html

Worldwide Botanical Knowledge Base
http://wwbota.free.fr/ 
test XML query engine: http://jmvanel.free.fr/protea.html




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Fix: XQuery performances + tests

Michael Beddow-2
Jean-Marc

> But currently it requires some knowhow to write efficent XQueries. I
> wrote something about that here :
> http://jmvanel.free.fr/exist/optimisation-xquery-fr.html

>Alas, in French for now :-(  .

Good stuff! Have you got someone lined up to translate this into English? If
not I'll gladly do it (but I thought I'd check first in case it was already
under way)

Michael Beddow



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Fix: XQuery performances + tests

Michael Beddow-2
Dannes Wessels wrote:


> let's try :-)

http://translate.google.com/translate?u=http%3A%2F%2Fjmvanel.free.fr%2Fexist%2Foptimisation-xquery-fr.html&langpair=fr%7Cen&hl=en&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools

Though some people have found my posts occasionally amusing, I don't think I
could promise to match Mr Google's laughter score on that particular
exercise.

Michael Beddow



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Fix: XQuery performances + tests

Michael Beddow-2
In reply to this post by Michael Beddow-2
OK, I've put an English translation at

http://www.anglo-norman.net/sitedocs/optimisation-xquery-en.html

Anyone is welcome to grab, amend, put on Wiki etc etc. I'm afraid I'm too
old (and too busy) to learn how to Wiki myself...

Michael




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Fix: XQuery performances + tests

Adam Retter-7
In reply to this post by Wolfgang Meier-2
Just to show the flipside of the coin...

I understand that for you guys performance is a top issue, however for
me XQuery is a very important issue and I see it as one of the greatest
strengths of eXist.

I have build a very comprehensive web application using just eXist and
XQuery and I think its a very beautiful architecture, everything is
stored in the eXist db and its all nice and neat :-).
I developed this application in a very short time, far faster than I
think I could have build it in something like PHP+MySQL or ASP+Oracle.
I think many developers could find eXist a good platform for web
applications, although I fear many of them dont know about it.

When the application goes live, end of october hopefully I will post the
URL and could maybe write an article of web application development
using eXist for an e-Zine or something, any recommendations.

Obviously performance is important to me, but I must say I had no
complaints of the current peformance, although I have yet to load the
full dataset for the application into eXist. I have found that
restructuring of the XML data that is stored in eXist can lead to
significant performance improvements. Restructuring the IPSV taxonomy
from a flat structure to a nested tree like structure gave me a
performance gain of over x10.




On Mon, 2005-09-19 at 10:58 +0100, Jean-Marc Vanel wrote:

> Cédric Meier wrote:
>
> > Hi Wolfgang,
> >
> > Great!!! I just made some tests with the CVS version: The
> improvement  
> > goes from 4.4 times (first query) to 16 times faster (3th query)!!!
> > That really speed up our xquery and application!
> > Thank you for your work and the changes during this week-end! It's  
> > really really usefull!!!
> >
> > Anyway eXist seems to be a lot faster when I use it only for
> querying  
> > datas and not transform to much datas (like an xsl-t).
>
> Bonjour Cédric
>
> I think like Michael B. that the most important is performance in  
> retrieving and querying.
> However I'm shure that XQuery in eXist has the potential to replace
> XSLT  
> (and JSP, PHP, XSP, ... by the way) in most of its uses.
>
> But currently it requires some knowhow to write efficent XQueries. I  
> wrote something about that here :
> http://jmvanel.free.fr/exist/optimisation-xquery-fr.html
>
> Alas, in French for now :-(  .
>
> > I'll check more about this during next weeks.
> >
> > Thanks for all!!
> > Cédric
> >
> >
> > Wolfgang Meier a écrit :
> >
> >>Hi Cédric,
> >>
> >>I made some tests with your data and had a fast success!!! Running
> the queries  
> >>with a profiler showed that an existing optimization to speed up
> path  
> >>expressions nested within for loops was not applied in the correct
> way. I  
> >>just had to change a few flags internally...
> >>
> >>Compared with the 050802 snapshot, Q1 is down to roughly 1/4, Q2 to
> Q4 take  
> >>only 5-8% of the previous time. See the attached PDF for results.
> >>
> >>The changes are in the CVS. I could probably find more things to
> improve, but  
> >>I'm happy with this for today. Using a range index on
> version/@number did not  
> >>show a further improvement. I still have to check way.
> >>
> >>Wolfgang
> >>  
> >>
>
>
> --  
> Jean-Marc Vanel         01 39 43 31 46
> Conseil et Services / développement & intégration logiciels
> Logiciel libre, Web, Java, XML ...
> A la pointe de la technique, au service des projets
> http://jmvanel.free.fr/ ===) CV, software resources
>
> Mes journaux:
> - sujets généraux en Français: http://jmvanel.free.fr/Block-note.html 
> - sujets informatiques en Français:
> http://jmvanel.free.fr/notes-informatiques.html 
> - computer science diary : http://jmvanel.free.fr/computer-notes.html
>
> Worldwide Botanical Knowledge Base  
> http://wwbota.free.fr/ 
> test XML query engine: http://jmvanel.free.fr/protea.html
>
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by:
> Tame your development challenges with Apache's Geronimo App Server.
> Download
> it for free - -and be entered to win a 42" plasma tv or your very own
> Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php 
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

RE: XQuery performances + tests

Adam Retter-7
In reply to this post by Cédric Meier
Here are the results for repeating Cedric's test on similar hardware but
with SuSE Linux 9.3 Professional as opposed to Windows  XP.

Intrestingly it seems to be a little bit slower, which surprised me...

I have seen the recent posts about Wolfgang's performance improvements
to XQuery performance. I will try and repeat the testing with a CVS
updated copy of eXist soon.

Thanks Adam


On Fri, 2005-09-16 at 10:09 +0100, Adam Retter wrote:

> Have done the testing on linux now. Initial indications are its a
> little
> slower...
>  
> will try and posts the results tomorrow.
>  
> cheers adam
>
>   _____  
>
> From: Cédric Meier [mailto:[hidden email]]
> Sent: Thu 15/09/2005 18:12
> To: [hidden email]
> Subject: Re: [Exist-open] XQuery performances + tests
>
>
> Hi,
> Thank you for all your answers and your tips!
> Now I have to make further tests with range indexes, the non-standard
> operator &= and simplified xqueries.
> I'll post my results on this list as soon as possible.
>
> Thanks for your help!
>
> Cédric
>
>
> Wolfgang Meier a écrit :
>
> Unless you have defined such range indexes (or are using the fulltext
> index
>
> on your attribute values) what you are in fact testing in your
> examples is
>
> largely the latency of your  file system, because eXist is having to
> use
>
> brute force methods to find matches in the persistent DOM.
>
>    
>
> That's quite correct and another major point I did not mention in my
>
> response. However, I would only expect a partial improvement from
>
> using range indexes for the type of data-transformation queries in
>
> question. But you never can be sure, so it would indeed be interesting
>
> to see the difference.
>
>
>
> Looking at the queries some more, I also think you can remove the
>
> final text() step in many cases. It is unnecessary and costs
>
> performance.
>
>
>
> Wolfgang
>
>
>
>
>
> -------------------------------------------------------
>
> SF.Net email is sponsored by:
>
> Tame your development challenges with Apache's Geronimo App Server.
> Download
>
> it for free - -and be entered to win a 42" plasma tv or your very own
>
> Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php 
> <http://sourceforge.net/geronimo.php>
>
> _______________________________________________
>
> Exist-open mailing list
>
> [hidden email]
> <mailto:[hidden email]>
>
> https://lists.sourceforge.net/lists/listinfo/exist-open 
> <https://lists.sourceforge.net/lists/listinfo/exist-open>
>
>
>
>
>
>  
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by:
> Tame your development challenges with Apache's Geronimo App Server.
> Download
> it for free - -and be entered to win a 42" plasma tv or your very own
> Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php 
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>


eXist_xquery_tests.pdf (57K) Download Attachment
12