tweaking Exist for multi user environments

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

tweaking Exist for multi user environments

Jakob Fix-2
Hello,

I recently put up a beta version of a web application I am developping
on my company's intranet, and asked people to access it.  There were
about 20 people, and I guess there were about 5 to 10 simultaneous
queries, big maximum. The response times were desastrous -- some
people waited 5 minutes or longer (yes, they were very "gutmuetig"
:-).  This happened on a quad processor with 8GB of RAM.

In contrast, when only one person searches, response times are less
than 30 seconds (obviously, this is still too much, we need to get to
under 5 secs).

While I am aware that the response time is in part due to the rather
complex queries, I was wondering if there are any parameters I could
tweak in order to improve the database's response time?  Currently,
we're using the stock config file as supplied with the standard
distribution (attached).  Thanks in advance.

--
cheers,
Jakob.


PS: BTW, it appears we're using a snapshot dated 08 September 2005 (I
am looking at the file dates, is there a more fool-proof way to find
out the exact version?  we were switching frequently so I can't
remember which one ... and bin/client.bat -v only says this:

E:\dy\lib\exist>bin\client -v
eXist version 1.0, Copyright (C) 2004 Wolfgang Meier
eXist comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; for details read the license file.

conf.xml (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Jakob Fix-2
Giulio,

On 29/09/05, Giulio Valentino <[hidden email]> wrote:
> For exist version go to
> http://localhost:8080/exist/xquery/functions.xq. ceck the top of page
> eXist version...
> giulio

unfortunately, I remove all the unneeded stuff, and just keep the bar
minimum, so this won't work, I guess unless this is an essential file
without which Exist won't work.  However, without being able to look
at the code right now, where does the Exist version at the top of the
page come from?  Anywhere I can go and look?  Is there a VERSION or
BUILD string or file somewhere?

> 2005/9/29, Jakob Fix <[hidden email]>:
> > Hello,
> >
> > I recently put up a beta version of a web application I am developping
> > on my company's intranet, and asked people to access it.  There were
> > about 20 people, and I guess there were about 5 to 10 simultaneous
> > queries, big maximum. The response times were desastrous -- some
> > people waited 5 minutes or longer (yes, they were very "gutmuetig"
> > :-).  This happened on a quad processor with 8GB of RAM.
> >
> > In contrast, when only one person searches, response times are less
> > than 30 seconds (obviously, this is still too much, we need to get to
> > under 5 secs).
> >
> > While I am aware that the response time is in part due to the rather
> > complex queries, I was wondering if there are any parameters I could
> > tweak in order to improve the database's response time?  Currently,
> > we're using the stock config file as supplied with the standard
> > distribution (attached).  Thanks in advance.
> >
--
cheers,
Jakob.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Fwd: tweaking Exist for multi user environments

Jakob Fix-2
In reply to this post by Jakob Fix-2
[oops, message went to Dannes only]

---------- Forwarded message ----------
From: Jakob Fix <[hidden email]>
Date: 29-Sep-2005 21:29
Subject: Re: [Exist-open] tweaking Exist for multi user environments
To: Dannes Wessels <[hidden email]>


Dannes,

On 29/09/05, Dannes Wessels <[hidden email]> wrote:
> based on the first few lines on the config file, I guess you are
> running an older version of eXist. I am not fully sure, because I run
> code from CVS.......

it depends what you mean by "older".  I know for sure that it's
positively post-beta, and I will have a look whether I've the
downloaded file somewhere.  I understand it's important for you to
know the version I'm running in order to be able to advise, but in
general, are there ways to optimize Exist for a multi-user
environment?


> If you are using an older version (beta stuff, no snapshot) please
> consider upgrading seriously. I expect this weekend to come a new
> snapshot available with exciting improvements and additions. (see
> change log)

Ok, I will have a look. Thanks.

> Dannes
>

--
cheers,
Jakob.


--
cheers,
Jakob.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Jakob Fix-2
In reply to this post by Jakob Fix-2
Dannes,

> Best way to check.. what is the name of the file you have downloaded?
>
> option 1) bin\start.sh  : open webbrowser http://whateveryourname:8080
> ; in the right top you'll see an version id
>
> option 2) find for file header.xml look into it

Ok, I will be looking for it.

> If you did compile the code by yourself you are out of luck, at every
> build this file is updated.

Ah, that could be a problem, indeed.

Wouldn't it be useful to have the unmutable buildnumber somewhere
safe?  I don't usually keep the downloaded installer jars or tar.gz,
and I did download quite a number of snapshots.  Also, when working
collaboratively, it would be nice to always have a way to find out
what the used build is (in case it doesn't work the same for the other
person).  Another thing, we had to stop at some point to always update
to the latest snapshot because we weren't sure it wouldn't break
something that worked before ...

--
cheers,
Jakob.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Dannes Wessels
Hi Jakob,

> Wouldn't it be useful to have the unmutable buildnumber somewhere safe?

Hmmm, yea, but typically this kind of data is forgotten to be updated
when releasing a new shapshot. When compiling from CVS, this helps us
to remember the CVS date. I like it to be updated automatically :-)

Normally, a 'end user' will not need to compile code him/herself, so
the datestamp will never be touched.

> Another thing, we had to stop at some point to always update
> to the latest snapshot because we weren't sure it wouldn't break
> something that worked before ...

I can understand that. However, the snapshots itself are stable
(enough) to use. However, I would not like to run CVS code in a
production environment :-)

For convenience, a change log is available on the CVS code. Based on
this information you could decide wether a new snapshot is suited for
you.....

regards

Dannes

--
# Dannes Wessels # The Netherlands #
# Jabber / ICQ / MSN / AIM / Yahoo / google.com/talk #


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

wolfgangmm
In reply to this post by Jakob Fix-2
Jakob,

if a single query does already take 30 seconds, it is likely that some
long-running part of the query is blocking the entire database and
concurrency will be getting worse with every additional user. So in
order to improve concurrency, you will have to find out what exactly
is slowing the query down.

It is hard to give any advice without seeing the query and the data.
You could try to increase the cacheSize setting in conf.xml,
especially if you see a lot of disk activity. I'm always willing to do
a bit of profiling if I have time, some real data and a few queries to
test with. So if you can't find a way to speed it up, you may consider
to send me a test package.

> PS: BTW, it appears we're using a snapshot dated 08 September 2005 (I
> am looking at the file dates, is there a more fool-proof way to find
> out the exact version?  we were switching frequently so I can't
> remember which one ... and bin/client.bat -v only says this:

The last snapshot was released on August 5, so maybe you're using a cvs build?

Wolfgang


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Bruno Chatel
Hi,

>if a single query does already take 30 seconds, it is likely that some
>long-running part of the query is blocking the entire database and
>concurrency will be getting worse with every additional user. So in
>order to improve concurrency, you will have to find out what exactly
>is slowing the query down.

The Jakob's application is a consultation tool (based on eXist) without
any update of the resources in the database. Data can really be considerer
as static (read only) during the whole process.
Then in this context; I do not really understand what you mean by
"blocking the entire database".  In the application context, we expect
to ignore any concurrency access problems (locks, ...). so is there a way
to specify this (in eXist configuration, or XqueryEngine parameters) ?


>It is hard to give any advice without seeing the query and the data.
>You could try to increase the cacheSize setting in conf.xml,
>especially if you see a lot of disk activity.

Do you have any idea (or tool) allowing to meter the disk activity or
the origin of  time consumtion for a query ?


Regards

-- bruno --


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

wolfgangmm
Hi Bruno,

> The Jakob's application is a consultation tool (based on eXist) without
> any update of the resources in the database. Data can really be considerer
> as static (read only) during the whole process.
> Then in this context; I do not really understand what you mean by
> "blocking the entire database".

Though queries are running concurrently, access to some of the core
database resources (the raw .dbx files, indexes, page caches) needs to
be synchronized since these resources are shared between all threads.
Ideally, exclusive access to resources should be limited to a small
fraction of the query time. However, concurrency will suffer if the
query needs to scan - for example - large parts of dom.dbx. In this
case, threads will keep blocking each other most of the time.

So the best you can do is to analyze the query and identify the
bottleneck. Recent postings by Cedric and others show: chances are
good that performance problems can be fixed if we are provided with
enough information to reproduce the concrete issue!

> In the application context, we expect
> to ignore any concurrency access problems (locks, ...). so is there a way
> to specify this (in eXist configuration, or XqueryEngine parameters) ?

No. The dependencies between page IO, caching and transaction
management are quite complicated. Some things could certainly be
improved internally, but you can't simply turn of the locking protocol
or eXist will stop working.

> >It is hard to give any advice without seeing the query and the data.
> >You could try to increase the cacheSize setting in conf.xml,
> >especially if you see a lot of disk activity.
>
> Do you have any idea (or tool) allowing to meter the disk activity or
> the origin of  time consumtion for a query ?

Unix tools like iostat or sar show you the page-ins/outs reported by
the OS. Continuously high values indicate that disk IO is responsible
for the bad performance.

The CVS version also offers a limited profiling facility. Add the
following to the start of your query:

declare option exist:profiling "enabled=yes verbosity=10
logger=xquery.profiling";

This will output a lot of timing info to the specified log4j logger.
Currently, the output is limited to function calls and path
expressions.

Wolfgang


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Jean-Marc Vanel-3
In reply to this post by Bruno Chatel
Bruno Chatel wrote:

> Do you have any idea (or tool) allowing to meter the disk activity or
>
>the origin of  time consumtion for a query ?
>  
>
Bonsoir Bruno !

Put this at the beginning of your XQuery :
declare option exist:profiling "enabled=yes verbosity=5";
It quite new and I haven't tried it yet.

--
Jean-Marc Vanel
Conseil et Services / développement & intégration logiciels
Logiciel libre, Web, Java, XML ...
A la pointe de la technique, au service des projets
http://jmvanel.free.fr/ ===) CV, software resources

Mes journaux:
- sujets généraux en Français: http://jmvanel.free.fr/Block-note.html
- sujets informatiques en Français: http://jmvanel.free.fr/notes-informatiques.html
- computer science diary : http://jmvanel.free.fr/computer-notes.html

Worldwide Botanical Knowledge Base
http://wwbota.free.fr/ 
test XML query engine: http://jmvanel.free.fr/protea.html




-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Jakob Fix-2
In reply to this post by wolfgangmm
Hi Wolfgang,

thanks for having another look.

On 17/10/05, Wolfgang Meier <[hidden email]> wrote:
> Hi Bruno, hi Jakob,
>
> I had another look at the queries. For the QMETAIU (and similar)
> queries I found one thing more to improve: attribute lookups use an
> optimization I introduced recently, but in your case (the context
> results from a //* selection) this optimization is actually bad and
> leads to a performance loss. I fixed this in CVS. Query times went
> down from 6000 to 1000 ms for my test on a slower machine.

This sounds promising.  My only problem is this: I wonder whether
upgrading Exist at this stage won't break anything.  Many things have
changed since the version we've been using.  I think we will try to
migrate to the CVS version nevertheless.  What would you say about the
stability of the CVS version?

> Concerning the other queries: most of the time now seems to be spend
> for retrieving node values in the phrase() function. This could be
> improved: contrary to earlier versions, the current CVS code stores
> the actual character offset of the match text in the fulltext index,
> which is available in Match.getOffset(). phrase() requires the matches
> to be in the correct order and it can not be applied to mixed content
> nodes. It would thus be possible to check the match offsets before
> scanning the actual node content and filter out wrong matches (i.e.
> matches where the second term occurs before the first). This should
> result in a considerable speedup.

uh, I hope Bruno understands what you're saying here :-). In practice,
will these improvements mean we have to rewrite portions of our code,
or are you explaining what's happening behind the scenes?

Again, thanks a lot for looking into this problem.

--
cheers,
Jakob.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Wolfgang Meier-2
Hi Jakob,

> This sounds promising.  My only problem is this: I wonder whether
> upgrading Exist at this stage won't break anything.  Many things have
> changed since the version we've been using.  I think we will try to
> migrate to the CVS version nevertheless.  What would you say about the
> stability of the CVS version?

The CVS version should be stable. Compared to the last snapshot there were
many, many bug fixes and performance improvements. The only reason why I
hesitate to release a new snapshot is that there are two known problems in
the recovery code. But those are in the last snapshot as well.

> > Concerning the other queries: most of the time now seems to be spend
> > for retrieving node values in the phrase() function. This could be
> > improved: contrary to earlier versions, the current CVS code stores
> > the actual character offset of the match text in the fulltext index,
> > which is available in Match.getOffset(). phrase() requires the matches
> > to be in the correct order and it can not be applied to mixed content
> > nodes. It would thus be possible to check the match offsets before
> > scanning the actual node content and filter out wrong matches (i.e.
> > matches where the second term occurs before the first). This should
> > result in a considerable speedup.
>
> uh, I hope Bruno understands what you're saying here :-). In practice,
> will these improvements mean we have to rewrite portions of our code,
> or are you explaining what's happening behind the scenes?

Bruno wrote the phrase function, so I assume he will know what I'm talking
about ;-) Basically, the implementation of phrase could be improved now that
eXist provides more information about the location of text matches. You don't
need to change your code. I would like to make the necessary modifications,
but I will be busy over the next days.

Wolfgang


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Michael Beddow-2
Jakob Fix wrote:

> > This sounds promising.  My only problem is this: I wonder whether
> > upgrading Exist at this stage won't break anything.  Many things have
> > changed since the version we've been using.   I think we will try to
> > migrate to the CVS version nevertheless.

Unless your installation is very unusual either in configuration or disk
space demands, installing a current CVS in parallel with your current
version should be pretty straightforward, so if it turns out that anything
breaks you have a quick way back. For anyone who can't do that, I recommend
considering using a separate box for the parallel installation. That costs a
project very little compared with the massive savings that the ability to
use eXist brings with it...

The only causes for (very mild) anxiety I can think of are some improvements
in standards conformance of the XQuery implementation (including at least
one which incorporates a change introduced in the Sept 2005 version) but in
the unlikely event of those requiring you to update your code that would be
a Good Thing anyway.

> What would you say about the stability of the CVS version?

I can confirm what Wolfgang says in response to this enquiry (even though I
am one of the  sufferers from that recovery code problem which he mentions)
but I think he is on the track of the bug(s?), and if they prove elusive,
eXist can always be run with recovery turned off (as we all had to do in the
days when it couldn't be turned on in the first place.)

On my collections at least, the performance increases, especially in
indexing /re-indexing and restore from backup times, are very large indeed
(the data that I complained here a while back was taking 8 hours to restore
shortly after recovery was first implemented, and which I was then able to
report in due course, after Wolfgang investigated the problem, was going in
again in a "normal" time of just under two hours, now goes in in slightly
more than 10 minutes.) I have been thrashing the CVS from last Sunday (with
a couple of minor patches supplied by Wolfgang in the meantime, now in CVS)
pretty hard all week, and aside from occasional -- and so far, by me at any
rate, unreproducible -- triggering of complete restore cycles when the
database is actually still intact, there has been no price to pay for the
big performance gains. This morning I put the CVS build on to the "second
rank" machine that's used by more technically savvy project members for
their routine work, and nothing untoward has so far emerged from their day's
activities.


> > > Concerning the other queries: most of the time now seems to be spend
> > > for retrieving node values in the phrase() function. This could be
> > > improved: contrary to earlier versions, the current CVS code stores
> > > the actual character offset of the match text in the fulltext index,
> > > which is available in Match.getOffset(). phrase() requires the matches
> > > to be in the correct order and it can not be applied to mixed content
> > > nodes. It would thus be possible to check the match offsets before
> > > scanning the actual node content and filter out wrong matches (i.e.
> > > matches where the second term occurs before the first). This should
> > > result in a considerable speedup.

> > uh, I hope Bruno understands what you're saying here :-). In practice,
> > will these improvements mean we have to rewrite portions of our code,
> > or are you explaining what's happening behind the scenes?

phrase() as currently implemented was useful to some people, but it was
something of an afterthought in eXist's function repertoire, and pretty
inefficient, due to limitations in earlier versions of the fulltext code on
which it was dependent. It hasn't received a makeover yet, but the changes
made to the fulltext index to allow eXist to support KWIC displays mean that
the performance hit caused by the way phrase() currently works (and indeed
*had* to work on all previous versions of the fulltext indexer) should now
be removable. So right now phrase() isn't in any way worse than it was. But
no doubt it will soon get a whole lot better.

Michael Beddow



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: tweaking Exist for multi user environments

Jakob Fix-2
Hello,

I have done some preliminary tests which suggest that the CVS version
seriously improves performance for our type of queries.  It also seems
to tell us that using xpath expressions is preferable over custom
xquery functions (or at least that our implementation of these
functions needs improvement).

Response times improved from two times to more than 20 times (half a
second instead of 10 seconds)!  On average, queries are now 9 times
faster.

As I'm not very good with diagrams, I just attach a text files with
more information.  Next on the list are tests with multiple
simultaneous clients.

Thanks to everybody who contributed to this thread, in particular of
course Wolfgang.

--
cheers,
Jakob.

test.results.exist.txt (12K) Download Attachment