Deadlock-prevention patch

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Deadlock-prevention patch

Peter Gardfjäll

Hello,

I would like to submit a patch to resolve the deadlock problem I've been
having with all post-1.1b eXist-versions.

The code modifies NativeBroker.openCollection() to prevent calling threads
from waiting for a collection (read/write)lock while holding the
CollectionCache monitor.
(That's what caused my deadlock. One thread waiting for a collection
r/w-lock while holding the CollectionCache monitor, and another thread
holding the collection r/w-lock and waiting for monitor entrance.)

Although this modification seems to have done the trick for my
application, there may be other potential "deadlock-spots" lurking around
in the code.

kind regards
/Peter

deadlock-patch.txt (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock-prevention patch

Pierrick Brihaye
Hi,

Peter Gardfjäll a écrit :

> The code modifies NativeBroker.openCollection() to prevent calling threads
> from waiting for a collection (read/write)lock while holding the
> CollectionCache monitor.
> (That's what caused my deadlock. One thread waiting for a collection
> r/w-lock while holding the CollectionCache monitor, and another thread
> holding the collection r/w-lock and waiting for monitor entrance.)

Indeed : org.exist.xmldb.test.concurrent.DeadlockTest now passes. Other
tests aren't apparently broken...

Was it as simple as locking the collection after it is actually cached ?

Symetrically, we should however have the same design when a collection
is (re)moved.

> Although this modification seems to have done the trick for my
> application, there may be other potential "deadlock-spots" lurking around
> in the code.

If this one is solved, it's a nice step forward :-)

Cheers,

p.b.





-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock-prevention patch

Peter Gardfjäll
On Fri, 23 Sep 2005, Pierrick Brihaye wrote:

> Indeed : org.exist.xmldb.test.concurrent.DeadlockTest now passes. Other
> tests aren't apparently broken...
>
> Was it as simple as locking the collection after it is actually cached ?

Yes, by moving the actual collection locking out of the
(CollectionCache) synchronized block the calling thread is guaranteed to
only hold a single lock at any time, thereby eliminating the risk of a
circular wait.

The thread that opens (and caches) the collection is not guaranteed to
acquire the collection lock first (another thread may get the lock before
the "caching thread" reaches Lock.acquire) but I can't see why that should
be a problem. Please correct me if I'm wrong.

Note that the potential for deadlocks caused by a thread holding several
collection locks still remains. Such problems are likely to occur in
transactions that span several collections. However, that type of problem
needs to be handled through some sort of locking protocol.

> Symetrically, we should however have the same design when a collection
> is (re)moved.

The same pattern is generally applicable, however, I haven't studied the
other code sections (since they have never caused me any problems) so I'm
not sure if the fix can be applied there without breaking something.

> If this one is solved, it's a nice step forward :-)

Yes, deadlocks are a real pain in the back.

Cheers, Peter

> Cheers,
>
> p.b.
>
>
>

-------------------------------------------------
 Peter Gardfjäll
 Department of Computing Science, Umeå University
 SE-901 87 Umeå, Sweden


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock-prevention patch

Pierrick Brihaye
Hi,

Peter Gardfjäll a écrit :

> The thread that opens (and caches) the collection is not guaranteed to
> acquire the collection lock first (another thread may get the lock before
> the "caching thread" reaches Lock.acquire) but I can't see why that should
> be a problem. Please correct me if I'm wrong.

Any objection anyone ?

> Note that the potential for deadlocks caused by a thread holding several
> collection locks still remains. Such problems are likely to occur in
> transactions that span several collections.

The problem is that opening, moving, and removing of collections *also*
locks the subcollections. It may be nice to improve the test case to
detect such side-effects.

 > However, that type of problem
> needs to be handled through some sort of locking protocol.

Yes. Wolfgang has some ideas about this ;-)

Cheers,

p.b.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock-prevention patch

wolfgangmm
In reply to this post by Peter Gardfjäll
Ok, patch applied. Thanks.

> Note that the potential for deadlocks caused by a thread holding several
> collection locks still remains. Such problems are likely to occur in
> transactions that span several collections. However, that type of problem
> needs to be handled through some sort of locking protocol.

Yes, that's perfectly true. I tried to address the deadlock issue at a
more fundamental level by avoiding synchronization on the collection
cache, but it showed to be quite complicated. The basic problem of
transactions working across multiple collections remains. Maybe we
should first replace the current lock classes by a cleaner
implementation that also offers deadlock detection. Apache's
commons-transactions package might be a starting point.

Wolfgang


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open