Quantcast

Workaround for KWIC slowness

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Workaround for KWIC slowness

Joe Wicentowski
Hi Winona,

I saw your tweet about KWIC being a source of slowness, rather than
the indexes as you first suspected
(https://twitter.com/wsalesky/status/473991703020314624).  If I'm
guessing correctly, you're seeing the same phenomenon I did.  It
becomes pronounced particularly when there are tons of hits in a
single result.  I've posted in my workaround below, which had some
pretty dramatic results (see the commit message).  I trim the # of
exist:match elements before displaying the results, using the
search:trim-matches() function below.  (Note that I use David Sewell's
milestone-chunk() function rather than util:get-matches-between()
because the $node containing the exist:match elements is in-memory,
and the util function operates only on data stored in the database.)

Joe


---------- Forwarded message ----------
From:  <[hidden email]>
Date: Wed, Feb 19, 2014 at 11:10 AM
Subject: Changeset 2705 from joewiz


= Subversion url
https://subversion.assembla.com/svn/paho

= Commit message
After adding the logging last Friday and analyzing the logs to isolate
the periods when the server's CPU spiked, I noticed a pattern: many of
the spikes happened during certain searches.  But not all searches.  I
discovered that the searches that took the longest to complete
included results from our back-of-book indexes - with typically tons
of results in each index.  The slowdown was caused when our "keyword
in context" (KWIC) search result highlighting routine was applying
highlights to all of these (hundreds, thousands?) of search results,
constructing in memory node trees that sucked up all available memory
and impacting the performance of other queries at the same time.  To
reduce our exposure to this phenomenon, I added some functions to our
search module, which limits the number of highlights to a hard-coded
number -- currently 10, but this can be altered.  The results are
pretty incredible: The performance of the hardest hit queries improved
between 100-768 times!

= Affected files
M   trunk/db/history/modules/search.xqm


= Diff
--- /trunk/db/history/modules/search.xqm
+++ /trunk/db/history/modules/search.xqm
@@ -325,9 +325,46 @@
         $query
 };

+declare function search:milestone-chunk(
+  $ms1 as element(),
+  $ms2 as element(),
+  $node as node()*
+) as node()*
+{
+    typeswitch ($node)
+        case element() return
+            if ($node is $ms1) then
+                $node
+            else if ( some $n in $node/descendant::* satisfies ($n is
$ms1 or $n is $ms2) ) then
+                element { name($node) }
+                    {
+                    for $i in ( $node/node() | $node/@* )
+                    return
+                        search:milestone-chunk($ms1, $ms2, $i)
+                    }
+            else if ( $node >> $ms1 and $node << $ms2 ) then
+                $node
+            else ()
+        default return
+            if ( $node >> $ms1 and $node << $ms2 ) then
+                $node
+            else ()
+};
+
+declare function search:trim-matches($node, $keep) {
+    let $matches := $node//exist:match
+    return
+        if (count($matches) le $keep) then
+            $node
+        else
+            search:milestone-chunk(subsequence($matches, 1, 1),
subsequence($matches, $keep, 1), $node)
+};
+
 (: Formats results for display as HTML :)
 declare function search:display-results($hit as element()) as element()* {
-    let $summary := kwic:summarize($hit, <config xmlns="" width="60"/>)/*
+    let $matches-to-highlight := 10
+    let $trimmed-hit := search:trim-matches(util:expand($hit),
$matches-to-highlight)
+    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
width="60"/>)/*
     let $info := search:prepare-results($hit)
     let $link := $info/link/text()
     let $title := $info/title/node()



-------
Subversion hosting offered by Assembla
http://www.assembla.com/free_subversion_hosting

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Chris Tomlinson-2-2
I was looking at this sort of problem last week and it seems to me that what would really help would be a form of until:expand that takes a limit argument since otherwise until:expand will go ahead and produce a potentially large MemTree with unneeded exist:match nodes thus creating churn.  

Alas I didn't see immediately how to communicate the limit argument through to the code that actually generates the exist:match nodes.

Chris


> On Jun 3, 2014, at 10:11 PM, Joe Wicentowski <[hidden email]> wrote:
>
> Hi Winona,
>
> I saw your tweet about KWIC being a source of slowness, rather than
> the indexes as you first suspected
> (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
> guessing correctly, you're seeing the same phenomenon I did.  It
> becomes pronounced particularly when there are tons of hits in a
> single result.  I've posted in my workaround below, which had some
> pretty dramatic results (see the commit message).  I trim the # of
> exist:match elements before displaying the results, using the
> search:trim-matches() function below.  (Note that I use David Sewell's
> milestone-chunk() function rather than util:get-matches-between()
> because the $node containing the exist:match elements is in-memory,
> and the util function operates only on data stored in the database.)
>
> Joe
>
>
> ---------- Forwarded message ----------
> From:  <[hidden email]>
> Date: Wed, Feb 19, 2014 at 11:10 AM
> Subject: Changeset 2705 from joewiz
>
>
> = Subversion url
> https://subversion.assembla.com/svn/paho
>
> = Commit message
> After adding the logging last Friday and analyzing the logs to isolate
> the periods when the server's CPU spiked, I noticed a pattern: many of
> the spikes happened during certain searches.  But not all searches.  I
> discovered that the searches that took the longest to complete
> included results from our back-of-book indexes - with typically tons
> of results in each index.  The slowdown was caused when our "keyword
> in context" (KWIC) search result highlighting routine was applying
> highlights to all of these (hundreds, thousands?) of search results,
> constructing in memory node trees that sucked up all available memory
> and impacting the performance of other queries at the same time.  To
> reduce our exposure to this phenomenon, I added some functions to our
> search module, which limits the number of highlights to a hard-coded
> number -- currently 10, but this can be altered.  The results are
> pretty incredible: The performance of the hardest hit queries improved
> between 100-768 times!
>
> = Affected files
> M   trunk/db/history/modules/search.xqm
>
>
> = Diff
> --- /trunk/db/history/modules/search.xqm
> +++ /trunk/db/history/modules/search.xqm
> @@ -325,9 +325,46 @@
>         $query
> };
>
> +declare function search:milestone-chunk(
> +  $ms1 as element(),
> +  $ms2 as element(),
> +  $node as node()*
> +) as node()*
> +{
> +    typeswitch ($node)
> +        case element() return
> +            if ($node is $ms1) then
> +                $node
> +            else if ( some $n in $node/descendant::* satisfies ($n is
> $ms1 or $n is $ms2) ) then
> +                element { name($node) }
> +                    {
> +                    for $i in ( $node/node() | $node/@* )
> +                    return
> +                        search:milestone-chunk($ms1, $ms2, $i)
> +                    }
> +            else if ( $node >> $ms1 and $node << $ms2 ) then
> +                $node
> +            else ()
> +        default return
> +            if ( $node >> $ms1 and $node << $ms2 ) then
> +                $node
> +            else ()
> +};
> +
> +declare function search:trim-matches($node, $keep) {
> +    let $matches := $node//exist:match
> +    return
> +        if (count($matches) le $keep) then
> +            $node
> +        else
> +            search:milestone-chunk(subsequence($matches, 1, 1),
> subsequence($matches, $keep, 1), $node)
> +};
> +
> (: Formats results for display as HTML :)
> declare function search:display-results($hit as element()) as element()* {
> -    let $summary := kwic:summarize($hit, <config xmlns="" width="60"/>)/*
> +    let $matches-to-highlight := 10
> +    let $trimmed-hit := search:trim-matches(util:expand($hit),
> $matches-to-highlight)
> +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
> width="60"/>)/*
>     let $info := search:prepare-results($hit)
>     let $link := $info/link/text()
>     let $title := $info/title/node()
>
>
>
> -------
> Subversion hosting offered by Assembla
> http://www.assembla.com/free_subversion_hosting
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

wsalesky
In reply to this post by Joe Wicentowski
Thanks Joe, 
This was helpful, but it remains prohibitively slow. The slow down seems to be in the KWIC module, when I eliminate the KWIC module and use only the util:expand, the query time is halved. However, on the full text searches I am still looking at queries of 16.9 sec, without any version of keyword in context I'm returning results in under 50ms. 

This is not really an urgent issue for me at the moment, I was hopping to demo KWIC for the class, but I may just likely just demo it using a different data set. 

Thanks again for the help, I look forward to future discussions on this topic. 
-Winona



On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]> wrote:
Hi Winona,

I saw your tweet about KWIC being a source of slowness, rather than
the indexes as you first suspected
(https://twitter.com/wsalesky/status/473991703020314624).  If I'm
guessing correctly, you're seeing the same phenomenon I did.  It
becomes pronounced particularly when there are tons of hits in a
single result.  I've posted in my workaround below, which had some
pretty dramatic results (see the commit message).  I trim the # of
exist:match elements before displaying the results, using the
search:trim-matches() function below.  (Note that I use David Sewell's
milestone-chunk() function rather than util:get-matches-between()
because the $node containing the exist:match elements is in-memory,
and the util function operates only on data stored in the database.)

Joe


---------- Forwarded message ----------
From:  <[hidden email]>
Date: Wed, Feb 19, 2014 at 11:10 AM
Subject: Changeset 2705 from joewiz


= Subversion url
https://subversion.assembla.com/svn/paho

= Commit message
After adding the logging last Friday and analyzing the logs to isolate
the periods when the server's CPU spiked, I noticed a pattern: many of
the spikes happened during certain searches.  But not all searches.  I
discovered that the searches that took the longest to complete
included results from our back-of-book indexes - with typically tons
of results in each index.  The slowdown was caused when our "keyword
in context" (KWIC) search result highlighting routine was applying
highlights to all of these (hundreds, thousands?) of search results,
constructing in memory node trees that sucked up all available memory
and impacting the performance of other queries at the same time.  To
reduce our exposure to this phenomenon, I added some functions to our
search module, which limits the number of highlights to a hard-coded
number -- currently 10, but this can be altered.  The results are
pretty incredible: The performance of the hardest hit queries improved
between 100-768 times!

= Affected files
M   trunk/db/history/modules/search.xqm


= Diff
--- /trunk/db/history/modules/search.xqm
+++ /trunk/db/history/modules/search.xqm
@@ -325,9 +325,46 @@
         $query
 };

+declare function search:milestone-chunk(
+  $ms1 as element(),
+  $ms2 as element(),
+  $node as node()*
+) as node()*
+{
+    typeswitch ($node)
+        case element() return
+            if ($node is $ms1) then
+                $node
+            else if ( some $n in $node/descendant::* satisfies ($n is
$ms1 or $n is $ms2) ) then
+                element { name($node) }
+                    {
+                    for $i in ( $node/node() | $node/@* )
+                    return
+                        search:milestone-chunk($ms1, $ms2, $i)
+                    }
+            else if ( $node >> $ms1 and $node << $ms2 ) then
+                $node
+            else ()
+        default return
+            if ( $node >> $ms1 and $node << $ms2 ) then
+                $node
+            else ()
+};
+
+declare function search:trim-matches($node, $keep) {
+    let $matches := $node//exist:match
+    return
+        if (count($matches) le $keep) then
+            $node
+        else
+            search:milestone-chunk(subsequence($matches, 1, 1),
subsequence($matches, $keep, 1), $node)
+};
+
 (: Formats results for display as HTML :)
 declare function search:display-results($hit as element()) as element()* {
-    let $summary := kwic:summarize($hit, <config xmlns="" width="60"/>)/*
+    let $matches-to-highlight := 10
+    let $trimmed-hit := search:trim-matches(util:expand($hit),
$matches-to-highlight)
+    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
width="60"/>)/*
     let $info := search:prepare-results($hit)
     let $link := $info/link/text()
     let $title := $info/title/node()



-------
Subversion hosting offered by Assembla
http://www.assembla.com/free_subversion_hosting


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Leif-Jöran Olsson-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear all, I have already started to rework the KWIC module into java
code and it was good to get these usecases and your feedback on limiting
the number of match markings not affecting them as much as in Joe's case.

Cheers,
Leif-Jöran


Den 2014-06-05 05:39, Winona Salesky skrev:

> Thanks Joe,
> This was helpful, but it remains prohibitively slow. The slow down seems
> to be in the KWIC module, when I eliminate the KWIC module and use only
> the util:expand, the query time is halved. However, on the full text
> searches I am still looking at queries of 16.9 sec, without any version
> of keyword in context I'm returning results in under 50ms.
>
> This is not really an urgent issue for me at the moment, I was hopping
> to demo KWIC for the class, but I may just likely just demo it using a
> different data set.
>
> Thanks again for the help, I look forward to future discussions on this
> topic.
> -Winona
>
>
>
> On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Winona,
>
>     I saw your tweet about KWIC being a source of slowness, rather than
>     the indexes as you first suspected
>     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>     guessing correctly, you're seeing the same phenomenon I did.  It
>     becomes pronounced particularly when there are tons of hits in a
>     single result.  I've posted in my workaround below, which had some
>     pretty dramatic results (see the commit message).  I trim the # of
>     exist:match elements before displaying the results, using the
>     search:trim-matches() function below.  (Note that I use David Sewell's
>     milestone-chunk() function rather than util:get-matches-between()
>     because the $node containing the exist:match elements is in-memory,
>     and the util function operates only on data stored in the database.)
>
>     Joe
>
>
>     ---------- Forwarded message ----------
>     From:  <[hidden email] <mailto:[hidden email]>>
>     Date: Wed, Feb 19, 2014 at 11:10 AM
>     Subject: Changeset 2705 from joewiz
>
>
>     = Subversion url
>     https://subversion.assembla.com/svn/paho
>
>     = Commit message
>     After adding the logging last Friday and analyzing the logs to isolate
>     the periods when the server's CPU spiked, I noticed a pattern: many of
>     the spikes happened during certain searches.  But not all searches.  I
>     discovered that the searches that took the longest to complete
>     included results from our back-of-book indexes - with typically tons
>     of results in each index.  The slowdown was caused when our "keyword
>     in context" (KWIC) search result highlighting routine was applying
>     highlights to all of these (hundreds, thousands?) of search results,
>     constructing in memory node trees that sucked up all available memory
>     and impacting the performance of other queries at the same time.  To
>     reduce our exposure to this phenomenon, I added some functions to our
>     search module, which limits the number of highlights to a hard-coded
>     number -- currently 10, but this can be altered.  The results are
>     pretty incredible: The performance of the hardest hit queries improved
>     between 100-768 times!
>
>     = Affected files
>     M   trunk/db/history/modules/search.xqm
>
>
>     = Diff
>     --- /trunk/db/history/modules/search.xqm
>     +++ /trunk/db/history/modules/search.xqm
>     @@ -325,9 +325,46 @@
>              $query
>      };
>
>     +declare function search:milestone-chunk(
>     +  $ms1 as element(),
>     +  $ms2 as element(),
>     +  $node as node()*
>     +) as node()*
>     +{
>     +    typeswitch ($node)
>     +        case element() return
>     +            if ($node is $ms1) then
>     +                $node
>     +            else if ( some $n in $node/descendant::* satisfies ($n is
>     $ms1 or $n is $ms2) ) then
>     +                element { name($node) }
>     +                    {
>     +                    for $i in ( $node/node() | $node/@* )
>     +                    return
>     +                        search:milestone-chunk($ms1, $ms2, $i)
>     +                    }
>     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>     +                $node
>     +            else ()
>     +        default return
>     +            if ( $node >> $ms1 and $node << $ms2 ) then
>     +                $node
>     +            else ()
>     +};
>     +
>     +declare function search:trim-matches($node, $keep) {
>     +    let $matches := $node//exist:match
>     +    return
>     +        if (count($matches) le $keep) then
>     +            $node
>     +        else
>     +            search:milestone-chunk(subsequence($matches, 1, 1),
>     subsequence($matches, $keep, 1), $node)
>     +};
>     +
>      (: Formats results for display as HTML :)
>      declare function search:display-results($hit as element()) as
>     element()* {
>     -    let $summary := kwic:summarize($hit, <config xmlns=""
>     width="60"/>)/*
>     +    let $matches-to-highlight := 10
>     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>     $matches-to-highlight)
>     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>     width="60"/>)/*
>          let $info := search:prepare-results($hit)
>          let $link := $info/link/text()
>          let $title := $info/title/node()
>
>
>
>     -------
>     Subversion hosting offered by Assembla
>     http://www.assembla.com/free_subversion_hosting
>
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iD8DBQFTlZbhhcIn5aVXOPIRAsMmAJ0YqD7GkTEOrgdbNLrlvi2kCE4KXACgixNJ
UWlNj4C9R5GhMZ2QoQFEvDc=
=Lm1z
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Chris Tomlinson-2-2
Hello Leif-Jöran,

Will your re-work into Java of the KWIC module include handling the util:expand with a limit parameter or something similar?

Thanks,
Chris


On Jun 9, 2014, at 6:13 AM, Leif-Jöran Olsson <[hidden email]> wrote:

Signed PGP part
Dear all, I have already started to rework the KWIC module into java
code and it was good to get these usecases and your feedback on limiting
the number of match markings not affecting them as much as in Joe's case.

Cheers,
Leif-Jöran


Den 2014-06-05 05:39, Winona Salesky skrev:

> Thanks Joe,
> This was helpful, but it remains prohibitively slow. The slow down seems
> to be in the KWIC module, when I eliminate the KWIC module and use only
> the util:expand, the query time is halved. However, on the full text
> searches I am still looking at queries of 16.9 sec, without any version
> of keyword in context I'm returning results in under 50ms.
>
> This is not really an urgent issue for me at the moment, I was hopping
> to demo KWIC for the class, but I may just likely just demo it using a
> different data set.
>
> Thanks again for the help, I look forward to future discussions on this
> topic.
> -Winona
>
>
>
> On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
> <[hidden email]>> wrote:
>
>     Hi Winona,
>
>     I saw your tweet about KWIC being a source of slowness, rather than
>     the indexes as you first suspected
>     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>     guessing correctly, you're seeing the same phenomenon I did.  It
>     becomes pronounced particularly when there are tons of hits in a
>     single result.  I've posted in my workaround below, which had some
>     pretty dramatic results (see the commit message).  I trim the # of
>     exist:match elements before displaying the results, using the
>     search:trim-matches() function below.  (Note that I use David Sewell's
>     milestone-chunk() function rather than util:get-matches-between()
>     because the $node containing the exist:match elements is in-memory,
>     and the util function operates only on data stored in the database.)
>
>     Joe
>
>
>     ---------- Forwarded message ----------
>     From:  <[hidden email] <[hidden email]>>
>     Date: Wed, Feb 19, 2014 at 11:10 AM
>     Subject: Changeset 2705 from joewiz
>
>
>     = Subversion url
>     https://subversion.assembla.com/svn/paho
>
>     = Commit message
>     After adding the logging last Friday and analyzing the logs to isolate
>     the periods when the server's CPU spiked, I noticed a pattern: many of
>     the spikes happened during certain searches.  But not all searches.  I
>     discovered that the searches that took the longest to complete
>     included results from our back-of-book indexes - with typically tons
>     of results in each index.  The slowdown was caused when our "keyword
>     in context" (KWIC) search result highlighting routine was applying
>     highlights to all of these (hundreds, thousands?) of search results,
>     constructing in memory node trees that sucked up all available memory
>     and impacting the performance of other queries at the same time.  To
>     reduce our exposure to this phenomenon, I added some functions to our
>     search module, which limits the number of highlights to a hard-coded
>     number -- currently 10, but this can be altered.  The results are
>     pretty incredible: The performance of the hardest hit queries improved
>     between 100-768 times!
>
>     = Affected files
>     M   trunk/db/history/modules/search.xqm
>
>
>     = Diff
>     --- /trunk/db/history/modules/search.xqm
>     +++ /trunk/db/history/modules/search.xqm
>     @@ -325,9 +325,46 @@
>              $query
>      };
>
>     +declare function search:milestone-chunk(
>     +  $ms1 as element(),
>     +  $ms2 as element(),
>     +  $node as node()*
>     +) as node()*
>     +{
>     +    typeswitch ($node)
>     +        case element() return
>     +            if ($node is $ms1) then
>     +                $node
>     +            else if ( some $n in $node/descendant::* satisfies ($n is
>     $ms1 or $n is $ms2) ) then
>     +                element { name($node) }
>     +                    {
>     +                    for $i in ( $node/node() | $node/@* )
>     +                    return
>     +                        search:milestone-chunk($ms1, $ms2, $i)
>     +                    }
>     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>     +                $node
>     +            else ()
>     +        default return
>     +            if ( $node >> $ms1 and $node << $ms2 ) then
>     +                $node
>     +            else ()
>     +};
>     +
>     +declare function search:trim-matches($node, $keep) {
>     +    let $matches := $node//exist:match
>     +    return
>     +        if (count($matches) le $keep) then
>     +            $node
>     +        else
>     +            search:milestone-chunk(subsequence($matches, 1, 1),
>     subsequence($matches, $keep, 1), $node)
>     +};
>     +
>      (: Formats results for display as HTML :)
>      declare function search:display-results($hit as element()) as
>     element()* {
>     -    let $summary := kwic:summarize($hit, <config xmlns=""
>     width="60"/>)/*
>     +    let $matches-to-highlight := 10
>     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>     $matches-to-highlight)
>     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>     width="60"/>)/*
>          let $info := search:prepare-results($hit)
>          let $link := $info/link/text()
>          let $title := $info/title/node()
>
>
>
>     -------
>     Subversion hosting offered by Assembla
>     http://www.assembla.com/free_subversion_hosting
>
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Leif-Jöran Olsson-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris, yes, that equivalent functionality will be one of the features.

Cheers,
Leif-Jöran

Den 2014-06-09 19:27, Chris Tomlinson skrev:

> Hello Leif-Jöran,
>
> Will your re-work into Java of the KWIC module include handling the
> util:expand with a limit parameter or something similar?
>
> Thanks,
> Chris
>
>
> On Jun 9, 2014, at 6:13 AM, Leif-Jöran Olsson <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>> Signed PGP part
>> Dear all, I have already started to rework the KWIC module into java
>> code and it was good to get these usecases and your feedback on limiting
>> the number of match markings not affecting them as much as in Joe's case.
>>
>> Cheers,
>> Leif-Jöran
>>
>>
>> Den 2014-06-05 05:39, Winona Salesky skrev:
>> > Thanks Joe,
>> > This was helpful, but it remains prohibitively slow. The slow down seems
>> > to be in the KWIC module, when I eliminate the KWIC module and use only
>> > the util:expand, the query time is halved. However, on the full text
>> > searches I am still looking at queries of 16.9 sec, without any version
>> > of keyword in context I'm returning results in under 50ms.
>> >
>> > This is not really an urgent issue for me at the moment, I was hopping
>> > to demo KWIC for the class, but I may just likely just demo it using a
>> > different data set.
>> >
>> > Thanks again for the help, I look forward to future discussions on this
>> > topic.
>> > -Winona
>> >
>> >
>> >
>> > On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
>> <mailto:[hidden email]>
>> > <mailto:[hidden email]>> wrote:
>> >
>> >     Hi Winona,
>> >
>> >     I saw your tweet about KWIC being a source of slowness, rather than
>> >     the indexes as you first suspected
>> >     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>> >     guessing correctly, you're seeing the same phenomenon I did.  It
>> >     becomes pronounced particularly when there are tons of hits in a
>> >     single result.  I've posted in my workaround below, which had some
>> >     pretty dramatic results (see the commit message).  I trim the # of
>> >     exist:match elements before displaying the results, using the
>> >     search:trim-matches() function below.  (Note that I use David
>> Sewell's
>> >     milestone-chunk() function rather than util:get-matches-between()
>> >     because the $node containing the exist:match elements is in-memory,
>> >     and the util function operates only on data stored in the database.)
>> >
>> >     Joe
>> >
>> >
>> >     ---------- Forwarded message ----------
>> >     From:  <[hidden email] <mailto:[hidden email]>
>> <mailto:[hidden email]>>
>> >     Date: Wed, Feb 19, 2014 at 11:10 AM
>> >     Subject: Changeset 2705 from joewiz
>> >
>> >
>> >     = Subversion url
>> >     https://subversion.assembla.com/svn/paho
>> >
>> >     = Commit message
>> >     After adding the logging last Friday and analyzing the logs to
>> isolate
>> >     the periods when the server's CPU spiked, I noticed a pattern:
>> many of
>> >     the spikes happened during certain searches.  But not all
>> searches.  I
>> >     discovered that the searches that took the longest to complete
>> >     included results from our back-of-book indexes - with typically tons
>> >     of results in each index.  The slowdown was caused when our "keyword
>> >     in context" (KWIC) search result highlighting routine was applying
>> >     highlights to all of these (hundreds, thousands?) of search results,
>> >     constructing in memory node trees that sucked up all available
>> memory
>> >     and impacting the performance of other queries at the same time.  To
>> >     reduce our exposure to this phenomenon, I added some functions
>> to our
>> >     search module, which limits the number of highlights to a hard-coded
>> >     number -- currently 10, but this can be altered.  The results are
>> >     pretty incredible: The performance of the hardest hit queries
>> improved
>> >     between 100-768 times!
>> >
>> >     = Affected files
>> >     M   trunk/db/history/modules/search.xqm
>> >
>> >
>> >     = Diff
>> >     --- /trunk/db/history/modules/search.xqm
>> >     +++ /trunk/db/history/modules/search.xqm
>> >     @@ -325,9 +325,46 @@
>> >              $query
>> >      };
>> >
>> >     +declare function search:milestone-chunk(
>> >     +  $ms1 as element(),
>> >     +  $ms2 as element(),
>> >     +  $node as node()*
>> >     +) as node()*
>> >     +{
>> >     +    typeswitch ($node)
>> >     +        case element() return
>> >     +            if ($node is $ms1) then
>> >     +                $node
>> >     +            else if ( some $n in $node/descendant::* satisfies
>> ($n is
>> >     $ms1 or $n is $ms2) ) then
>> >     +                element { name($node) }
>> >     +                    {
>> >     +                    for $i in ( $node/node() | $node/@* )
>> >     +                    return
>> >     +                        search:milestone-chunk($ms1, $ms2, $i)
>> >     +                    }
>> >     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>> >     +                $node
>> >     +            else ()
>> >     +        default return
>> >     +            if ( $node >> $ms1 and $node << $ms2 ) then
>> >     +                $node
>> >     +            else ()
>> >     +};
>> >     +
>> >     +declare function search:trim-matches($node, $keep) {
>> >     +    let $matches := $node//exist:match
>> >     +    return
>> >     +        if (count($matches) le $keep) then
>> >     +            $node
>> >     +        else
>> >     +            search:milestone-chunk(subsequence($matches, 1, 1),
>> >     subsequence($matches, $keep, 1), $node)
>> >     +};
>> >     +
>> >      (: Formats results for display as HTML :)
>> >      declare function search:display-results($hit as element()) as
>> >     element()* {
>> >     -    let $summary := kwic:summarize($hit, <config xmlns=""
>> >     width="60"/>)/*
>> >     +    let $matches-to-highlight := 10
>> >     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>> >     $matches-to-highlight)
>> >     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>> >     width="60"/>)/*
>> >          let $info := search:prepare-results($hit)
>> >          let $link := $info/link/text()
>> >          let $title := $info/title/node()
>> >
>> >
>> >
>> >     -------
>> >     Subversion hosting offered by Assembla
>> >     http://www.assembla.com/free_subversion_hosting
>> >
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Learn Graph Databases - Download FREE O'Reilly Book
>> > "Graph Databases" is the definitive new guide to graph databases and
>> their
>> > applications. Written by three acclaimed leaders in the field,
>> > this first edition is now available. Download your free book today!
>> > http://p.sf.net/sfu/NeoTech
>> >
>> >
>> >
>> > _______________________________________________
>> > Exist-open mailing list
>> > [hidden email]
>> <mailto:[hidden email]>
>> > https://lists.sourceforge.net/lists/listinfo/exist-open
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://www.hpccsystems.com
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iD8DBQFTlgZzhcIn5aVXOPIRAs3yAJ9cbhIF8FCtfT0Ztq8yJMOw3TkKswCePvxL
5199+vrxDp5ZgY7NhDEXlZQ=
=KpS5
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Chris Tomlinson-2-2
Leif-Jöran, Great. I don't need to spend time on extending util:epand.

Thanks,
Chris



On Jun 9, 2014, at 2:09 PM, Leif-Jöran Olsson <[hidden email]> wrote:

Signed PGP part
Chris, yes, that equivalent functionality will be one of the features.

Cheers,
Leif-Jöran

Den 2014-06-09 19:27, Chris Tomlinson skrev:

> Hello Leif-Jöran,
>
> Will your re-work into Java of the KWIC module include handling the
> util:expand with a limit parameter or something similar?
>
> Thanks,
> Chris
>
>
> On Jun 9, 2014, at 6:13 AM, Leif-Jöran Olsson <[hidden email]
> <[hidden email]>> wrote:
>
>> Signed PGP part
>> Dear all, I have already started to rework the KWIC module into java
>> code and it was good to get these usecases and your feedback on limiting
>> the number of match markings not affecting them as much as in Joe's case.
>>
>> Cheers,
>> Leif-Jöran
>>
>>
>> Den 2014-06-05 05:39, Winona Salesky skrev:
>> > Thanks Joe,
>> > This was helpful, but it remains prohibitively slow. The slow down seems
>> > to be in the KWIC module, when I eliminate the KWIC module and use only
>> > the util:expand, the query time is halved. However, on the full text
>> > searches I am still looking at queries of 16.9 sec, without any version
>> > of keyword in context I'm returning results in under 50ms.
>> >
>> > This is not really an urgent issue for me at the moment, I was hopping
>> > to demo KWIC for the class, but I may just likely just demo it using a
>> > different data set.
>> >
>> > Thanks again for the help, I look forward to future discussions on this
>> > topic.
>> > -Winona
>> >
>> >
>> >
>> > On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
>> <[hidden email]>
>> > <[hidden email]>> wrote:
>> >
>> >     Hi Winona,
>> >
>> >     I saw your tweet about KWIC being a source of slowness, rather than
>> >     the indexes as you first suspected
>> >     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>> >     guessing correctly, you're seeing the same phenomenon I did.  It
>> >     becomes pronounced particularly when there are tons of hits in a
>> >     single result.  I've posted in my workaround below, which had some
>> >     pretty dramatic results (see the commit message).  I trim the # of
>> >     exist:match elements before displaying the results, using the
>> >     search:trim-matches() function below.  (Note that I use David
>> Sewell's
>> >     milestone-chunk() function rather than util:get-matches-between()
>> >     because the $node containing the exist:match elements is in-memory,
>> >     and the util function operates only on data stored in the database.)
>> >
>> >     Joe
>> >
>> >
>> >     ---------- Forwarded message ----------
>> >     From:  <[hidden email] <[hidden email]>
>> <[hidden email]>>
>> >     Date: Wed, Feb 19, 2014 at 11:10 AM
>> >     Subject: Changeset 2705 from joewiz
>> >
>> >
>> >     = Subversion url
>> >     https://subversion.assembla.com/svn/paho
>> >
>> >     = Commit message
>> >     After adding the logging last Friday and analyzing the logs to
>> isolate
>> >     the periods when the server's CPU spiked, I noticed a pattern:
>> many of
>> >     the spikes happened during certain searches.  But not all
>> searches.  I
>> >     discovered that the searches that took the longest to complete
>> >     included results from our back-of-book indexes - with typically tons
>> >     of results in each index.  The slowdown was caused when our "keyword
>> >     in context" (KWIC) search result highlighting routine was applying
>> >     highlights to all of these (hundreds, thousands?) of search results,
>> >     constructing in memory node trees that sucked up all available
>> memory
>> >     and impacting the performance of other queries at the same time.  To
>> >     reduce our exposure to this phenomenon, I added some functions
>> to our
>> >     search module, which limits the number of highlights to a hard-coded
>> >     number -- currently 10, but this can be altered.  The results are
>> >     pretty incredible: The performance of the hardest hit queries
>> improved
>> >     between 100-768 times!
>> >
>> >     = Affected files
>> >     M   trunk/db/history/modules/search.xqm
>> >
>> >
>> >     = Diff
>> >     --- /trunk/db/history/modules/search.xqm
>> >     +++ /trunk/db/history/modules/search.xqm
>> >     @@ -325,9 +325,46 @@
>> >              $query
>> >      };
>> >
>> >     +declare function search:milestone-chunk(
>> >     +  $ms1 as element(),
>> >     +  $ms2 as element(),
>> >     +  $node as node()*
>> >     +) as node()*
>> >     +{
>> >     +    typeswitch ($node)
>> >     +        case element() return
>> >     +            if ($node is $ms1) then
>> >     +                $node
>> >     +            else if ( some $n in $node/descendant::* satisfies
>> ($n is
>> >     $ms1 or $n is $ms2) ) then
>> >     +                element { name($node) }
>> >     +                    {
>> >     +                    for $i in ( $node/node() | $node/@* )
>> >     +                    return
>> >     +                        search:milestone-chunk($ms1, $ms2, $i)
>> >     +                    }
>> >     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>> >     +                $node
>> >     +            else ()
>> >     +        default return
>> >     +            if ( $node >> $ms1 and $node << $ms2 ) then
>> >     +                $node
>> >     +            else ()
>> >     +};
>> >     +
>> >     +declare function search:trim-matches($node, $keep) {
>> >     +    let $matches := $node//exist:match
>> >     +    return
>> >     +        if (count($matches) le $keep) then
>> >     +            $node
>> >     +        else
>> >     +            search:milestone-chunk(subsequence($matches, 1, 1),
>> >     subsequence($matches, $keep, 1), $node)
>> >     +};
>> >     +
>> >      (: Formats results for display as HTML :)
>> >      declare function search:display-results($hit as element()) as
>> >     element()* {
>> >     -    let $summary := kwic:summarize($hit, <config xmlns=""
>> >     width="60"/>)/*
>> >     +    let $matches-to-highlight := 10
>> >     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>> >     $matches-to-highlight)
>> >     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>> >     width="60"/>)/*
>> >          let $info := search:prepare-results($hit)
>> >          let $link := $info/link/text()
>> >          let $title := $info/title/node()
>> >
>> >
>> >
>> >     -------
>> >     Subversion hosting offered by Assembla
>> >     http://www.assembla.com/free_subversion_hosting
>> >
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Learn Graph Databases - Download FREE O'Reilly Book
>> > "Graph Databases" is the definitive new guide to graph databases and
>> their
>> > applications. Written by three acclaimed leaders in the field,
>> > this first edition is now available. Download your free book today!
>> > http://p.sf.net/sfu/NeoTech
>> >
>> >
>> >
>> > _______________________________________________
>> > Exist-open mailing list
>> > [hidden email]
>> <[hidden email]>
>> > https://lists.sourceforge.net/lists/listinfo/exist-open
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://www.hpccsystems.com
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>



------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Dan McCreary
In reply to this post by Leif-Jöran Olsson-3
Hello Leif,

Thank you for working on the KWIC module.  One suggestion I had from a customer was to make the text surrounding a match break on word boundaries, not just number of characters before and after a match.

Thanks! - Dan


On Mon, Jun 9, 2014 at 2:09 PM, Leif-Jöran Olsson <[hidden email]> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris, yes, that equivalent functionality will be one of the features.

Cheers,
Leif-Jöran

Den 2014-06-09 19:27, Chris Tomlinson skrev:
> Hello Leif-Jöran,
>
> Will your re-work into Java of the KWIC module include handling the
> util:expand with a limit parameter or something similar?
>
> Thanks,
> Chris
>
>
> On Jun 9, 2014, at 6:13 AM, Leif-Jöran Olsson <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>> Signed PGP part
>> Dear all, I have already started to rework the KWIC module into java
>> code and it was good to get these usecases and your feedback on limiting
>> the number of match markings not affecting them as much as in Joe's case.
>>
>> Cheers,
>> Leif-Jöran
>>
>>
>> Den 2014-06-05 05:39, Winona Salesky skrev:
>> > Thanks Joe,
>> > This was helpful, but it remains prohibitively slow. The slow down seems
>> > to be in the KWIC module, when I eliminate the KWIC module and use only
>> > the util:expand, the query time is halved. However, on the full text
>> > searches I am still looking at queries of 16.9 sec, without any version
>> > of keyword in context I'm returning results in under 50ms.
>> >
>> > This is not really an urgent issue for me at the moment, I was hopping
>> > to demo KWIC for the class, but I may just likely just demo it using a
>> > different data set.
>> >
>> > Thanks again for the help, I look forward to future discussions on this
>> > topic.
>> > -Winona
>> >
>> >
>> >
>> > On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
>> <mailto:[hidden email]>
>> > <mailto:[hidden email]>> wrote:
>> >
>> >     Hi Winona,
>> >
>> >     I saw your tweet about KWIC being a source of slowness, rather than
>> >     the indexes as you first suspected
>> >     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>> >     guessing correctly, you're seeing the same phenomenon I did.  It
>> >     becomes pronounced particularly when there are tons of hits in a
>> >     single result.  I've posted in my workaround below, which had some
>> >     pretty dramatic results (see the commit message).  I trim the # of
>> >     exist:match elements before displaying the results, using the
>> >     search:trim-matches() function below.  (Note that I use David
>> Sewell's
>> >     milestone-chunk() function rather than util:get-matches-between()
>> >     because the $node containing the exist:match elements is in-memory,
>> >     and the util function operates only on data stored in the database.)
>> >
>> >     Joe
>> >
>> >
>> >     ---------- Forwarded message ----------
>> >     From:  <[hidden email] <mailto:[hidden email]>
>> <mailto:[hidden email]>>
>> >     Date: Wed, Feb 19, 2014 at 11:10 AM
>> >     Subject: Changeset 2705 from joewiz
>> >
>> >
>> >     = Subversion url
>> >     https://subversion.assembla.com/svn/paho
>> >
>> >     = Commit message
>> >     After adding the logging last Friday and analyzing the logs to
>> isolate
>> >     the periods when the server's CPU spiked, I noticed a pattern:
>> many of
>> >     the spikes happened during certain searches.  But not all
>> searches.  I
>> >     discovered that the searches that took the longest to complete
>> >     included results from our back-of-book indexes - with typically tons
>> >     of results in each index.  The slowdown was caused when our "keyword
>> >     in context" (KWIC) search result highlighting routine was applying
>> >     highlights to all of these (hundreds, thousands?) of search results,
>> >     constructing in memory node trees that sucked up all available
>> memory
>> >     and impacting the performance of other queries at the same time.  To
>> >     reduce our exposure to this phenomenon, I added some functions
>> to our
>> >     search module, which limits the number of highlights to a hard-coded
>> >     number -- currently 10, but this can be altered.  The results are
>> >     pretty incredible: The performance of the hardest hit queries
>> improved
>> >     between 100-768 times!
>> >
>> >     = Affected files
>> >     M   trunk/db/history/modules/search.xqm
>> >
>> >
>> >     = Diff
>> >     --- /trunk/db/history/modules/search.xqm
>> >     +++ /trunk/db/history/modules/search.xqm
>> >     @@ -325,9 +325,46 @@
>> >              $query
>> >      };
>> >
>> >     +declare function search:milestone-chunk(
>> >     +  $ms1 as element(),
>> >     +  $ms2 as element(),
>> >     +  $node as node()*
>> >     +) as node()*
>> >     +{
>> >     +    typeswitch ($node)
>> >     +        case element() return
>> >     +            if ($node is $ms1) then
>> >     +                $node
>> >     +            else if ( some $n in $node/descendant::* satisfies
>> ($n is
>> >     $ms1 or $n is $ms2) ) then
>> >     +                element { name($node) }
>> >     +                    {
>> >     +                    for $i in ( $node/node() | $node/@* )
>> >     +                    return
>> >     +                        search:milestone-chunk($ms1, $ms2, $i)
>> >     +                    }
>> >     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>> >     +                $node
>> >     +            else ()
>> >     +        default return
>> >     +            if ( $node >> $ms1 and $node << $ms2 ) then
>> >     +                $node
>> >     +            else ()
>> >     +};
>> >     +
>> >     +declare function search:trim-matches($node, $keep) {
>> >     +    let $matches := $node//exist:match
>> >     +    return
>> >     +        if (count($matches) le $keep) then
>> >     +            $node
>> >     +        else
>> >     +            search:milestone-chunk(subsequence($matches, 1, 1),
>> >     subsequence($matches, $keep, 1), $node)
>> >     +};
>> >     +
>> >      (: Formats results for display as HTML :)
>> >      declare function search:display-results($hit as element()) as
>> >     element()* {
>> >     -    let $summary := kwic:summarize($hit, <config xmlns=""
>> >     width="60"/>)/*
>> >     +    let $matches-to-highlight := 10
>> >     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>> >     $matches-to-highlight)
>> >     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>> >     width="60"/>)/*
>> >          let $info := search:prepare-results($hit)
>> >          let $link := $info/link/text()
>> >          let $title := $info/title/node()
>> >
>> >
>> >
>> >     -------
>> >     Subversion hosting offered by Assembla
>> >     http://www.assembla.com/free_subversion_hosting
>> >
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Learn Graph Databases - Download FREE O'Reilly Book
>> > "Graph Databases" is the definitive new guide to graph databases and
>> their
>> > applications. Written by three acclaimed leaders in the field,
>> > this first edition is now available. Download your free book today!
>> > http://p.sf.net/sfu/NeoTech
>> >
>> >
>> >
>> > _______________________________________________
>> > Exist-open mailing list
>> > [hidden email]
>> <mailto:[hidden email]>
>> > https://lists.sourceforge.net/lists/listinfo/exist-open
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://www.hpccsystems.com
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iD8DBQFTlgZzhcIn5aVXOPIRAs3yAJ9cbhIF8FCtfT0Ztq8yJMOw3TkKswCePvxL
5199+vrxDp5ZgY7NhDEXlZQ=
=KpS5
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open



--
Dan McCreary
http://danmccreary.com
Co-author: Making Sense of NoSQL
office: (952) 931-9198
cell: (612) 986-1552
skype: dmccreary47

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Dmitriy Shabanov
Hello,


Result of chunk highlighter looks like:

<para><exist:cutoff/>with <hi><exist:match>mixed</exist:match></hi> cont<exist:cutoff/></para>

(taken from LuceneMatchListenerTest.chunkTests)

Is it what you ask for?

On Tue, Jun 10, 2014 at 4:29 PM, Dan McCreary <[hidden email]> wrote:
Thank you for working on the KWIC module.  One suggestion I had from a customer was to make the text surrounding a match break on word boundaries, not just number of characters before and after a match.




--
Dmitriy Shabanov

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Leif-Jöran Olsson-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan, Dmitriy (et al),
yes, that will be a configuration option.

Leif-Jöran


Den 2014-06-10 14:45, Dmitriy Shabanov skrev:

> Hello,
>
> Check
> https://github.com/shabanovd/exist/commit/dea860196bdebbc66274b029a12e0c09606e04f1
>
> Result of chunk highlighter looks like:
>
> <para><exist:cutoff/>with <hi><exist:match>mixed</exist:match></hi>
> cont<exist:cutoff/></para>
>
> (taken from LuceneMatchListenerTest.chunkTests)
>
> Is it what you ask for?
>
> On Tue, Jun 10, 2014 at 4:29 PM, Dan McCreary <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Thank you for working on the KWIC module.  One suggestion I had from
>     a customer was to make the text surrounding a match break on word
>     boundaries, not just number of characters before and after a match.
>
>
>
>
> --
> Dmitriy Shabanov
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iD8DBQFTlwGahcIn5aVXOPIRAuBJAJ9Rvkbxq2KFkAHOPeW//WyhyDsaJwCeO8d9
6qlISp21+V+qsbm4eTHwQ3g=
=D6K3
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Dan McCreary
Thanks!  Sign me up for the testing team!

- Dan


On Tue, Jun 10, 2014 at 8:01 AM, Leif-Jöran Olsson <[hidden email]> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan, Dmitriy (et al),
yes, that will be a configuration option.

Leif-Jöran


Den 2014-06-10 14:45, Dmitriy Shabanov skrev:
> Hello,
>
> Check
> https://github.com/shabanovd/exist/commit/dea860196bdebbc66274b029a12e0c09606e04f1
>
> Result of chunk highlighter looks like:
>
> <para><exist:cutoff/>with <hi><exist:match>mixed</exist:match></hi>
> cont<exist:cutoff/></para>
>
> (taken from LuceneMatchListenerTest.chunkTests)
>
> Is it what you ask for?
>
> On Tue, Jun 10, 2014 at 4:29 PM, Dan McCreary <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Thank you for working on the KWIC module.  One suggestion I had from
>     a customer was to make the text surrounding a match break on word
>     boundaries, not just number of characters before and after a match.
>
>
>
>
> --
> Dmitriy Shabanov
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iD8DBQFTlwGahcIn5aVXOPIRAuBJAJ9Rvkbxq2KFkAHOPeW//WyhyDsaJwCeO8d9
6qlISp21+V+qsbm4eTHwQ3g=
=D6K3
-----END PGP SIGNATURE-----



--
Dan McCreary
http://danmccreary.com
Co-author: Making Sense of NoSQL
office: (952) 931-9198
cell: (612) 986-1552
skype: dmccreary47

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Jens Østergaard Petersen-2
In reply to this post by Leif-Jöran Olsson-3
Hi Leif-Jöran,

Now that you are recoding, would it be an idea to add an optional parameter, $location as element()?, to kwic:get-summary()?

Instead of outputting 


<tr class="reference" xmlns="">
    <td colspan="3"><span class="number">4</span><a href="sha-ham.html">Hamlet, Prince of Denmark</a>, <a href="sha-ham301.html">Act 3, Scene 1</a></td>
</tr>
<tr xmlns="">
    <td class="previous">... hear him coming: withdraw, my lord. To be, or not to be: that is </td>
    <td class="hi"><a href="works/sha-ham301.html">the question</a></td>
    <td class="following"> : Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortun ...</td>
</tr>


as one has to now, one could then output the more compact


<tr xmlns="">
    <td colspan="3"><span class="number">4</span><a href="sha-ham.html">Hamlet, Prince of Denmark</a>, <a href="sha-ham301.html">Act 3, Scene 1</a></td>
    <td class="previous">... hear him coming: withdraw, my lord. To be, or not to be: that is </td>
    <td class="hi"><a href="works/sha-ham301.html">the question</a></td>
    <td class="following"> : Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortun ...</td>
</tr>


(perhaps abbreviating the rather long location …),


supplying <td colspan="3"><span class="number">4</span><a href="sha-ham.html">Hamlet, Prince of Denmark</a><a href="sha-ham301.html">Act 3, Scene 1</a></td> as $location?


Similarly for <p> and <span> ….


One could also pass this parameter as a child of $conf - this is perhaps the best overall solution, since @link is already there.


Jens

On 9 Jun 2014 at 13:41:52, Leif-Jöran Olsson ([hidden email]) wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear all, I have already started to rework the KWIC module into java
code and it was good to get these usecases and your feedback on limiting
the number of match markings not affecting them as much as in Joe's case.

Cheers,
Leif-Jöran


Den 2014-06-05 05:39, Winona Salesky skrev:

> Thanks Joe,
> This was helpful, but it remains prohibitively slow. The slow down seems
> to be in the KWIC module, when I eliminate the KWIC module and use only
> the util:expand, the query time is halved. However, on the full text
> searches I am still looking at queries of 16.9 sec, without any version
> of keyword in context I'm returning results in under 50ms.
>
> This is not really an urgent issue for me at the moment, I was hopping
> to demo KWIC for the class, but I may just likely just demo it using a
> different data set.
>
> Thanks again for the help, I look forward to future discussions on this
> topic.
> -Winona
>
>
>
> On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> Hi Winona,
>
> I saw your tweet about KWIC being a source of slowness, rather than
> the indexes as you first suspected
> (https://twitter.com/wsalesky/status/473991703020314624). If I'm
> guessing correctly, you're seeing the same phenomenon I did. It
> becomes pronounced particularly when there are tons of hits in a
> single result. I've posted in my workaround below, which had some
> pretty dramatic results (see the commit message). I trim the # of
> exist:match elements before displaying the results, using the
> search:trim-matches() function below. (Note that I use David Sewell's
> milestone-chunk() function rather than util:get-matches-between()
> because the $node containing the exist:match elements is in-memory,
> and the util function operates only on data stored in the database.)
>
> Joe
>
>
> ---------- Forwarded message ----------
> From: <[hidden email] <mailto:[hidden email]>>
> Date: Wed, Feb 19, 2014 at 11:10 AM
> Subject: Changeset 2705 from joewiz
>
>
> = Subversion url
> https://subversion.assembla.com/svn/paho
>
> = Commit message
> After adding the logging last Friday and analyzing the logs to isolate
> the periods when the server's CPU spiked, I noticed a pattern: many of
> the spikes happened during certain searches. But not all searches. I
> discovered that the searches that took the longest to complete
> included results from our back-of-book indexes - with typically tons
> of results in each index. The slowdown was caused when our "keyword
> in context" (KWIC) search result highlighting routine was applying
> highlights to all of these (hundreds, thousands?) of search results,
> constructing in memory node trees that sucked up all available memory
> and impacting the performance of other queries at the same time. To
> reduce our exposure to this phenomenon, I added some functions to our
> search module, which limits the number of highlights to a hard-coded
> number -- currently 10, but this can be altered. The results are
> pretty incredible: The performance of the hardest hit queries improved
> between 100-768 times!
>
> = Affected files
> M trunk/db/history/modules/search.xqm
>
>
> = Diff
> --- /trunk/db/history/modules/search.xqm
> +++ /trunk/db/history/modules/search.xqm
> @@ -325,9 +325,46 @@
> $query
> };
>
> +declare function search:milestone-chunk(
> + $ms1 as element(),
> + $ms2 as element(),
> + $node as node()*
> +) as node()*
> +{
> + typeswitch ($node)
> + case element() return
> + if ($node is $ms1) then
> + $node
> + else if ( some $n in $node/descendant::* satisfies ($n is
> $ms1 or $n is $ms2) ) then
> + element { name($node) }
> + {
> + for $i in ( $node/node() | $node/@* )
> + return
> + search:milestone-chunk($ms1, $ms2, $i)
> + }
> + else if ( $node >> $ms1 and $node << $ms2 ) then
> + $node
> + else ()
> + default return
> + if ( $node >> $ms1 and $node << $ms2 ) then
> + $node
> + else ()
> +};
> +
> +declare function search:trim-matches($node, $keep) {
> + let $matches := $node//exist:match
> + return
> + if (count($matches) le $keep) then
> + $node
> + else
> + search:milestone-chunk(subsequence($matches, 1, 1),
> subsequence($matches, $keep, 1), $node)
> +};
> +
> (: Formats results for display as HTML :)
> declare function search:display-results($hit as element()) as
> element()* {
> - let $summary := kwic:summarize($hit, <config xmlns=""
> width="60"/>)/*
> + let $matches-to-highlight := 10
> + let $trimmed-hit := search:trim-matches(util:expand($hit),
> $matches-to-highlight)
> + let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
> width="60"/>)/*
> let $info := search:prepare-results($hit)
> let $link := $info/link/text()
> let $title := $info/title/node()
>
>
>
> -------
> Subversion hosting offered by Assembla
> http://www.assembla.com/free_subversion_hosting
>
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iD8DBQFTlZbhhcIn5aVXOPIRAsMmAJ0YqD7GkTEOrgdbNLrlvi2kCE4KXACgixNJ
UWlNj4C9R5GhMZ2QoQFEvDc=
=Lm1z
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Leif-Jöran Olsson-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jens, it sounds completely in line with the semantics. I will do that.

Leif-Jöran

Den 2014-06-11 11:31, Jens Østergaard Petersen skrev:

> Hi Leif-Jöran,
>
> Now that you are recoding, would it be an idea to add an optional
> parameter, $location as element()?, to kwic:get-summary()?
>
> Instead of outputting
>
>
> <trclass="reference"xmlns="">
>    
> <tdcolspan="3"><spanclass="number">4</span><ahref="sha-ham.html">Hamlet,
> Prince of Denmark</a>, <ahref="sha-ham301.html">Act 3, Scene 1</a></td>
> </tr>
> <trxmlns="">
>     <tdclass="previous">... hear him coming: withdraw, my lord. To be,
> or not to be: that is </td>
>     <tdclass="hi"><ahref="works/sha-ham301.html">the question</a></td>
>     <tdclass="following"> : Whether 'tis nobler in the mind to suffer
> The slings and arrows of outrageous fortun ...</td>
> </tr>
>
>
> as one has to now, one could then output the more compact
>
>
> <trxmlns="">
>    
> <tdcolspan="3"><spanclass="number">4</span><ahref="sha-ham.html">Hamlet,
> Prince of Denmark</a>, <ahref="sha-ham301.html">Act 3, Scene 1</a></td>
>     <tdclass="previous">... hear him coming: withdraw, my lord. To be,
> or not to be: that is </td>
>     <tdclass="hi"><ahref="works/sha-ham301.html">the question</a></td>
>     <tdclass="following"> : Whether 'tis nobler in the mind to suffer
> The slings and arrows of outrageous fortun ...</td>
> </tr>
>
>
> (perhaps abbreviating the rather long location …),
>
>
> supplying <td colspan="3"><span class="number">4</span><a href="sha-ham.html">Hamlet,
> Prince of Denmark</a>, <a href="sha-ham301.html">Act 3, Scene
> 1</a></td> as $location?
>
>
> Similarly for <p> and <span> ….
>
>
> One could also pass this parameter as a child of $conf - this is perhaps
> the best overall solution, since @link is already there.
>
>
> Jens
>
> On 9 Jun 2014 at 13:41:52, Leif-Jöran Olsson ([hidden email]
> <mailto:[hidden email]>) wrote:
>
> Dear all, I have already started to rework the KWIC module into java
> code and it was good to get these usecases and your feedback on limiting
> the number of match markings not affecting them as much as in Joe's case.
>
> Cheers,
> Leif-Jöran
>
>
> Den 2014-06-05 05:39, Winona Salesky skrev:
>> Thanks Joe,
>> This was helpful, but it remains prohibitively slow. The slow down seems
>> to be in the KWIC module, when I eliminate the KWIC module and use only
>> the util:expand, the query time is halved. However, on the full text
>> searches I am still looking at queries of 16.9 sec, without any version
>> of keyword in context I'm returning results in under 50ms.
>
>> This is not really an urgent issue for me at the moment, I was hopping
>> to demo KWIC for the class, but I may just likely just demo it using a
>> different data set.
>
>> Thanks again for the help, I look forward to future discussions on this
>> topic.
>> -Winona
>
>
>
>> On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
>> <mailto:[hidden email]>> wrote:
>
>>     Hi Winona,
>
>>     I saw your tweet about KWIC being a source of slowness, rather than
>>     the indexes as you first suspected
>>     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>>     guessing correctly, you're seeing the same phenomenon I did.  It
>>     becomes pronounced particularly when there are tons of hits in a
>>     single result.  I've posted in my workaround below, which had some
>>     pretty dramatic results (see the commit message).  I trim the # of
>>     exist:match elements before displaying the results, using the
>>     search:trim-matches() function below.  (Note that I use David Sewell's
>>     milestone-chunk() function rather than util:get-matches-between()
>>     because the $node containing the exist:match elements is in-memory,
>>     and the util function operates only on data stored in the database.)
>
>>     Joe
>
>
>>     ---------- Forwarded message ----------
>>     From:  <[hidden email] <mailto:[hidden email]>>
>>     Date: Wed, Feb 19, 2014 at 11:10 AM
>>     Subject: Changeset 2705 from joewiz
>
>
>>     = Subversion url
>>     https://subversion.assembla.com/svn/paho
>
>>     = Commit message
>>     After adding the logging last Friday and analyzing the logs to isolate
>>     the periods when the server's CPU spiked, I noticed a pattern: many of
>>     the spikes happened during certain searches.  But not all searches.  I
>>     discovered that the searches that took the longest to complete
>>     included results from our back-of-book indexes - with typically tons
>>     of results in each index.  The slowdown was caused when our "keyword
>>     in context" (KWIC) search result highlighting routine was applying
>>     highlights to all of these (hundreds, thousands?) of search results,
>>     constructing in memory node trees that sucked up all available memory
>>     and impacting the performance of other queries at the same time.  To
>>     reduce our exposure to this phenomenon, I added some functions to our
>>     search module, which limits the number of highlights to a hard-coded
>>     number -- currently 10, but this can be altered.  The results are
>>     pretty incredible: The performance of the hardest hit queries improved
>>     between 100-768 times!
>
>>     = Affected files
>>     M   trunk/db/history/modules/search.xqm
>
>
>>     = Diff
>>     --- /trunk/db/history/modules/search.xqm
>>     +++ /trunk/db/history/modules/search.xqm
>>     @@ -325,9 +325,46 @@
>>              $query
>>      };
>
>>     +declare function search:milestone-chunk(
>>     +  $ms1 as element(),
>>     +  $ms2 as element(),
>>     +  $node as node()*
>>     +) as node()*
>>     +{
>>     +    typeswitch ($node)
>>     +        case element() return
>>     +            if ($node is $ms1) then
>>     +                $node
>>     +            else if ( some $n in $node/descendant::* satisfies ($n is
>>     $ms1 or $n is $ms2) ) then
>>     +                element { name($node) }
>>     +                    {
>>     +                    for $i in ( $node/node() | $node/@* )
>>     +                    return
>>     +                        search:milestone-chunk($ms1, $ms2, $i)
>>     +                    }
>>     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>>     +                $node
>>     +            else ()
>>     +        default return
>>     +            if ( $node >> $ms1 and $node << $ms2 ) then
>>     +                $node
>>     +            else ()
>>     +};
>>     +
>>     +declare function search:trim-matches($node, $keep) {
>>     +    let $matches := $node//exist:match
>>     +    return
>>     +        if (count($matches) le $keep) then
>>     +            $node
>>     +        else
>>     +            search:milestone-chunk(subsequence($matches, 1, 1),
>>     subsequence($matches, $keep, 1), $node)
>>     +};
>>     +
>>      (: Formats results for display as HTML :)
>>      declare function search:display-results($hit as element()) as
>>     element()* {
>>     -    let $summary := kwic:summarize($hit, <config xmlns=""
>>     width="60"/>)/*
>>     +    let $matches-to-highlight := 10
>>     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>>     $matches-to-highlight)
>>     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>>     width="60"/>)/*
>>          let $info := search:prepare-results($hit)
>>          let $link := $info/link/text()
>>          let $title := $info/title/node()
>
>
>
>>     -------
>>     Subversion hosting offered by Assembla
>>     http://www.assembla.com/free_subversion_hosting
>
>
>
>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/NeoTech
>
>
>
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://www.hpccsystems.com
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iD8DBQFTmFoAhcIn5aVXOPIRAmYHAJ9cQKE8TV/CBCw7X7MOKr8NtCYNhQCeKMjs
2/E7gbJnQRKJFobnAyevVQc=
=P7s/
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Chris Tomlinson-2-2
Great! I'm looking forward to it. We're working on some changes in the Lucene search and this will be a nice addition.

Chris


On Jun 11, 2014, at 8:30 AM, Leif-Jöran Olsson <[hidden email]> wrote:

Signed PGP part
Jens, it sounds completely in line with the semantics. I will do that.

Leif-Jöran

Den 2014-06-11 11:31, Jens Østergaard Petersen skrev:

> Hi Leif-Jöran,
>
> Now that you are recoding, would it be an idea to add an optional
> parameter, $location as element()?, to kwic:get-summary()?
>
> Instead of outputting
>
>
> <trclass="reference"xmlns="">
>
> <tdcolspan="3"><spanclass="number">4</span><ahref="sha-ham.html">Hamlet,
> Prince of Denmark</a>, <ahref="sha-ham301.html">Act 3, Scene 1</a></td>
> </tr>
> <trxmlns="">
>     <tdclass="previous">... hear him coming: withdraw, my lord. To be,
> or not to be: that is </td>
>     <tdclass="hi"><ahref="works/sha-ham301.html">the question</a></td>
>     <tdclass="following"> : Whether 'tis nobler in the mind to suffer
> The slings and arrows of outrageous fortun ...</td>
> </tr>
>
>
> as one has to now, one could then output the more compact
>
>
> <trxmlns="">
>
> <tdcolspan="3"><spanclass="number">4</span><ahref="sha-ham.html">Hamlet,
> Prince of Denmark</a>, <ahref="sha-ham301.html">Act 3, Scene 1</a></td>
>     <tdclass="previous">... hear him coming: withdraw, my lord. To be,
> or not to be: that is </td>
>     <tdclass="hi"><ahref="works/sha-ham301.html">the question</a></td>
>     <tdclass="following"> : Whether 'tis nobler in the mind to suffer
> The slings and arrows of outrageous fortun ...</td>
> </tr>
>
>
> (perhaps abbreviating the rather long location …),
>
>
> supplying <td colspan="3"><span class="number">4</span><a href="sha-ham.html">Hamlet,
> Prince of Denmark</a>, <a href="sha-ham301.html">Act 3, Scene
> 1</a></td> as $location?
>
>
> Similarly for <p> and <span> ….
>
>
> One could also pass this parameter as a child of $conf - this is perhaps
> the best overall solution, since @link is already there.
>
>
> Jens
>
> On 9 Jun 2014 at 13:41:52, Leif-Jöran Olsson ([hidden email]
> <[hidden email]>) wrote:
>
> Dear all, I have already started to rework the KWIC module into java
> code and it was good to get these usecases and your feedback on limiting
> the number of match markings not affecting them as much as in Joe's case.
>
> Cheers,
> Leif-Jöran
>
>
> Den 2014-06-05 05:39, Winona Salesky skrev:
>> Thanks Joe,
>> This was helpful, but it remains prohibitively slow. The slow down seems
>> to be in the KWIC module, when I eliminate the KWIC module and use only
>> the util:expand, the query time is halved. However, on the full text
>> searches I am still looking at queries of 16.9 sec, without any version
>> of keyword in context I'm returning results in under 50ms.
>
>> This is not really an urgent issue for me at the moment, I was hopping
>> to demo KWIC for the class, but I may just likely just demo it using a
>> different data set.
>
>> Thanks again for the help, I look forward to future discussions on this
>> topic.
>> -Winona
>
>
>
>> On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
>> <[hidden email]>> wrote:
>
>>     Hi Winona,
>
>>     I saw your tweet about KWIC being a source of slowness, rather than
>>     the indexes as you first suspected
>>     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>>     guessing correctly, you're seeing the same phenomenon I did.  It
>>     becomes pronounced particularly when there are tons of hits in a
>>     single result.  I've posted in my workaround below, which had some
>>     pretty dramatic results (see the commit message).  I trim the # of
>>     exist:match elements before displaying the results, using the
>>     search:trim-matches() function below.  (Note that I use David Sewell's
>>     milestone-chunk() function rather than util:get-matches-between()
>>     because the $node containing the exist:match elements is in-memory,
>>     and the util function operates only on data stored in the database.)
>
>>     Joe
>
>
>>     ---------- Forwarded message ----------
>>     From:  <[hidden email] <[hidden email]>>
>>     Date: Wed, Feb 19, 2014 at 11:10 AM
>>     Subject: Changeset 2705 from joewiz
>
>
>>     = Subversion url
>>     https://subversion.assembla.com/svn/paho
>
>>     = Commit message
>>     After adding the logging last Friday and analyzing the logs to isolate
>>     the periods when the server's CPU spiked, I noticed a pattern: many of
>>     the spikes happened during certain searches.  But not all searches.  I
>>     discovered that the searches that took the longest to complete
>>     included results from our back-of-book indexes - with typically tons
>>     of results in each index.  The slowdown was caused when our "keyword
>>     in context" (KWIC) search result highlighting routine was applying
>>     highlights to all of these (hundreds, thousands?) of search results,
>>     constructing in memory node trees that sucked up all available memory
>>     and impacting the performance of other queries at the same time.  To
>>     reduce our exposure to this phenomenon, I added some functions to our
>>     search module, which limits the number of highlights to a hard-coded
>>     number -- currently 10, but this can be altered.  The results are
>>     pretty incredible: The performance of the hardest hit queries improved
>>     between 100-768 times!
>
>>     = Affected files
>>     M   trunk/db/history/modules/search.xqm
>
>
>>     = Diff
>>     --- /trunk/db/history/modules/search.xqm
>>     +++ /trunk/db/history/modules/search.xqm
>>     @@ -325,9 +325,46 @@
>>              $query
>>      };
>
>>     +declare function search:milestone-chunk(
>>     +  $ms1 as element(),
>>     +  $ms2 as element(),
>>     +  $node as node()*
>>     +) as node()*
>>     +{
>>     +    typeswitch ($node)
>>     +        case element() return
>>     +            if ($node is $ms1) then
>>     +                $node
>>     +            else if ( some $n in $node/descendant::* satisfies ($n is
>>     $ms1 or $n is $ms2) ) then
>>     +                element { name($node) }
>>     +                    {
>>     +                    for $i in ( $node/node() | $node/@* )
>>     +                    return
>>     +                        search:milestone-chunk($ms1, $ms2, $i)
>>     +                    }
>>     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>>     +                $node
>>     +            else ()
>>     +        default return
>>     +            if ( $node >> $ms1 and $node << $ms2 ) then
>>     +                $node
>>     +            else ()
>>     +};
>>     +
>>     +declare function search:trim-matches($node, $keep) {
>>     +    let $matches := $node//exist:match
>>     +    return
>>     +        if (count($matches) le $keep) then
>>     +            $node
>>     +        else
>>     +            search:milestone-chunk(subsequence($matches, 1, 1),
>>     subsequence($matches, $keep, 1), $node)
>>     +};
>>     +
>>      (: Formats results for display as HTML :)
>>      declare function search:display-results($hit as element()) as
>>     element()* {
>>     -    let $summary := kwic:summarize($hit, <config xmlns=""
>>     width="60"/>)/*
>>     +    let $matches-to-highlight := 10
>>     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>>     $matches-to-highlight)
>>     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>>     width="60"/>)/*
>>          let $info := search:prepare-results($hit)
>>          let $link := $info/link/text()
>>          let $title := $info/title/node()
>
>
>
>>     -------
>>     Subversion hosting offered by Assembla
>>     http://www.assembla.com/free_subversion_hosting
>
>
>
>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/NeoTech
>
>
>
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://www.hpccsystems.com
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open

signature.asc (506 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Workaround for KWIC slowness

Martin Holmes
Did this work ever get done? I'm hitting the issue of painfully slow
behaviour when long hit documents contain lots of matches, and before I
try to write a workaround I'd like to check whether this proposal for a
parameter that would return only a specified number of matches was ever
implemented.

Cheers,
Martin

On 2014-07-10 07:13 AM, Chris Tomlinson wrote:

> Great! I'm looking forward to it. We're working on some changes in the
> Lucene search and this will be a nice addition.
>
> Chris
>
>
> On Jun 11, 2014, at 8:30 AM, Leif-Jöran Olsson <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>> Signed PGP part
>> Jens, it sounds completely in line with the semantics. I will do that.
>>
>> Leif-Jöran
>>
>> Den 2014-06-11 11:31, Jens Østergaard Petersen skrev:
>> > Hi Leif-Jöran,
>> >
>> > Now that you are recoding, would it be an idea to add an optional
>> > parameter, $location as element()?, to kwic:get-summary()?
>> >
>> > Instead of outputting
>> >
>> >
>> > <trclass="reference"xmlns="">
>> >
>> > <tdcolspan="3"><spanclass="number">4</span><ahref="sha-ham.html">Hamlet,
>> > Prince of Denmark</a>, <ahref="sha-ham301.html">Act 3, Scene 1</a></td>
>> > </tr>
>> > <trxmlns="">
>> >     <tdclass="previous">... hear him coming: withdraw, my lord. To be,
>> > or not to be: that is </td>
>> >     <tdclass="hi"><ahref="works/sha-ham301.html">the question</a></td>
>> >     <tdclass="following"> : Whether 'tis nobler in the mind to suffer
>> > The slings and arrows of outrageous fortun ...</td>
>> > </tr>
>> >
>> >
>> > as one has to now, one could then output the more compact
>> >
>> >
>> > <trxmlns="">
>> >
>> > <tdcolspan="3"><spanclass="number">4</span><ahref="sha-ham.html">Hamlet,
>> > Prince of Denmark</a>, <ahref="sha-ham301.html">Act 3, Scene 1</a></td>
>> >     <tdclass="previous">... hear him coming: withdraw, my lord. To be,
>> > or not to be: that is </td>
>> >     <tdclass="hi"><ahref="works/sha-ham301.html">the question</a></td>
>> >     <tdclass="following"> : Whether 'tis nobler in the mind to suffer
>> > The slings and arrows of outrageous fortun ...</td>
>> > </tr>
>> >
>> >
>> > (perhaps abbreviating the rather long location …),
>> >
>> >
>> > supplying <td colspan="3"><span class="number">4</span><a
>> href="sha-ham.html">Hamlet,
>> > Prince of Denmark</a>, <a href="sha-ham301.html">Act 3, Scene
>> > 1</a></td> as $location?
>> >
>> >
>> > Similarly for <p> and <span> ….
>> >
>> >
>> > One could also pass this parameter as a child of $conf - this is perhaps
>> > the best overall solution, since @link is already there.
>> >
>> >
>> > Jens
>> >
>> > On 9 Jun 2014 at 13:41:52, Leif-Jöran Olsson ([hidden email]
>> <mailto:[hidden email]>
>> > <mailto:[hidden email]>) wrote:
>> >
>> > Dear all, I have already started to rework the KWIC module into java
>> > code and it was good to get these usecases and your feedback on limiting
>> > the number of match markings not affecting them as much as in Joe's
>> case.
>> >
>> > Cheers,
>> > Leif-Jöran
>> >
>> >
>> > Den 2014-06-05 05:39, Winona Salesky skrev:
>> >> Thanks Joe,
>> >> This was helpful, but it remains prohibitively slow. The slow down
>> seems
>> >> to be in the KWIC module, when I eliminate the KWIC module and use only
>> >> the util:expand, the query time is halved. However, on the full text
>> >> searches I am still looking at queries of 16.9 sec, without any version
>> >> of keyword in context I'm returning results in under 50ms.
>> >
>> >> This is not really an urgent issue for me at the moment, I was hopping
>> >> to demo KWIC for the class, but I may just likely just demo it using a
>> >> different data set.
>> >
>> >> Thanks again for the help, I look forward to future discussions on this
>> >> topic.
>> >> -Winona
>> >
>> >
>> >
>> >> On Tue, Jun 3, 2014 at 11:11 PM, Joe Wicentowski <[hidden email]
>> <mailto:[hidden email]>
>> >> <mailto:[hidden email]>> wrote:
>> >
>> >>     Hi Winona,
>> >
>> >>     I saw your tweet about KWIC being a source of slowness, rather than
>> >>     the indexes as you first suspected
>> >>     (https://twitter.com/wsalesky/status/473991703020314624).  If I'm
>> >>     guessing correctly, you're seeing the same phenomenon I did.  It
>> >>     becomes pronounced particularly when there are tons of hits in a
>> >>     single result.  I've posted in my workaround below, which had some
>> >>     pretty dramatic results (see the commit message).  I trim the # of
>> >>     exist:match elements before displaying the results, using the
>> >>     search:trim-matches() function below.  (Note that I use David
>> Sewell's
>> >>     milestone-chunk() function rather than util:get-matches-between()
>> >>     because the $node containing the exist:match elements is in-memory,
>> >>     and the util function operates only on data stored in the
>> database.)
>> >
>> >>     Joe
>> >
>> >
>> >>     ---------- Forwarded message ----------
>> >>     From:  <[hidden email] <mailto:[hidden email]>
>> <mailto:[hidden email]>>
>> >>     Date: Wed, Feb 19, 2014 at 11:10 AM
>> >>     Subject: Changeset 2705 from joewiz
>> >
>> >
>> >>     = Subversion url
>> >>     https://subversion.assembla.com/svn/paho
>> >
>> >>     = Commit message
>> >>     After adding the logging last Friday and analyzing the logs to
>> isolate
>> >>     the periods when the server's CPU spiked, I noticed a pattern:
>> many of
>> >>     the spikes happened during certain searches.  But not all
>> searches.  I
>> >>     discovered that the searches that took the longest to complete
>> >>     included results from our back-of-book indexes - with typically
>> tons
>> >>     of results in each index.  The slowdown was caused when our
>> "keyword
>> >>     in context" (KWIC) search result highlighting routine was applying
>> >>     highlights to all of these (hundreds, thousands?) of search
>> results,
>> >>     constructing in memory node trees that sucked up all available
>> memory
>> >>     and impacting the performance of other queries at the same
>> time.  To
>> >>     reduce our exposure to this phenomenon, I added some functions
>> to our
>> >>     search module, which limits the number of highlights to a
>> hard-coded
>> >>     number -- currently 10, but this can be altered.  The results are
>> >>     pretty incredible: The performance of the hardest hit queries
>> improved
>> >>     between 100-768 times!
>> >
>> >>     = Affected files
>> >>     M   trunk/db/history/modules/search.xqm
>> >
>> >
>> >>     = Diff
>> >>     --- /trunk/db/history/modules/search.xqm
>> >>     +++ /trunk/db/history/modules/search.xqm
>> >>     @@ -325,9 +325,46 @@
>> >>              $query
>> >>      };
>> >
>> >>     +declare function search:milestone-chunk(
>> >>     +  $ms1 as element(),
>> >>     +  $ms2 as element(),
>> >>     +  $node as node()*
>> >>     +) as node()*
>> >>     +{
>> >>     +    typeswitch ($node)
>> >>     +        case element() return
>> >>     +            if ($node is $ms1) then
>> >>     +                $node
>> >>     +            else if ( some $n in $node/descendant::* satisfies
>> ($n is
>> >>     $ms1 or $n is $ms2) ) then
>> >>     +                element { name($node) }
>> >>     +                    {
>> >>     +                    for $i in ( $node/node() | $node/@* )
>> >>     +                    return
>> >>     +                        search:milestone-chunk($ms1, $ms2, $i)
>> >>     +                    }
>> >>     +            else if ( $node >> $ms1 and $node << $ms2 ) then
>> >>     +                $node
>> >>     +            else ()
>> >>     +        default return
>> >>     +            if ( $node >> $ms1 and $node << $ms2 ) then
>> >>     +                $node
>> >>     +            else ()
>> >>     +};
>> >>     +
>> >>     +declare function search:trim-matches($node, $keep) {
>> >>     +    let $matches := $node//exist:match
>> >>     +    return
>> >>     +        if (count($matches) le $keep) then
>> >>     +            $node
>> >>     +        else
>> >>     +            search:milestone-chunk(subsequence($matches, 1, 1),
>> >>     subsequence($matches, $keep, 1), $node)
>> >>     +};
>> >>     +
>> >>      (: Formats results for display as HTML :)
>> >>      declare function search:display-results($hit as element()) as
>> >>     element()* {
>> >>     -    let $summary := kwic:summarize($hit, <config xmlns=""
>> >>     width="60"/>)/*
>> >>     +    let $matches-to-highlight := 10
>> >>     +    let $trimmed-hit := search:trim-matches(util:expand($hit),
>> >>     $matches-to-highlight)
>> >>     +    let $summary := kwic:summarize($trimmed-hit, <config xmlns=""
>> >>     width="60"/>)/*
>> >>          let $info := search:prepare-results($hit)
>> >>          let $link := $info/link/text()
>> >>          let $title := $info/title/node()
>> >
>> >
>> >
>> >>     -------
>> >>     Subversion hosting offered by Assembla
>> >>     http://www.assembla.com/free_subversion_hosting
>> >
>> >
>> >
>> >
>> >>
>> ------------------------------------------------------------------------------
>> >> Learn Graph Databases - Download FREE O'Reilly Book
>> >> "Graph Databases" is the definitive new guide to graph databases
>> and their
>> >> applications. Written by three acclaimed leaders in the field,
>> >> this first edition is now available. Download your free book today!
>> >> http://p.sf.net/sfu/NeoTech
>> >
>> >
>> >
>> >> _______________________________________________
>> >> Exist-open mailing list
>> >> [hidden email]
>> <mailto:[hidden email]>
>> >> https://lists.sourceforge.net/lists/listinfo/exist-open
>> >
>> >
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>> Solutions
>> >> Find What Matters Most in Your Big Data with HPCC Systems
>> >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> >> http://www.hpccsystems.com
>> >> _______________________________________________
>> >> Exist-open mailing list
>> >> [hidden email]
>> <mailto:[hidden email]>
>> >> https://lists.sourceforge.net/lists/listinfo/exist-open
>>
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://p.sf.net/sfu/hpccsystems
>> _______________________________________________
>> Exist-open mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
>
>
>
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Loading...