Allowing of Lucene Stopwords is not working for me

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Allowing of Lucene Stopwords is not working for me

mountainbiker
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Allowing of Lucene Stopwords is not working for me

Joe Wicentowski
Hey Mountainbiker,

Have you enabled the "leading-wildcard" option for your query? This is discussed in http://exist-db.org/exist/apps/doc/lucene.xml.

Joe

Sent from my iPhone

_____________________________
From: mountainbiker <[hidden email]>
Sent: Tuesday, October 11, 2016 6:15 AM
Subject: [Exist-open] Allowing of Lucene Stopwords is not working for me
To: <[hidden email]>


*Running eXist 2.2*

Trying to utilize "*are*" with ft:query.

Configured the collection.xconf in numerous different ways to permit this
stopword. I changed the xconf, restarted, and re-indexed (a sub set of the
collection that knowingly has matches), but nothing is working for me. I
have tried:

<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer">

</analyzer>

<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer">

</analyzer>

<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer">

</analyzer>

<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer">

</analyzer>

<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer">

<value>the</value>
<value>this</value>
<value>and</value>
<value>that</value>

</analyzer>

I then tried all of the above using an attribute of id="nosw" on the
analyzer and then used on the text element the id attribute of
analyzer="nosw". Still did not work for me. I am not even seeing any
errors in the logs.

/*What the heck am I missing?*/



--
View this message in context: http://exist.2174344.n4.nabble.com/Allowing-of-Lucene-Stopwords-is-not-working-for-me-tp4670892.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Allowing of Lucene Stopwords is not working for me

mountainbiker
[ft:query(.//doc:key, "ARX")]   <-- works without the need of wildcards
[ft:query(.//doc:key, "ARE")]   <-- doesn't work as it is a Lucene stopword





Reply | Threaded
Open this post in threaded view
|

Re: Allowing of Lucene Stopwords is not working for me

mountainbiker
It seemed like there might be a bug with eXist 2.2.

The only way I get it to work is by including all the stopwords (when I only needed "are")
        <lucene>
            <analyzer id="nosw" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
                <param name="stopwords" type="org.apache.lucene.analysis.util.CharArraySet"/>
            </analyzer>
            <text qname="doc:key" analyzer="nosw" />
            <text qname="doc:cost" />
            <text qname="doc:loc" />
        </lucene>

That is, this did not work for me
        <lucene>
            <analyzer id="nosw" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
                <param name="stopwords" type="org.apache.lucene.analysis.util.CharArraySet">
                    <value>are</value>
                </param>
            </analyzer>
            <text qname="doc:key" analyzer="nosw" />
            <text qname="doc:cost" />
            <text qname="doc:loc" />
        </lucene>
Reply | Threaded
Open this post in threaded view
|

Re: Allowing of Lucene Stopwords is not working for me

Joe Wicentowski
Hi Mountainbiker,

Any code you post in nabble is not showing up in emails sent out to
people who subscribe to the mailing list - a known issue with nabble.
Would recommend you subscribe/post via normal email to ensure your
code is sent to all.

I see, so your issue isn't with wildcards per se, but with stopwords.
I recall discussions on this here in the past.  See messages from the
list archive like this:

  http://markmail.org/message/7kcdjigjfjigtlyj

Joe

On Wed, Oct 12, 2016 at 9:32 AM, mountainbiker
<[hidden email]> wrote:

> It seemed like there might be a bug with eXist 2.2.
>
> The only way I get it to work is by including all the stopwords (when I only
> needed "are")
>
>
> That is, this did not work for me
>
>
>
>
>
> --
> View this message in context: http://exist.2174344.n4.nabble.com/Allowing-of-Lucene-Stopwords-is-not-working-for-me-tp4670892p4670898.html
> Sent from the exist-open mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Exist-open mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/exist-open

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: Allowing of Lucene Stopwords is not working for me

mountainbiker
In reply to this post by mountainbiker
I was wondering why you were talking about wildcards, but I see between Nabble and the mailing list the "bold" markup was changed to "*".

I had tried every example in the various documentation sets and in the forum.  The only one that I had success with was Jens' response to me where all the stopwords were brought back into play.  I could not find any successful way with eXist 2.2 to selective remove just the stopwords I needed reintroduced.

Hopefully eXist 3 will come out of the "release candidate" state, and we'll get approval to upgrade -- where this stopword bug has most likely been addressed.