configurable analyzer (like in Solr)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

configurable analyzer (like in Solr)

Immanuel Normann
Hi,

I have recently started learning Solr the Lucene based webapp. My actual intention was to learn about Lucene features without touching Java. In this sense I see Solr as a kind of REST interface to Lucene that allows index configuration based on XML-files.

Now I am coming back to exist with the intention to benefit from my fresh layman's Solr knowledge in order to apply it where possible in exist. In particular I am interested in customizing analyzers. From the online documentation
http://exist-db.org/exist/apps/doc/lucene.xml#D2.2.4.27
I understand that one can assign to each indexed element a certain analyzer - like e.g.:
<analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
<text qname="TITLE" analyzer="ws"/>

It seems that Solr is more flexible as it allows to combine inside an analyzer a tokenizer with various filter. And this is easy to configure (in a schema.xml in Solr) as the following example illustrates:

      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

Wouldn't it be great to have something similar in exist?

How would you accomplish an analyzer otherwise with current means of exist? I assume you would have to implement the analyzer yourself on top of the Lucene library.


------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: configurable analyzer (like in Solr)

wshager

Wouldn't it be great to have something similar in exist?

+1
 


--

W.S. Hager
Lagua Web Solutions
http://lagua.nl



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open