PHP, Python and Java developer located at Hvaler, Norway. Main interests include digital mapping, search and scalability. Mats is a DZone MVB and is not an employee of DZone and has posted 28 posts at DZone. You can read more from them at their website. View Full User Profile

Updating a Solr Analysis Plugin from 1.4.1 (Lucene 2.9) to Solr / Lucene 4.0 (current trunk)

  • submit to reddit

Three years and a couple of weeks ago I wrote a post about how to get started writing a simple Solr Analysis Plugin to handle incoming tokens and modifying them in place when an update is requested.

Since then the whole version number structure of Solr has changed (and is now in sync with the underlying Lucene version), and not surprisingly, the current API has also been updated. This means that a few small changes are required to get your analysis plugins running on the current trunk of Lucene and Solr.

The main change is that the previously named TermAttribute is now named CharTermAttribute, this means that any imports will have to change:

    - import org.apache.lucene.analysis.tokenattributes.TermAttribute;
    + import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

Any declarations of TermAttributes will need to be CharTermAttributes instead:

    - private TermAttribute termAtt;
    + private CharTermAttribute termAtt;

public NorwegianNameFilter(TokenStream input)
-     termAtt = (TermAttribute) addAttribute(TermAttribute.class);
+     termAtt = input.getAttribute(CharTermAttribute.class);

We now fetch the attribute from the current TokenStream (not sure if the old way I did it has been deprecated, but this seems to be the suggested way now). We also change any references to TermAttribute.class to CharTermAttribute.class.

The actual TermAttribute interface has also changed, meaning we’ll have to change a few of the old method calls:

    - termAtt.setTermLength(this.parseBuffer(termAtt.termBuffer(), termAtt.termLength()));
    + termAtt.setLength(this.parseBuffer(termAtt.buffer(), termAtt.length()));

.setTermLength() => .setLength()
.termBuffer => .buffer()
.termLength => .length()

The methods will behave in the same manner as in the previous API, .buffer() will retrieve a char array (char[]) which is the current buffer of the actual term which can you modify in place, while length() and setLength() retrieves the current length of the buffer (the buffer can be larger than the part used) and sets the new length of the buffer (if you’re collapsing characters).

The new implementation of our analysis filter skeleton:

    package no.derdubor.solr.analysis;
    import org.apache.lucene.analysis.Token;
    import org.apache.lucene.analysis.TokenFilter;
    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
    public class NorwegianNameFilter extends TokenFilter
        private CharTermAttribute termAtt;
        public NorwegianNameFilter(TokenStream input)
            termAtt = input.getAttribute(CharTermAttribute.class);
        public boolean incrementToken() throws IOException
            if (this.input.incrementToken())
                termAtt.setLength(this.parseBuffer(termAtt.buffer(), termAtt.length()));
                return true;
            return false;
        protected int parseBuffer(char[] buffer, int bufferLength)


Published at DZone with permission of Mats Lindh, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Saša Mutić replied on Tue, 2011/09/06 - 6:45am

Thank you, very useful ! In order to compile agains SOLR 4.0 trunk, you also need to change .setTermBuffer() => .copyBuffer()

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.