Rafal Kuc is a team leader and software developer. Right now he is a software architect and Solr and Lucene specialist. Mainly focused on Java, but open on every tool and programming language that will make the achievement of his goal easier and faster. Rafal is also one of the founders of solr.pl site where he tries to share his knowledge and help people with their problems. Rafał is a DZone MVB and is not an employee of DZone and has posted 75 posts at DZone. You can read more from them at their website. View Full User Profile

Solr 4.0: DocTransformers first look

12.13.2011
| 4529 views |
  • submit to reddit

In todays entry we will look at the next feature that will come with version 4.0 of Apache Solr. We will look at the functionality which enables us to modify the fields in Solr result lists.

Do I need it ?

Till now, we didn’t have much choice when it came to the results returned by Solr. When Solr 4.0 is published we will be given a new tool called DocTransformers. This feature enables us to modify the fields of the documents returned in the search results by Solr. Looking at what is available now we can for example change the names of the fields returned or mark the documents that were added by the QueryElevationComponent. Right now there are only a few implementation, but implementing your own DocTranformer is not hard.

What is already available ?

At the exact moment we are writing this, the following transformers are available:

  • One that enables you to mark the documents that were added by the QueryElevationComponent.
  • One that enables you to add the explain information to the document.
  • One that enables you to add static value as a field of the document.
  • One that enables you to add the shard if from which the document was fetched.
  • One that enables you to add the docid as the document field (identifier used by Lucene).

How to use DocTransformers ?

Lets look at how to use DocTransformers. To do that I’ve downloaded trunk version of Apache Solr (4.0) from the svn repository and I’ve run the example deployment. Next, I’ve indexed the example data and I’ve run the following query:

 
http://localhost:8983/solr/select?q=encoded&fl=name,score,[docid],[explain]

If you look at the fl parameter you will notice that we told Solr that we want the name field in the results, the score of the document and two DocTransformers: [docid] and [explain]. In result I’ve got the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">2</int>
  <lst name="params">
    <str name="q">encoded</str>
    <str name="fl">name,score,[docid],[explain]</str>
  </lst>
 </lst>
 <result name="response" numFound="2" start="0" maxScore="0.50524884">
 <doc>
  <str name="name">Test with some GB18030 encoded characters</str>
  <float name="score">0.50524884</float>
  <int name="[docid]">0</int>
  <str name="[explain]">
  0.50524884 = (MATCH) weight(text:encoded in 0) [DefaultSimilarity], result of:
    0.50524884 = score(doc=0,freq=1.0 = termFreq=1), product of:
      1.0000001 = queryWeight, product of:
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.3092536 = queryNorm
      0.5052488 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.15625 = fieldNorm(doc=0)
  </str>
 </doc>
 <doc>
  <str name="name">Test with some UTF-8 encoded characters</str>
  <float name="score">0.4041991</float>
  <int name="[docid]">25</int>
  <str name="[explain]">
  0.4041991 = (MATCH) weight(text:encoded in 25) [DefaultSimilarity], result of:
    0.4041991 = score(doc=25,freq=1.0 = termFreq=1), product of:
      1.0000001 = queryWeight, product of:
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.3092536 = queryNorm
      0.40419903 = fieldWeight in 25, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.125 = fieldNorm(doc=25)
  </str>
 </doc>
</result>
</response>

As you can see, Solr did what we asked for.

Your own implementation

Let’s discuss, who to implement you own DocTransformer. Below, you have an example class named RenameFieldsTransformer from the org.apache.solr.response.transform package in Apache Solr source code. In general, all you have to do is override the following two methods from the DocTransformer class from org.apache.solr.response.transform package:

  • String getName() – method returning transformers name,
  • void transform(SolrDocument doc, int docid) – method which makes the actual transformation.

Implementation looks like this:

public class RenameFieldsTransformer extends DocTransformer {
 final NamedList<String> rename;

 public RenameFieldsTransformer( NamedList<String> rename ) {
  this.rename = rename;
 }

 @Override
 public String getName() {
  StringBuilder str = new StringBuilder();
  str.append( "Rename[" );
  for( int i=0; i< rename.size(); i++ ) {
   if( i > 0 ) {
    str.append( "," );
   }
   str.append( rename.getName(i) ).append( ">>" ).append( rename.getVal( i ) );
  }
  str.append( "]" );
  return str.toString();
 }

 @Override
 public void transform(SolrDocument doc, int docid) {
  for( int i=0; i<rename.size(); i++ ) {
   Object v = doc.remove( rename.getName(i) );
   if( v != null ) {
    doc.setField(rename.getVal(i), v);
   }
  }
 }
}
The code shown above enables us to rename the fields returned in the results. As you can see the transform method iterates through all the values in rename class variable. The rename variable consist of name value pairs which are field name and the name it should have after the transformation. You must also remember that in order to use your own transformer you need to add it’s configuration to the solrconfig.xml file. Here is the example which can be found on Solr wiki page:
<transformer name="elevated" class="org.apache.solr.response.transform.EditorialMarkerFactory" />

To sum up

You should remember that the DocTransformer functionality is marked as experimental and could change its behavior when Lucene and Solr 4.0 are released. We will get back to this topic as soon as Solr 4.0 will be released.

References
Published at DZone with permission of Rafał Kuć, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)