My passion is building crawlers and search engines. In particular, I specialize in building vertical search engines like Indeed.com, Homethinking.com, Bright.com and Enormo.com (all companies I've worked with). I've also worked on products such as Atlassian Jira and Confluence to improve their search capabilities. Kelvin has posted 22 posts at DZone. You can read more from them at their website. View Full User Profile

Connecting Redis to Solr For Boosting Documents

06.23.2012
| 6199 views |
  • submit to reddit

There are a number of instances in Solr where it's desirable to retrieve data from an external datastore for boosting purposes instead of trying to contort Solr with multiple queries, joins etc.

Here's a trivial example:

Jobs are stored as documents in Solr. Users of the application can rank a job from 1-10. We need to boost each job with the user's rank if it exists.

Now, to try to attempt to model this fully in Solr would be fairly inefficient, especially for large # of jobs and/or users, since each time a user ranks a job, the searcher has to reload in order for that data to be available for searching.

A much more efficient method of implementing this, is by storing the rank data in a nosql store like Redis, and retrieving the rank at query-time, using it to boost the documents accordingly.

This can be accomplished using a custom FunctionQuery. I've blogged about how to create custom function queries in Solr before, so this is simply an application of the subject.

Here's the code:

public class RedisValueSourceParser extends ValueSourceParser {
  @Override public ValueSource parse(FunctionQParser fp) throws ParseException {
    String dataType = fp.parseArg(); // either z (sortedset) or h (hash)
    if (!dataType.equalsIgnoreCase("z") && !dataType.equalsIgnoreCase("h")) {
      throw new ParseException("Expecting first arg to be either z (sortedset) or h (hash)");
    }
    String redisKey = fp.parseArg();
    String field = fp.parseArg();
    return new RedisValueSource(dataType, redisKey, field);
  }
}

This FunctionQuery accepts 3 arguments:
1. dataType, either a Redis sortedset or hash
2. the key to the Redis collection
3. the field to use as an id field

Here's what the salient part of RedisValueSource looks like:

@Override public DocValues getValues(Map context, IndexReader reader) throws IOException {
    final String[] lookup = FieldCache.DEFAULT.getStrings(reader, field);
    final Jedis jedis = new Jedis("localhost");
    return new DocValues() {    

      @Override public String strVal(int doc) {
        final String id = lookup[doc];
        String result = redisDataType.equalsIgnoreCase("h") ?
            jedis.hget(redisKey, id) : Double.toString(jedis.zscore(redisKey, id));
        return result;
      }

      @Override public String toString(int doc) {
        return strVal(doc);
      }
    };
  }

 

From here, you can use the following Solr query to perform boosting based on the Redis value:
http://localhost:8983/solr/select?defType=edismax&q=cat:electronics&bf=redis(h,bar,id)&debugQuery=on

The explain output looks like this:

3.4664698 = (MATCH) sum of:
  1.070082 = (MATCH) weight(cat:electronics in 2), product of:
    0.80067647 = queryWeight(cat:electronics), product of:
      1.3364723 = idf(docFreq=14, maxDocs=21)
      0.59909695 = queryNorm
    1.3364723 = (MATCH) fieldWeight(cat:electronics in 2), product of:
      1.0 = tf(termFreq(cat:electronics)=1)
      1.3364723 = idf(docFreq=14, maxDocs=21)
      1.0 = fieldNorm(field=cat, doc=2)
  2.3963878 = (MATCH) FunctionQuery(redis(h,bar,id)), product of:
    4.0 = 4.0
    1.0 = boost
    0.59909695 = queryNorm
Published at DZone with permission of its author, Kelvin Tan. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

David Smiley replied on Fri, 2012/07/06 - 12:26pm

How does this perform?

The standard Solr solution involves ExternalFileField but that is only half of a full implementation since you have to generate the file and trigger a commit.  Commits needn't be with every change, but spaced out enought to meet requirements.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.