Yonik Seeley is the creator of Apache Solr and the Chief Open Source Architect and Co-Founder at Lucid Imagination, a company dedicated to development and support of Lucene/Solr. He's also an Apache Lucene/Solr PMC member and committer. Yonik has posted 10 posts at DZone. You can read more from them at their website. View Full User Profile

Solr relevancy function queries

04.21.2011
| 18606 views |
  • submit to reddit

Lucene’s default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores.
Solr has now exposed these factors as function queries.

  • docfreq(field,term) returns the number of documents that contain the term in the field.
  • termfreq(field,term) returns the number of times the term appears in the field for that document.
  • idf(field,term) returns the inverse document frequency for the given term, using the Similarity for the field.
  • tf(field,term) returns the term frequency factor for the given term, using the Similarity for the field.
  • norm(field) returns the “norm” stored in the index, the product of the index time boost and then length normalization factor.
  • maxdoc() returns the number of documents in the index, including those that are marked as deleted but have not yet been purged.
  • numdocs() returns the number of documents in the index, not including those that are marked as deleted but have not yet been purged.


We can use these new functions to develop and test custom ranking functions! For example, if we wanted simple tf*idf for a given term, we could issue the following function query (if you have solr’s example server running with exampledocs indexed, just click on the following link):

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf(text,memory),idf(text,memory))

To avoid repeating the term we are using (text,memory) we can pull the field and term out into other query parameters:

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf($f,$t),idf($f,$t))&f=text&t=memory

Utilizing Solr’s new ability to sort by arbitrary function queries, we could now sort a query by the number of times a specific term appears in each document. The following query searches for documents matching “DDR”, but then sorts by the number of times “memory” appears in the text field.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=termfreq(text,memory) desc

We could also utilize the “norm” function to sort by the longest field first. This assumes there were no index time boosts and thus the norm is just the standard length normalizationf actor.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=norm(text) asc

Given Solr’s plethora of function queries (including the new spatial queries that return distance between points), the possibilities are almost endless. To try this out, you’ll need a recent nightly build of Solr 4.0-dev, or LucidWorks Enterprise, our commercial version of Solr.

References
Published at DZone with permission of its author, Yonik Seeley. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Shoaib Almas replied on Sat, 2012/08/25 - 5:49am

Wow! That’s exactly what I was looking for! Really nice new feature!
Unfortunately seems that it is not yet available for the last stable version (currently 3.4). Any rough idea about when will solr4 be published?
Is there a way I could plug the relevancy function queries feature in solr 3.4?

Java Forum

James Walker replied on Sat, 2012/10/06 - 2:52am

We could also utilize the “norm” function to sort by the longest field first. This assumes there were no index time boosts and thus the norm is just the standard length normalizationf actor The McMinn Law Firm

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.