Yonik Seeley is the creator of Apache Solr and the Chief Open Source Architect and Co-Founder at Lucid Imagination, a company dedicated to development and support of Lucene/Solr. He's also an Apache Lucene/Solr PMC member and committer. Yonik has posted 10 posts at DZone. You can read more from them at their website. View Full User Profile

Solr Result Grouping / Field Collapsing Improvements

02.28.2011
| 14585 views |
  • submit to reddit

I previously introduced Solr’s Result Grouping, also called Field Collapsing, that limits the number of documents shown for each “group”, normally defined as the unique values in a field or function query.

Since then, there have been a number of bug fixes, performance improvements, and feature enhancements. You’ll need a recent nightly build of Solr 4.0-dev, or the newly released LucidWorks Enterprise v1.6, our commercial version of Solr.

Feature Enhancements

One improvement is the ability to group by query via the group.query parameter. This functionality is very similar to facet.query, except that it retrieves the top documents that match the query, not just the count. This has many potential uses, including always getting the top documents for specific groups, or defining custom groups such has price ranges.

Another useful capability is the addition of the group.main parameter. Setting this to true causes the results of the first grouping command to be used as the main result list in a flattened response format that legacy clients will be able to handle.

For example, the grouped response format normally returns highly structured results under “grouped”.
…&q=solr+memory&group=true&group.field=manu_exact


 "grouped":{
"manu_exact":{
"matches":6,
"groups":[{
"groupValue":"Apache Software Foundation",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server",
"manu":"Apache Software Foundation"}]
}},
{
"groupValue":"Corsair Microsystems Inc.",
"doclist":{"numFound":2,"start":0,"docs":[
{
"id":"VS1GB400C3",
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
"manu":"Corsair Microsystems Inc."}]
}},
[...]
If we add group.main=true to the request, then we get back a much more familiar looking response (i.e. it looks like a normal non-grouped response):
…&q=solr+memory&group=true&group.field=manu_exact&group.main=true

 "response":{"numFound":6,"start":0,"docs":[
{
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server",
"manu":"Apache Software Foundation"},
{
"id":"VS1GB400C3",
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
"manu":"Corsair Microsystems Inc."},

One can also use the group.format=simple parameter to select this simplified flattened response within the normal “grouped” section of the response.

Other recent enhancements include support for debugging explain, highlighting, faceting, and the ability to handle missing values in the grouping field by treating all documents without a value as being in the “null” group.

Performance Enhancements

There have been a number of performance enhancements, including an improvement to the short circuiting logic… cutting off low ranking documents earlier in the process. This important optimization resulted in a speedup of about 9x for collapsing on certain fields!

Collapsing on string fields was further optimized with specialized code that worked on ord values instead of the string values. This doubled the performance yet again!

Please see the Solr Wiki for further documentation on all of result grouping’s capabilities and parameters.

Published at DZone with permission of its author, Yonik Seeley.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Fast Zhong replied on Tue, 2011/05/24 - 1:18am

Hi, this is very useful feature, but from my test, two things are still missing: 1. solrj does not support this feature yet; 2. "matches" returns the no. of doc, not the no. of groups, this makes front end pagination impossible (if showing result by group). Any news or update on these? Thanks.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.