Hacking on GraphHopper - a Java road routing engine. Peter has posted 62 posts at DZone. You can read more from them at their website. View Full User Profile

Use Cases of Faceted Search for Apache Solr

12.09.2010
| 19135 views |
  • submit to reddit

In this post I write about some use cases of facets for Apache Solr. Please submit your own ideas in the comments.
This post is split into the following parts:

  • What are facets?
  • How do you enable and use simple facets?
  • What are other use cases?
     - Category navigation
     - Autocompletion
     - Trending keywords or links
     - Rss feeds
  • Conclusion


What are facets?

In Apache Solr, elements for navigational purposes are named facets. Keep in mind that Solr provides filter queries (specified via http parameter fq) which filter out documents from the search result. In contrast, facet queries only provide information (count of documents) and do not change the result documents, i.e. they provide ‘filter queries for future queries’. So define a facet query and then see how much documents I can expect if I would apply the related filter query.

But a picuture – from this great facet-introduction – is worth a thousand words:

What do you see?

  • You see different facets like Manufacturer, Resolution, …
  • Every facet has some constraints, where the user can filter its search results easily
  • The breadcrumb shows all selected contraints and allows removing them

All these values can be extracted from Solrs’ search results and can be defined at query time, which looks surprising if you come from FAST ESP. Nevertheless the fields on which you do faceting needs to be indexed and untokenized. E.g. string or integer. But the type of fields where you want to do faceting mustn’t be the default ‘text’ type, which is tokenized.

In Solr you have

The normal facets can be useful if your documents have a manufacturer string field e.g. a document can be within the ‘Sony’ or ‘Nikon’ bucket. In contrast you will need facet queries for integers like pricing. For example if you specify a facet query from 0 to 10 EUR Solr will calculate on the fly all documents which fall into that bucket. But the facet queries becomes relative unhandy if you have several identical ranges like 0-10, 10-20, 20-30, … EUR. Then you can use range queries.

Date facets are special range queries. As an example look into this screenshot from jetwick:

where here the interval (which is called gap) for every bucket is one day.

For a nice introduction into facets have a look into this publication or use the solr wiki here.

How do you enable and use simple facets?

As stated before they can be enabled at query time. For the http API you add “&facet=true&facet.field=manu” to your normal query “http://localhost:8983/solr/select?q=*:*”. For SolrJ you do:

new SolrQuery("*:*").setFacet(true).addFacetField("manu");

In the Xml returned from the Solr server you will get something like this – again from this post:

<lst name="facet_fields">
<lst name="manu">
<int name="Canon USA">17</int>
<int name="Olympus">12</int>
<int name="Sony">12</int>
<int name="Panasonic">9</int>
<int name="Nikon">4</int>
</lst>
<pre></lst></pre>

To retrieve this with SolrJ you don’t need to touch any Xml, of course. Just get the facet objects:

List<FacetField> facetFields = queryResponse.getFacetFields();

To append facet queries specify them with addFacetQuery:

solrQuery.addFacetQuery("quality:[* TO 10]").addFacetQuery("quality:[11 TO 100]");

And how you would query for documents which does not have a value for that field? This is easy: q=-field_name:[* TO *]

Now I’ll show you like I implemented date facets in jetwick:

q.setFacet(true).set(“facet.date”, “{!ex=dt}dt”).
set(“facet.date.start”, “NOW/DAY-6DAYS”).
set(“facet.date.end”, “NOW/DAY+1DAY”).
set(“facet.date.gap”, “+1DAY”);

 

With that query you get 7 day buckets which is visualized via:

It is important to note that you will have to use local parameters like {!ex=dt} to make sure that if a user applies a facet (uses the facet query as filter query) then the other facet queries won’t get a count of 0. In the picture the filter query was fq={!tag=dt}dt:[2010-12-04T00:00:00.000Z+TO+2010-12-05T00:00:00.000Z]. Again: filter query needs to start with {!tag=dt} to make that working. Take a look into the DateFilter source code or this for more information.

Be aware that you will have to tune the filterCache in order to keep performance green. It is also important to use warming queries to avoid time outs and pre-fill caches with old ‘heavy’ used data.

What are other use cases?

1. Category navigation

The problem: you have a tree of categories and your products are categorized in multiple of those categories.

There are two relative similar solutions for this problem. I will describe one of them:

  • Create a multivalued string field called ‘category’. Use the category id (or name if you want to avoid DB queries).
  • You have a category tree. Make sure a document gets not only the leaf category, but all categories until the root node.
  • Now facet over the category field with ‘-1′ as limit
  • But what if you want to display only the categories of one level? E.g. if you don’t want other level at a time or if they are too much.
    Then index the category field ala <level>_category. For that you will need the complete category tree in RAM while indexing. Then use facet.prefix=<level>_ to filter the category list for the level
  • Clicking on a category entry should result in a filter query ala fq=category:”<levle>_categoryId”
  • The little tricky part is now that your UI or middle tier has to parse the level e.g. 2 and the append 2+1=3 to the query: facet.prefix=3_
  • If you filter the level then one question remains:
    Q: how can you display the path from the selected category until the root category?
    A: Either get the category parents via DB, which is easy if you store the category ids in Solr – not the category names.
    Or get the parents from the parameter list which is a bit more complicated but doable. In this case you’ll need to store the category names in Solr.

Please let me know if this explanation makes sense to you or if you want to see that in action – I don’t want to make advertisments for our customers here :-)

BTW: The second approach I have in mind is: instead of using facet.prefix you can use dynamic fields ala category_<level>_s

2. Autocompletion

The problem: you want to show suggestions as the user types.

You’ll need a multivalued ‘tag’ field. For jetwick I’m using a heavy noise word filter to get only terms ‘with information’ into the tag field, from the very noisy tweet text. If you are using a shingle filter you can even create phrase suggestions. But I will describe the “one more word” suggestion here, which will only suggest the next word (not a complete different phrase).

To do this create a the following query when the user types in some characters (see getQueryChoices method of SolrTweetSearch):

  • Use the old query with all filter queries etc to provide a context dependent autocomplete (ie. only give suggestions which will lead to results)
  • split the query into “completed” terms and one “to do” term. E.g. if you enter “michael jack”
    Then michael is complete (ends with space) and jack should be completed
  • set the query term of the old query to michael and add the facet.prefix=jack
  • set facet limit to 10
  • read the 10 suggestions from facet field but exclude already completed terms.

The implementation for jetwick which uses Apache Wicket is available in the SearchBox source file which uses MyAutoCompleteTextField and the getQueryChoices method of SolrTweetSearch. But before you implement autocomplete with facets take a look into this documentation. And if you don’t want to use wicket then there is a jquery autocomplete library especially for solr – no UI layer required.

3. Trending keywords or links

Similar to autocomplete you will need a tag or link field in your index. Then use the facet counts as an indicator how important a term is. If you now do a query e.g. solr you will get the trending keywords and links depending on the filters. E.g. you can select different days to see the changes:

The keyword panel is implemented in the TagCloudPanel and the link list is available as UrlTrendPanel.

Of course it would be nice if we would get the accumulated score of every link instead of a simple ‘count’ to prevent spammers from reaching this list. For that, look into this JIRA issue and into the StatsComponent. Like I explained in the JIRA issue this nice feature could be simulated by the results grouping feature.

4. Rss feeds

If you log into at jetwick.com you’ll see this idea implemented. Every user can have different saved searches. For example I have one search for ‘apache solr’ and one for ‘wikileaks’. Every search could contain additional filters like only German language or sort against retweets. Now the task is to transform that query into a facet query:

  • insert AND’s between the query and all the filter query
  • remove all date filters
  • add one date filter with the date of the last processed search (‘last date’)

Then you will see how many new tweets are available for every saved searches:

Update: no need to click refresh to see the counts. The count-update is done in background via JavaScript.

Conclusion

There are a lot of applications for faceted search. It is very convinient to use them. Okay, the ‘local parameter hack’ is a bit daunting, but hey: it works :-)

It is nice that I can specify different facets for every query in Solr, with that feature you can generate personalized facets like it was explained under “rss feeds”.

One improvement for the facets implemented in Solr could be a feature which does not calculate the count. Instead it sums up a fieldA for documents with the same value in fieldB or even returns the score for a facet or a facet query. To improve the use case “Trending keywords or links”.

 

From http://karussell.wordpress.com/2010/12/08/use-cases-of-faceted-search-for-apache-solr/

Published at DZone with permission of its author, Peter Karussell.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Frank Hardisty replied on Thu, 2010/12/09 - 4:23am

Thanks a lot, I've been curious about faceted search in Solr.

Peter Karussell replied on Thu, 2010/12/09 - 10:28am

Glad it brings a bit light into this topic. Check out the both articles I mentioned to get a better starting point:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr

http://www.packtpub.com/article/faceting-in-solr-1.4-enterprise-search-server

John Ericksen replied on Thu, 2010/12/09 - 8:31pm in response to: Peter Karussell

I have a pretty neat use case for faceted searching you didn't mention.  I dont know if it is a common approach...

At my work we developed a role based searching technique using facets, basically showing and hiding results based upon what roles you have/lack.

This was a very interesting exercise since we accounted for role combinations (ands / ors / nots, etc) and ended up minimizing / simplifying each binary clause into Disjunctive Normal Form during indexing.  This form allows us to leverage the multiple value 'or' based search through solr faceted searches to show only the results applicable.

Peter Karussell replied on Fri, 2010/12/10 - 3:21am in response to: John Ericksen

John,

you enhanced all the facets with the role restrictions to show only applicable counts (?)

Why didn't you use a filter query for a multivalued field 'role'? I mean, for a user which has the role anonymous you already have a parameter ala fq=role:anonymous in the query (to avoid that he'll see admin docs)

so facets will be calculated only with this restriction. Or did I misunderstand what you were achieving?

John Ericksen replied on Fri, 2010/12/10 - 11:19am

Peter,

Yes, you have the basic idea correct.... The multivalued role field query is what we used in conjunction with faceting on the roles field.

We did not use the facet counting feature, the only thing facets do for us in this case is  allow us to perform the filter query on the given roles.

So, for an example, if you index a document with the following roles: Admin or Edit

Then the following searches would find the document:

  • fq=role:Admin,role:Global
  • fq=role:Edit,role:Global

And the following searches would not find the document:

  • fq=role:Global
  • fq=role:RoleN,role:Global

 This is a distilled example, as I mentioned in my previous post, we took this to the next level to allow for more complex boolean statements than just disjunctions. 

Peter Karussell replied on Fri, 2010/12/10 - 4:05pm

The "more complicated" DNF stuff is clear to me :-)

Are you sure you mean facets not filter queries? (hope I'm not asking enirely stupid things now ... :-))

fq stands for filter query, not for facet query. facets are only to show how many documents would be if I *would* apply the facet as filter query. Or are you stating exactly this?

But why a person with roleA would have to know how many docs there are for a different roleB?

John Ericksen replied on Fri, 2010/12/10 - 7:12pm in response to: Peter Karussell

This conversation has jogged my memory a bit on why we chose to use the faceted approach to implement this feature.

First of all, we are not using the sum() feature inherent in the faceted browsing approach.  As you point out, it would not make sense to give the user the count of documents in roles they have and don't have.  So if we are not using this piece of the facet feature the results will just a regular filter work just as well?

 We chose this approach on the advice of a small section of the "Solr 1.4 Enterprise Search Server" book by David Smiley & Eric Pugh through Packt Publishing specifically at the end of Chapter 7.  It specifically points to facets "if you need to control access to document within your index and must control it based on the users accessing the content, then one approach is to leverage the faceted search capabilities of Solr."  So, we took that advice and ran with it.

The solution as it stands today works just fine and we have been very happy with the results.  Although we are using filters here (not facet queries) to restrict the results.

 What do you think Peter?  Can / should I use filters here instead of facets?

Nevertheless, I am going to try to play around with replacing this with plain filters.

Peter Karussell replied on Sat, 2010/12/11 - 9:28pm

> So, we took that advice and ran with it.

ah, ok. I think I got it: so you constructed the facets (via DNF) to see how a "limited view" will look like for an e.g. anonymous role. Yes, you'll need to combine the query terms, the role restriction and additional filters to create the facets in order to get the correct count for the "limited view". Did I correctly understand this?

Then I think you could use filter queries with simple facets (without the DNF). The facet calculation is made from the *filtered* document set. E.g. if you already have a filter ala fq=role:editor then you won't need this filter in the facets

> The solution as it stands today works just fine and we have been very happy with the results.  

don't touch a running system ;-) I just wanted to understand...

John Ericksen replied on Sun, 2010/12/12 - 3:11pm

> Did I correctly understand this?

Yes, I think you got it.

From my investigations and what you said it does look like I can just use filters (dump the facets) for this case.

Great sanity check.. thanks for youir input.

Peter Karussell replied on Sun, 2010/12/12 - 5:53pm in response to: John Ericksen

Glad to hear that and thank you too for your input!

Anurag Verma replied on Wed, 2011/08/31 - 1:34am

Hi,

   I am implementing solr in my java code.I have implemented successfully Facet Field,facet Query,Facet Date in the way given below.

QueryRequest queryRequest = new QueryRequest(new AppendedSolrParams(solrParams, SolrParams
                                .toSolrParams(namedList)),SolrRequest.METHOD.valueOf("POST"));

 

     List      facetValueList = responseSolr.getFacetFields();
     List      facetDateList = responseSolr.getFacetDates();
     Map     facetQueryMap = responseSolr.getFacetQuery();

Now i want to implement "Facet Range".I am unable to do it for facet.range. Please help me in the given case.

Thanks in advance.

Anurag

Michael Eric replied on Wed, 2012/09/26 - 3:38pm

Hi. ! Great article thanks. But i’ve got a question. Why you have mouved jetwick from solr to elasticsearch ? And did you meet some problems ? I would like use elasticsearch with riak and i’m not very comfortable with all these projects. You expérience Will be very usefull.

linux archive 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.