Rafal Kuc is a team leader and software developer. Right now he is a software architect and Solr and Lucene specialist. Mainly focused on Java, but open on every tool and programming language that will make the achievement of his goal easier and faster. Rafal is also one of the founders of solr.pl site where he tries to share his knowledge and help people with their problems. Rafał is a DZone MVB and is not an employee of DZone and has posted 75 posts at DZone. You can read more from them at their website. View Full User Profile

Solr Optimization – filter cache

06.11.2011
| 7464 views |
  • submit to reddit
Today’s entry is dedicated to one type of cache in the Solr – filter cache. I will try to explain what it does, how to configure it and how to use it in an efficient way.

What it is used for ?

Let’s start from the inside. FilterCache stores unordered collection of identifiers of documents. Of course, these are not the IDs defined in the schema.xml file as a unique key – Solr stores the internal IDs of the documents used by Lucene and Solr – it is worth remembering.

What it is used for ?

The main task of the filterCache is to keep results related to the use of filters. Although it is not his only use. In addition, the cache can serve as an aid for faceting mechanism (if using the TermEnum method), and for sorting when <useFilterForSortedQuery/> option is set to true in the solrconfig.xml file.

Definition

FilterCache standard definition is as follows:

<filterCache
      class="solr.FastLRUCache"
      size="16384"
      initialSize="4096"
      autowarmCount="4096" />

You have the following configuration options:

  • class - class is responsible for implementation. For filterCache recommend using solr.FastLRUCache, which is characterized by greater efficiency in a larger number of operations GET, PUT than that.
  • size - the maximum number of entries that can be found in the cache.
  • initialSize - initial size of the cache.
  • autowarmCount - the number of entries that will be transcribed during the warm-up from the old to the new cache.
  • minSize - value specifying to which the number of entries Solr will try to reduce the cache in case of full restoration.
  • acceptableSize - if Solr will not be able to bring the number of entries to the specified by parameter minSize, the value acceptableSize will be the one to which it will seek a new one.
  • cleanupThread - the default value is false. If set to true to clean the cache will be used a separate topic.


In most cases, the use of size , and initialSize and autowarmCount parameters is quite sufficient.

How to configure ?

The size of the cache should be determined on the basis of queries that are sent to Solr. The maximum size filterCache should be at least as large as the number of filters (with values) that we use. This means that if your application is, in a given period of time, using 2000 for example (fq parameters with values), the size parameter should be set to a minimum value of 2000.

Efficient use

However, the configuration of the cache is not sufficient – we need to make the query to be able to use it. Take the following query for example:

 

q=name:solr+AND+category:ksiazka+AND+section:ksiazki


At first glance, the query is the correct. However, there is a problem – it does not use filterCache. The entire request will be handled by queryResultCache and will create a single entry in it. Let’s modify it a bit and send the following query.

q=name:solr&fq=category:ksiazka&fq=section:ksiazki


What happens now? As in the previous case, an entry will be created in queryResultCache. Additionaly there will be two entries in filterCache created. Now let’s look at the next query:

q=name:lucene&fq=category:ksiazka&fq=section:ksiazki


This query would create another entry in the queryResultCache and would use two already existing entries in the filterCache. Thus the execution time of the query would be reduced and the query would be less demanding for the I/O.

However, let’s look at the query in the following form:

q=name:lucene+AND+category:ksiazka+AND+section:ksiazki

 

Solr would not be able to use any information from the cache and would have to collect all the information for the results of the Lucene index.

Last few words

As you can see, the correct way to configure cache is not what guarantee that Solr will be able to use it. The efficiency of the target implementation depends on how the queries are send to Solr. It is worth remembering when planning implementation.

References
Published at DZone with permission of Rafał Kuć, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: