Rafal Kuc is a team leader and software developer. Right now he is a software architect and Solr and Lucene specialist. Mainly focused on Java, but open on every tool and programming language that will make the achievement of his goal easier and faster. Rafal is also one of the founders of solr.pl site where he tries to share his knowledge and help people with their problems. Rafał is a DZone MVB and is not an employee of DZone and has posted 75 posts at DZone. You can read more from them at their website. View Full User Profile

Solr Optimization – document cache

  • submit to reddit

A few months ago (here) we looked at filterCache. I’ve decided to update the optimization topic and take a look at the documentCache.

What it contains ?

So let’s start with information about the data that documentCache holds. documentCache contains Lucene documents that were fetched from the index. So little and so much.

What it is used for ?

Every object (Lucene document) stored in documentCache contains a list of references to the fields, that are stored with the document. Thanks to this, when a document is fetched and put into the cache it doesn’t have to be fetched again while processing another query. And this is why the number of I/O operations is reduces when rendering the query results list.

What to remember when using documentCache ?

When using documentCache you have to remember about to important things:

  1. documentCache can’t be autowarmed because it operates on identifiers that change after every commit operation.
  2. If you use lazy field loading (enableLazyFieldLoading=true) documentCache functionality is somehow limited. This means that the document stored in the documentCache will contain only those fields that were passed to the fl parameter. If the next query will try to get additional fields for the document stored in the cache, those additional fields will be fetched from the index.


The standard documentCache definition looks like this:


Let’s recall those parameters:

  • class – class implementing the cache,
  • size – the maximum cache size,
  • initialSize – initial size of the cache.

How to configure

The usual question about cache is – what size should I set ? According to the information from the Solr wiki (http://wiki.apache.org/solr/SolrCaching#documentCache), the maximum size shouldn’t be less than the product of concurrent queries and the maximum number of documents fetched by the query. A simple relation that should ensure that Solr won’t have to fetch documents from the index during query processing.

Last few words

In the case of documentCache we don’t have to worry about how we construct our queries to properly use this cache. But please remember that documentCache requires memory, the more memory, the more field you stored in the index.

Published at DZone with permission of Rafał Kuć, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)