Enterprise Integration Zone is brought to you in partnership with:

Full-time developer on a small team. Writing great specs and code, delivered on time. Chase is a DZone MVB and is not an employee of DZone and has posted 53 posts at DZone. You can read more from them at their website. View Full User Profile

Enabling SOLR Autocommit with a Custom Haystack Backend

07.03.2014
| 5164 views |
  • submit to reddit
 By default Django Haystack makes updates to your Solr index available for searching immediately. It does this in the simplest way possible, it commits every single update individually. That can be quite slow. I have an index with 35 million records, and under heavy write load commits of 1,000 records can slow down and take up to 5 seconds for each chunk. In extreme cases, Solr can refuse to accept that much write load at once, and throw an exception like the following:
<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">503</int>
        <int name="QTime">1492</int>
    </lst>
    <lst name="error">
        <str name="msg">Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.</str>
        <int name="code">503</int>
    </lst>
</response>

Investigating this error, I turned up a Stackoverflow post basically saying to not make so many commits. That turned up a Haystack pull request to make manual commits optional.

You can see the basic issue by looking at the logs that Haystack creates each time it issues a write request to the Solr REST API:

Finished 'http://localhost:8080/solr/my_index/update/?commit=true' (post) with body 'u'<add>...' in 0.010 seconds.

As of Solr 4.0, we have much more performant options for bulk indexing. A common setup is to use autocommit (set by default to 15 seconds) and abstain from manually committing by passing commit=false on the REST API URL. Though Haystack supports passing a commit boolean to the various back-end implementations of update, remove and clear, this parameter is never explicitly set. Instead, you can implement your own search back-end subclass to pass this value.

from haystack.backends.solr_backend import SolrEngine, SolrSearchBackend


class AutoCommitSolrSearchBackend(SolrSearchBackend):

    def update(self, index, iterable, commit=False):
        super(AutoCommitSolrSearchBackend, self).update(index, iterable, commit=commit)

    def remove(self, obj_or_string, commit=False):
        super(AutoCommitSolrSearchBackend, self).remove(obj_or_string, commit=commit)

    def clear(self, models=[], commit=False):
        super(AutoCommitSolrSearchBackend, self).clear(models, commit=commit)


class AutoCommitSolrEngine(SolrEngine):
    ''' the built-in Solr engine in Haystack performs a manual commit after each update/add/remove/clear. This
    is really slow. Solr is configured by default to auto-commit changes every 15 seconds, so there is no need to
    commit manually on every update.
    '''
    backend = AutoCommitSolrSearchBackend

Then you can use this new AutoCommitSolrEngine in your HAYSTACK_CONNECTIONS setting.

HAYSTACK_CONNECTIONS = {
     'default': {
         'ENGINE': 'myapp.serach.AutoCommitSolrEngine',
         'URL': 'http://localhost:8080/solr/my_index',
     }
}

Note: By default, indexed items will not show up in searches right away. That’s what soft-commit is for.

Hard commits are about durability, soft commits are about visibility. Understanding Transaction Logs, Soft Commit and Commit in SolrCloud - Erick Erickson

To make your auto-committed items available to search in a timely fashion, you must set a autoSoftCommit.maxTime in your Solr config. This is NOT set by default.

    <!-- softAutoCommit is like autoCommit except it causes a
         'soft' commit which only ensures that changes are visible
         but does not ensure that data is synced to disk.  This is
         faster and more near-realtime friendly than a hard commit.
      -->
    <autoSoftCommit>
      <maxTime>1000</maxTime>
    </autoSoftCommit>

Alternately, you can set autoCommit.openSearcher to true, which will cause a new searcher worker to be instantiated every time you auto-commit. This could slow down the first searches that come in after an auto commit, however.

Published at DZone with permission of Chase Seibert, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)