Michael loves building software; he's been building search engines for
more than a decade, and has been working on Lucene as a committer, PMC
member and Apache member, for the past few years. He's co-author of
the recently published Lucene in Action, 2nd edition. In his spare
time Michael enjoys building his own computers, writing software to
control his house (mostly in Python), encoding videos and tinkering
with all sorts of other things. Michael is a DZone MVB and is not an employee of DZone and has posted 49 posts at DZone. You can read more from them at their website. View Full User Profile
I looked into the curious issue I described in my last post, where the NRT reopen delays can become "spikey" (take longer) during a large merge.
show the issue, I modified the NRT test to kick off a background
optimize on startup. This runs a single large merge, creating a 13 GB
segment, and indeed produces spikey reopen delays (purple):
large merge finishes shortly after 7 minutes, after which the reopen
delays become healthy again. Search performance (green) is unaffected.
I also added Linux'd dirty bytes to the graph, as reported by /proc/meminfo;
it's the saw-tooth blue/green series on the bottom. Note that it's
divided by 10, to better fit the Y axis; the peaks are around 800-900
The large merge writes bytes a fairly high rate (around 30
MB/sec), but Linux buffers those writes in RAM, only actually flushing
them to disk every 30 seconds; this is what produces the saw-tooth
From the graph you can see that the spikey reopen delays
generally correlate to when Linux is flushing the dirty pages to disk.
Apparently, this heavy write IO interferes with the read IO required
when resolving deleted terms to document IDs. To confirm this, I ran
the same stress test, but with only adds (no deletions); the reopen
delays were then unaffected by the ongoing large merge.
So finally the mystery is explained, but, how to fix it?
I know I could tune Linux's IO,
for example to write more frequently, but I'd rather find a Lucene-only
solution since we can't expect most users to tune the OS.
possibility is to make a RAM resident terms dictionary, just for
primary-key fields. This could be very compact, for example by using an
FST, and should give lookups that never hit disk unless the OS has frustratingly swapped out your RAM data structures.
This can also be separately useful for applications that need fast
document lookup by primary key, so someone should at some point build
Another, lower level idea is to simply rate limit byte/sec
written by merges. Since big merges also impact ongoing searches,
likely we could help that case as well. To try this out, I made a
simple prototype (see LUCENE-3202), and then re-ran the same stress test, limiting all merging to 10 MB/sec:
optimize now took 3 times longer, and the peak dirty bytes (around 300
MB) is 1/3rd as large, as expected since the IO write rate is limited to
10 MB/sec. But look at the reopen delays: they are now much better
contained, averaging around 70 milliseconds while the optimize is
running, and dropping to 60 milliseconds once the optimize finishes. I
think the ability to limit merging IO is an important feature for