Mitch Pronschinske is the Lead Research Analyst at DZone. Researching and compiling content for DZone's research guides is his primary job. He likes to make his own ringtones, watches cartoons/anime, enjoys card and board games, and plays the accordion. Mitch is a DZone Zone Leader and has posted 2576 posts at DZone. You can read more from them at their website. View Full User Profile

Heavy Committing: Flexible Indexing in Lucene 4

  • submit to reddit

Heavy Committing: Flexible Indexing in Lucene 4, Uwe Schindler, SD DataSolutions GmbH from Lucene Revolution on Vimeo.

Apache Lucene's next major release, 4, will introduce lots of flexibility into indexing, but also fundamental changes to the well-known APIs: It features a new and consistent, 4-dimensional iteration API on top of a low-level, pluggable codec API giving applications full control over the postings data. Terms are now arbitrary opaque bytes enabling users to store terms in any encoding, not necessarily UTF-8, natively in the index (e.g. numeric fields). Currently under development is a higher performance postings iteration API, enabling interesting codecs based on recent encoding algorithms to work effectively. Several codecs have already been created, including the default "standard" codec, which enables sizable RAM reduction for searchers, and a "pulsing" codec that inlines postings data directly into the terms dictionary, which provides a solid performance boost for primary key fields. A lot of new codecs are under development. In this talk, Uwe presents an overview of all of these exciting changes, as well as several concrete, real-world examples of how applications can tap into these new features.

Download session slides.