Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2574 posts at DZone. You can read more from them at their website. View Full User Profile

Automata Invasion: Finite-State Technology in Lucene

05.21.2012
| 4052 views |
  • submit to reddit




Here's another great presentation from the just-finished Lucene Revolution 2012 with Robert Muir of Lucid Imagination and Michael Mccandless (a DZone MVB) from IBM.

Finite-state technology, including automata and weighted finite state transducers (wFSTs), are compact data structures well suited to text processing and searching applications. Low level support for both automata and wFSTs is now available in Lucene and has recently enabled a number of surprisingly powerful improvements. In this joint talk, Robert Muir and Michael McCandless will provide an overview of finite-state technology and then describe how it's used today in Lucene: synonym filtering, fuzzy queries, respelling/suggesting, terms dictionary, in-memory postings format (MemoryPostingsFormat) and Japanese analysis (Kuromoji analyzer).

Download session slide