Mitch Pronschinske is the Lead Research Analyst at DZone. Researching and compiling content for DZone's research guides is his primary job. He likes to make his own ringtones, watches cartoons/anime, enjoys card and board games, and plays the accordion. Mitch is a DZone Zone Leader and has posted 2576 posts at DZone. You can read more from them at their website. View Full User Profile

Automata Invasion: Finite-State Technology in Lucene

05.21.2012
| 4132 views |
  • submit to reddit




Here's another great presentation from the just-finished Lucene Revolution 2012 with Robert Muir of Lucid Imagination and Michael Mccandless (a DZone MVB) from IBM.

Finite-state technology, including automata and weighted finite state transducers (wFSTs), are compact data structures well suited to text processing and searching applications. Low level support for both automata and wFSTs is now available in Lucene and has recently enabled a number of surprisingly powerful improvements. In this joint talk, Robert Muir and Michael McCandless will provide an overview of finite-state technology and then describe how it's used today in Lucene: synonym filtering, fuzzy queries, respelling/suggesting, terms dictionary, in-memory postings format (MemoryPostingsFormat) and Japanese analysis (Kuromoji analyzer).

Download session slide