Mitch Pronschinske is the Lead Research Analyst at DZone. Researching and compiling content for DZone's research guides is his primary job. He likes to make his own ringtones, watches cartoons/anime, enjoys card and board games, and plays the accordion. Mitch is a DZone Zone Leader and has posted 2576 posts at DZone. You can read more from them at their website. View Full User Profile

The Art of Searching

04.25.2012
| 5331 views |
  • submit to reddit
Blur, which is based on Apache Lucene, is a search engine capable of searching billions of records quickly. The underlying data structures and algorithms that make Lucene work are build from simple structures that are built up piece-by-piece to enable more sophisticated functionality such as comparing documents through cosine similarity. In this talk, we'll start with a simple search example and build up to a vector space model and explain some of the underlying math needed normalize the weights used to make fair comparisons among documents.