Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2573 posts at DZone. You can read more from them at their website. View Full User Profile

The Art of Searching

  • submit to reddit
Blur, which is based on Apache Lucene, is a search engine capable of searching billions of records quickly. The underlying data structures and algorithms that make Lucene work are build from simple structures that are built up piece-by-piece to enable more sophisticated functionality such as comparing documents through cosine similarity. In this talk, we'll start with a simple search example and build up to a vector space model and explain some of the underlying math needed normalize the weights used to make fair comparisons among documents.