The Art of Searching

Blur, which is based on Apache Lucene, is a search engine capable of searching billions of records quickly. The underlying data structures and algorithms that make Lucene work are build from simple structures that are built up piece-by-piece to enable more sophisticated functionality such as comparing documents through cosine similarity. In this talk, we'll start with a simple search example and build up to a vector space model and explain some of the underlying math needed normalize the weights used to make fair comparisons among documents.