Mitch Pronschinske is the Lead Research Analyst at DZone. Researching and compiling content for DZone's research guides is his primary job. He likes to make his own ringtones, watches cartoons/anime, enjoys card and board games, and plays the accordion. Mitch is a DZone Zone Leader and has posted 2576 posts at DZone. You can read more from them at their website. View Full User Profile

Natural Language Search in Solr

  • submit to reddit

This presentation aims to showcase how to build and implement a search engine which is able to understand a query written in a way much nearer to spoken language than to keyword-based search using Apache Lucene/Solr and Apache UIMA. A system which can recognize semantics in natural language can be very handy for non expert users, e-learning systems, customer care systems, etc. With such a system it's possible to submit queries such as "hotels near Rome" or "people working at Google" without having to manually transform a user entered natural language query to a Lucene/Solr query.

The Solr - UIMA integration (since Solr 3.1.0) can help on building such intelligent systems using NLP / Text mining algorithms on documents being indexed and on queries written by the user.

This module gives Solr the ability of calling UIMA pipelines when documents are indexed to trigger automatic extraction of metadata (i.e. named entities like people, places, organizations, etc.) using existing and custom algorithms as UIMA analysis engines. The talk will cover:

  • The Solr - UIMA integration
  • Introducing UIMA to Lucene's analysis phase
  • Running existing open source NLP algorithms in Lucene/Solr
  • Orchestrating blocks to build a sample system able to understand natural language queries

We'll introduce these points using examples (architectures & code) and a sample demo system.

Download session slides.