This presentation aims to showcase how to build and implement a search engine which is able to understand a query written in a way much nearer to spoken language than to keyword-based search using Apache Lucene/Solr and Apache UIMA. A system which can recognize semantics in natural language can be very handy for non expert users, e-learning systems, customer care systems, etc. With such a system it's possible to submit queries such as "hotels near Rome" or "people working at Google" without having to manually transform a user entered natural language query to a Lucene/Solr query.
The Solr - UIMA integration (since Solr 3.1.0) can help on building such intelligent systems using NLP / Text mining algorithms on documents being indexed and on queries written by the user.
This module gives Solr the ability of calling UIMA pipelines when
documents are indexed to trigger automatic extraction of metadata (i.e.
named entities like people, places, organizations, etc.) using existing
and custom algorithms as UIMA analysis engines. The talk will cover:
- The Solr - UIMA integration
- Introducing UIMA to Lucene's analysis phase
- Running existing open source NLP algorithms in Lucene/Solr
- Orchestrating blocks to build a sample system able to understand natural language queries
We'll introduce these points using examples (architectures & code) and a sample demo system.