Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2569 posts at DZone. You can read more from them at their website. View Full User Profile

Inside the New Apache Solr

11.16.2009
| 11971 views |
  • submit to reddit
In 2006, Solr was donated to the Apache Foundation and integrated into the Lucene project.  Apache Solr is an enterprise search platform that powers the search and navigation features of many of the world's largest internet sites.  It harnesses the popular Apache Lucene Java search library, which has over 3,000 installations.  With the recent release of Solr 1.4, DZone conducted an exclusive interview with Grant Ingersoll, a committer on the Apache Lucene and Apache Solr projects, as well as the current Lucene PMC chair.

"Solr is Lucene best practices plus a whole bunch of production-ready capabilities," said Ingersoll.  "Solr takes Lucene and packages it up as an HTTP server."  Ingersoll says that Solr runs as a web application inside a servlet container such as Tomcat or Jetty, providing the functionality of Lucene as well as other search capabilities such as faceting.  The Lucene functionalities handle distributed search capabilities along with replication for failover and load balancing.  Solr also provides easy configuration through XML.  

DZone asked Ingersoll where Solr is being used.  He listed some major websites such as AP interactive, Netflix.com, Comcast.com, and Zappos.com, but admitted that there were too many to name.  CNET also uses Solr because they were the ones who originally donated the software to Apache.  Webshots, product reviews, and other database documents on CNET are indexed in Solr and scored based on importance parameters.  Ingersoll says the great thing about Solr is that it works almost anywhere.  "You can talk to Solr via any client that supports HTTP," he said.  

The 1.4 version of Solr is the project's latest release Ingersoll says.  It includes many bug fixes and significant performance improvements for indexing, searching, and faceting.  New rich document (Word, PDF, HTML) processing via Apache Tika can extract content, index it, and make it searchable.  There's better support for numeric range queries that can now find date ranges faster.  Ingersoll says Solr 1.4 also features new dynamic faceting capabilities that cluster search results on cluster points.

What makes Solr unique, Ingersoll says, is its open source flexibility.  With a powerful external configuration, Solr can be tailored to almost any type of application without Java coding, and it has a flexible plugin architecture for even more customization.  In contrast to Solr's open API, Ingersoll said commercial APIs "are usually black boxes and you don't have access to the lower level details."  With commercial products, you have to pay for a certain number of documents to be indexed or queries per second.  Solr, like all Apache projects, is free.  "You can also scale Solr to be as big as you want," adds Ingersoll.  

In the next version, Solr 1.5, Ingersoll says we can expect more distributed capabilities and more support for geographic searches.  The beauty of open source, Ingersoll adds, is that "good ideas come out of the blue all the time."

Comments

Dizzle Sizzle replied on Tue, 2009/11/17 - 9:33am

Actually, it would seem Solr was donated in 2006 (by CNet), 5 years is a long time in Softwareland! http://en.wikipedia.org/wiki/Solr

Mitch Pronschinske replied on Thu, 2009/11/19 - 1:19am in response to: Dizzle Sizzle

My mistake.  It was actually Lucene that was donated in 2001.  The post has been updated.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.