SQL Zone is brought to you in partnership with:

Grant Ingersoll is a committer on the Apache Lucene and Apache Solr projects, as well as the current Lucene PMC chair. He is also a founding team member of Lucid Imagination. Grant has posted 11 posts at DZone. You can read more from them at their website. View Full User Profile

NoSQL, Lucene and Solr

03.31.2011
| 12481 views |
  • submit to reddit

The other day, Michael Coté asked me where Apache Lucene and Solr fit in with the NoSQL movement (having heard about the Guardian’s use of Solr), to which I replied:  I haven’t used SQL in any significant way since I started using Lucene in 2004 (and I started my career doing Oracle DBA work, etc. way back when.)  We just didn’t have a fun name for it “back in the day”.

All kidding aside and at the risk of jumping on the buzzword bandwagon, let’s take a look at Wikipedia’s definition of NoSQL:

NoSQL (Not only SQL) is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally.


Now let’s apply that definition to Lucene (we’ll get to Solr in a moment):

  1. NoSQL – Check.  Ironically, many people have also layered SQL on top of Lucene as well.  Guess we should also argue for inclusion in the No-NoSQL (as Coté suggests) movement too!
  2. Loosely defined class of non-relational data stores that break w/ long history of relational dbs: Check.  Once again, ironically, Lucene covers both sides of the aisle here.
  3. No fixed schemas: Been there, done that, bought the t-shirt.  Once again, Lucene also supports fixed schemas as well.
  4. Avoid joins:  Check.  Denormalization frees your mind.  (at least in many cases).  You can do joins in Lucene with some work.
  5. Scales horizontally:  Yes and no.  Let’s be honest, Lucene scales quite well, but you’re going to have to do some work to make it so.  Enter Solr.


Since Solr is “Lucene Best Practices” all wrapped up in an easy to use server, it covers items 1-4 no problem.  And, get this, it also scales horizontally in terms of both data size and query volume.  Plus, with the recent addition of Apache ZooKeeper to Solr (aka Solr Cloud), scaling has never been easier.  At the end of the day, you get all the benefits of NoSQL (under an eventually consistent model) plus you get built in things like free text search, faceting, spell checking, similar item search, hit highlighting and a whole host of other things that have been proven out in thousands of installations around the world.

NoSQL never looked so good.


Source

Published at DZone with permission of its author, Grant Ingersoll.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)