Yonik Seeley is the creator of Apache Solr and the Chief Open Source Architect and Co-Founder at Lucid Imagination, a company dedicated to development and support of Lucene/Solr. He's also an Apache Lucene/Solr PMC member and committer. Yonik has posted 10 posts at DZone. You can read more from them at their website. View Full User Profile

Yonik Seeley's Solr 4 Preview: SolrCloud, NoSQL, and More

05.23.2012
| 6781 views |
  • submit to reddit

The first alpha release of Solr 4 is quickly approaching, bringing powerful new features to enhance existing Solr powered applications, as well as enabling new applications by further blurring the lines between full-text search and NoSQL.

The largest set of features goes by the development code-name “Solr Cloud” and involves bringing easy scalability to Solr.  Distributed indexing with no single points of failure has been designed from the ground up for near real-time (NRT), and NoSQL features such as realtime-get, optimistic locking, and durable updates.

We’ve incorporated Apache ZooKeeper, the rock-solid distributed coordination project that is immune to issues like split-brain syndrome that tend to plague other hand-rolled solutions. ZooKeeper holds the Solr configuration, and contains the cluster meta-data such as hosts, collections, shards, and replicas, which are core to providing an elastic search capability.

When a new node is brought up, it will automatically be assigned a role such as becoming an additional replica for a shard. A bounced node can do a quick “peer sync” by exchanging updates with its peers in order to bring itself back up to date. New nodes, or those that have been down too long, recover by replicating the whole index of a peer while concurrently buffering any new updates.

An update can be sent to any node in the cluster, and it’s automatically forwarded to the correct node and immediately replicated to a number of other nodes to enable fault tolerance, high availability, and query scalability. Likewise, queries may be sent to any node in a cluster and they will automatically be routed to the correct nodes and load balanced across replicas.  This single-document push model of replication fits in well with the near real-time support that is exposed via Solr’s softCommit to quickly make updates visible to searches.

The SolrCloud wiki page is a good place to start learning more about Solr’s new distributed capabilities.

Solr 4 has more NoSQL features for applications wishing to use it as a primary data store, including

  • Update durability – A transaction log ensures that even uncommitted documents are never lost
  • Real-time Get – The ability to retrieve the latest version of a document, without the need to commit or open a new searcher
  • Versioning and Optimistic Locking – combined with real-time get, this allows read-update-write functionality that ensures no conflicting changes were made concurrently by other clients.


There are many other features coming in Solr 4, such as

  • Pivot Faceting – Multi-level or hierarchical faceting where the top constraints for one field are found for each top constraint of a different field.
  • Pseudo-fields – The ability to alias fields, or to add metadata along with returned documents, such as function query values and results of spatial distance calculations.
  • A spell checker implementation that can work directly from the main index instead of creating a sidecar index.
  • Pseudo-Join functionality – The ability to select a set of documents based on their relationship to a second set of documents.
  • Function query enhancements including conditional function queries and relevancy functions.
  • New update processors to facilitate modifying documents prior to indexing


We’re not done yet! There are other features already on the drawing board for Solr 4.x, including

  • Update-able documents – the ability to add fields to an existing document without having to send in the complete document again.
  • More dynamic schema, including the ability to dynamically add new fields on the fly
  • Enhanced elasticity – the ability to split existing shards in a cluster
  • Rack awareness
  • Index and shard aliases

Although the list of improvements in Solr 4 is too long to describe all of them here,
I’ll leave you with some parting screenshots of the new admin pages.

 

Published at DZone with permission of its author, Yonik Seeley. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Herry Johnson replied on Tue, 2012/06/12 - 1:05pm

So when I said classpath I actually meant LD_LIBRARY_PATH. Is this the same as the java.library.path you mentioned? My eclipse is own a redhat linux environment. I'm getting the error when I'm running a JUNIT test. I've also tried creating an LD_LIB... variable on my run configuration as well as adding it to the run configuration arguments with the -Djava.library.path. None of these options worked. Any other ideas on what the issue might be. I also tried a System.load with the .so file but that gave a different error message java.lang.nosuchmethod found.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.