Fascinated by the "craft" of software development, Eric Pugh has been healthily involved in the open source world as a developer, comitter, and user for the past five years. He is a member of the Apache Software Foundation, and lately has been mulling over how we move from read/write web to the read/write/share web. In biotech, financial services and defence IT, he has helped European and American companies develop coherent strategies for embracing open source software. As a speaker has has advocated the advantages of Agile practices in software development. Eric became involved in Solr when he submitted the patch SOLR-284 for Parsing Rich Document types such as PDF and MS Office formats that became the single most popular patch as measured by votes! The patch was subsequently cleaned up and enhanced by three other individuals, demonstrating the power of the Free/Open Source Model to build great code collaboratively. SOLR-284 was eventually refactored into Solr Cell as part of Solr version 1.4 Eric co-authored "Solr 1.4 Enterprise Search Server", the first book on Solr. he blogs at http://www.opensourceconnections.com/blog/. Eric is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

Should I Deploy Solr 4.0?

04.28.2011
| 14877 views |
  • submit to reddit
I recently had an IRC conversation about Solr 4.0. The main question that the person who was chatting with me had was “How far out is the 4.0 release?” The answer, as with almost any open source project, is “when it’s released.”

Naturally, that answer doesn’t really help get to the crux of what most IT teams who either use or are considering Solr need to figure out, which is whether 4.0 is stable enough to deploy in a live environment.

Solr, even in unrelated versions, has historically been pretty stable. So, if a new version, in this case 4.0, has the functions that you’re looking for – in this conversation, it was function queries like idf() or termfreq() – then unless you’re comfortable with compiling a previous version of Solr and creating your own code on top of it, then you’re probably going to want to go with the latest version.

Of course, this approach does come with risk. I have only heard of 1 actual “bug” that led to incorrect/wrong results sneaking into the Solr code base in an unreleased project, and it was quickly found and fixed. But, since you’re working on a code base which may change somewhat, if you are building indexes that you can not easily rebuild, for example, indexing the Internet and can’t recrawl to generate the data – meaning if Solr is your “system of record”, then be aware that over time the index file format may change because Lucene is changing under the covers and periodically there is an email that tells you that you need to rebuild your indexes. But, if you are basically taking a download of Solr 4.0 as it is today, and then only going to update a) when new killer awesome feature added or b) when 4.0 comes out, then reindexing shouldn’t be a problem.

The other aspect of deploying Solr 4.0 is your testing environment. If you have strong system and functional testing, then you can be fairly sure that things are working appropriately. If you’re not certain about testing, check out my presentation on Better Search Engine Testing from this year’s Software Test and Performance Conference.
References
Published at DZone with permission of Eric Pugh, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Rogério Araújo replied on Thu, 2011/04/28 - 10:36am

Is our long discussed field collapsing feature will be in Solr 4.0?

Eric Pugh replied on Thu, 2011/04/28 - 12:45pm in response to: Rogério Araújo

Yup! And it's one of the cool features that has a lot of people using 4.0 today!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.