Fascinated by the "craft" of software development, Eric Pugh has been healthily involved in the open source world as a developer, comitter, and user for the past five years. He is a member of the Apache Software Foundation, and lately has been mulling over how we move from read/write web to the read/write/share web. In biotech, financial services and defence IT, he has helped European and American companies develop coherent strategies for embracing open source software. As a speaker has has advocated the advantages of Agile practices in software development. Eric became involved in Solr when he submitted the patch SOLR-284 for Parsing Rich Document types such as PDF and MS Office formats that became the single most popular patch as measured by votes! The patch was subsequently cleaned up and enhanced by three other individuals, demonstrating the power of the Free/Open Source Model to build great code collaboratively. SOLR-284 was eventually refactored into Solr Cell as part of Solr version 1.4 Eric co-authored "Solr 1.4 Enterprise Search Server", the first book on Solr. he blogs at http://www.opensourceconnections.com/blog/. Eric is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile
I recently had an IRC conversation about Solr
4.0. The main question that the person who was chatting with me had was
“How far out is the 4.0 release?” The answer, as with almost any open
source project, is “when it’s released.”
Naturally, that answer doesn’t really help get to the crux of what most IT teams who either use or are considering Solr need to figure out, which is whether 4.0 is stable enough to deploy in a live environment.
even in unrelated versions, has historically been pretty stable. So, if
a new version, in this case 4.0, has the functions that you’re looking
for – in this conversation, it was function queries like idf() or
termfreq() – then unless you’re comfortable with compiling a previous
version of Solr and creating your own code on top of it, then you’re probably going to want to go with the latest version.
Of course, this approach does come with risk. I have
only heard of 1 actual “bug” that led to incorrect/wrong results
sneaking into the Solr
code base in an unreleased project, and it was quickly found and fixed.
But, since you’re working on a code base which may change somewhat, if
you are building indexes that you can not easily rebuild, for example,
indexing the Internet and can’t recrawl to generate the data – meaning
is your “system of record”, then be aware that over time the index file
format may change because Lucene is changing under the covers and
periodically there is an email that tells you that you need to rebuild
your indexes. But, if you are basically taking a download of Solr
4.0 as it is today, and then only going to update a) when new killer
awesome feature added or b) when 4.0 comes out, then reindexing
shouldn’t be a problem.
The other aspect of deploying Solr
4.0 is your testing environment. If you have strong system and
functional testing, then you can be fairly sure that things are working
appropriately. If you’re not certain about testing, check out my
presentation on Better Search Engine Testing from this year’s Software Test and Performance Conference.
Published at DZone with permission of Eric Pugh, author and DZone MVB. (source)
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)