Lynda Moulton is a Senior Analyst and Consultant in enterprise search at Outsell's Gilbane Group. She has over 30 years as a professional search architect. She also consults on metadata management and taxonomies for content behind the firewall. Lynda is a DZone MVB and is not an employee of DZone and has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Lucene Open Source Community Commits to a Future in Search

03.21.2011
| 6228 views |
  • submit to reddit

It has been two years since I commented on an article in Information Week, Open Source, Its Time has Come, Nov. 2008. My main point was the need for deep expertise to execute enterprise search really well. I predicted the growth of service companies with that expertise, particularly for open source search. Not long after I announced that, Lucid Imagination was launched, with its focus on building and supporting solutions based on Lucene and, its more turnkey version, Solr.

It has not taken long for Lucid Imagination (LI) to take charge of the Lucene/Solr community of practice (CoP), and to launch its own platform built on Solr, Lucidworks Enterprise. Open source depends on deep and sustained collaboration; LI stepped into the breach to ensure that the hundreds of contributors, users and committers have a forum. I am pretty committed to CoPs myself and know that nurturing a community for the long haul takes dedicated leadership. In this case it is undoubtedly enlightened self-interest that is driving LI. They are poised to become the strongest presence for driving continuous improvements to open source search, with Apache Lucene as the foundation.

Two weeks ago LI hosted Lucene Revolution, the first such conference in the US. It was attended by over 300 in Boston, October 7-8 and I can report that this CoP is vibrant, enthusiastic. Moderated by Steve Arnold, the program ran smoothly and with excellent sessions. Those I attended reflected a respectful exchange of opinions and ideas about tools, methods, practices and priorities. While there were allusions to vigorous debate among committers about priorities for code changes and upgrades, the mood was collaborative in spirit and tinged with humor, always a good way to operate when emotions and convictions are on stage.

From my 12 pages of notes come observations about the three principal categories of sessions:

1. Discussions, debates and show-cases for significant changes or calls for changes to the code
2. Case studies based on enterprise search applications and experiences
3. Case studies based on the use of Lucene and Solr embedded in commercial applications

Since the first category was more technical in nature, I leave the reader with my simplistic conclusions: core Apache Lucene and Solr will continue to evolve in a robust and aggressive progression. There are sufficient committers to make a serious contribution. Many who have decades of search experience are driving the charge and they have cut their teeth on the more difficult problems of implementing enterprise solutions. In announcing Lucidworks Enterprise, LI is clearly bidding to become a new force in the enterprise search market.

New and sustained build-outs of Lucene/Solr will be challenged by developers with ideas for diverging architectures, or "forking" code, on which Eric Gries, LI CEO, commented in the final panel. He predicted that forking will probably be driven by the need to solve specific search problems that current code does not accommodate. This will probably be more of a challenge for the spinoffs than the core Lucene developers, and the difficulty of sustaining separate versions will ultimately fail.

Enterprise search cases reflected those for whom commercial turnkey applications will not or cannot easily be selected; for them open source will make sense. Coming from LI's counterpart in the Linux world, Red Hat, are these earlier observations about why enterprises should seek to embrace open source solutions, in short the sorry state of quality assurance and code control in commercial products. Add to that the cost of services to install, implement and customize commercial search products. The argument would be to go with open source for many institutions when there is an imperative or call for major customization.

This appears to be the case for two types of enterprises that were featured on the program: educational institutions and government agencies. Both have procurement issues when it comes to making large capital expenditures. For them it is easier to begin with something free, like open source software, then make incremental improvements and customize over time. Labor and services are cost variables that can be distributed more creatively using multiple funding options. Featured on the program were the Smithsonian, Adhere Solutions doing systems integration work for a number of government agencies, MITRE (a federally funded research laboratory), U. of Michigan, and Yale. CISCO also presented, a noteworthy commercial enterprise putting Lucene/Solr to work.

The third category of presenters was, by far, the largest contingent of open source search adopters, producers of applications that leverage Lucene and Solr (and other open source software) into their offerings. They are solidly entrenched because they are diligent committers, and share in this community of like-minded practitioners who serve as an extended enterprise of technical resources that keeps their overhead low. I can imagine the attractiveness of a lean business that can run with an open source foundation, and operates in a highly agile mode. This must be enticing and exciting for developers who wilt at the idea of working in a constrained environment with layers of management and political maneuvering.

Among the companies building applications on Lucene that presented were: Access Innovations, Twitter, LinkedIn, Acquia, RivetLogic and Salesforce.com. These stand out as relatively mature adopters with traction in the marketplace. There were also companies present that contribute their value through Lucene/Solr partnerships in which their products or tools are complementary including: Basis Technology, Documill, and Loggly.

Links to presentations by organizations mentioned above will take you to conference highlights. Some will appeal to the technical reader for there was a lot of code sharing and technical tips in the slides. The diversity and scale of applications that are being supported by Lucene and Solr was impressive. Lucid Imagination and the speakers did a great job of illustrating why and how open source has a serious future in enterprise search. This was a confidence building exercise for the community.

Two sentiments at the end summed it up for me. On the technical front Eric Gries observed that it is usually clear what needs to be core (to the code) and what does not belong. Then there is a lot of gray area, and that will contribute to constant debate in the community. For the user community, Charlie Hull, of flax opined that customers don't care whether (the code) is in the open source core or in the special "secret sauce" application, as long as the product does what they want.

References
Published at DZone with permission of Lynda Moulton, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)