A Peruvian Analyst, Designer and Java Programmer that enjoys working with frameworks like Spring, Hibernate and others, Playing with Java and compositing Music, some interests are Playing bass and guitar, listening to music (Pink Floyd, Rick Wakeman), reading the Bible and normally buying and reading Java books, here my Blog http://manueljordan.wordpress.com/ Manuel has posted 31 posts at DZone. View Full User Profile

Lucene in Action, Second Edition: Covers Apache Lucene 3.0

08.21.2011
| 5324 views |
  • submit to reddit
Published by: Manning Publications Co.
ISBN: 1933988177

Reviewer Ratings

Relevance:
5

Readability:
4

Overall:
5

Buy it now

One Minute Bottom Line

These day,s searching performance is crucial when we work with the huge amount of data available in many enterprise businesses. It's hard work, but Lucene is here to help us. This is an interesting book, and sometimes you need read a section several times to understand a complex topic.

Review

You have more than 450 pages available to learn Lucene in this book.

Each chapter includes many, many sections! Since this book includes a lot of sections for each chapter, it has two side effects, in the bottom of this review I expand upon my opinion about this. Many of these sections are based on images of source code and explanation.  Some of them are long and others are practically concrete, therefore you'll find this pattern or approach many times below

Part I: Core Lucene

Chapter 01 Meet Lucene

A solid chapter, introducing about the information explosion for these days and then introducing Lucene, explaining what it is and what it can do, even including the history about its creation. A valuable image about many components is involved for the search application, and an even more, long and important explanation for these components is available too.

A sample application with its respective explanation, instructions and result output are shown too. A thorough and excellent explanation for core indexing classes and core searching classes is available too.

Chapter 02 Building a search index

Starting the indexing process with important material, a sample source code is included with its respective explanation, delete and update methods API are introduced and explained.

Field options are well covered and the rest of the chapter is long and focused on theory covering Boosting documents and fields, indexing numbers, dates, times and concurrency, thread safety and locking issues.

Chapter 03: Adding search to your application

After of a concrete introduction to the searching API a short sample source code for a TermQuery is introduced and explained, same appreciation for QueryParser even including an image to represent its work.

Coverage of IndexSearcher is available too, including almost a page of source code about Near-real-time search with its respective explanation; very interesting. The same goes for Lucene scoring.

Another section available is Lucene's diverse queries, where topics such as TermQuery, TermRangeQuery, NumericRangeQuery, PrefixQuery, BooleanQuery, PhraseQuery, WildcardQuery and FuzzyQuery are available through a good amount of pages including important source code with its respective explanation, realize yourself Lucene offer a good support about Query

Chapter 04: Lucene's analysis process

Starting with an image and explaining about Analysis process during indexing, following with What is inside an analyzer, where important terms like token and token stream are explained with valuable theory and important images for a better explanation.

Among other sections, Synonyms and aliases are covered, important and valuable source code with its respective explanation is available.

Something very crucial is the section Languages analysis issues, well covered.

Chapter 05: Advanced search techniques

A long chapter, after a concrete introduction to Lucene's field cache, we also have an important section covering Sorting search results. Many sorting options are represented through source code with its respective explanation and output result.

The same attention is given to Span queries about its long covering and variations, where each variation includes an image for better understanding. Again, the same approach is used for Filtering a search - well covered.

Chapter 06: Extending search

Starting quickly with a situation about a geographic sorting covered practically with three pages of source code and its respective explanation. The same attention is given to custom Collector - two approaches are used.

A deeper and longer section about QueryParser is available, with many samples and source code with its respective explanation, covered in five pages, including a table about its extensibiltiy points. The same approach is used for Filters and Payloads.

Really an interesting chapter with a lot of source code available.

Part II: Applied Lucene

Chapter 07: Extracting text with Tika

Starting quickly with an introduction to Tika, including a table of two pages about documents format supported to parse, explanation about its API and how install it. Extract text programmatically is covered with two pages of source code with its respective explanation. Not everything is perfect, limitations about Tika are covered too.

To complete the chapter, material covering indexing custom XML is available, working with SAX and Apache Commons Digester, each one includes its own sample source code with its respective explanation.

Chapter 08: Essential Lucene extensions

This chapter is based closely on Luke, with many images about its environment and explanation of its features. I mean images about, tabs overviews, Documents tabs, search for QueryParser, Files support, etc.

Something important and valuable is a table available in two pages about APIs for Analyzers, Tokenizers and TokenFilters. A very interesting table.

An important section is Highlighting query terms, an image about the flow process and the classes and interfaces involved is shown. Explanation for each component is included. Sample source code to work with and apply highlighting is available, even working with CSS.

How to work with Spell checking is covered through source code with its respective explanation. Something valuable is practically a page about ideas to improve spell checking.

To complete the chapter, many Query extensions are introduced like MoreLikeThis, RegexQuery and more.

Chapter 09: Further Lucene extensions

Starting quickly covering Chaining Filters based practically in three pages of source code with its respective explanation. An interesting section based on the same approach is about Storing an index in Berkley DB.

An interesting section is about Synonyms working with WordNet. How to build an index and how to work with an analyzer is well covered through images, source code and explanation, the images are a good complement.

A valuable section is about the XML QueryParser, where an interesting image about the three common options for building a Lucene app from a search UI is available.  Valuable source code and explanation detailed for a .xsd code is included too.

Spatial Lucene is included too, with important images about Globe, Tiers and Grid Boxes.  Of course, respective source code with its respective explanation is introduced covering important topics such as searching and perfomance.

To complete the chapter a well-covered section is Searching multiple Indexes remotely, explained with an image and source code with a concrete explanation.

Chapter 10: Using Lucene from other programming languages

An interesting chapter for our consideration is based on many samples of source code about how you can work with Lucene and programming languages including:

  • CLucene (C++)
  • Lucene.Net (C#)
  • KinoSearch and Lucy (Perl)
  • Ferret (Ruby)
  • PyLucene (Python)

Chapter 11: Lucene administration and performance tuning

This chapter is not very long nor very short. It includes concrete, valuable theory and explanation about source code, covering among many things topics like: Tuning, Threads, managing disk memory usage, index.

Some images about some flow process and output perfomance are available to complement the long theory offered for many sections covered

Part III: Case studies

Practically we have three very interesting chapters, these have common feautres like considerable theory and explanation about each situation or case, some snippet code to complement some ideas, valuable images about some simple and complex processes, some view or output results and finally some JMX configurations.

These three finals chapters are:

  • Chapter 12: Case study 1: Krugle
  • Chapter 13: Case study 2: SIREn
  • Chapter 14: Case study 3: LinkedIn

What I liked:

  • A lot of practical sections in each chapter.
  • A lot of theory available.
  • Many complementary tables.
  • Valuable images to understand complex processes and functions are available

What I disliked:

  • Some sections are only based in theory, therefore you only get the idea but not the action.
  • You need to read some sections many times due to the theoretical parts - a lot of topics to learn
  • Many times I wanted a deeper explanation of the source code.

I hope we don't see strong API changes with the Apache Lucene 3.3.0 against the actual version covered in this book, which is 3.0.

The same and others reviews can be read it on my Blog here

Published at DZone with permission of its author, Manuel Jordan.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)