Chris Hostetter is Senior Staff Engineer at Lucid Imagination, a member of the Apache Software Foundation, and serves as a committer on the Apache Lucene/Solr Projects. Prior to joining Lucid Imagination in 2010 to work full time on Solr development, he spent 11 years as a Principal Software Engineer for CNET Networks thinking about searching “structured data” that was never as structured as it should have been. Chris has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Solr Powered ISFDB – Part #9: Autocomplete

11.26.2011
| 3483 views |
  • submit to reddit
This is Part 9 in a series of 11 (so far) articles by Chris Hostetter in 2011 on Indexing and Searching the ISFDB.org data using Solr.

When we left last time, I had upgraded the version of Solr I was using from 1.4.1, to the newly released 3.1. Today I wanted to make some improvements to the functionality of my Velocity UI, by adding in Autocomplete support for things the user types into the search box.

(If you are interested in following along at home, you can checkout the code from github. I’m starting at the blog_8 tag, and as the article progresses I’ll link to specific commits where I changed things, leading up to the blog_9 tag containing the end result of this article.)

Getting Started: Borrowing Code

One of the nice additions to the example Velocity templates in Solr 3.1, is the usage of the jQuery Autocomplete Plugin. So the first step I’m going to take in adding this functionality to my own templates (which, as you may recall, we’re copied from 3.1 in the first place) is to look at how the functionality is hooked in there, and reuse the same ideas.

As little as i understand about Velocity templates or javascript, I do know how to use “grep” and it looks like the crux of the functionality seems to come from two main pieces…

  • head.vm includes the jQuery autocomplete files, and then registers an “autocomplete” callback function with jQuery that seems to be hitting the “/terms” URL using the “suggest” template
  • suggest.vm is a simple template that looks like it just outputs a plain text list of the terms

Since my Velocity is rustier then my javascript, that last item is the most confusing to me — but skimming the jQuery autocomplete() docs that does in fact seem to be the format expected, so I’ll roll with it. All in all this seems like it will be fairly straightforward.

In fact, I apparently never removed the autocomplete hooks in head.vm and suggest.vm back when i first “borrowed” the 3.1 templates — so really the question isn’t how to make it work, but why isn’t it already working? The answer seems to be the “/terms” path. Even though I reused most of the velocity templates, I created a much simpler solrconfig.xml file for myself, So I need to add that request handler in using the example configs as my template, and tweak the terms.fl in my head.vm to better match my schema.

So Why Isn’t It Working?

After making these changes, I can now see “successful” requests being made to the “/terms” component in my Solr logs when I start typing in my search box…

[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302294601828&terms.fl=catchall&q=a&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=A} status=0 QTime=0
[java] Apr 8, 2011 1:30:02 PM org.apache.solr.core.SolrCore execute
[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302294602831&terms.fl=catchall&q=as&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=As} status=0 QTime=1
[java] Apr 8, 2011 1:30:03 PM org.apache.solr.core.SolrCore execute
[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302294603672&terms.fl=catchall&q=asi&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asi} status=0 QTime=1
[java] Apr 8, 2011 1:30:04 PM org.apache.solr.core.SolrCore execute
[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302294604604&terms.fl=catchall&q=asim&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asim} status=0 QTime=1
[java] Apr 8, 2011 1:30:05 PM org.apache.solr.core.SolrCore execute
[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302294605490&terms.fl=catchall&q=asimo&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asimo} status=0 QTime=0
[java] Apr 8, 2011 1:30:06 PM org.apache.solr.core.SolrCore execute
[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302294606678&terms.fl=catchall&q=asimov&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asimov} status=0 QTime=0

…and yet in spite of this, I’m not getting any autocomplete suggestions. So what’s going wrong? More importantly, how can i tell what’s going wrong?

I’m going to start with the assumption that every piece of the system is doing it’s job properly according to how they are configured, and that I screwed something up in the setup/configuration. (I find that in life in general, when something goes wrong, it’s a good idea to assume it’s my fault until i can prove otherwise). So to start with, let’s see what some of these “/terms” requests are producing. When i load http://localhost:8983/solr/terms?limit=10&timestamp=1302294432177&terms.fl=catchall&q=asi&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asi in my browser, the first and most obvious thing that jumps out at me is that it’s totally blank. This re-affirms the belief that jQuery isn’t broken (how can it give suggestions if there’s no data) which means we’ve already narrowed the problem space down considerably.

The next steps are to eliminate some more pieces of the puzzle and/or gather more data. By eliminating “suggest.vm” from the equation, i should be able to do both: http://localhost:8983/solr/terms?limit=10&timestamp=1302294432177&terms.fl=catchall&q=asi&terms.sort=count&terms.prefix=Asi gives me back the following response…

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
</lst>
<lst name="terms">
  <lst name="catchall"/>
</lst>
</response>

So far so good, I’ve now (mostly) ruled out Velocity (and my suggest.vm template) as cause of the problem, but I’ve now also noticed something about my request that I didn’t notice before: ...&q=asi&...&terms.prefix=Asi. jQuery is sending a lowercase version of my input in the “q” param (that appears to be it’s default behavior) but it’s sending the original case as the “terms.prefix” param (thinking back to my head.vm changes — that’s something explicitly being requested as part of the “extraParams”. In my schema.xml, “catchall” uses the LowerCaseFilterFactory which means there are no indexed terms in that field that contain uppercase characters.

There may be a way to ask jQuery to pass the same lowercase value it uses in the “q” param by default to the “terms.prefix” param, but since it wasn’t immediately obvious to me, I went with something i was a little more confident of and just did it myself using javascript.

So Why Is It Still Not Working?

Now when I enter “Asi” in the search box, I see the lowercase values showing up in my logs…

[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302296185396&terms.fl=catchall&q=a&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=a} status=0 QTime=115
[java] Apr 8, 2011 1:56:27 PM org.apache.solr.core.SolrCore execute
[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302296187320&terms.fl=catchall&q=as&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=as} status=0 QTime=3
[java] Apr 8, 2011 1:56:27 PM org.apache.solr.core.SolrCore execute
[java] INFO: [] webapp=/solr path=/terms params={limit=10&timestamp=1302296187960&terms.fl=catchall&q=asi&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=asi} status=0 QTime=2

…but there are still no suggestions. Tracing the same steps I used before i see that with the “suggest.vm” template I’m still getting a blank response, but when I just look at the raw XML output I see…

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
</lst>
<lst name="terms">
  <lst name="catchall">
    <int name="asimov">2820</int>
    <int name="asimov's">2256</int>
    <int name="asire">32</int>
    <int name="aside">16</int>
    <int name="asia">15</int>
    <int name="asimovs">9</int>
    <int name="asian">6</int>
    <int name="asiatic">5</int>
    <int name="asim">3</int>
    <int name="asis">2</int>
  </lst>
</lst>
</response>

So let’s take another look at that suggest.vm velocity template. It didn’t really occur to me before, but it’s referring to $response.response.terms.name — I thought that “name” was generic, but now I’m guessing it was actually in reference to the fact that the example templates were using the “name” field for autocomplete. So I need to change that to “catchall” and now my jQuery autocomplete is working as expected.

Great! Now What?

I’ve now got functional autocomplete working in my index, just like the example Solr velocity templates, but there are some things I don’t like about this setup that i want to fix…

  • I need to use a better field for picking suggestions. “catchall” was a choice I made on a whim because I knew it would contain both words from titles as well as words from author names, and I wanted both to work in autocomplete — but the “catchall” field also contains a lot of other crap that probably won’t be useful (ie: right now if you type “h” it suggests “http” because URLs are copied into “catchall”)
  • The configs for autocomplete are spread around in too many files — head.vm, suggest.vm, and solrconfig.xml all needed changed to make this work, and will likely all need changed to switch the field as well. It would be nice to consolidate and simplify this.
  • TermsComponent is a nice simple way to get autocomplete suggestions based on prefix matching — but new in Solr 3.1 is the Suggester plugin which is the New Hotness for how to do autocomplete (and spelling suggestions)

So I’m going to set out to make some improvements to what I’ve got, starting with the switch to using “Suggester”, but with an eye to the other two problems as I go along.

Suggester

The Suggester class is really just a new type of “dictionary” implementation for the SpellCheckComponent that has some nice properties (so I’m told) for generating autocomplete suggestions. Based on the wiki, I added a new “/suggest” request handler to my configs, that uses the SpellCheckComponent with a Suggester based on my catchall field. The one change I made to the example was to specify a threshold of “0.0″, meaning that (for now) I want all terms in my field to be used in the dictionary.

Unlike the TermsComponent, which scans the terms in the main index for terms, the SpellChecker uses it’s own data structures that must be explicitly built up from the source data in the index. The configuration I used includes an instruction to “buildOnCommit” so anytime I update the index it will also be updated, however according to the docs, the “Lookup” implementations used by Suggester don’t persist any data to disk, so when you first startup the server there won’t be any suggestions by default. So I also added a “firstSearcher” listener to ensure the Suggester dictionary would be built in this case.

So with that, my “/suggest” handler is always up and running and ready to go, but it’s not currently giving back very useful results — most likely because of my field choice.

Better Input, Better Output

The main things I want to autocomplete on are author names and titles, So the first step is to create a new field for that purpose using copyField. Doing that gets me some nice looking results for input like isaac asim

<lst name="spellcheck">
  <lst name="suggestions">
    <lst name="isaac">
      <int name="numFound">5</int>
      <int name="startOffset">0</int>
      <int name="endOffset">5</int>
      <arr name="suggestion">
        <str>isaac</str>
        <str>isaacs</str>
        <str>isaac's</str>
        <str>isaacson</str>
        <str>isaacman</str>
      </arr>
    </lst>
    <lst name="asim">
      <int name="numFound">4</int>
      <int name="startOffset">6</int>
      <int name="endOffset">10</int>
      <arr name="suggestion">
        <str>asimov's</str>
        <str>asimov</str>
        <str>asimovs</str>
        <str>asim</str>
      </arr>
    </lst>
    <str name="collation">isaac asimov's</str>
  </lst>
</lst>

This isn’t in the same format as the TermsComponent, but since we’re going to use a velocity template to reformat it, that won’t hurt anything. The real juicy looking bit is the “collation” value, where the SpellChecker suggests combinations of individual suggestions. By default it only gives you one, but we can increase that to get a nice list of multi-word suggestions in the “collation” section.

Use Our New Suggestions And Clean Up The Configs

To use our new suggestions, we need to get them in the format jQuery expects. At first I wasn’t sure how to make the suggest.vm velocity template return the all the values for the “collation” key in the “suggestions” NamedList, but it only took a little experimentation (and knowledge of the NamedList API) to to get it working. A nice side effect of the change to the Suggester based approach is that it’s no longer necessary to know the field being used for suggestions in the velocity template — so my goal of cleaner configs is already making progress.

Another nice perk of switching to the Suggester is that it uses the “q” param for it’s input, so the “terms.*” params can be removed from our jQuery autocomplete call — but we can also move the “wt” and “v.template” params out of our jQuery call and into our “/suggest” defaults. Giving us a nice clean separation between how we configure our suggestions, and where we use them.

Last, but not least: the default behavior of the jQuery autocomplete is to submit the first suggestion from the list if the user hits “return”, even if the user didn’t select it. I consider that asinine; if the user wants to search for a word that isn’t the first one in the suggestion list, they should be allowed to. fortunately, I noticed in the docs an easy way to change that.

Conclusion (For Now)

And that wraps up this latest installment with the blog_9 tag. The Search UI for our ISFDB Solr Index now has a nicely functioning javascript autocomplete feature, that didn’t really require learning anything about javascript. In my next post, I plan to continue talking about how to improve the user experience — but I’ll switch tacks a bit to talk more about tuning the ranking of results then about the UI itself.


Source: http://www.lucidimagination.com/blog/2011/04/08/solr-powered-isfdb-part-9/
Published at DZone with permission of its author, Chris Hostetter.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Amara Amjad replied on Sun, 2012/03/25 - 1:02am

This article is very helpful. But one thing i cannot achieve are multiple collations.

“By default it only gives you one, but we can increase that to get a nice list of multi-word suggestions in the “collation” section.”

No matter what number i pick for “spellcheck.maxCollations” i get 1 collation.

Did anybody experience the same problem?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.