Tony Russell-Rose is director of UXLabs, a UX research and design consultancy specializing in complex search and information access applications. Previously Tony has led R&D teams at Canon, Reuters, HP Labs and BT Labs, and seems happy to work for pretty much any organization that has 'Labs' in the title. He has a PhD in Artificial Intelligence and is author of Designing the Search Experience (Morgan Kaufmann, 2012). Tony is a DZone MVB and is not an employee of DZone and has posted 26 posts at DZone. You can read more from them at their website. View Full User Profile

How do You Measure the Impact of Tagging on Search Retrieval?

05.30.2012
| 4052 views |
  • submit to reddit

A client of mine wants to measure the difference between manual tagging and auto-classification on unstructured documents, focusing in particular on its impact on retrieval (i.e. relevance ranking).  At the moment they are considering two contrasting approaches:

  1. Create a list of all the insertions and deletions (i.e. instances where the auto and manual tags differ for a given document), and sort by frequency. Take those that appear more than given number of times (say 20), and count how often they appear as search terms in the top 1000 queries for the past 6 months. Include exact matches (where a tag and a query term are identical), and partial matches (where a tag is wholly included in a query), but exclude everything else. For tags that don’t appear in the top 1000, assume a notional frequency of say 70. Then divide the figure you get by the total number of queries over the past 6 months. This gives you a measure of how important those insertions and deletions are, and thus the impact of manual tagging on retrieval.

  2. Run a controlled experiment in which the tagging condition is the independent variable and the relevance ranking is the dependent variable. Use a benchmark set of queries and relevance judgements, and calculate precision and recall.

Surprisingly (to me, at least) there seems to be some debate as to which is the best approach.

Which one would you choose, and why?

Published at DZone with permission of Tony Russell-rose, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Daniel Slazer replied on Tue, 2012/06/12 - 12:19pm

It is also a very basic Google to get the answer in only a few seconds. The next time you get an error, paste it into google with the parts specific to you (like a the name of a class you created) removed from it. In this case you could have literally used "java.sql.SQLException: SQL string is not Query".

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.