NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Summarize Opinions With a Graph – Part 1

08.13.2012
| 4292 views |
  • submit to reddit

How does the saying go? Opinions are like bellybuttons, everybody’s got one? So let’s say you have an opinion that NOSQL is not for you. Maybe you read my blog and think this Graph Database stuff is great for recommendation engines and path finding and maybe some other stuff, but you got really hard problems and it can’t help you.

I am going to try to show you that a graph database can help you solve your really hard problems if you can frame your problem in terms of a graph. Did I say “you”? I meant anybody, especially Ph.D. students. One trick is to search for “graph-based approach to” and your problem.

I’ll give you an example. The other day I ran into “Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions,” by Kavita Ganesan, ChengXiang Zhai and Jiawei Han at the University of Illinois at Urbana-Champaign. Here is the abstract:

We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions. Evaluation results on summarizing user reviews show that Opinosis summaries have better agreement with human summaries compared to the baseline extractive method. The summaries are readable, reasonably well-formed and are informative enough to convey the major opinions.

What does that mean? It means Opinosis takes the free form text people write in reviews, aggregates it, and makes something useful out of it; so I can look at one sentence and not 1000 when looking for details about a review.

How is this useful? Most companies want to know what their customers are saying about them, but nobody has time to read 1000 responses to that customer survey. So generate a summary instead. Ebay feedback? Twitter posts about a specific hashtag? Text of support e-mails? You get the picture.

Let’s dive into what this means by an example that everyone is familiar with: e-commerce.

You can see the 1 to 5 star ratings and you already know how to build a recommendation algorithm out of this. We also know how to predict what the star rating of the user will be using personalization, but we want to ask a different question. Can we summarize what people are saying about this product? We want to do this because all our competitors are also giving items 1-5 star ratings, and they are also telling you what rating they think you’ll give this item. But it’s not enough. We turned to graph databases to get that little bit extra. That feature none of our competitors are offerin— that secret sauce, that edge.

We are going to take the things people are saying about the products we sell and generate a graph out of them, find the paths most traveled, and combine them to build our summary. An illustration might help:

Today we are just going to look at Step 1. Our input is going to be these two sentences:

My phone calls drop frequently with the iPhone.

Great device, but the calls drop too frequently.

With these, we can generate the following graph:

One interesting property about this graph is that it naturally captures redundancies. The paths shared by two sentences are captured by the nodes, and this sharing what allows us to have high confidence in the summaries we build.

Another property the graph has is that it can handle gaps between words, which helps us see the redundancy and allows us to discover new sentences.

A third interesting property about this graph is that it allows us to join similar sentences together:

Think about how these properties are going to help us build a summary that represents what our users are saying, and we’ll tackle building the graph in part 2.

If you want to take a sneak peek, take a look at the Opinosis presentation. It goes over each step in depth. You can find out more on Kavita’s website.

Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: