Big Data/BI Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 57 posts at DZone. You can read more from them at their website. View Full User Profile

The Last Mile: VisualSearch and Neo4j

07.04.2013
| 2417 views |
  • submit to reddit

The “last mile” is a term used in the telecommunications industry that refers to delivering connectivity to the customers that will actually be using the system. In the sense of Graph Databases, it refers to how well the end user can extract value and insight from the graph. We’ve already seen an example of this concept with Graph Search, allowing a user to express their requests in natural language. Today we’ll see another example. We’ll be taking advantage of the features of Neo4j 2.0 to make this work, so be sure to have read the previous post on the matter.

We’re going to be using VisualSearch.js made by Samuel Clay of NewsBlur. VisualSearch.js enhances ordinary search boxes with the ability to autocomplete faceted search queries. It is quite easy to customize and there is an annotated walkthrough of the options available. You can see what it does in the image below, or click it to try their demo.


VisualSearch

We’ve previously prepared a Neo4j 2.0 graph with Actors, Director, Producers, Writers, and Users all connected to Movies.

graph2

The first thing we need to do is find the Facets for visualsearch.js. We don’t want to configure this manually, because that would be painful and our graph may change over time. So instead we’ll use the “list_labels” method to get the Labels of our Graph:

  get '/facets' do
    content_type :json
    cache_control :public, :max_age => 600
    facets = []
    categories = $neo.list_labels    
    categories.each do |cat| 
      get_properties(cat).each do |label|
        facets << {:category => cat, :label => cat + "." + label} 
      end
    end
    facets.to_json
  end

One of the nice things we can do is group properties of a label together, we don’t have a hard schema for what properties are in each Label, but we can query the graph, grab one node and see the properties it has.

    def get_properties(category)
      cypher = "MATCH n:#{category} RETURN n LIMIT 1"
      $neo.execute_query(cypher)["data"].first.first["data"].keys
    end

This will return a JSON array that looks like:

[{"category":"Writer","label":"Writer.born"},
{"category":"Writer","label":"Writer.name"},
{"category":"Actor","label":"Actor.born"},
{"category":"Actor","label":"Actor.name"}
...

We will pass this on to visualsearch.js and have our first drop down working with these grouped label properties.
visual_search_part1

Once a user clicks on one of the properties, we will fill in some of the available options for that property. We can do this with cypher by MATCHing the nodes of the specified Label that have the property we care about and grouping it so we only get the top 25 unique values.

  get '/values/:facet/' do
    content_type :json

    label, key = get_label_and_key(params)
    
    cypher = "MATCH node:#{label} 
              WHERE HAS(node.#{key})
              RETURN node.#{key} AS label, COUNT(*)
              ORDER BY label
              LIMIT 25"
    
    $neo.execute_query(cypher)["data"].collect{|x| x.first.to_s}.compact.flatten.to_json
  end

Now we can see some of the values in our search box. In this example, we are grabbing names of Actors in our graph.

visual_search2

The top 25 items is nice, but what if we’re looking for an Actor whose name beings with the letter Z like, “Zach Grenier“? Visualsearch.js gives us the ability to start typing the value and it will reset our options to match.

visual_search 3

We will enhance our previous query by adding a case insensitive regular expression with the term or part of the term we are looking for.

  get '/values/:facet/:term' do
    content_type :json
    
    label, key = get_label_and_key(params)
    
    cypher = "MATCH node:#{label} 
              WHERE HAS(node.#{key}) AND node.#{key} =~ {term}
              RETURN node.#{key} AS label, COUNT(*)
              ORDER BY label
              LIMIT 25"
    
    $neo.execute_query(cypher, {:term => "(?i).*" + params[:term] + ".*"})["data"].collect{|x| x.first.to_s}.compact.flatten.to_json
  end

Once we click on Zach Grenier, a few things happen. We get a little message telling us that:

You searched for: Actor.name: “Zach Grenier”. (1 node)

Our search bar comes alive again with the next set of Labels to query on…

visual search 4

… and our graph (currently consisting of just one node) is populated via vivagraph.js. See this previousvivagraph.js post for more information on how this great graph visualization library works.

Screen Shot 2013-07-02 at 11.24.06 PM

Now… I know you may be thinking… we populated an Actor node, and now only Movie is available in our drop down. How did that happen? That’s the magic of this application. Instead of just grabbing any next node at random, we are taking the context of our first node and building a path of available connections from there. If we click on “Movie.title”, we call the following method under the covers to get our possibilities:

  post '/connected_values/:facet/' do
    content_type :json
    related_label, related_key = get_label_and_key(params)
    
    match, where, values = prepare_query(params)
    last_node = get_last_node_id(params)
    
    where.pop
    where << "HAS(node#{last_node}.#{related_key})"
    
    cypher  = prepare_cypher(match,where)
    cypher << "WITH LAST(EXTRACT(n in NODES(p) : n.#{related_key}?)) AS label, COUNT(*) AS cnt "
    cypher << "RETURN label ORDER BY label LIMIT 25"    
    
    parameters = prepare_parameters(values)
        
    $neo.execute_query(cypher, parameters)["data"].flatten.collect{|d| d.to_s}.to_json
  end

It looks a little complicated, but all we are doing is just building a cypher query dynamically that will end up looking like this:

MATCH p = node0:Actor -- node1:Movie 
WHERE node0.name? = {value0} AND HAS(node1.title) 
WITH LAST(EXTRACT(n in NODES(p) : n.title?)) AS label, COUNT(*) AS cnt 
RETURN label 
ORDER BY label 
LIMIT 25

This Cypher query will be executed with the parameters {“value0″=>”Zach Grenier”}. It will find the Actor node for Zach Grenier in the graph, and then find the nodes that are labeled “Movie” and are related to Zach Grenier, and then extract the property “title” from the last node in our path (which happen to be the movies Zach Grenier is in) and give us our answer.

In our graph, we only have two things connected to Zach Grenier… the Movie “RescueDawn” and “Twister”. Let’s go ahead and click on Twister:

visual search 5

We query the graph for the pattern Actor named “Zach Grenier” that is connected to the movie titled “Twister”. The graph finds this pattern, returns the nodes and relationships within this pattern, and Twister gets added to our graph, connected to Zach Grenier.

The patterns we can create can go beyond just a single hop, for example. Actor born in 1929, that acted in “Snow Falling on Cedars” alongside Rick Yune, who was also in Ninja Assassin, alongside other actors…

MATCH p = node0:Actor -- node1:Movie -- node2:Actor -- node3:Movie -- node4:Actor 
WHERE node0.born? = {value0} AND node1.title? = {value1} AND node2.name? = {value2} AND node3.title? = {value3} AND HAS(node4.name) 
WITH LAST(EXTRACT(n in NODES(p) : n.name?)) AS label, COUNT(*) AS cnt 
RETURN label 
ORDER BY label 
LIMIT 25"

This query will be executed with the parameters: {“value0″=>1929, “value1″=>”Snow Falling on Cedars”, “value2″=>”Rick Yune”, “value3″=>”Ninja Assassin”}. One of the Actors at the end of the pattern is “Naomie Harris” and once we click on her we get this graph:

visual search 6

Don’t just take my word for it thought. Try the live Demo, take a look at the source code, and try pointing it at your own Neo4j 2.0 Labeled Graph.

What missing?

This is a dynamic UI that gives an end user quick access to the graph. However, the astute observer will notice something is missing. The relationship types. The patterns we are creating and matching against the graph only care about nodes that are connected, not in the way they are connected, and that might be a very important feature of our graph we are omitting. Alas, this little project is not the last mile, it is but one step further, and eventually we’ll reach it.

Help me work on these kinds of problems.

Understanding the power of graphs will give your data architect skills a boost. Don’t let this blog post be the last time you think in graphs. Learn about graphs at one of the dozens of events already on the Calendar and keep an eye out as more get added every week. Take some time to watch these great graph videosfrom the events you might have missed. Read the Graph Databases book, and of course… subscribe to my blog and follow me on Twitter.

Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)