NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 59 posts at DZone. You can read more from them at their website. View Full User Profile

Java Network/Graph/Data Mining Algorithm 'Arsenal' on Neo4j: Part 2

03.17.2012
| 6290 views |
  • submit to reddit

A few weeks ago I showed you how to visualize a graph using the chord flare visualization and how to visualize a network using a force directed graph visualization from D3.js.

On Twitter Claire Willett from Riparian Data asked:


This post on Graphs Beyond the Hairball by Robert Kosara explains why some non-traditional graph visualizations may work better and links us to an article explaining what a Node Quilt is and how it’s useful. We’re going to just take the first step and build a Matrix representation of a graph. We will use one of the JUNG clustering algorithms to help us understand it.

The study of social networks goes back to at least the ancient Greeks, but we won’t go back that far in time today… just to 1977. A man named Wayne Zachary recorded the interactions of a Karate Club at a University for 2 years. During this time a conflict developed between the administrator and the instructor and the club broke up into two. Turns out you could predict which club each member would belong to by building a graph of their weighted relationships and spliting it along the minimum-cut.

We’re working with Neo4j, and you all know Neo knows Kung Fu, so we’ll do something a little different. Plus looking at a two node cluster is kinda lame… let’s go bigger. We’re going to be using Neo4j with the JUNG jars already in the lib directory. Refer to JUNG in Neo4j – Part 1 if you need help with that. Let’s create our graph:

def create_graph
  neo = Neography::Rest.new
  graph_exists = neo.get_node_properties(1)
  return if graph_exists && graph_exists['name']

  names = %w[Aaron Achyuta Adam Adel Agam Alex Allison Amit Andreas Andrey 
             Andy Anne Barry Ben Bill Bob Brian Bruce Chris Corey 
             Dan Dave Dean Denis Eli Eric Esteban Ezl Fawad Gabriel 
             James Jason Jeff Jennifer Jim Jon Joe John Jonathan Justin 
             Kim Kiril LeRoy Lester Mark Max Maykel Michael Musannif Neil]

  commands = names.map{ |n| [:create_node, {"name" => n}]}

  names.each_index do |from| 
    commands << [:add_node_to_index, "nodes_index", "type", "User", "{#{from}}"]
    connected = []

    # create clustered relationships
    members = 5.times.collect{|x| x * 10 + (from % 10)}
    members.delete(from)
    rels = 1 + rand(4)
    rels.times do |x|
      to = members[x]
      connected << to
      commands << create_rel(from, to) unless to == from
    end    

    # create random relationships
    rels = 1 + rand(4)
    rels.times do |x|
      to = rand(names.size)
      commands << create_rel(from, to) unless (to == from) || connected.include?(to)
    end    
   end
   batch_result = neo.batch *commands
end

I am once again using the first 50 names from the Graph DataBase- Chicago Meet-Up group. You’ve seen me do this once or twice already, so I won’t go over it in detail, but as you can see above, I am forcing small clusters of relationships to exist along with random relationships. Our relationships have a weight property, so the create_rel method is just this:

def create_rel(x,y,z= 1 + rand(10))
  [:create_relationship, "knows", "{#{x}}", "{#{y}}", {:weight => z}]
end

Our visualization is expecting a list of nodes already in groups, and a list of relationships that includes it’s strength. The JSON object looks like this:

{"nodes":[{"name":"Myriel","group":1},
          {"name":"Napoleon","group":1},
          {"name":"Mlle.Baptistine","group":2}],
 "links":[{"source":1,"target":0,"value":1},
          {"source":2,"target":0,"value":8},
          {"source":3,"target":0,"value":10}]
}

Getting our nodes is pretty easy, we’ll use Gremlin to retrieve all but the root node.

def get_nodes
  neo = Neography::Rest.new
  neo.execute_script("g.V.filter{it.id != 0}.transform{[it.id,it.name]}")
end 

Getting the relationships is also pretty ease, to switch it up, we’ll use Cypher.

def get_relationships
  neo = Neography::Rest.new
  cypher_query =  " START a = node:nodes_index(type='User')"
  cypher_query << " MATCH a-[r:knows]-b"
  cypher_query << " RETURN ID(a), ID(b), r.weight"
  neo.execute_query(cypher_query)["data"]
end 

Now comes the fun part. To get our clusters, we’ll be using the Voltage Clusterer Class. We don’t want the root node getting in the way, so we’ll create a sub-graph using TinkerGraph that excludes it (you could do something similar to cluster just a small part of a larger graph). We then pass this graph on to GraphJung and set our VoltageClusterer to try to get 10 clusters for us.

def get_voltage_clusters
  neo = Neography::Rest.new
  lg = neo.execute_script("import edu.uci.ics.jung.algorithms.cluster.VoltageClusterer;
                            to = new TinkerGraph();
                            g.V.filter{it.id != 0}.sideEffect{toVertex = to.addVertex(it.getId()); 
                                           ElementHelper.copyProperties(it, toVertex);
                            }.iterate();   
                            g.E.sideEffect{toEdge = to.addEdge(it.getId(), 
                                                    to.v(it.getOutVertex().getId()), 
                                                    to.v(it.getInVertex().getId()),
                                                    it.getLabel());
                                            ElementHelper.copyProperties(it, toEdge);
                            }.iterate();
                            vc = new VoltageClusterer(new GraphJung(to), 10);
                            vc.cluster(10).id;
                            ")
end

We can then put it all together to build that JSON object.

get '/cluster' do
  clusters = Hash.new
  get_voltage_clusters.each_with_index {|item, index| item.each{|i| clusters[i.to_i] = index + 1} }
  nodes = get_nodes.map{|n| {"name" => n[1], "group" => clusters[n[0]]}}
  relationships = get_relationships.map{|r| {"source" => r[0] - 1, "target" => r[1] - 1, "value" => r[2]} }
  {:nodes => nodes, :links => relationships}.to_json
end	

Our visualization was done by Mike Bostock with D3.js. As always, the code is on Github. Click on the image below to see the live example on Heroku.

 

Note: The version on Heroku is using a pre-generated JSON file.

 

 

Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)