Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 529 posts at DZone. You can read more from them at their website. View Full User Profile

Neo4j + Lucene: Querying a Graph Ontology

04.20.2012
| 8453 views |
  • submit to reddit

I’ve been playing around with neo4j using the neography gem to create a graph of all the people in ThoughtWorks and the connections between them based on working with each other.

I created a UI where you could type in the names of two people and see when they’ve worked together or the path between the shortest path between them if they haven’t.

I thought it would be cool to have auto complete functionality when typing in a name but I couldn’t figure out how to partially query the index of people’s names that I’d created.

I have this Lucene index:

@neo = Neography::Rest.new
@neo.create_node_index("people", "fulltext", "lucene")

Which I add to like this:

node = @neo.create_node("name" => "Mark Needham")
@neo.add_node_to_index("people", "name", "Mark Needham", node)
> @neo.get_index("people", "name", "Mark Needham")
=> [{"indexed"=>"http://localhost:7474/db/data/index/node/people/name/Mark%20Needham/979", "outgoing_relationships"=>"http://localhost:7474/db/data/node/979/relationships/out", "data"=>{"name"=>"Mark Needham"}, "traverse"=>"http://localhost:7474/db/data/node/979/traverse/{returnType}", "all_typed_relationships"=>"http://localhost:7474/db/data/node/979/relationships/all/{-list|&|types}", "property"=>"http://localhost:7474/db/data/node/979/properties/{key}", "self"=>"http://localhost:7474/db/data/node/979", "properties"=>"http://localhost:7474/db/data/node/979/properties", "outgoing_typed_relationships"=>"http://localhost:7474/db/data/node/979/relationships/out/{-list|&|types}", "incoming_relationships"=>"http://localhost:7474/db/data/node/979/relationships/in", "extensions"=>{}, "create_relationship"=>"http://localhost:7474/db/data/node/979/relationships", "paged_traverse"=>"http://localhost:7474/db/data/node/979/paged/traverse/{returnType}{?pageSize,leaseTime}", "all_relationships"=>"http://localhost:7474/db/data/node/979/relationships/all", "incoming_typed_relationships"=>"http://localhost:7474/db/data/node/979/relationships/in/{-list|&|types}"}]

I came across an old mailing list thread which suggested the following solution:

One solution is to add a field with a known and constant value to each document in the index. Then searching for that field and value will give you all documents in the index.


I changed my code to do that:

node = @neo.create_node("name" => "Mark Needham")
@neo.add_node_to_index("people", "name", "Mark Needham", node)
@neo.add_node_to_index("people", "type", "person", node)

From my sinatra web app I then put the names of all the people in an application level variable like so:

configure do
  set :all_people, Neography::Rest.new.get_index("people", "type", "person").map { |n| n["data"]["name"] }
end

And then search through that like so:

get '/people' do
    search_term = params["term"] ||= ""
    settings.all_people.select { |p| p.downcase.start_with?(search_term.downcase) }.to_json
end

It works and since there’s only one query to get the Lucene index when I first start the web server it’s pretty quick but surely there’s a less hacky/proper way?

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)