NoSQL Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 534 posts at DZone. You can read more from them at their website. View Full User Profile

neo4j: Embracing the Sub Graph

07.31.2012
| 3255 views |
  • submit to reddit

In May I wrote a blog post explaining how I’d been designing a neo4j graph by thinking about what questions I wanted to answer about the data.

In the comments Josh Adell encouraged gave me the following advice:

The neat things about graphs is that multiple subgraphs can live in the same data-space.

Keep your data model rich! Don’t be afraid to have as many relationships as you need. The power of graph databases comes from finding surprising results when you have strongly interconnected data.

At the time I didn’t really understand the advice but I’ve since updated my graph so that it includes ‘colleagues’ relationships which can be derived by looking at the projects that people had worked together on.

V2 a

When I was showing the graph to Marc he thought it would be quite interesting to see all the people that he hadn’t worked with before, a query that’s very easy to write when we have these two sub graphs.

We could write the following cypher query to find who Marc hadn’t worked with on a specific project:

START project=node:projects(name="Project X"), me=node:people(name="Marc Hofer")
MATCH project<-[:worked_on]-neverColleague-[c?:colleagues]->me
WHERE c is NULL
RETURN neverColleague

We do an index lookup to find the appropriate project node and to find the node that represent’s Marc.

THe MATCH clause starts from the project and then work backwards to the people who have worked on it. We then follow an optional colleagues relationship back to Marc.

The WHERE clause makes sure that we only return people that Marc doesn’t have a ‘colleagues’ relationship with i.e. people Marc hasn’t worked with.

Along the same lines we could also find out all the people that Marc has worked with from a specific office:

START office=node:offices(name="London - UK South"), me=node:people(name="Marc Hofer")
MATCH office<-[r:member_of]-colleague-[c:colleagues]->me
WHERE (NOT(HAS(r.end_date)))
RETURN colleague, c

This is reasonably similar to the previous query except the colleagues relationship is no longer optional since we only want to find people that Marc has worked with.

I’m sure there are other queries we could run but these were a couple that I hadn’t thought of before I had multiple sub graphs together on the same overall graph.

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)