NoSQL Zone is brought to you in partnership with:

Michal Bachman is a Principal Consultant at GraphAware, where he helps companies of all sizes succeed with Neo4j, a popular graph database. He also works on open-source extensions to Neo4j, focusing on large-scale graph analytics and domain-specific add-ons. Specializing in Java and related technologies, he is also a certified Spring Framework trainer. He writes clean, tested, and documented code that creates value. Occasionally, he blogs and speaks at conferences. Michal has posted 2 posts at DZone. You can read more from them at their website. View Full User Profile

Modeling Data in Neo4j: Bidirectional Relationships

11.06.2013
| 9559 views |
  • submit to reddit

Transitioning from the relational world to the beautiful world of graphs requires a shift in thinking about data. Although graphs are often much more intuitive than tables, there are certain mistakes people tend to make when modelling their data as a graph for the first time. In this article, we look at one common source of confusion: bidirectional relationships.

Directed Relationships

Relationships in Neo4j must have a type, giving the relationship a semantic meaning, and a direction. Frequently, the direction becomes part of the relationship's meaning. In other words, the relationship would be ambiguous without it. For example, the following graph shows that the Czech Republic DEFEATED Sweden in ice hockey. Had the direction of the relationship been reversed, the Swedes would be much happier. With no direction at all, the relationship would be ambiguous, since it would not be clear who the winner was.

Directed Relationship

Note that the existence of this relationship implies a relationship of a different type going in the opposite direction, as the next graph illustrates. This is often the case. To give another example, the fact that Pulp Fiction was DIRECTED_BY Quentin Tarantino implies that Quentin Tarantino IS_DIRECTOR_OF Pulp Fiction. You could come up with a huge number of such relationship pairs.

Implied Relationship

One common mistake people often make when modelling their domain in Neo4j is creating both types of relationships. Since one relationship implies the other, this is wasteful, both in terms of space and traversal time. Neo4j can traverse relationships in both directions. More importantly, thanks to the way Neo4j organizes its data, the speed of traversal does not depend on the direction of the relationships being traversed.

Bidirectional Relationships

Some relationships, on the other hand, are naturally bidirectional. A classic example is Facebook or real-life friendship. This relationship is mutual - when someone is your friend, you are (hopefully) his friend, too. Depending on how we look at the model, we could also say such relationship is undirected.

GraphAware and Neo Technology are partner companies. Since this is a mutual relationship, we could model it as bidirectional or undirected relationship, respectively.

Bidirectional Relationship

But since none of this is directly possible in Neo4j, beginners often resort to the following model, which suffers from the exact same problem as the incorrect ice hockey model: an extra unnecessary relationship.

Duplicated Relationship

Neo4j APIs allow developers to completely ignore relationship direction when querying the graph, if they so desire. For example, in Neo4j's own query language, Cypher, the key part of a query finding all partner companies of Neo Technology would look something like

MATCH (neo)-[:PARTNER]-(partner)

The result would be the same as executing and merging the results of the following two different queries:

MATCH (neo)-[:PARTNER]->(partner)
and
MATCH (neo)<-[:PARTNER]-(partner)

Therefore, the correct (or at least most efficient) way of modelling the partner relationships is using a single PARTNER relationship with an arbitrary direction.

Bidirectional Relationship with Arbitrary Direction

Conclusion

Relationships in Neo4j can be traversed in both directions with the same speed. Moreover, direction can be completely ignored. Therefore, there is no need to create two different relationships between nodes, if one implies the other.

Published at DZone with permission of its author, Michal Bachman.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)