NoSQL Zone is brought to you in partnership with:

I'm currently working as a Bioinformatics consultant/developer/researcher at Amateur musician, traveller, painter, sporadic writer and always eager to learn more about languages, plants... Too many things to do and too little time for it! Pablo is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

Using Bio4j + Neo4j Graph-algo component for finding protein-protein interaction paths

  • submit to reddit

Hi all !

Today I managed to find some time to check out the Graph-algo component from Neo4j and after playing with it plus Bio4j a bit, I have to say it seems pretty cool.
For those who don’t know what I’m talking about, here you have the description you can find in Neo4j wiki:

This is a component that offers implementations of common graph algorithms on top of Neo4j. It is mostly focused around finding paths, like finding the shortest path between two nodes, but it also contains a few different centrality measures, like betweenness centrality for nodes.

The algorithm for finding the shortest path between two nodes caught my attention and I started to wonder how could I give it a try applying it to the data included in Bio4j. I realized then that protein-protein interactions could be a good candidate so I got down to work and created the utility method:

findShortestInteractionPath(ProteinNode proteinSource, ProteinNode proteinTarget, int maxDepth, int maxResultsNumber)

for getting at most ‘maxResultsNumber’ paths between ‘proteinSource’ and ‘proteinTarget’ with a maximum path depth of ‘maxDepth’.
You can check the source code here

I also did a small test program which prints out the paths found between two proteins.

Even though I’ve missed having a wider choice of algorithms, it’s really cool having at least this small set of algorithms already implemented, abstracting you from the low level coding.
Apart from that, I’ve been thinking how Bio4j could open a lot of doors for topology/network analysis around all the data it includes. Such analysis could otherwise be quite hard to perform due to several reasons like the lack of data-integration between different datasources and the inner storage paradigm limiting topology/network analysis among others…

With Bio4j however, you just have to move around the nodes and get the information you’re looking for! ;)


Published at DZone with permission of Pablo Pareja Tobes, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Afandi Merathi replied on Sun, 2012/03/18 - 7:49am

I follow neo4j which much itrneest. It is a novel approach, however i think property searches are very important and neo4j is not very good at this.So for example, implementing a complete social website with millions of users would not be feasible with neo4j i think. I am not sure off course.What is also itrneesting is the upcoming of native XML database. They also solve the imdependace mismatch to a certain expend. However their model are trees not graphs, graphs are more general in this sense, but i think more optimizations are possible if you choose trees.

Pablo Pareja Tobes replied on Fri, 2012/03/30 - 5:14am in response to: Afandi Merathi

Hi Afandi,

You can find my reply in the original blog source:



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.