NoSQL Zone is brought to you in partnership with:

I am a Webscience PhD student at the university of Koblenz and the Founder of http://www.metalcon.de Social news streams are my research interest. René is a DZone MVB and is not an employee of DZone and has posted 36 posts at DZone. You can read more from them at their website. View Full User Profile

Data Structure for Social News Streams on Graph Databases

01.03.2012
| 6267 views |
  • submit to reddit

Ok you guys did not hear much from me most recently. I was on vaccation and then on summer school and I worked on my first scientific poster and on a talk which will hopefully ontribute to my PhD thesis. Well at least I can now share some ressources which include my poster and the slides from my talk. But let me first show you two pictures.

The standard graph of a social network. You see several people and attached to them content items identified by numbers which are supposed to be time stamps

My model of a social network graph. The ego network changed from a star topology to a list topology and each ego network has a certain edge type which is modeled by edge color here. This graph stores exactly the same information as the standard model but makes retrieval of news streams much faster

Poster

Feel  free to download and look at my first poster with the Title:  a model for social news streams and time indices on graph data bases

You will probably not see so many things in it without the slides from my talk. So let me explain some things. I was looking into the data structures to model social news streams.  Basically there is the approach of normalized or denormalized relational data bases which I call the twitter approach for the reason that Twitter is doing something similar with FlockDB

I also looked into the case of saving the news stream as a flat file for every user in which the events from his friends are saved for every user. For some reason I thought I had picked up somewhere that facebook is running on such a system. But right now I can’t find the resource anymore. If you can, please tell me! Anyway while studying these different approaches I realized that the flat file approach even though it seems to be primitive makes perfect sense. It scales to infinity and is very fast for reading! Even though I can’t find the resource anymore I will still call this approach the Facebook approach.

I was now wondering how you would store a social news stream in a graph data base like neo4j in a way that you get some nice properties. More specifically I wanted to combine the advantages of both the facebook and the twitter approach and try to get rid of the downfalls. And guess what! To me this seems actually possible on graph data bases. The key Idea is to store the social network and content items created by the users not only in a star topology but also in a list topology ordered by time of occuring events. The crucial part is to maintain this topology which is actually possible in O(1) while Updates occure to the graph.


Talk

As mentioned together with this poster I was giving  a talk social news streams and time indices on social network graphs. Please feel free to download the slides. Unfortunally I improved the example while making the poster so that some pictures are not consistant with those from the poster! If I find the time I will not only update the slides but also give the talk as a video lecture on youtube! I think this will be helpful to spread the idea!

Future Work

  1. I need to publish all these results in a good coference or journal
  2. relevance filtering and recommendations which is the problem I am really interested in.
  3. Implementing this stuff for my social network (see blog post)

Open Questions

  1. Is it possible in neo4j to specify edgetypes (Relationship types) on runtime rather than compiletime.
  2. If so: Is the time of accessing them O(1) with respect to the node degree?
  3. If not: is there a graph data base that is capable of doing this?

Discussion

Anyway it is great to see how much more insight you get when thinking of a problem in a scientific way and not only implement it right away!


Source:  http://www.rene-pickhardt.de/data-structure-for-social-news-streams-on-graph-data-bases/
Published at DZone with permission of René Pickhardt, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Patrick Durusau replied on Wed, 2012/01/04 - 8:58am

René,

Liked the paper and the poster!

On your question about runtime vs. compiletime I assume you mean that you can limit graph exploration to particular edges in a query, as opposed to starting from a node and exploring all its edges? 

See the use of relationship types under 9.2 Traversal Framework in the Neo4j reference manual. I think that has what you are looking for. 

It may be that the question rarely comes up in traditional graphs (non-multigraphs) because there is only one edge between any two nodes. With multigraph support, like Neo4j, the question of which relationship to follow is a viable question. 

 Patrick

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.