Big Data/Analytics Zone is brought to you in partnership with:

Disruptive Innovator and Senior Executive with a passion transforming industries by applying cutting edge technologies (Cloud, Big Data, M2M, Distributed Machine Learning, Open Source Hardware, etc.) and business innovations (Freemium, Gamification, Social, Long Tail, Pay per Use, SaaS Subscriptions, Design Thinking, Blue Ocean Strategy, Lean Startup, etc.) to generate new revenues Maarten is a DZone MVB and is not an employee of DZone and has posted 35 posts at DZone. You can read more from them at their website. View Full User Profile

Trident Storm, Real-Time Analytics for Big Data

08.13.2012
| 15855 views |
  • submit to reddit

In a previous post I mentioned Storm already. Trident is an extension of Storm that makes it an easy-to-use distributed real-time analytics framework for Big Data. Both Trident and Storm were developed by Twitter.

One of Twitter’s major problems is keeping statistics of Tweets and Tweeted URLs that get retweeted by millions of followers. Imagine a famous person who tweets a URL to millions of followers. Lots of followers will retweet the URL. So how do you calculate how many Tweeters have seen the URL? This is important for features like “Top retweeted URLs”.

The answer was Storm but with the addition of Trident, it has become a lot easier to manage. Trident is doing to Storm what Pig and Cascading are doing to Hadoop: simplification. Instead of having to create a lot of Spouts and Bolts and take care of how messages are distributed, Trident comes with a lot of the work already done.

In a few lines of code, you set-up a Distributed RPC server, send it URLs, have it collect the tweeters and followers, and count them. Fail-over and resiliance as well as massive distribution throughput are build into the platform. You can see it in this example code:

TridentState urlToTweeters =
topology.newStaticState(getUrlToTweetersState());
TridentState tweetersToFollowers =
topology.newStaticState(getTweeterToFollowersState());

topology.newDRPCStream("reach")
.stateQuery(urlToTweeters, new Fields("args"), new MapGet(), new Fields("tweeters"))
.each(new Fields("tweeters"), new ExpandList(), new Fields("tweeter"))
.shuffle()
.stateQuery(tweetersToFollowers, new Fields("tweeter"), new MapGet(), new Fields("followers"))
.parallelismHint(200)
.each(new Fields("followers"), new ExpandList(), new Fields("follower"))
.groupBy(new Fields("follower"))
.aggregate(new One(), new Fields("one"))
.parallelismHint(20)
.aggregate(new Count(), new Fields("reach"));

The possibilities of Trident + Storm, combined with fast scalable datastores, like for instance Cassandra, are enormous. Everything from real-time counters, filtering, complex event processing, machine learning, etc.
The Storm concept of Spout [data generation] and Bolt [data processing] can be easily understood by most programmers. Storm is an asynchronous highly distributed framework but with a simple distributed RPC server it can easily be used in synchronous code.

The only drawback I have seen is that DRPC is focused only on Strings (and other primitive types that can be contained in a String). Adding more complex objects (via Kryo, Avro, Protocol Buffers, etc.), or at least bytes, would be useful for companies that do not only focus on Tweets.

 

Published at DZone with permission of Maarten Ectors, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Nikita Ivanov replied on Wed, 2012/08/15 - 3:39pm

For enterprise level Streaming MapReduce products I suggest to look at GridGain (www.gridgain.com). No string-only nonesense, and ton of other features for real high-performance distributed programming. 

Bertrand Dechoux replied on Sat, 2012/08/25 - 9:06am

"Both Trident and Storm were developed by Twitter."

That would be shortcut. One of the main developper on Storm is Nathan Marz.

http://nathanmarz.com/blog/suffering-oriented-programming.html

At that time, he was working for a startup Backtype, which has been acquired by Twitter one year ago.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.