Big Data/Analytics Zone is brought to you in partnership with:

Dmitriy Setrakyan manages daily operations of GridGain Systems and brings over 12 years of experience to GridGain Systems which spans all areas of application software development from design and architecture to team management and quality assurance. His experience includes architecture and leadership in development of distributed middleware platforms, financial trading systems, CRM applications, and more. Dmitriy is a DZone MVB and is not an employee of DZone and has posted 54 posts at DZone. You can read more from them at their website. View Full User Profile

The World's Shortest MapReduce App

10.13.2010
| 7394 views |
  • submit to reddit
We often say that GridGain is very simple to use, but we often forget that many of our users don't actually exploit the full power of GridGain functional APIs. In this blog I want to demonstrate several GridGain features that are available at your fingertips once you download GridGain.

1. Broadcasting
Here is a simple app that broadcasts execution of a closure to all participating nodes (in our simple scenario the closure simply prints out a string): 
G.grid().run(
    GridClosureCallMode.BROADCAST,
    F.println("Broadcasting This Message To All Nodes")
);

The "F.println()" method returns a simple closure that will print out a passed in argument. Then GridGain will take this closure and execute it on all available nodes.

2. Splitting Execution
Here is a simple app that splits a given phrase into words, creates closures that print individual words, and executes them on different nodes:

G.grid().run(
GridClosureCallMode.SPREAD,
F.yield("Splitting This Message Into Words".split(" "), F.println())
);

What happens here is that initial String is split into words using standard JDK split method. Then method "F.yield()" will take each word, give it to "F.println()" closure and return a collection of "F.println()" closures curried with a predefined argument (one for each word). GridGain will then take this collection of closures and execute them on the Grid sending each closure to a different node in round-robin fashion.

3. The World's Shortest MapReduce App
Now let's try to actually get a little more sophisticated and execute an app which will split a phrase into multiple words, have individual grid nodes count letters in individual words and then, in reduction step, add up all the counts. Here is how this app will look like:

int letterCnt = G.grid().forkjoin(
GridClosureCallMode.SPREAD,
F.yield("Counting Letters In This Phrase".split(" "),
new C1<String, Integer>() {
@Override public Integer apply(String word) {
return word.length();
}
}
),
F.sumIntReducer()
);

Here are the things to note here. The C1 is a convenience alias for a GridClosure class which in our case takes a string and returns a number of characters in that string. A collection of closures with different words as an argument will be distributed to individual grid nodes and each of those closures will return a number of characters in the word it received. Then the local node will use "F.sumIntReducer()" which simply adds all character counts given to it.

And to top it all, none of the code above requires any deployment. You simply startup a few bare bone GridGain nodes, write your code, hit the Run button, and your code just executes on the Grid (well, it's utilizing GridGain peer-class-loading mechanism underneath). There are no Ant or Maven scripts to execute, just write your code and run it. If you need to change your code, just change it and run it again.

I doubt there is another product out there that will let you execute fairly complex MapReduce applications in such a concise and elegant manner ;-)

From http://gridgain.blogspot.com/2010/10/worlds-shortest-mapreduce-app.html

Published at DZone with permission of Dmitriy Setrakyan, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Sebastian Mueller replied on Wed, 2010/10/13 - 2:12am

Hey, mine is even shorter:

 

D.o();

 

Seriously, instead of this marketing stuff you should post an example that doesn't count characters. The split method takes way longer than a trivial implementation of a method that counts the non whitespace characters in a string. These examples are ridiculous. In the myriads of blog posts you post about grid grain, why is there never an example that uses objects that are more complex than strings and integers? Probably because as soon as real objects come into play the world's shortest example would become fairly complex, I guess. Please, if you post to the front page of Javalobby - add some substance to it and leave out that marketing fluff.

Nikita Ivanov replied on Wed, 2010/10/13 - 1:37pm

Sebastian,
I hope you understand that we don't propose to count chars on the grid :) This is an example. Do you run "HelloWorld" in production?

Now that this is behind us - you can use any objects you like (download GridGain and look at our examples). Moreover, all of them will be auto-deployed on the grid/cloud with zero effort from your side.

In our examples we try to show what GridGain brings to the table - not complexity of your own data model. And I would again challenge anyone to produce more simpler, more readable and more performant version of this very simple application - using any other grid or cloud computing framework.

Best,
Nikita Ivanov.
GridGain Systems

Dmitriy Setrakyan replied on Wed, 2010/10/13 - 1:52pm in response to: Sebastian Mueller

Sebastian,

All I did is wrote a blog about some cool features of GridGain :)

The reason I purposely picked a simple example for counting letters, is because I tried to put an emphasis on how easy GridGain can distribute a problem and the problem itself is less important. You can pick any problem you like and distribute it as easily with GridGain as a counting letters problem. As a matter of fact, I suggest you do that next time before producing these empty and meaningless rants.

If you are looking for some more complex examples, I invite you to visit my blog. You will find plenty there.

Also, I suggest you take a look at some testimonials from real users, who actually *did* try GridGain.

--Best,
Dmtiriy Setrakyan

Dmitriy Setrakyan replied on Wed, 2010/10/13 - 3:20pm

By the way, all the code examples are posted twice for some reason. Can someone fix this?
--Dmitriy
(already fixed, thanks)

Sebastian Mueller replied on Thu, 2010/10/14 - 2:16am in response to: Dmitriy Setrakyan

Dmitriy,

actually I took the bait and took a look at your blog - as a frequent DZONE and java lobby reader I already feel like a GridGain customer because of the high amount of GridGain articles (that funnily are always promoted by the same set of people), so I have the feeling that I know the GridGain API without having used it. I looked at all of the articles back to 2008 and did not find an single example that was not about strings or ints - please do us the favor and post and example where there is a POJO that has a method that modifies the state of the object and that object is being distributed across the cloud into multiple different JVMs. I really would be interested in how @Gridify solves this. If you post an example like this I might actually become interested in trying out GridGain (actually at the moment I have absolutely no need for that and my guess is that way more than 90% of the Javalobby readers don't have that either, but I could be wrong), however if sample code shows how to distribute System.out.println calls using a prebuilt function, I feel rather annoyed.

Actually my complaint was mainly about the title of your post: If it had been "Using MapReduce with GridGain", it would have been perfectly alright for me, then I would have discarded it immediately. However marketing fluff like "worlds greates whatever" is just that - marketing fluff and I don't like it that way - I am sure you can accept that.

Dmitriy Setrakyan replied on Thu, 2010/10/14 - 4:14am in response to: Sebastian Mueller

Dear Sebastian,

1. I didn't title this post on Dzone, nor have I posted it here. I am thankful to whoever did post it as they found it worthwhile for DZone readers. I still think the post title is correct however, as I doubt anyone can produce the functionality above in a simpler or more elegant way.

2. All examples above already send POJOs around... It may seem to you that a String is sent, but String is only an argument to an instance (POJO) created by user (e.g., note the anonymous class) which gets passed for execution. GridGain never distributes Strings around - it always distributes user-created objects. Moreover, all deployment happens on the fly, so the code you see above is pretty much all you need to run it on GridGain.

3. If, by your estimation, 10% of DZone readers find my blog interesting, I am already extremely grateful :-) GridGain becomes very useful once you start having more than 1 box in your deployment. If nothing else, you can use it just to auto-discover a cluster of all boxes in your deployment for a basic message exchange.

Dmitriy Setrakyan
GridGain = Compute + Data + Cloud

Alessandro Santini replied on Thu, 2010/10/14 - 4:54am

As a matter of fact, I suggest you do that next time before producing these empty and meaningless rants.

I think you should promptly re-consider such sentences when you are speaking on behalf of the company you are working at.

Having said that, I believe your comment is more pointless than Sebastian's. People reading DZone often want to get meaningful information without having to spend hours (that they don't have) playing with a technology.

Sometimes rushing to get something published on this website produces effects like this. 

Dmitriy Setrakyan replied on Thu, 2010/10/14 - 5:19am in response to: Alessandro Santini

The rant was meaningless because it made statements that were entirely untrue. If you read the blog and some of my replies here you would clearly see that.

I think I already pointed out in my replies why these examples are the way they are, and how they do a lot more than seems on the surface. There is no point to over-complicate them with artificial data models because that would add no value. Is there anything in particular you would like to see?

Dmitriy Setrakyan
GridGain = Compute + Data + Cloud

Sebastian Mueller replied on Thu, 2010/10/14 - 6:33am in response to: Dmitriy Setrakyan

> I didn't title this post on Dzone, nor have I posted it here

 That's interesting because the line below the title says:Submitted by Dmitriy Setrakyan on Wed, 2010/10/13 - 2:53am - I didn't know that there are others involved at dzone that moderate head lines. If so, then I think this should be stated more prominently - Then of course my complaint is directed at the guy who blindly took the title of your blog entry (where of course you may write whatever you want and I will never even dare to complain) and pasted it directly into dzone (as far as I know this is what the Submitter does) - if you chose a different title and the title was changed to the current one, then that is ok for me.

>  2. All examples above already send POJOs around... It may seem to you that a String is sent, but String is only an argument to an instance (POJO) created by user (e.g., note the anonymous class) which gets passed for execution

 Now that *is* interesting - if actually the instances (and not just the classes) are sent over the network then how do you deal with serialization (in the case the objects are not trivially serializable) and data synchronization (if instances "live" in more than one JVM)? what if I add a static and a normal int field (for me POJOs are more than just pure functions with immutable arguments and return types - they tend to have mutable state, among other things) to the anonymous inner class that are incremented each time the method is called? What will be the value of that these fields after the execution?

>If, by your estimation, 10% of DZone readers find my blog interesting, I am already extremely grateful :-)

No - I said way less than 10% - and I was trying to be really polite... Please don't create false testimonials from my comments. If you want a quote from me than that would be: GridGain may be a good product, but the samples they use to attract people are trivial and hide away the fact that distributed computing is not something you get by simply extracting a method - there is *a lot* more that needs to be done if you want to do non-trivial stuff.

Dmitriy Setrakyan replied on Thu, 2010/10/14 - 10:27am in response to: Sebastian Mueller

I will answer question #2. To send objects across, we use Serialization mixed with GridGain automatic peer deployment, so all class and resource definitions will be automatically loaded on remote nodes (marshalling is pluggable, and you can use any protocol you like, including XML).

All class static variables will be initialized again on remote nodes, and all the non-static state will be passed forward. However, for performance reasons GridGain does not blindly mirror local and remote objects. Instead, GridGain allows users to return only the needed state from remote jobs, and those returned values will be fed to local reducer. In my example such reducer is F.sunIntReducer() which automatically adds all the integers, but reducer is just a simple interface and users can provide their own.

If you need mirrored objects across all grid nodes, then you can use GridGain data grid (caching) component, and store your objects in distributed cache - this was not part of the example above.

Hope this clarifies a few things.

Dmitriy Setrakyan
GridGain = Compute + Data + Cloud

Nikita Ivanov replied on Thu, 2010/10/14 - 2:50pm in response to: Sebastian Mueller

Hey, mine is even shorter...
Hey Sebastian, nothing to brag about here :)

But I don't want to get into this piss contest - I'll let users to speak for themselves. GridGain is started today every 20 seconds arounds the globe and 1000s of developers find this software simple and productive to use in their daily work.

GridGain core team came from Globus background and I know firsthand what "complex" and "non-trivial" grid computing is. We've started GridGain 5 years ago exactly b/c of that experience to push grid computing into 21 century.

And I still didn't see you coming up with simpler/shorter/faster (pick any) example even for the simplistic problem that Dmitriy posted in his blog. And don't forget distributed continuations, functional MapReduce, distributed closure executions, late and early load balancing, zero deployment with cloud-enabled class loading, data grid, and many, many other "trivial" features while you at it...

Btw, we use your software in our Javadoc - and like it.

Keep up a good work.

Best,
Nikita Ivanov.

Nikita Ivanov replied on Thu, 2010/10/14 - 3:47pm in response to: Sebastian Mueller

Sebastian,
Just in case it wasn't obvious, here's an example with some domain model... (in Scala - APIs are the same as in Java):
// Domain model...
case class Payload(data: String)

// Calculate on the cloud/grid.
def count(p: Payload) =
    grid !*~ 
        (for (x <- (for (w <- p.data.split(" ")) yield Payload(w))) 
        yield (() => x.data.length), (s: Seq[Int]) => s.sum)
Payload gets distributed (artificially - just for this example). Its data field is used for calculation. You can, of course, put anything you like into Payload.

Enjoy!
Nikita Ivanov.

Sebastian Mueller replied on Fri, 2010/10/15 - 2:47am in response to: Nikita Ivanov

Hey Nikita,

 no need to get personal here... I was not implying anything - you said this is a "piss contest".

I won't comment on the marketing paragraphs - they don't add anything to the discussion.

However because you wanted it: here is a shorter, simpler and faster approach that uses Map Reduce (it doesn't use Java and it doesn't use GridGrain and it doesn't use the cloud - but I was complaining about "The world shortest MapReduce example" - if the title would have been "The worlds shortest MapReduce example in the cloud using GridGrain" my example of course would be invalid, however it's not:

 In .net 4.0 using C# you can do the following without any additional library:

      Console.WriteLine("The worlds shortest map reduce program".Split(' ').AsParallel().Sum(s => s.Length));
 

Put that above method in class D static method o() and you have my initial example ;-)

Just to clarify again: I was never putting your work with GridGain in question - remember I was just complaining about, nor was I saying or implying anything negative about GridGrain - I just don't like to be bombarded with marketing fluff on the java lobby frontpage. I know - you don't created that title here (it's the title of your original blog post) - as it seems nobody did that, so, sadly, there is no one left to blame.

Happy coding - Sebastian

Sebastian Mueller replied on Fri, 2010/10/15 - 2:54am in response to: Dmitriy Setrakyan

Hi Dmitriy,

 thank you for the additional information about the inner workings - this sure does clarify a few things. It shows that in order to switch paradigms (local computing -> cloud computing) there is a lot to consider - not only how to parallelize an algorithm but also serialization and sychronization. It is these tiny little details that I was missing in all your examples - however it is these details that make real world problems really complicated. GridGain does a fairly good job in making the trivial things easy - and as you noted it probably makes the non-trivial stuff possible - but certainly not easy. This is not GridGains fault, so adding a notice to these posts wouldn't even hurt.

Happy posting and marketing - best regards - Sebastian

Nikita Ivanov replied on Fri, 2010/10/15 - 11:52am in response to: Sebastian Mueller

Sebastian,

Your example is not distributed - but simply utilizes thread pool. I'm not sure what you are demonstrating here...

Best,
Nikita Ivanov.

Sebastian Mueller replied on Mon, 2010/10/18 - 4:12am in response to: Nikita Ivanov

>Your example is not distributed - but simply utilizes thread pool.

 I know that - and I wrote that and gave reasons for that. It does show how to apply the Map Reduce approach (the one from functional programming, not Google's patented implementation - which your example does not use either AFAIK) to a problem by distributing the computation to multiple threads - thus it runs in parallel and uses the Map Reduce approach. Note that from the API point of view it could very well perform the same operation on a grid in the cloud or elsewhere. Just get a different implementation of the extension method (maybe GridGain for .net !). Your sample doesn't make it clear either how grid gain is implemented internally.You cannot tell from the source code whether it is distributed either. In fact that was my point- the false simplicity of the API example and that catchy marketing phrase... but I am beginning to repeat myself here...

>I'm not sure what you are demonstrating here...

  If you don't want to understand, then don't do it. If you cannot understand then I am sorry for you. But simply bashing a few of my statements and not responding to others doesn't get us anywhere. I'm signing off...

/Sebastian

Thomas Kern replied on Thu, 2012/09/06 - 10:58am

Sorry but I'm afraid I am new into GridGain. ¿Where do this "forkjoin" method come from? I have tried to make this example work but I failed time after time.

The compiler keeps complaining "The method forkreduce(GridClosureCallMode, Collection>, GridReducer) is undefined for the type Grid"

Thank you!

http://www.java-tips.org 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.