Software developer, architect, and agile software engineering coach at SAP Stoyan has posted 2 posts at DZone. You can read more from them at their website. View Full User Profile

Devoxx 2012: Java 8 Lambda and Parallelism, Part 2

12.24.2012
| 4110 views |
  • submit to reddit

Written by Stoyan Rachev

Overview

In Devoxx 2012: Java 8 Lambda and Parallelism, Part 1, I covered the basics of lambda and parallelism in Java 8 as presented at Devoxx in the session On the road to JDK 8: Lambda, parallel libraries, and more by Joe Darcy. In this post, I would like to cover two more Devoxx sessions on this topic:

These two sessions, as well as the first one, took place in the same day, in the same room, one after the other, and together provided three different perspectives on lambdas, parallel collections, and parallelism in general in Java 8.

Closures and Collections - the World After Eight

In this session, Maurice Naftalin, a software architect with Incept5 and co-author of Java Generics and Collections, talked about the different world after Java 8, focusing in particular on the impact of lambdas on the way collections are handled with the introduction of new features such as stream-oriented processing, parallel views, and more.

Introduction

Maurice started by talking a bit about his background, his book, and his current project, Lambda FAQ, which is intended to become a major online learning resource for Project Lambda, and which is also a preparation for a second edition of his book, in which the collections part would need to be thoroughly revised.

Why is everyone so rude about us?

You wouldn't have to look very far on the Web to find resources that treat Java with irony and sarcasm, and the notion that "Java is the new Cobol" has become widespread. According to the speaker, it could be because we write code like this:

// check each deadlined Plan. If it can't be done in time for its deadline, return the Plan
// that caused the problem; otherwise, return null.
private Plan checkPlans(List<Plan> deadlinedPlans) {
    Plan problemPlan = null;
    Iterator<Plan> planItr = deadlinedPlans.iterator();
    while (planItr.hasNext() && problemPlan == null) {
        Plan plan = planItr.next();
        problemPlan = checkPlan(plan);
    }
    return problemPlan
}

The above code iterates over a list of plans, calling the checkPlan method on each one of them, and if it returns a non-null stops the iteration and returns the problem plan.

Wouldn't it be cool if we could instead write this:

private Optional<Plan> checkPlans(List<Plan> deadlinedPlans) {
    return deadlinedPlans.stream().
        map(p -> checkPlan(p)).
        filter(p -> p != null).
        findFirst();
}

The above code is written in a functional style. The list of plans is put into a stream, each one of them is mapped to the result of calling checkPlan on it, the resulting stream is filtered by plans that are non-null, and finally the first one of them is found. Once you learn to read it, the second code is much nicer than the first one. And it could be executed in parallel with only little added effort.

The Magic Ingredient

The magic ingredient is something that allows us to take values to a higher order. Instead of supplying values to specific library methods, we're going to supply behavior to general library methods. Here, as usual in functional programming, behavior is regarded as a higher order kind of value. For example, let's compare the following Collection<E> methods:

boolean remove(Object o);
...
boolean removeIf(Predicate<? super E> p);

The second version is clearly more general, because we supply behavior instead of a value. Predicate is an interface with a single abstract boolean-valued method test, which is executed by removeIf on each element. Here, Predicate is a functional interface, and adding the new method removeIf to Collection is possible due to default methods, concepts that were already covered in my previous post.

In the past, to make an interface instance to use with the above code, we would use an anonymous inner class. However, this is not really nice. If we strip out the boilerplate, we are only left with the name of the argument and the body of the method, which we could supply via a lambda expression:

plans.removeIf(p -> p.equals(problemPlan));

If our magic ingredient is passing behavior instead of values, we would want to do it a lot, and there is a big difference between creating lots of anonymous inner classes, which are a rather ugly and unwieldy construct, and doing it in this neat and tidy way, which is much more encouraging for a new style of programming.

Better APIs - Pipes and Filters

Lambdas, an essentially simple thing, can make a really big difference. With collections, we are going to see a pattern known as Pipes and Filters used a lot more. This is a venerable Unix tool-building pattern which has also significant adoption in other areas:

ps -ef | grep login | cut -c 50- | head

This pattern has a lot of advantages, which carry over to Java collections as well:

  • no intermediate variables
  • less (or no) intermediate storage
  • lazy evaluation, which provides big efficiency gains
  • flexible tool-building, following the Unix philosophy of Write programs that do one thing well. Write programs to work together.

In Java, a pipeline such as this:

plans.stream().map(p -> checkPlan(p)).filter(p -> p != null).findFirst();

has a source, a number of intermediate operations (map, filter), and a terminal operation (findFirst), and the facility that carries the values from the source down the pipeline is called a stream (Stream). The map operation takes an instance of the new functional interface Mapper, the filter operation uses the already mentioned Predicate, and the terminal operations can also have lambdas embedded. Here, the idea of "writing programs that do one thing well" is well represented by both Mapper and Predicate, which are functional interfaces with a single method.

What exactly is a Stream? It is a sequence of values, which may be partially evaluated. Unlike collections, you don't actually know what it contains because it hasn't been generated yet. The execution of lazy operations such as map and filter sets the pipeline up:

Stream<Plan> planStream = plans.stream().map(p -> checkPlan(p)).filter(p -> p != null);

Streams are similar to iterators, which are another good example of lazy evaluation. When you create an iterator, no processing is going to happen until the next method is called. When the above pipeline is setup, all those multiple streams are kind of waiting to be kicked-off, but again no processing happens. What makes the processing happen is the execution of an eager operation such as findFirst, which pulls some or all of the data down the pipeline:

planStream.findFirst(); // stop after the first element

The advantage of lazy evaluation here is that we haven't actually done all the processing that would have produced all the other intermediate values, just exactly as much as needed for findFirst to return a result. Therefore, the above functional code is exactly as efficient as the original iteration-based code we started with.

Here are some more examples of operations that can be done on streams:

// get the plans longer than 15 minutes
planStream.filter(p -> p.getDuration().isLognerThan(minutes(15)));
// get the total duration of all the plans in a list
planStream.map(p -> p.getDuration()).reduce(Duration.ZERO, (d1, d2) -> d1.plus(d2));
// each plan belongs to a task; map each task to a collection of its plans
planStream.groupBy(p -> p.getTask());
// get the first five plans
planStream.limit(5);
// get all but the first five plans
planStream.skip(5);

These and other collection methods are summarized in the table below:

name returns interface lambda signature
filter Stream<T> lazy Predicate<T> T -> boolean
map Stream<U> lazy Mapper<T, U> T -> U
sorted Stream<T> lazy Comparator<T> (T, T) -> int
limit Stream<T> lazy n/a n/a
skip Stream<T> lazy n/a n/a
reduce T eager BinaryOperator<T> (T, T) -> T
findFirst T eager n/a n/a
groupBy Map<U, Collection<T>> eager Mapper<T, U> T -> U
forEach void eager Block<T> T -> void

Here, we are tapping into a big body of knowledge, mainly from functional programming. It gives us a different style of programming which is very wildly applicable, so collections code is going to look really different. The value proposition is that we are going to get much better APIs and our code will be easier to read and write, and more maintainable, once we are used to it. And more parallel-friendly as well, because we will be relying less on mutability.

Better APIs - More Finely-Grained APIs

Lambdas offer also another way of making APIs better. With methods that accept and return behavior, it's for example much easier to create custom comparators:

// method in class Comparators
Comparator<T> comparing(Mapper<T, U> m);
// usage
Comparator<Plan> byTask = Comparators.comparing(p -> getTask());
Comparator<Plan> byDuration = Comparators.comparing(p -> getDuration());

The above code isolates the sort key in the call to comparing, rather than extracting it in the comparison code itself, which is much nicer, and also makes it much easier to compose fine-grained methods together:

plans.sort(byTask.compose(byDuration));

Composing comparators allows using the first sort key as primary, and the second as secondary, in a very neat way. We could make it even neater by dispensing of the custom comparators in favor of method references, Plan::getTask and Plan::getDuration in this case.

More Parallelism

There are two kinds of iteration, external and internal. They, as well as their advantages and disadvantages, have been already explained in my previous post. In summary, what we really need is internal iteration, in which again we supply behavior to general library methods:

plans.forEach(p -> p.setSection(null));

What is important here is that the library, being in control of the iteration, can do really complicated things, such as parallelism, out-of-order execution, laziness, etc.

With streams, you can get a parallel stream by calling parallel:

plans.parallel().filter(...).map(...).reduce(...);

What happens here is that the execution breaks up the stream into chunks, submits them in a fork / join pool, and sets up one of the pipelines mentioned earlier for each one of them. The parallel processing can be further optimized on the VM level by loop fusion, which essentially boils down to executing each of the stages of the pipeline within one big loop.

The mantra is that parallelism should be "explicit, but non-obtrusive". You have to ask for it, but then it will work without you having to do anything more.

Conclusion

All these new concepts fit well together. According to Maurice, the Oracle language team has done a really good job in introducing a pretty good version of closures into a language that really wasn't adapted for them, without breaking backward compatibility.

In conclusion, although lambdas may seem like a small syntactic change, they are going to make a big difference in the style of our code, and they set the library writers free to innovate. They are also going to encourage a functional coding style, which in turn is going to encourage less mutability, resulting in more parallel-friendly code. And as the speaker pointed out, after Java 8 our Java code will become more "stylish and sophisticated" as the popular After Eight brand.

Fork / Join, lambda & parallel() : parallel computing made (too ?) easy

In the final session on this topic, Jose Paumard, an assistant professor at the Institut Galilee (Universite Paris 13) and a long time blogger at Java le soir talked about the caveats of lambda and parallel computing. The slides from this session are available here.

Introduction

With Java 8, we are going to have amazing tools to leverage the full power of parallel programming. However, there are also certain caveats and pitfalls we need to be aware about.

Java Toolbox for Parallelism

The Java toolbox for parallelism has evolved from nothing in JDK 5, through parallel arrays in JDK 6, and finally to the Fork / Join framework in JDK 7.

The Fork / Join framework handles tasks. A task is a chunk of data which is able to spawn two or more subtasks, with the division strategy coded in the task itself. The main task can later join is subtasks. There are two main approaches:

  • With dyadic recursive division, the task divides itself in two if it's too big, and each of the subtasks can in turn divide itself in two, thus spawning a number of tasks of a suitable size, which are then also recursively joined.
  • The task could be sliced by a for loop directly into a number of tasks of the right size, which are then joined all at once.

According to the speaker, the second strategy has proven to be much more efficient than the first one for the set of problems he has worked on, even though the Javadoc and many of the available resources are only describing the first one.

A very important caveat of Fork / Join is the Javadoc sentence "Computations should avoid synchronized methods or blocks". The consequence of this is that the task can't share its state, instead it must transmit part of that state to the subtask while spawning it. When working on arrays, often the portion of the array transmitted has to be copied, which means additional memory consumption and degraded performance.

The Parallel Arrays API didn't make it to the JDK, but is available as a special package, which is Java 6 compatible. Essentially, it brings the power of Fork / Join to Java 6.

Finally, Java 8 introduces the powerful new family of tools around lambdas already described in the previous two sessions. The mentioned method Collection.parallel() essentially implements in a nicer way the familiar concept of a task that can split itself.

Parallel Computing Caveats

Unfortunately, going parallel is not just a matter of calling parallel(). There are at least the following three problems which remain to be solved:

  • Ensuring that the computation is correct
  • Ensuring that parallelism actually brings performance gains
  • Using algorithms that are parallelizable

An example of the first caveat mentioned in a presentation from Guy Steele is that a reduce operation has to be associative, or the result of the computation will not be correct:

reduce(a, reduce(b, c)) = reduce(reduce(a, b), c)

For example, the operation a + b is associative, but (a + b)2 is not. If it's supplied as a lambda expression in a reduce operation, it will produce wrong results, and neither the compiler nor the JVM could help us detect that.

The second caveat is about performance, and one example to illustrate it is sorting. Sorting in parallel is tricky, since one should first divide the list in 2 or more sublists, then sort each list using quicksort, and finally merge the sorted lists with mergesort. With this approach, the comparison itself is not computationally intensive, but there is lots of data moving between the cores.

The following two tables contain the times for sorting an array of 16M and 32M elements with different packet sizes, using the two division strategies mentioned above, on a 8 core CPU.

16M / packet size For loop Divide by 2
4M 13 18
2M 12 20
1M 20 35

      
32M / packet size For loop Divide by 2
4M 13 47
2M 12 125
1M 20 211

Based on the above results, we could make the following conclusions:

  • The size of the packet should be "right", which depends on the total size and the number of processors.
  • The division strategy matters.

Interestingly, if we just do quicksort on the original array with no parallelism at all, the results are:

Not parallel
16M 18
32M 36

The above results prove the second caveat: writing parallel code can be tricky, and does not always lead to performance improvements.

The third caveat is an even worse case. To illustrate it, Jose used the travelling salesman problem. This problem is known to be NP-complete, so instead of using an exact algorithm, we would use a heuristic approach such as simulated annealing to solve it. Since it's quite computationally intensive, one could think that going parallel would increase the performance.

However, as the speaker demoed and explained, the simulated annealing algorithm that is guaranteed to find the right solution if executed sequentially either finds a wrong solution or doesn't finish at all when executed in parallel. The reason is very simple - the used algorithm is not parallelizable at all, as the theorems on which it is based don't hold in this case.

This brings us to the third caveat: some algorithms don't work anymore if they are parallelized. In the case of simulated annealing, it loses its convergence properties if computed in parallel.

Conclusion

In Java 8, parallelization is going to be easier than ever, and this is really great. But we would still need to ask ourselves the following 3 questions, and we are all alone to answer them:

  • Will the result be correct?
  • Will it make my computations faster?
  • Will it even work?

To be able to answer these questions, the speaker advised to spend more time learning how the CPUs work, how to program complex algorithms on them, and finally the parallel versions of the existing algorithms.

Finale

As I already mentioned in my previous post, from what I learned so far I find lambdas, the functional style, and the easier parallelism in Java 8 to be really wonderful. However, also the performance tests in Wordcounter confirmed that we should not take all the benefits for granted. Instead, as developers we should learn how to properly take advantage of these new opportunities. As usual in our craft, this means simply that more interesting challenges are yet to come.

Published on DZone by Stoyan Rachev (source).

Published at DZone with permission of its author, Stoyan Rachev.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)