Performance Zone is brought to you in partnership with:

Pierre-Yves Saumont has managed his own company for 15 years, specializing in natural language processing. At the same time, he has been the publishing manager for the French subsidiary of a leading American computer books publisher. He wrote about 30 books about computers and software development in Java. Since 2008, he works for Alcatel-Lucent Submarine Networks as an R&D Software engineer, and he is responsible for the architecture of several projects such as distributed application framework, functional framework and functional DSML (Domain Specific Modeling Language). He is also a double bass Jazz player. Pierre-yves has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

What's wrong in Java 8, part VII: Streams again

06.30.2014
| 5176 views |
  • submit to reddit

In part III of this series, I wrote about many issues with parallel streams. The main problem was that the most important feature advertised about Streams was automatic parallelization. And the fact is that, as spectacular as this feature may be, there are very little use cases for it in real business development. The counterpart is that much effort has been put in parallel streams, and not enough in streams, which are missing many important features for functional programming.

Streams use cases in functional programming

There are many important use cases for streams, and one is a replacement for lazy constructs such as for and while loops. But why should we need a replacement for loops?

One important principle in Java programming is that things defined in a scope are accessible in enclosed scopes (unless they are masked)

For example, in a method, all other methods and all fields declared in the enclosing class are accessible. In the same way, in a for loop, it is possible to access all class members, and all public static members of other classes.

In the following example:

for(int i = 0; i < 10; i++) {
  System.out.println(i);
}

we are accessing the println method, which is defined in another class, from inside the loop scope. In the following example:

//
//
List<Integer> list = new ArrayList<>();
for(int i = 0; i < 10; i++) {
  list.add(i);
}
list.forEach(System.out::println);

the list field is declared outside the loop (the enclosing scope) and is accessed from inside.

In this case, it is less of a problem because it only accesses the immediate enclosing scope. However, to a certain extend, it is worst since it mutates it.

The problem with accessing the enclosing scope is that the loop may not be reused anywhere. Functional programming brings a solution:

The functional equivalent for this code is:

IntStream.range(0, 10).forEach(System.out::println);

Here, the resource from the enclosing scope is passed as a parameter. This is much cleaner, and reduces the cyclomatic complexity of the code, making it much less error prone and much more easy to maintain. The “loop” equivalent has been abstracted.

The for each Java constructs has also a very simple functional equivalent. This iterative example:

//
//
List<Integer> integerList = Arrays.asList(1, 2, 3, 4, 5);
List<String> stringList = new ArrayList<>();
for (Integer value : integerList) {
  stringList.add(“Value = “ + value);
}

may be replaced with:

//
//
List<Integer> integerList = Arrays.asList(1, 2, 3, 4, 5);
integerList.map(x -> "Value = " + x);

Although these examples are stupid and useless, they show how functional programming with streams might remove many control structures from the code.

There are of course many other form of loops, for example:

//
//
for (int i = 1; i < limit; i += 2) {
  ...
}

This loop will iterate over all odd number under limit. And the following is a potentially infinite loop over even numbers:

for (int i = 0;; i += 2) {
  ...
}

Streams may be used to replace some of these constructs, making the code much cleaner. Unfortunately, two very important features are missing for this kind of use in Java 8: specifying the step, and limiting a stream with a predicate.

Limiting a stream with a predicate

Let's look at the following iterative example and see how we can rewrite it using the functional paradigm:

//
//
public static boolean isPrime(final int n) {
  if (n < 2) return true;
  for(int i = 2; i * i <= n; i++) {
    if(n % i == 0) {
      return false;
    }
  }
  return true;
}

This (very inefficient) code determines if an integer is a prime number.

This code may be rewritten using a functional approach:

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
  IntStream.range(2, (int) Math.sqrt(n) + 1)
      .filter(x -> x % 2 != 0 && n % x == 0)
      .count() != 0);

This implementation gives the same result. It may however be optimized:

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
  IntStream.range(2, (int) Math.sqrt(n) + 1)
      .filter(x -> x % 2 != 0 && n % x == 0)
      .findAny()
      .isPresent());

Replacing .count() with .findAny().isPresent() allows the same code to run in 663 ms instead of 2 760ms for finding all primes between 1 and 1 000 000. This is much better, due to the fact that findAny() will stop the stream evaluation as soon as a value is found, as opposed to count() which needs to evaluate the total stream before returning a value. An even faster solution is to use anyMatch():

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
  IntStream.range(2, (int) Math.sqrt(n) + 1)
      .filter(x -> x % 2 != 0 && n % x == 0)
      .anyMatch(x -> true);

This is about 10% faster, but it is less elegant since .findAny().isPresent() better express that the stream contains at least one element.

Note that we are having a problem with the way we limit the stream. In the iterative solution, the limit is the predicate:

i * i <= n

In many functional languages, we would limit the stream with something like:

//
//
public static IntPredicate isPrime4 = n -> n < 4 || !(n % 2 == 0 ||
  IntStream.range(2, n)
      .takeWhile(x -> x * x <= n)
      .filter(x -> x % 2 != 0 && n % x == 0)
      .findAny()
      .isPresent());

But this is not possible in Java 8, because IntStream has no takeWhile nor equivalent method. We must then use an ugly trick using the square root of n.

Another solution would be to use the iterate method to create the stream:

//
IntStream.iterate(2, x -> x + 1)

but this does not work, since we may only limit the stream by specifying the length (i.e. the number of elements to take before stopping) and not a predicate. In particular, filter() will not work because although it will discard elements, it will not stop evaluation. To see the result, just try:

//
//
IntStream.iterate(2, x -> x + 1).filter(x -> x < 10).forEach(System.out::println);

This will print the numbers from 1 to 9, then stop for a while, then print:

-2147483648
-2147483647
-2147483646
-2147483645
-2147483644
-2147483643
-2147483642
-2147483641
    ...

This is because the stream is infinite and the value produced by the function will eventually overflow and give a negative result which will satisfy the predicate. Using the following condition:

//
//
IntStream.iterate(2, x -> x + 1).filter(x -> x >= 0 && x < 10).forEach(System.out::println);

will not solve the problem.

(We might used the Math.addExact method, but this would throw an exception instead of overflowing, which is better, but not what we need.)

We really miss a takeWhile method here. We can simulate it with the following code:

// This code is extracted from the
// IntStream class and slighty modified
public static IntStream iterate(final int seed, final IntUnaryOperator f, IntPredicate p) { // change here
  Objects.requireNonNull(f);
  final PrimitiveIterator.OfInt iterator = new PrimitiveIterator.OfInt() {
    int t = seed;

    @Override
    public boolean hasNext() {
      return p.test(t); // change here
    }

    @Override
    public int nextInt() {
      int v = t;
      t = f.applyAsInt(t);
      return v;
    }
  };
  return StreamSupport.intStream(Spliterators.spliteratorUnknownSize(
      iterator,
      Spliterator.ORDERED | Spliterator.IMMUTABLE | Spliterator.NONNULL), false);
}

This code may be used as follows:

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
  iterate(2, x -> x + 1, x-> x * x <= n)
      .filter(x -> x % 2 != 0 && n % x == 0)
      .findAny()
      .isPresent());

Obviously, this possibility should have been included either in the construction of the stream, or as a takeWhile method. However, this might have caused problems with parallel streams, and this may be why it has not been done. If this is the reason, it is really a bad choice.

Specifying a step

The iterative version of the prime finder may be optimize as follows:

//
//
public static boolean isPrime(final int n) {
  if (n < 4) return true;
  if (n % 2 == 0) return false;
  for(int i = 3; i * i <= n; i += 2) {
    if(n % i==0) {
      return false;
    }
  }
  return true;
}

The optimization consist in using a step of 2, since we only want to test odd numbers. Is this possible with the functional idiom? We can't use:

//
IntStream.iterate(2, x -> x + 2)

Since we would not be able to limit the stream with a predicate, and we do not know the length of the stream, so we cannot use limit().

We could however use the iterate method we have created:

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
  iterate(3, x -> x + 2, x-> x * x <= n)
       .filter(x -> n % x == 0)
       .findAny()
       .isPresent());

Another solution solution would be the following:

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
  IntStream.range(1, (int) Math.sqrt(n) / 2 + 1)
      .map(x -> x * 2 + 1)
      .filter(x -> n % x == 0)
      .findAny()
      .isPresent());

This version creates a stream with a step of one and then maps it with the function x -> x * 2 + 1.

We could also create a stream with a step of 1 and then filter out even values:

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
  IntStream.range(2, (int) Math.sqrt(n) + 1)
      .filter(x -> x % 2 != 0 && n % x == 0)
      .findAny()
      .isPresent());

Or we could use a modified (again) version of the RangeIntSpliterator class:

//
//
static final class RangeIntSpliterator implements Spliterator.OfInt {

  private int from;
  private final int upTo;
  private final int step;
  private int last;

  RangeIntSpliterator(int from, int upTo, int step, boolean closed) { // change
    this(from, upTo, step, closed ? 1 : 0); // change
  }

  private RangeIntSpliterator(int from, int upTo, int step, int last) { // change
    this.from = from;
    this.upTo = upTo;
    this.step = Math.max(1, step); // added
    this.last = last;
  }

  @Override
  public boolean tryAdvance(IntConsumer consumer) {
    Objects.requireNonNull(consumer);

    final int i = from;
    if (upTo - i >= step) {
      from += step; // change
      consumer.accept(i);
      return true;
    }
    else if (last > 0) {
      last = 0;
      consumer.accept(i);
      return true;
    }
    return false;
  }

  @Override
  public void forEachRemaining(IntConsumer consumer) {
    Objects.requireNonNull(consumer);

    int i = from;
    final int hUpTo = upTo;
    int hLast = last;
    from = upTo;
    last = 0;
    while (i < hUpTo) {
      consumer.accept(i);
      i += step; // change
    }
    if (hLast > 0) {
      // Last element of closed range
      consumer.accept(i);
    }
  }

  @Override
  public long estimateSize() {
    // Ensure ranges of size > Integer.MAX_VALUE report the correct size
    return (((long) upTo) - from + last) / step;
  }

  @Override
  public int characteristics() {
    return Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED |
        Spliterator.IMMUTABLE | Spliterator.NONNULL |
        Spliterator.DISTINCT | Spliterator.SORTED;
  }

  @Override
  public Comparator<? super Integer> getComparator() {
    return null;
  }

  @Override
  public Spliterator.OfInt trySplit() {
    long size = estimateSize();
    return size <= 1
        ? null
        // Left split always has a half-open range
        : new RangeIntSpliterator(from, from = from + splitPoint(size), step, 0);
  }

  private static final int BALANCED_SPLIT_THRESHOLD = 1 << 24;

  private static final int RIGHT_BALANCED_SPLIT_RATIO = 1 << 3;

  private int splitPoint(long size) {
    int d = (size < BALANCED_SPLIT_THRESHOLD) ? 2 : RIGHT_BALANCED_SPLIT_RATIO;
    return (int) (size / d);
  }
}

This class may be used with the following method (extracted from the IntStream class and slightly modified:

//
//
public static IntStream rangeStep(int startInclusive, int endExclusive, int step) { // change
  if (startInclusive >= endExclusive) {
    return IntStream.empty();
  } else {
  return StreamSupport.intStream(
    new RangeIntSpliterator(startInclusive, endExclusive, step, false), false); // change
  }
}

It may then be used as:

//
//
public static IntPredicate isPrime = n -> n < 4 || !(n % 2 == 0 ||
    rangeStep(3, (int) Math.sqrt(n) + 2, 2)
         .filter(x -> n % x == 0)
         .findAny()
         .isPresent());

Benchmark

Functional versions are in any case slower than imperative ones. Here are the performance of each solution for finding the 78 499 prime numbers in the range 1 to 1 000 000. All tests have been run several times before measuring time, in order to warm the compiler. (This is very important. Not doing so may give funny results.)

//
//
Non optimized iterative: 78499 primes in 307 ms.
Optimized iterative: 78499 primes in 160 ms.
Functional with step 1 and mapping with x -> x * 2 + 1: 78499 primes in 763 ms.
Functional with step 2 limited by a predicate: 78499 primes in 746 ms.
Functional idem, tested with anyMatch: 78499 primes in 653 ms.
Functional with step 1 and filtering even values: 78499 primes in 1186 ms.
Functional with iterate, step 2 and limit with a predicate: 78499 primes in 760 ms.

Conclusion

What this shows is that iterative versions always out perform functional ones, and all functional versions are pretty equivalent.

The advantage of functional versions are a very low cyclomatic complexity and the fact that the code shows what is intended rather than how it is implemented (since the real implementation is hidden in the libraries used).

What is missing (beside better performance) is a way to even better express intentions. In other words, for the prime example, we should be able to write:

//
//
public static IntPredicate isPrime = n -> n < 4 || (n % 2 != 0 &&
  IntStream.iterate(3, x -> x + 2, x-> x * x <= n)
      .filter(x -> n % x == 0)
      .isEmpty());

or:

//
//
public static IntPredicate isPrime2 = n -> n < 4 || (n % 2 != 0 &&
  IntStream.range(3, (int) Math.sqrt(n) + 2, 2)
      .filter(x -> n % x == 0)
      .isEmpty());

There are also other very useful methods that are missing in the Stream class. One is zip (and the opposite unzip) which takes two streams and return a stream of tuples (but tuples are missing too in Java 8!).

Previous articles

What's Wrong in Java 8, Part I: Currying vs Closures

What's Wrong in Java 8, Part II: Functions & Primitives

What's Wrong in Java 8, Part III: Streams & Parallel Streams

What's Wrong in Java 8, Part IV: Monads

What's Wrong in Java 8, Part V: Tuples

What's wrong in Java 8, part VI: Strictness

Published at DZone with permission of its author, Pierre-yves Saumont.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Honey Monster replied on Mon, 2014/06/30 - 10:33am

I am really enjoying this series. Please keep it coming!

I am curious: You mention that the lack of "takeWhile" may be down to performance considerations for parallel streams. As you are probably aware, C# LINQ includes a "TakeWhile" method. Are you aware of any negative impact performance of parallel sequences in the .NET implementation?

Pierre-yves Saumont replied on Mon, 2014/06/30 - 3:43pm in response to: Honey Monster

Thank you for your kind comment!

I don't know about any specific negative impact performance in .Net implementation. What I wanted to say is that functions operating on ordered streams are more difficult to parallelize. Or at least, they do not benefit from the same techniques as functions that do not depend on order.

For example, findFirst does not benefit from automatic parallelization as it is implemented in Java 8. That is why there is also a findAny method. (Of course, findAny may not always be used as a replacement for findFirst, but when it may, it gives much better performance.)

takeWhile is somewhat like findFirst, except that it keeps all preceding values instead of returning only one value, so the findAny approach will not work. It is however possible to benefit from parallelization, but it is slightly more complex to collate the result, and it does not work well with the Java 8 approach.

As I show in the article, it is very easy to implement takeWhile on a sequential stream. It is much more difficult on a parallel stream. My guess is that it was not implemented in Java 8 because it would have stress some weakness of the parallel stream implementation.

Valery Silaev replied on Sun, 2014/07/06 - 5:22pm

Really interesting and educational artice! And the complete column!

Do you plan to share your opinion about Java version of Promise monad - CompletableFuture? From what I've tested so far it's another good target for your critique:

1. No way to wrap Future into CompletableFuture

2. Async NIO is not updated for CompletableFuture, combined with the first limitation this totally left async IO apart from new async programming enhancements.

3. It's hard to use CompletableFuture with frameworks that store security/transactional/etc context in ThreadLocals  - the thread that starts async operation and the thread that handles completion are rarely the same, and I see no way how to propagate this context from initiating thread to completion handler thread in a general way...

It would be really interesting to hear what you think about these shortcomings and your own findings about the subject.

Thank you again for your valuable posts.

Pierre-yves Saumont replied on Tue, 2014/07/08 - 7:44am in response to: Valery Silaev

 Hi Valery,

Thank you for your kind comment. I am currently rewriting in Java a Scala prototype application based upon asynchronous streams. (Java is a requirement, not a personal choice.) Using CompletableFuture may be an option. Depending upon the results of this experience, I will perhaps write something about this subject. There are however lot of other subjects to write about, such as the lack of value types, the lack of immutable collections, or the lack of recursion!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.