Do it in Java 8: Automatic memoization

Sep. 30, 14 · Tutorial

Likes (19)

Comment

Save

68.2K Views

Memoization is a technique used to speed up functions. Memoization may be done manually. It may also be done automatically. We can find many examples of automatic memoization on Internet. In this article, I will show how Java 8 makes it very easy to memoize functions.

What is memoization

Memoization consist in caching the results of functions in order to speed them up when they are called several times with the same argument. The first call implies computing and storing the result in memory before returning it. Subsequent calls with the same parameter imply only fetching the previously stored value and returning it.

How to apply memoization

Memoization may be applied manually by hard coding it in every function that may benefit from it. If it takes a long time to compute the return value, memoization will speed up the program. For functions that take less time to evaluate than fetching the previously stored value from memory, memoization is clearly not a good option.

Hard coding memoization by hand in each function is not a good option neither because it is repeating the same principle again and again. That is why automatic memoization is desirable.

What to memoize

Memoization applies to functions. Prior to Java 8, Java had no functions. However, it was perfectly possible to define some. Furthermore, we used to create “functional” methods, that is methods taking an argument and returning a value based only upon this argument. These kind of method may benefit from memoization. By the way, there is a match between functional methods and functions. For example, the following method:

Integer doubleValue(Integer x) {
  return x * 2;
}

corresponds to:

Integer doubleValue(Integer x) {
  if (cache.containsKey(x)) {
    return cache.get(x);
  } else {
    Integer result = x * 2;
    cache.put(x, result) ;
    return result;
  }
}

In Java 8, we can make this much cleaner:

Map<Integer, Integer> cache = new ConcurrentHashMap<>();

Integer doubleValue(Integer x) {
  return cache.computeIfAbsent(x, y -> y * 2);
}

Our function may be modified to use the same technique:

Function<Integer, Integer> doubleValue = x -> cache.computeIfAbsent(x, y -> y * 2);

This is pretty simple, but it has two main drawbacks:

We have to repeat this modification for all functions.
The map we use is exposed and could potentially be modified by another thread having nothing to do with the function.

The second problem is quite easy to address. We may put the method or the function in a separate class, including the map, with private access. For example, for the method case:

public class Doubler {
  private static Map<Integer, Integer> cache = new ConcurrentHashMap<>();

  public static Integer doubleValue(Integer x) {
    return cache.computeIfAbsent(x, y -> y * 2);
  }
}

We may then instantiate that class and use it each time we want to compute a value:

Integer y = Doubler.doubleValue(x);

With this solution, the map is no longer accessible from outside.

We can't do the same for functions because functions are anonymous classes and such classes may not have static members. One possibility would be to pass the map to the function as an additional argument. This may be done through a closure:

class Doubler {
  private static Map<Integer, Integer> cache = new ConcurrentHashMap<>();

  public static Function<Integer, Integer> doubleValue = x->cache.computeIfAbsent(x, y -> y * 2);
}

We can use this function as follows:

Integer y = Doubler.doubleValue.apply(x);

This gives no advantage compared to the “method” solution. However, we may also use this function in more idiomatic examples, such as:

IntStream.range(1, 10).boxed().map(Doubler.doubleValue);

This is equivalent to using the method version with the following syntax:

IntStream.range(1, 10).boxed().map(ThisClass::doubleValue);

The main problem is that while solving the second issue, we have made the first one more acute, which make automatic memoization more desirable.

Automatic memoization: requirements

What we need is a way to do the following:

Function<Integer, Integer> f = x -> x * 2;
Function<Integer, Integer> g = Memoizer.memoize(f);

so that we may use the memoized function as a drop in replacement for the original one. All values returned by function g will be calculated through the original function f the first time, and returned from the cache for all subsequent accesses. By contrast, if we create a third function:

Function<Integer, Integer> f = x -> x * 2;
Function<Integer, Integer> g = Memoizer.memoize(f);
Function<Integer, Integer> h = Memoizer.memoize(f);

the values cached by g will not be returned by h. In other words, g and h will use separate caches.

Implementation

The Memoizer class is quite simple:

public class Memoizer<T, U> {

  private final Map<T, U> cache = new ConcurrentHashMap<>();

  private Memoizer() {}

  private Function<T, U> doMemoize(final Function<T, U> function) {
    return input -> cache.computeIfAbsent(input, function::apply);
  }

  public static <T, U> Function<T, U> memoize(final Function<T, U> function) {
    return new Memoizer<T, U>().doMemoize(function);
  }
}

Using this class is also extremely simple:

Integer longCalculation(Integer x) {
  try {
    Thread.sleep(1_000);
  } catch (InterruptedException ignored) {
  }
  return x * 2;
}
Function<Integer, Integer> f = this::longCalculation;
Function<Integer, Integer> g = Memoizer.memoize(f);

public void automaticMemoizationExample() {
  long startTime = System.currentTimeMillis();
  Integer result1 = g.apply(1);
  long time1 = System.currentTimeMillis() - startTime;
  startTime = System.currentTimeMillis();
  Integer result2 = g.apply(1);
  long time2 = System.currentTimeMillis() - startTime;
  System.out.println(result1);
  System.out.println(result2);
  System.out.println(time1);
  System.out.println(time2);
}

Running the automaticMemoizationExample method will produce the following result:

We can now make memoized function out of ordinary ones by just calling a single method!

What about functions with several arguments?

Short answer: nothing. There are no such things in this world as functions with several arguments. Functions are applications of one set (the source set) to another set (the target set). So, they simply can't have several arguments.

But this does not solve our problem. What is the functional equivalent to a method with several arguments?

Long answer: what people generally consider as functions with several arguments are in fact either:

Functions of tuples
Function returning functions returning functions … returning a result

In either cases, we are only concerned with functions of one argument, so we can easily use our Memoizer class.

Using functions of tuples would probably be the simplest choice... if Java had tuples! We could of course write tuples. But to store tuples in maps, we would have to implement equals and hashcode for them, plus we would have to define tuples for two elements (pairs), tuple for three elements, and so on. Who knows where to stop?

The second option is much easier. It is based upon currying, which means applying each argument one after the other instead of applying them as a whole (the tuple).

Currying a function is very easy. The only problem, in Java 8, is that writing the types is really cumbersome.

Currying a “function of two arguments” (in fact a function of a pair) is easy once you master the type. Java has in fact a shortcut for functions of tuple2 which is called BiFunction. We will take this as an example. The two following functions are equivalent (from the result point of view):

BiFunction<Integer, Integer, Integer> h = (x, y) -> x + y;
Function<Integer, Function<Integer, Integer>> hc = x -> y -> x + y;

Not considering the types, there are very little differences. In the first case, the two arguments are put between parentheses, separated by a comma, which is, by the way, how tuples are written in most languages which have them!

Remove the parentheses and separate the arguments with an arrow and you get the curried version.

We can only regret that we have to write the type as:

Function<Integer, Function<Integer, Integer>>

when other languages use a simplified syntax such as:

Integer -> Integer -> Integer

From this, it is easy to memoized this curried version, although we cant use the same simple form as previously. We have to memoize each function:

Function<Integer, Function<Integer, Integer>> mhc = 
        Memoizer.memoize(x -> Memoizer.memoize(y -> x + y));

Same thing for a function of (a tuple of) 3 arguments (which by the way has no equivalent in Java):

Function<Integer, Function<Integer, Function<Integer, Integer>>> f3 =
         x -> y -> z -> x + y - z;
Function<Integer, Function<Integer, Function<Integer, Integer>>> f3m = 
        Memoizer.memoize(x -> Memoizer.memoize(y -> Memoizer.memoize(z -> x + y – z));

Here is an example of using this memoized function “of three arguments”:

Function<Integer, Function<Integer, Function<Integer, Integer>>> f3 =
      x -> y -> z -> longCalculation(x) + longCalculation(y) - longCalculation(z);
  Function<Integer, Function<Integer, Function<Integer, Integer>>> f3m =
      Memoizer.memoize(x -> Memoizer.memoize(y -> Memoizer.memoize(z ->
          longCalculation(x) + longCalculation(y) - longCalculation(z))));

  public void automaticMemoizationExample2() {
    long startTime = System.currentTimeMillis();
    Integer result1 = f3m.apply(2).apply(3).apply(4);
    long time1 = System.currentTimeMillis() - startTime;
    startTime = System.currentTimeMillis();
    Integer result2 = f3m.apply(2).apply(3).apply(4);
    long time2 = System.currentTimeMillis() - startTime;
    System.out.println(result1);
    System.out.println(result2);
    System.out.println(time1);
    System.out.println(time2);
  }

This example produces the following output:

showing that the first access to method longCalculation has taken 3000 milliseconds and the second has return immediately.

On the other hand, using a function of tuple may seem easier once you have the Tuple class defined. Here is an example of Tuple3:

public class Tuple3<T, U, V> {

  public final T _1;
  public final U _2;
  public final V _3;

  public Tuple3(T t, U u, V v) {
    _1 = Objects.requireNonNull(t);
    _2 = Objects.requireNonNull(u);
    _3 = Objects.requireNonNull(v);
  }

  @Override
  public boolean equals(Object o) {
    if (!(o instanceof Tuple3)) return false;
    else {
      Tuple3 that = (Tuple3) o;
      return _1.equals(that._1) && _2.equals(that._2) && _3.equals(that._3);
    }
  }

  @Override
  public int hashCode() {
    return _1.hashCode() + _2.hashCode() + _3.hashCode();
  }
}

Using this class, we may rewrite the previous example as:

Function<Tuple3<Integer, Integer, Integer>, Integer> ft = x -> longCalculation(x._1) + longCalculation(x._2) - longCalculation(x._3);
Function<Tuple3<Integer, Integer, Integer>, Integer> ftm = Memoizer.memoize(ft);

public void automaticMemoizationExample3() {
  long startTime = System.currentTimeMillis();
  Integer result1 = ftm.apply(new Tuple3<>(2, 3, 4));
  long time1 = System.currentTimeMillis() - startTime;
  startTime = System.currentTimeMillis();
  Integer result2 = ftm.apply(new Tuple3<>(2, 3, 4));
  long time2 = System.currentTimeMillis() - startTime;
  System.out.println(result1);
  System.out.println(result2);
  System.out.println(time1);
  System.out.println(time2);
}

Conclusion

Memoizing is about maintaining state between function calls. A memoized function is a function which behavior is dependent upon the current state. However, it will always return the same value for the same argument. Only the time needed to return the value will be different. So the memoized function is still a pure function if the original function is pure.

However, there is a kind of function that may pose a problem: recursive functions that call themselves several times with the same argument may not be memoized this way. This will be addressed in a next article.

Memoization Tuple

Opinions expressed by DZone contributors are their own.

Related

Trending