Václav is a programming enthusiast who's constantly seeking ways to make development more effective and enjoyable. He's particularly interested in server-side Java technologies, distributed systems, concurrency, agile methodologies, modern programming languages and DSLs.
He works for JetBrains as a senior software developer and a technology evangelist. He is also a board member of the JetBrains Academy. On the side, he's leading the GPars project, an opensource concurrency library, and investigates the domains of neural networks, evolutionary programming and data mining. You can check out his blog or follow him on twitter.
[dzone] Václav is a DZone Zone Leader and has posted 45 posts at DZone. View Full User Profile

Memoize Groovy functions with GPars

06.25.2010
| 8729 views |
  • submit to reddit

Once again I looked over the fence into the Clojure garden for inspiration and came back attracted by the memoize function, which, simply put, enables caching of function's return values. You call a function once - it will do the usual calculation. You call the function a second time - if the same argument values are used, the return value is retrieved from the internal transparent cache without starting the actual calculation. And so you get your result faster. You're trading-off memory for performance.

To see a concrete use-case, checkout out this brief example in Groovy using the experimental GPars memoize implementation:

GParsPool.withPool {
def urls = ['http://www.dzone.com', 'http://www.theserverside.com', 'http://www.infoq.com']
Closure download = {url ->
println "Downloading $url"
url.toURL().text.toUpperCase()
}
Closure cachingDownload = download.memoize()

println 'Groovy sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GROOVY')}
println 'Grails sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRAILS')}
println 'Griffon sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRIFFON')}
println 'Gradle sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRADLE')}
println 'Concurrency sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('CONCURRENCY')}
println 'GPars sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GPARS')}
}

Notice closures are enhanced inside the GParsPool.withPool() blocks with a memoize() function, which returns a new closure wrapping the original closure with a cache.
In the example we're calling the cachingDownload function in several places in the code, however, each unique url gets downloaded only once - the first time it is needed. The values are then cached and available for subsequent calls. And also to all threads, no matter which thread originally came first with a download request for the particular url and had to handle the actual calculation/download.
So, to wrap up, memoize shields a function by a cache of past return values.

No matter how simple the principle looks at first glance, read on to see pretty exciting consequences.

Taking off

Chances are pretty high you, being developers, have seen any of the many variants of Fibonacci number generators before. Here I show the least verbose and least performant, purely recursive, implementation:

Closure fib = {n -> n > 1 ? call(n - 1) + call(n - 2) : n}

It's not only inefficient, it's incredibly inefficient, exponentially, I would say. Try running the function to give you the 40th or so Fibonacci number and you're done for the day.

Now is the time, when your brain is about to start thinking about ways to optimize the algorithm to turn it from its unacceptable exponential complexity into a linear one. You may come up with solutions revolving around turning recursion into a cycle, bubling the n-1 and n-2 values up or keeping track of smaller so far calculated Fibonacci numbers somewhere for reuse. These are all good approaches, in which, in a sence, you're going to trade-off memory for peformance. But wait a moment, haven't I say earlier that trading-off memory for performance is something that memoize can do?

Check out the modified version of my Fibonacci generator, this time with linear complexity:

Closure fib
fib = {n -> n > 1 ? fib(n - 1) + fib(n - 2) : n}.memoize()

Pick any number and you get the result almost instantly. And subsequent calls to the fib function will leverage the values calculated by the earlier calls. The only change? The fib function is now memoized, meaning that it keeps a transparent cache of so far calculated values. The memoize concept allowed me to optimize my code without having to give up the beauty of the original recursive function.
Now, I'm sure the functional guys out there may laugh out loud at me, since this is something they've been doing for ages, but my functional-flavoured-imperative-mind got all excited seeing the elegance of the concept.

Cut down on fat

Since some may argue that we're usurping a bit too much memory in our calculation remembering all Fibonacci numbers up to the argument value, in the last optimization we'll make an attempt to be nice and only grab as much memory as absolutely necessary - enough to remember two Fibonacci numbers, which is the minimum required to avoid repetitive calculations:

Closure fib
fib = {n -> n > 1 ? fib(n - 1) + fib(n - 2) : n}.memoizeAtMost(2)

Well, that's it - linear time complexity, using only two extra memory positions to store intermediate values. I would have hard time trying to write a considerably more CPU and memory efficient imperative algorithm to compete with this clean recursive functional code.

Conclusion

Although I might be a bit of an exception, the functional aspect of Groovy appeals to me quite a lot. Closures, the ability to define partially defined functions, immutable data structures or memoized functions all belong to the family of exciting concepts in the language.

The good practical news - If you want to experiment yourself with memoize in Groovy, check out the GPars memoize samples, grab the recent GPars 0.11 snapshot and don't forget to send us your feedback. I'm especially curious to hear ideas on domains or algorithms where memoize could be helpful.

 

From http://www.jroller.com/vaclav/entry/memoize_groovy_functions_with_gpars

Published at DZone with permission of its author, Václav Pech.