How come that Groovy++ overperform Java?
This article continues my writings on statically typed groovy. More information on motivations and goals of the project can be found in my previous articles published on DZone.
To make a long story short "statically typed Groovy" (aka Static Groovy, aka Groovy Booster, aka Groovy++) is an atempt to combine performace and compile-time checks with the expressiveness of the Groovy programming language and to develop high-performant runtime libraries utilizing this combination.
I started the project as a research/hobby, but due to huge interest in the community my company MB Technologies decided to open source it and sponsor development.
Let me start with a short update of what happened since the last post:
- Public preview of statically typed Groovy is available now at Google Code
- There is discussion group at Google Groups
- The project is not open-sourced yet but announced to be open-sourced soon
Now let's go to our main subject.
We will deal with some non-trivial benchmarks and provide several different implementions. The purpose of our exercise will be to see how the power of expressive langauge and statically typed compilations play together.
The benchmark under investigation is counting word frequency of over 20000 text files of 4-20K each. The benchmark was initiated in blog posts of Zach Cox and continued by Lau B. Jensen.
All code below is available now at Google Code.
Let us start with a pure Groovy solution. I want to use the opportunity and express my gratitude to Paul King, who brought this story to Groovy++ discussion group, provided pure Groovy implementation below and helped a lot with benchmarking.
package org.mbte.groovypp.examples.wordcount
for (i in 0..<10) {
t1 = System.currentTimeMillis()
counts = [:]
new File("./20_newsgroups").eachFileRecurse{ f ->
if (!f.isDirectory() && !f.path.contains(".svn")) {
f.text.toLowerCase().eachMatch(/\w+/) { w ->
def c = counts[w] ?: 0
counts[w] = c + 1 } } }
new File("counts-descreasing-groovy").withWriter { out ->
counts.sort { a, b -> b.value <=> a.value }.each { k, v -> out << "$k\t$v\n" } }
new File("counts-alphabetical-groovy").withWriter { out ->
counts.sort { it.key }.each { k, v -> out << "$k\t$v\n" } }
println "Finished in ${System.currentTimeMillis() - t1} millis"
}
First of all, I think it's nice. Short and clear. Secondly, it does the job, which is always important.
The only problem is it is a little bit slow. On average it takes 51.1sec to process files. Here I should probably say that all benchmarks run on 2 core Mac Book Pro with "-Xmx512m -server". To calculate the average, I take 10 measurements done by the script, drop off highest and lowest number and average the rest.
Let us now see statically typed solution:
@Typed package org.mbte.groovypp.examples.wordcount
for (i in 0..<10) {
def t1 = System.currentTimeMillis()
def counts = [:]
new File("./20_newsgroups").eachFileRecurse{ f ->
if (!f.isDirectory() && !f.path.contains(".svn")) {
f.text.toLowerCase().eachMatch(/\w+/) { w ->
def c = counts[w] ?: 0
counts[w] = c + 1 } } }
new File("counts-descreasing-groovy").withWriter { out ->
counts.sort { a, b -> b.value <=> a.value }.each { k, v -> out << "$k\t$v\n" } }
new File("counts-alphabetical-groovy").withWriter { out ->
counts.sort { it.key }.each { k, v -> out << "$k\t$v\n" } }
println "Finished in ${System.currentTimeMillis() - t1} millis"
}
Truly speaking, the difference with previous script is minor. We just added @Typed annotation in the line 1 and def-keyword in lines 4 and 5.
Amazingly, it was enough to achieve 6 times performance gain. Yes, the statically types script averagely process all our files in 8.6 seconds
Now is a good time to try to utilize multiple cores we have. By the nature of the problem there are two groups of activities and each of them can be run in parallel - accumulation of statistic and generating reports. Let us see what can we do with Groovy++.
Here is the code.
@Typed package org.mbte.groovypp.examples.wordcount
import java.util.concurrent.*
import groovy.util.concurrent.*
for (i in 0..<10) {
def t1 = System.currentTimeMillis()
def pool = Executors.newFixedThreadPool(30)
def counts = new AtomicIntegerMap ()
new File("./20_newsgroups").recurseFileIterator().filter{ file ->
!file.directory && !file.path.contains(".svn")
}.each(pool,4) { file ->
file.charSequence.eachMatch(/\w+/) { String w -> w = w.toLowerCase()
counts[w].incrementAndGet ()
}
}.whenBound {
pool.execute {
new File("counts-descreasing-groovy").withWriter { Writer out ->
counts.asList().sort { a, b -> b.get() <=> a.get() }.each { e -> out << "$e.key\t${e.get()}\n" }
}
} {
new File("counts-alphabetical-groovy").withWriter { Writer out ->
counts.asList().sort { a, b -> b.key <=> a.key }.each { e -> out << "$e.key\t${e.get()}\n" }
}
}
pool.shutdown()
}
pool.awaitTermination(30,TimeUnit.SECONDS)
println "Finished in ${System.currentTimeMillis() - t1} millis"
}
Obviously, it is more lengthy. Fortunately, it is much faster - averagely it processes files in 5.5 seconds.
Here is brief explaination of tools used. We create a thread pool, start with recursive iterator over our files, apply filter to skip unneeded directories and files and then start concurrent iteration of filtered iterator using our thread pool. It is done with each(pool, 4){...} call, which returns immiditely something called BindLater.
BindLater is specialized and high performance version of java.util.concurrent.Future, which also supports adding listeners of calculation completion. We add such listener with whenBound{...} call. When calculation complete the listener starts concurrent execution of report generation. Voila.
Last important thing to notice is AtomicIntegerMap used for accumulation results. AtomicIntegerMap is specialized and high performant concurrent map of AtomicIntegers. It based on the same implementation, which in the past allowed us to noticably increase performance of normal Groovy runtime. Standard library of static Groovy also provide Atomic(Boolean|Long|Reference)Map, which are handy in many situations.
Obviously, there is more place for improvement. Two things, which come to mind immidiately is that we can probably precompile regular expression pattern instead of doing it for each file and we can separate reading of files from accumulating statistics by doing it also in parallel.
Now, we come to the original question of the article - why does Groovy++ overperform Java?
Truly speaking, I don't know. The best average I was able to achive (only once) with non-concurrent Java code was 10.5sec to process all files. I am sure that it has nothing to do with bytecode generation by javac but with a little bit different algorithm used by Java code. As this code wasn't written neither by me nor by Paul and none of us did any performance tuning of it the comparision itself is probably not very correct.
It will be very interesting if someone will be curios enough to do really fastest possible Java implementation.
But does it really matter? Even if statically typed groovy code performs exactly as slow as java one we have 16LOC instead of 65 and this 16LOC is concentrated minds, no boilerplate code.
Thank you for reading and give your own try to Groovy++!





Comments
Artur Biesiadowski replied on Tue, 2010/01/12 - 10:03am
I must say that opening/reading/closing 2500 files per second is already very impressive, without any word counting. With 10KB average per file, we are in 25000KB, which means 25MB per second of sustained input over possibly very fragmented (on disk) files, with 2500 open and close operations on top of that. Word counting is only topping on the cake.
I assume that everything is already in OS cache - I hardly believe possibility of doing 2500 accesses to disk per second. Can you redo the test WITHOUT word counting (just sum f.text.length() or something similar to force reading) and see what is the difference? There is a big chance that what you are measuring is just I/O performance, not word counting.
If this is the case, then static groovy would be a real benefit. To achieve 6 times speedup on the test which is mostly I/O bound, it would mean that actual work (word counting) has to be 100 or so times faster.
I'll try to play with static groovy at home and do the pure math benchmarks for comparison - especially my favorite sorting one, which turns out to be 300 times slower in groovy than in java (and calling Arrays.sort from groovy doesn't count as valid solution for benchmark of language...). If static groovy can get within only 2-3 times slower than java (which would mean 100+ times improvement from dynamic groovy), it would be a real killer.
Alex Tkachman replied on Tue, 2010/01/12 - 11:18am
in response to:
Artur Biesiadowski
Charles Oliver ... replied on Tue, 2010/01/12 - 11:27am
Alex Tkachman replied on Tue, 2010/01/12 - 11:46am
in response to:
Charles Oliver Nutter
Paul King replied on Tue, 2010/01/12 - 8:25pm
In Groovy trunk there is also a new
traverse()method which means if you like dense code the normal Groovy solution can be pruned to 7 lines:Paul King replied on Tue, 2010/01/12 - 8:09pm
Dapeng Liu replied on Tue, 2010/01/12 - 9:12pm
so confused ...
how is that possible Groovy runs faster than native java code, isn't Groovy built on top of JVM? in this case, both are calling the same API
21 sec vs 8 sec is huge ...
Paul King replied on Tue, 2010/01/12 - 11:32pm
in response to:
Dapeng Liu
Adam Rabung replied on Wed, 2010/01/13 - 3:26pm
I modified your Java version and went from 12.6 seconds down 7.7. With that ratio, it would have taken about 6.4 seconds on your machine.
The most significant change I made was using a HashMap rather than a TreeMap to count the occurrences of the words, and adding an additional sort when displaying by word count. I was a bit surprised that this change would dramatically help performance, but it did. In retrospect, that's the exact algorithm you chose in Groovy - [:] instantiates a LinkedHashMap - you could probably use just Hashmap instead and shave off some more time.
I wonder how much a true IntMap would reduce this further, as it would get rid of a lot of unboxing. A higher level language like static Groovy could potentially sneak one of these in behind the scenes at compile time. Scala added a @specialized annotation in 2.8 to reduce boxing overhead.
One other note: your Java version is actually cheating a little bit - it is case-sensitive, so it's skipping a lot of .toLowerCase calls.
Final note: I first tried this with 1.5, and the numbers were more like (12.2/19.8). Java 6 is an amazing piece of software.
Paul King replied on Wed, 2010/01/13 - 9:59pm
in response to:
Adam Rabung
Alex Tkachman replied on Thu, 2010/01/14 - 1:47pm
in response to:
Adam Rabung
Adam Rabung replied on Thu, 2010/01/14 - 3:30pm
http://gist.github.com/277457
Evgeny Goldin replied on Thu, 2010/01/14 - 6:56pm
Carla Brian replied on Sat, 2012/05/19 - 9:37am