Jim has posted 16 posts at DZone. View Full User Profile

Comparing JVMs on ARM/Linux

02.15.2012
| 4337 views |
  • submit to reddit

For quite some time, Java Standard Edition releases have included both client and server bytecode compilers (referred to as c1 and c2 respectively), whereas Java SE-Embedded binaries only contained the client c1 compiler.  The rationale for excluding c2 stems from the fact that (1) eliminating optional components saves space, where in the embedded world, space is at a premium, and (2) embedded platforms were not given serious consideration for handling server-like workloads.  But all that is about to change.  In anticipation of the ARM processor's legitimate entrance into the server market (see Calxeda), Oracle has, with the latest update of Java SE-Embedded (7u2), made the c2 compiler available for ARMv7/Linux platforms, further enhancing performance for a large class of traditional server applications. 

These two compilers go about their business in different ways.  Of the two, c1 is a lighter optimizing compiler, but has faster start up.  It delivers excellent performance and as the default bytecode compiler, works extremely well in almost all situations.  Compared to c1, c2 is the more aggressive optimizer and is suited for long-lived java processes.  Although slower at start up, it can be shown to achieve better performance over time.  As a case in point, take a look at the graph that follows.

 

One of the most popular Java-based applications, Apache Tomcat, was installed on an ARMv7/Linux device.   The chart shows the relative performance, as defined by mean HTTP request time, of the Tomcat server run with the c1 client compiler (red line) and the c2 server compiler (blue line).  The HTTP request load was generated by an external system on a dedicated network utilizing the ab (Apache Bench) program.  The closer the response time is to zero the better, you can see that for the initial run of 25,000 HTTP requests, the c1 compiler produces faster average response times than c2.  It takes time for the c2 compiler to "warm up", but once the threshold of 50,000 or so requests is met, the c2 compiler performance is superior to c1.  At 250,000 HTTP requests, mean response time for the c2-based Tomcat server instance is 14% faster than its c1 counterpart.

It is important to realize that c2 assumes, and indeed requires more resources (i.e. memory).  Our sample device with 1GB RAM, was more than adequate for these rounds of tests.  Of course your mileage may vary, but if you have the right hardware and the right workload, give c2 a further look.

While discussing these results with a few of my compadres, it was suggested that OpenJDK and some of its variants be included in on this comparison.  The following chart shows mean http request times for 6 different configurations:

  1. Java SE Embedded 7u2 c1 Client Compiler
  2. Java SE Embedded 7u2 c2 Server Compiler
  3. OpenJDK Zero VM (build 20.0-b12, mixed mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  4. JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  5. CACAO (build 1.1.0pre2, compiled mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  6. Interpreter only: OpenJDK Zero VM (build 20.0-b12, interpreted mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)

 

Results remain pretty much unchanged, so only the first 4 runs (25K-100K requests) are shown.  As can be seen, The Java SE-E VMs are on the order of 3-5x faster than their OpenJDK counterparts irrespective of the bytecode compiler chosen.  One additional promising VM called shark was not included in these tests because, although it built from source successfully, it failed to run Apache Tomcat.  In defense of shark, the ARM version may still be in development (i.e. non-stable) mode.

Creating a really fast virtual machine is hard work and takes a lot of time to perfect.  Considering the resources expended by Oracle (and formerly Sun), it is no surprise that the commercial Java SE VMs are excellent performers.  But the extent to which they outperform their OpenJDK counterparts is surprising.  It would be no shock if someone in the know could demonstrate better OpenJDK results.  But herein lies one considerable problem:  it is an exercise in patience and perseverance just to locate and build a proper OpenJDK platform suitable for a particular CPU/Linux configuration.  No offense would be taken if corrections were presented, and a straightforward mechanism to support these OpenJDK builds were provided.

 

From https://blogs.oracle.com/jtc/entry/comparing_jvms_on_arm_linux

Published at DZone with permission of its author, Jim Connors.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Goel Yatendra replied on Thu, 2012/03/15 - 4:02pm

When testing JVM performance on ARM its important to remember that the default optimization settings used by the compilers to build the JVM do matter.

The Debian 6.0.4 squeeze "armel" distribution use ARMv4t optimization by default. This low optimization level enable the Debian built packages run on as many kind of different ARM broads and CPU's as possible. The trade-off are that you basically disable all VFP, floating point, optimizations and make synchronization code slower by forcing the JVM to call the Linux kernel helper instead of using faster ARMv7 atomic instructions directly.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.