A passionate professional in areas of java performance, distributed systems and in-memory-data-grids. For the last 5 years I was working on various performance critical java systems (usually involving data grids) in areas of finance, telecom and ecommerce. See http://blog.ragozin.info for a list of my articles. Alexey is a DZone MVB and is not an employee of DZone and has posted 4 posts at DZone. You can read more from them at their website. View Full User Profile

JRockit GC in Action

07.12.2011
| 20495 views |
  • submit to reddit

In this article I would like to elaborate on the garbage collection specifics of Oracle's JRockit JVM. Recently JRockit has been made free for use and many people may consider using it instead of another widely popular Oracle JVM - HotSpot (former Sun's JVM). I have more experience with HotSpot JVM, so my opinion may be biased a little, but I will try to stick to the facts as much as I can.

Disclaimer: This article expresses my personal opinion based on my practical experience with JRockit and HotSpot JVMs. My experience is limited by few use cases. Conclusions from this article may not be valid for other use cases. I'm not pretending that I have completed comprehensive research of JRockit's GC behavior.

JRockit garbage collection algorithms

JRockit uses mark-sweep-compact (MSC) as its base garbage collection algorithm, though it allows a lot of tweaking. The JVM command line option -Xgc: allows to choose variations of MSC algorithm. The following variations are available in JRockit R28:

-Xgc: option

Generational

Mark

Sweep

genconcon or gencon

Yes

concurrent

concurrent

singleconcon or singlecon

No

concurrent

concurrent

genconpar

Yes

concurrent

parallel

singleconpar

No

concurrent

parallel

genparpar or genpar

Yes

parallel

parallel

singleparpar or singlepar

No

parallel

parallel

genparcon

Yes

parallel

concurrent

singleparcon

No

parallel

concurrent

There are also special values for -Xgc: (prior to R28 -XgcPrio: was used for this) options which instruct the JVM to use heuristics to choose best algorithm in run-time (unlike HotSpot JRockit can switch algorithms while the JVM is running, though documentation says that R28 is likely to stick with one algorithm).

  • -Xgc:throughput - best throughput,
  • -Xgc:pausegen - minimal pauses,
  • -Xgc:deterministic - minimal pauses, stable pause time.
I personally found these heuristics quite useless though. In practice, the JVM tends to choose singlecon strategy for low pause target, which IMHO critically lacks throughput for server type applications.

Generational vs single space

In my previous article, I explained the idea behind generational garbage collection. The generational approach assumes that space is divided into young and old space, each of which are collected by different algorithms (young space employs copy collector, while old space more sophisticated mark-sweep-compact). Keeping young and old space separate, requires the JVM to implement some kind of barrier to track old-to-young references. In generational mode JRockit uses a card marking barrier similar to one in HotSpot's CMS and throughput collectors (HotSpot's G1 is the only mainstream collector using other type of barrier). Unlike HotSpot which is always using generational approach, JRockit can operate in single space mode. Single space mode means:

  • no young collection pauses,
  • no write barrier unless it is needed for old space collector,
  • more frequent old collection pauses,
  • orders of magnitude worse throughput compared to generational collector.
To be honest, I have never worked with application which could benefit from single space collector. Though I cannot deny the possibility of their existence.

Parallel vs concurrent

Parallel collectors require stop-the-world pause for the whole duration of major collection phases (mark or sweep), but employ all available cores to compress pause time. Parallel collectors usually have better throughput, but they are not a good fit for pause critical applications. Concurrent collector tries to do most work concurrently (though it also does it in parallel on multi-core systems), stopping the application only for short duration. The concurrent collection algorithm in JRockit is fairly different from both HotSpot's concurrent collectors (CMS and G1). I will explain how it works in details later in this article.

A Few differences between JRockit and HotSpot

Heap geometry

The HotSpot JVM has fixed heap geometry, in particular young, tenured and permanent spaces have fixed address ranges during the JVM life time (though physical memory may be partially committed). On the contrary, JRockit has single heap space. If a generational collection algorithm is used, some part of this space will be used for nursery (young space), though nursery in JRockit is not necessary continuous. The same is true for keep space (equivalent of survivor space in HotSpot). Both nursery and keep space may drift in the heap during JRockit's JVM life time. JRockit has no analog of HotSpot's permanent space.

Aging of objects in young space

HotSpot keeps the exact age (in terms of survived collection) associated with each object in young space. Using this knowledge HotSpot can keep an object in young space through several collections, which is an effective way to fight medium-aged garbage without increasing young space (though increasing young space is usually better in terms of performance). JRockit does not keep objects age, so objects are always promoted on second collection (first collection will move object to keep area, and second to old space).

Thread local allocation blocks and pauses

Both HotSpot and JRockit use TLABs (thread local allocation blocks) for fast object allocation. TLABs are allocated in young space/nursery (in single space mode, JRockit allocates TLABs in old space). Threads allocate new objects in their TLABs, and when a TLAB gets full, the thread requests a new one from the memory manager.

In HotSpot, all TLABs are recycled during young space collection (which is usually triggered by particular thread requesting TLAB block). JRockit is different, failure to allocate a new TLAB will trigger young GC, but it is not guaranteed that GC will start immediately or that GC will free enough continuous memory for TLAB. In later cases, the thread will be blocked waiting for TLAB while the JVM technically is not in stop-the-world pause. In other words, JRockit has two types of application pauses: stop-the-world pauses and TLAB wait pauses (affecting individual threads). From an application point of view, thread pauses are as bad as stop-the-worlds ones. It is impossible to guarantee service of application if random threads are blocked. The JVM may not fairly report TLAB wait pauses, so it is possible that application will experience pauses not reported by GC logs.

Concerning TLAB sizes, HotSpot is more aggressive for growing TLABs compared to JRockit. JRockit is more conservative because TLABs may survive several young collections. HotSpot recycles all TLABs in each young collection, so large TLABs are not going to be wasted if the thread would stop actively allocating objects.

JRockit's concurrent collector

JRockit has a very sophisticated concurrent collector. It is a variation of mark-sweep-compact algorithm. During mark phase, collector is traversing object graph marking all reachable objects. During sweep phase, whole heap is scanned and space from non-marked objects is reclaimed. Compact phase relocate objects in heap, fighting with fragmentation. JRockit can execute mark and sweep phases in concurrent mode. Concurrent implementation of mark phase requires breaking it into 3 sub phases:

  • initial mark - stop-the-world pause to collect root references,
  • concurrent marking - traversing graph without blocking application,
  • remark - stop-the-world pause needed to account changes made by application during concurrent phase. During remark collector only have to revisit references changed since initial mark (card marking write barrier allow to do it efficently).
In practice, both JRockit and HotSpot are using additional phase - concurrent preclean - which executed before remark. Concurrent preclean is actually a remark but without stop-the-world pause. Preclean phase makes next remark phase shorter by reducing number of cards which have to be rescanned.

Concurrent marking is fairly straight forward. Sweeping also can be done concurrently with application (JRockit is using two short stop-the-world pause for sweeping, while HotSpot's sweep phase is fully concurrent). But if we just mark unused objects as free space it would eventually lead to fragmentation of heap and inability to allocate objects of moderate size, even if free space is available (death by fragmentation). JRockit is using compaction to protect itself against fragmentation.

Compaction is a very expensive operation. The JVM should move not only the object itself, it should also update all references to every relocated object. Compaction also requires stop-the-world pause and is single threaded in JRockit JVM. To avoid long compaction pauses, the garbage collector can do compaction incrementally. Each time when concurrent collection is stated, the JVM selects a range of heap space to be compacted. During the mark phase all references to objects in compaction area are collected. During the sweep phase, unreachable objects are marked as free space. And finally during the compact phase objects in compaction area are relocated. Compaction can be either internal (objects are relocated inside of compaction region), or external (objects are copied out to another region and whole old region becomes a free space). Compaction phase is abortable, JVM may choose to abort compaction half way if it is taking too much time. JVM may also decide not to move some objects if they have too many external references (or if they are pinned).

Even done incrementally compaction is significantly increasing pause duration. It is possible to turn off compaction altogether, but this way fragmentation becomes serious treat (unlike HotSpot's CMS, JRockit is not using free lists and statistical analysis to control fragmentation of heap).

JRockit's gencon vs HotSpot's CMS quick summary

Both use 4 phase concurrent marking (initial mark, concurrent sweep, concurrent preclean, remark). HotSpot's CMS is using fully concurrent sweep (without compaction).

JRockit may use compaction, compaction requires additional pause.

In JRockit initial mark and remark are forcing young collection. In HotSpot it is more flexible. Initial mark may wait for next young GC, while remark either force it or scan objects in Eden without young GC.

HotSpot's CMS is using free lists and statistical analysis to avoid fatal heap fragmentation. JRockit can do compaction, but very prone to fragmentation if compaction is not frequent enough.

Configuring JRockit for low pause on large heap

Garbage collection tuning is very application specific. So everything below has been written with certain type of applications in mind. Application class I'm interested in is same as in previous article. Its key characteristics are:

  • Heap is used to store data structures in memory.
  • Heap size 10GiB and more.
  • Request execution time is small (up to dozens of milliseconds).
  • Transactions are short (up to hundreds of milliseconds). Transaction may include several requests.
  • Data in memory is modified slowly (e.i. we do not modify whole 10GiB in heap within one seconds, though updating of 10MiB data in heap per second is ok).
  • Amount of short lived garbage is fairly large ~100-200MiB sec (garbage produced by parsing encoding network protocol, etc).
Only viable algorithm for such kind of application is generational concurrent mark sweep. Unfortunately heuristic algorithms are not smart enough and will force single space concurrent algorithm for low pause target (they have their metric, they want to avoid young GC pauses). Achilles' heel of single space algorithm is throughput, which is too low for this class of applications.

We have to for gencon algorithm and tune it by hands.

Sizing young space

Default size of young space in JRockit is 10MiB multiplied by number of young collection threads (young collection is done in parallel). Usually this default size is too small and you would want to increase it to reduce young GC frequency (-Xns<size> JVM option will help you). Compared to HotSpot, JRockit young space collection pauses are considerably shorter.

Keeping compaction pauses under control

JVM can abort compaction if it is taking too long. This is effective way to ensure max pause guaranty. Unfortunately you cannot just say -XpauseTarget=50 and relax. JRockit forbids pause target below 200ms if GC type is not set to deterministic, but if you use -Xgc:deterministic, JVM will choose singlecon mode and you will enjoy 5-30 second pauses (dependent on heap size) due to lack of throughput. This is really sad.

Due to pause target is locked out from our use, we have to use other options. There are too ways how we can prohibit long compaction:

  • limiting size of compaction area (using -XXcompation:percentage=n option),
  • limiting number of references to be updated during compaction (using -XXcomaption: maxReferences=n).
Both ways are bad. Reducing size of compaction area will limit compaction pause time, but will reduce throughput. Using maxReferences will abort compaction if area is containing too many live objects, avoiding long pauses but reducing throughput even more. Let's hope JRockit team will realize demand from application with large heap and unlock access for pause target.

Running on 32GiB heap, good, bad and ugly.

Now I would like to share my experience of running 32GiB Oracle Coherence node on JRockit. Though I have spent enough time with tuning of GC options, there is still a fair chance that I have missed something. So please take my opinion with a grain of salt.

Good, young GC pause times

Young GC pause time are much better than HotSpot's CMS. It is roughly on par with patched version of JDK7 (even slightly better). Young collections are most frequent ones, this is really good that JRockit can handle them so well.

Bad, throughput

It is just not enough. I believe it is a curse of any compacting collector (HotSpot's G1 included). Modern hardware is just not enough to do all work associated with object relocation fast enough. But lack of throughput may not necessary be a show stopper. While my tests are fairly write intensive, for many applications JRockit's generational collector throughput may be enough.

In terms of throughput HotSpot's CMS beats out all competitors (probably except Azul Zing, which is using some intimate access to hardware not possible for common JVMs like HotSpot or JRockit).

Bad, fragmentation

Surprise! Compacting collector is prone to fragmentation. Combination of low throughput and incremental compaction leads to a fragmentation. Increasing throughput probably would remedy this problem, but it is impossible without significant increase in pause duration. Another way to counter fragmentation is increasing heap size, but this approach also have obvious practical limitation.

Ugly, long pauses

If you are looking only at logs of JRockit's GC, you may be kept under the assumption that pauses are short and low throughput is the only issue. It is not true. Your application may experience pauses not reported by JVM. You can easily measure them in your application code, though. After spending some time investigating this problem, I came to conclusion that the concurrent preclean phase is hindering young collection.

Normally young collection starts immediately, if TLAB cannot be allocated. But if concurrent preclean is active at the same time, it seems that the young collection can be delayed (and this delay can be significant 0.5-2 seconds depending on preclean phase duration). During that time threads are blocked waiting for TLAB. TLABs are usually small enough, so you have a good chance that most worker threads of your application will be blocked waiting for TLAB allocation. This is as severe as normal stop-the-world pause except, JVM does not report anything.

Young GC delayed by preclean

Why is preclean is affecting young collection? It is a good question, one possible reason is that remark which is scheduled after preclean requires young collection anyway, and the JVM thinks that this way it can avoid 2 pauses. Or it may be young collection interferes with concurrent preclean somehow using sharing data structures, so the JVM decides to delay it. The reason is not clear for me, but the consequence is unpredictable long application pauses which cannot be controlled.

This behavior is a serious show stopper for using JRockit in pause sensitive applications.

Deterministic pauses myth

JRockit claims what it can guarantee deterministic short pauses (below 50ms). This claim is absolutely valid. Single space concurrent collector, fully controls duration of pauses, so it can provide this guarantee. The problem is extremely low throughput though. Throughput can be increased by throwing in more memory for the application. But it will probably require tens or even hundreds times memory overhead to provide throughput comparable to the generational collector.

Conclusion

JRockit is a nice product, it has a lot of advanced features and is a very mature JVM. But so far I'm not going to use it for response time sensitive applications. Still I believe JRockit has good potential. There may also be kinds applications which can benefit from JRockit's garbage collection algorithms better than typical data grid.

Anyway it is good to have fair competition in the JVM area. Good luck to both JRockit and HotSpot products!

See also

Some of my other articles about garbage collection.

Published at DZone with permission of Alexey Ragozin, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Keith Barret replied on Wed, 2011/07/13 - 5:23am

Thanks again for this another exxcellent article giving an in-depth overview and insight of the internal of the different java memory management implementation. Especially your (potentially biased) opinion helped me a lot to sort some marketing claims out. Hopefully Oracle doesn't neglect HotSpot in favour to their JRockit implementation.

J Virumbi replied on Fri, 2011/12/02 - 7:04am

Hi Alexey, All your JVM articles are very informative and thoughtful. Thanks. We are upgrading a greenfield developed mid-size J2EE web app (plus weblogic integration) from single instance weblogic 8.1(2 *single core processor/ 32 bit/ 1gig heap) to weblogic 10.3 (2*Quad core/ 64 bit machine housing two VMs each hosting 2 weblogic instances one for SOA server and another for application; clustered). Each VM has 20Gig of RAM.

JRockit is the JVM of choice; THANKS to Oracle marketing ;)

It's a below average application in terms of response time; takes 5-15 seconds on many screens. It's all digested by customer.(So, Long pause time can not be an issue !!!!)

To start with, I have proposed for a 4 Gig of heap (By the way, noticed that in fact java.exe reserves 6Gig at the start itself. So I guess New Gen space ratio of 2 is taken here and reserved beyond 4Gig of heap, unlike HotSpot. am I right ? here should I go for -Xns=2Gig to avoid shrinking of youngGen space after GC ?)

I have not started profiling GC logs yet but I feel throughput is a basic problem for this app. So I am planning to propose -Xgc:genparpar . is it fine or do you have any other suggestion ?

Thanks & Rdgs, J

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.