I am founder and Master Developer of Plumbr, memory leaking detection tool. I enjoy solving problems, raising self-awareness and professional pride of developers around me. In my out-of-office time I am bookworm and computer games addict. Nikita is a DZone MVB and is not an employee of DZone and has posted 84 posts at DZone. You can read more from them at their website. View Full User Profile

Should I use a 32- or a 64-bit JVM?

11.28.2012
| 18956 views |
  • submit to reddit

 This is a question I have faced several times during my career in enterprise software development. Every once in awhile I’ve had to hand out recommendations for configuring a specific new environment. And more often than not, part of the question at hand was related to “Should I use a 32- or a 64-bit JVM”. To be honest, in the beginning I just flipped the coin. Instead of giving a reasoned answer. (Sorry, bros!) But by now I have gathered some more insight on this and thought to share it with you.

First stop – the more, the merrier. Right? So – as 64 > 32 then this would be an easy answer: if possible, always choose 64-bit? Well, hold your horses. The downside of the 64-bit architecture is that the same data structures consume more memory. A lot more. Our measurements show that depending on the JVM version and the operating system version along with hardware architecture you end up using 30-50% more heap than on 32-bit. Larger heap can also introduce longer GC pauses affecting application latency – running a full GC on a 4.5GB heap is definitely going to take longer than on a 3GB one. So it will not be correct to jump on the 64-bit bandwagon just because 64 is bigger than 32.

But… when should you ever desire to use a 64-bit JVM at all then? In most cases the reason is large heap sizes. On different architectures you quickly face limitations of maximum heap size on 32-bit architectures. The following illustrates these limitations on different platforms:

OS Max heap Notes
Linux 2GB 3GB on specific kernels, such as hugemem
Windows 1.5GB Up to 3GB with “/3GB” boot flag and JRE compiled with /LARGEADDRESSAWARE switch)
Mac OS X 3.8GB Alert – could not find an ancient Mac, so this is untested by me

Now how come is it that bad? After all, I bet you have seen 32-bit machines running on 16G+ RAM and doing just fine. What’s wrong with the JVM that it can allocate less than 10% of this 16G on Windows?

Main cause – address space. In a 32-bit system you can theoretically allocate up to 4GB of memory per process. What breaks this on Windows is how process address space is handled. Windows cuts the process address space in half. One half of it is reserved for the kernel (which a user process cannot use) and the other half for the user. It doesn’t matter how much RAM is in the box, a 32-bit process can only use 2GB of RAM. What’s even worse – this address space needs to be contiguous, so in practice you are most often left with just 1.5-1.8GB of heap on Windows boxes.

There is a trick you can pull on 32-bit windows to reduce the kernel space and grow the user space. You can use the /3GB parameter in your boot.ini. However, to actually use this opportunity, the JVM must be compiled/linked using the /LARGEADDRESSAWARE switch.

This unfortunately is not the case, at least with the Hotspot JVM. Until the latest JDK 1.7 releases the JVM is not compiled with this option. You are luckier if you are running on a jRockit on post-2006 versions. In this case you can enjoy up to 2.8-2.9GB of heap size.

So – can we conclude that if your application requires more than ~2-3GB of memory you should always run on 64-bit? Maybe. But you have to be aware of the threats as well. We have already introduced the culprits – increased heap consumption and longer GC pauses. Lets analyze the causes here.

Problem 1: 30-50% of more heap is required on 64-bit. Why so? Mainly because of the memory layout in 64-bit architecture. First of all – object headers are 12 bytes on 64-bit JVM. Secondly, object references can be either 4 bytes or 8 bytes, depending on JVM flags and the size of the heap. This definitely adds some overhead compared to the 8 bytes on headers on 32-bit and 4 bytes on references. You can also dig into one of our earlier posts for more information about calculating the memory consumption of an object.

Problem 2: Longer garbage collection pauses. Building up more heap means there is more work to be done by GC while cleaning it up from unused objects. What it means in real life is that you have to be extra cautious when building heaps larger than 12-16GB. Without fine tuning and measuring you can easily introduce full GC pauses spanning several minutes. In applications where latency is not crucial and you can optimize for throughput only this might be OK, but on most cases this might become a showstopper.

So what are my alternatives when I need larger heaps and do not wish to introduce the overhead caused by 64-bit architecture? There are several tricks we have covered in one of our earlier blog posts- you can get away by heap partitioning, GC tuning, building on different JVMs or allocating memory off the heap.

To conclude, let’s re-state that you should always be aware of the consequences of choosing a 64-bit JVM. But do not be afraid of this option.

And – if you enjoyed this post then stay tuned for more and subscribe to either our RSS feed or Twitter stream to be notified on time.

Published at DZone with permission of Nikita Salnikov-tarnovski, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Dapeng Liu replied on Wed, 2012/11/28 - 2:53am

Problem 2 is not very true. Full GC duration ONLY depends on the size of LIVE objects, the garbage objects doesn't count toward the GC duration. 

suppose that your live size is about 100m, a heap of 2G and a heap of 20G will have about the same full GC duration.


André Pankraz replied on Wed, 2012/11/28 - 7:15am in response to: Dapeng Liu

 And Problem 1...I already answered in original post:

30% overhead might not be true anymore with Compressed Oops feature.

https://wikis.oracle.com/display/HotSpotInternals/CompressedOops

http://blog.leneghan.com/2012/03/reducing-java-memory-usage-and-garbage.html


Jose María Zaragoza replied on Wed, 2012/11/28 - 9:23am

A stupid question

Can I run natively a 64-bit JVM on a 32-bit OS ? 


Thanks


Pierre - Hugues... replied on Wed, 2012/11/28 - 10:26am

Great post Nikita. You listed well the facts here and 2 problems to keep an eye on when upgrading or planning  to use a 64-bit JVM. These are exactly the main problems I observed while performing multiple JVM upgrades from 32-bit to 64-bit for my clients.

Regarding problem #1, this is true that the footprint increase can be mitigated at some level when enabling compression for the HotSpot JVM with minimal performance impact. Proper due diligence and load testing is still my primary recommendation so you can truly assess any delta increase and/or negative impact for your environment. This is especially important if you are planning to re-use existing hardware so you will need to perform extra capacity planning analysis as well to ensure you have enough RAM & CPU to handle the upgrade.

Regarding the problem #2, my understanding from Nikita’s point is that typically using a JVM that big is when we are dealing with a large OldGen space. The cost of the short live objects may not be excessive here but the true impact can be observed when the Full GC has to clear the OldGen space (long lived objects). Given the # objects accumulated, this can lead to high GC time and JVM hang if the GC policy is not tuned properly.

If your physical/virtual server has enough CPU cores, typically I have observed better throughput & capacity by splitting that 20-30 GB JVM into sub JVM processes up to 10-15 GB in order to reduce inner contention.

My final point about problem #2 is thread concurrency. Running a good portion of your traffic using a single JVM process of 30 GB may be appealing but this can also lead to significant increase of thread concurrency within the JVM/middleware/application. Again, depending of your application behavior, you may notice throughput capacity increase by splitting (partitioning ) such big JVM process.


Regards,

P-H 

Jose María Zaragoza replied on Wed, 2012/11/28 - 11:47am in response to: Pierre - Hugues Charbonneau


If your physical/virtual server has enough CPU cores, typically I have observed better throughput & capacity by splitting that 20-30 GB JVM into sub JVM processes up to 10-15 GB in order to reduce inner contention.

How would be this implemented ?


Thanks

Pierre - Hugues... replied on Wed, 2012/11/28 - 12:05pm in response to: Jose María Zaragoza

Hi Jose,

Let’s take a simple example of a Java Web application running on Tomcat or JBoss. Let’s assume you need, for some reasons, a 30 GB Java heap size to handle your client traffic (many sessions with high memory footprint etc.).

Partitioning your JVM process simply means that you would create let’s say 3 instances of Tomcat on the same host vs. only one. Each instance of Tomcat has its own JVM process (and thread pools). In this scenario, you would configure the Java heap size at 10 GB instead of 30 GB. This is assuming your memory footprint is dynamic and depends on incoming client requests vs. static footprint. You application will obviously be more fault tolerant with reduced impact if you need to take restart actions, dealing with JVM crashes and/or stuck threads etc.

Vertical scaling all depends of your application behavior, static vs. dynamic footprint and also availability of CPU cores from your physical/virtual host.


Regards,

P-H

Dapeng Liu replied on Wed, 2012/11/28 - 8:57pm in response to: Pierre - Hugues Charbonneau

with G1 collector, I don't think there is still a need to partition your memory. 

Jaffa Wify replied on Fri, 2013/06/21 - 2:36am

 Another business strategy that a company can adopt is whole life costing. Rather than merely account for the price of the items that a company buys, whole life costing further records the costs associated with maintaining and disposing of that item. For example, the costs associated with the purchase of a motor car will include the fuel, insurance, repairs and maintenance and disposal, not just the cost of acquisition. Thank you.
wedding photographer

Muhammed Shakir... replied on Sun, 2014/05/04 - 12:33am in response to: Dapeng Liu

Hi Dapeng,

Full GC does not use copy collector instead uses MarkSweep. MS will run through the dead matter as well in order to be able to get the un-referenced object memory addresses back to free list. So the time taken on 20G will be more than it would take on 2G. Full GC will kick in only when object allocation is about to fail with no addresses available in free list - this means the 20G (considering that heap is 20G) is almost allocated. So even if live objects are of 1G and dead are 19G, GC will have to run through 19G as well in order to claim the dead matter. 

I request you to please spend time to reply back in case you think I need to be corrected.

Best Regards

Dapeng Liu replied on Thu, 2014/05/08 - 10:14pm in response to: Muhammed Shakir Misarwala

well i think you are talking about a very primitive GC implementation. http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html

marking doesn't have to wait at the "last moment" ... it is very possible that when the memory usage is crossed some threshold, marking/copying can kick in at a very convenient time when the server isn't super busy.

by setting your GC parse goal, you can tell VM not to go all-in with GC, as long as there are enough space is freed, GC can literally stop at any time ... 

last point to address if your LIVE objects are using more than 90% of your heap, something needs to be done ... 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.