Performance Zone is brought to you in partnership with:

Work as Technical Evangelist. Have been in Java field since last 10 yrs. Worked with Yahoo, IDeaS, EDS, General Motors. Vikash has posted 2 posts at DZone. View Full User Profile

Java Performance Tuning, Profiling, and Memory Management

09.01.2009
| 121677 views |
  • submit to reddit

Permanent Generation

Class information is stored in the perm generation. Also constant strings are stored there. Strings created dynamically in your application with String.intern() will also be stored in the perm generation. Reflective objects (classes, methods, etc.) are stored in perm. It holds all of the reflective data for the JVM

JVM process memory

The windows task manager just shows the memory usage of the java.exe task/process. It is not unusual for the total memory consumption of the VM to exceed the value of -Xmx   Managed Heap (java heap, PERM, code cache) + NativeHEAPThreadMemory <= 2GB (PAS on windows)
       Code-cache contains JIT code and hotspot code.
       ThreadMemory = Thread_stack_size*Num_threads.ManagedHeap: Managed by the developer.
     Java heap: This part of the memory is used when you create new java objects.     Perm: for relfective calls etc.
NativeHeap : Used for native allocations.ThreadMemory: used for thread allocations.
       

What you see  in the TaskManager is the total PAS, while what the profiler shows is the Java Heap and the PERM(optionally)

   Platforms                       Maximum PAS*
  1. x86 / Redhat Linux 32 bit         2 GB
  2. x86 / Redhat Linux 64 bit         3 GB
  3. x86 / Win98/2000/NT/Me/XP    2 GB
  4. x86 / Solaris x86 (32 bit)         4 GB
  5. Sparc / Solaris 32 bit               4 GB

Why GC needs tuning

  Amdahl's law • • •

 

Limits of Vertical scaling

If F is the fraction of a calculation that is sequential (i.e. cannot benefit from parallelization), and (1 − F) is the fraction that can be parallelized, then the maximum speedup that can be achieved by using N processors is:

     1
------------    Amdahl's law
F + (1-F)/N

In the limit, as N -> infinity, the maximum speedup tends to 1/F. If F is only 10%, the problem can be sped up by only a maximum of a factor of 10, no matter how large the value of N used.

So we assume that there is a scope of leveraging benefits of multiple CPUs or multithreading.All right, enough of theory..........can it solve my problem??

 

Problem Statements

  1. Application slow

    Your application may be crawling because it's spending too much time cleaning up the garbage , rather than running the app.

    Solution: Need to tune the JVM parameters. Take steps to Balance b/w pause and GC freq.
  2. Consumes too much memory
    The memory footprint  of  the  application is  related  to the number and  size of the live objects that are in the JVM at  any  given point of time. This  can be either due to valid objects that  are  required to stay in memory, or because programmer forgot to remove the reference to unwanted objects (typically known as 'Memory leaks' in java parlance. And as  the memory footprint hits the threshold, the JVM throws the java.lang.OutOfMemoryError.

Normal 0

Java.lang.OutOfMemoryError can occur due to 3 possible reasons:  

1. JavaHeap space low to create new objects . Increase by -Xmx    (java.lang.OutOfMemoryError: Java heap space).
java.lang.OutOfMemoryError: Java heap space
MaxHeap=30528 KB  TotalHeap=30528 KB FreHeap=170 KB  UsedHeap=30357 KB

2. Permanent Generation low. Increase by XX:MaxPermSize=256m (java.lang.OutOfMemoryError: PermGen space)
java.lang.OutOfMemoryError: PermGen space
MaxHeap=65088 KB  TotalHeap=17616 KB      FreeHeap=9692 KB  UsedHeap=7923 KB

                Heap

 def new generation   total 1280K, used 0K [0x02a70000, 0x02bd0000, 0x02f50000)

  eden space 1152K,   0% used [0x02a70000, 0x02a70000, 0x02b90000)

  from space 128K,   0% used [0x02bb0000, 0x02bb0000, 0x02bd0000)

  to   space 128K,   0% used [0x02b90000, 0x02b90000, 0x02bb0000)

 tenured generation   total 16336K, used 7784K [0x02f50000, 0x03f44000, 0x06a70000)

                 the space 16336K,  47% used [0x02f50000, 0x036ea3f8, 0x036ea400, 0x03f44000)

 compacting perm gen  total 12288K, used 12287K [0x06a70000, 0x07670000, 0x07670000)

                 the space 12288K,  99% used [0x06a70000, 0x0766ffd8, 0x07670000, 0x07670000) 

 

            3. java.lang.OutOfMemoryError:  .... Out of swap space ...

JNI Heap runs low on memory, even though the JavaHeap and the PermGen have memory. This typically happens if you are  meking lots of heavy JNI calls, but the JavaHeap objects occupy little space. In that scenario the GC might not feel the urge to cleanup JavaHeap, while the JNI Heap keeps on increasing till it goes out of memory.

If you use java NIO packages, watch out for this issue. DirectBuffer allocation uses the native heap.

The NativeHeap can be increasded by -XX:MaxDirectMemorySize=256M (default is 128)

Diagnosis:

 

There are some starting points to diagnose the problem.You may start with the '-verbose:gc' flag on the java command and see the memory footprint as the application progresses, till you find a spike. You may analyze the logs or use a light profiler like JConsole (part of JDK) to check the memory graph. If you need the details of the objects that are occupying the memory at a certain point, then you may use JProfiler or AppPerfect which can provide the details of each object instance and all the in/out bound  references to/from it. This is a memory intensive procedure and not meant for production systems. Depending upon your application, these heavy profilers can slow down the app upto 10 times.

Below are  some of the ways you can zero-in on the issue.

A) GC outputs

-verbose:gc   

This flag starts printing additional lines to the console, like given below

[GC 65620K -> 50747K(138432K), 0.0279446 secs]
[Full GC 46577K -> 18794K(126848K), 0.2040139 secs] 
Combined size of live objects before(young+tenured) GC -> Combined size of live objects(young+tenured)  after GC (Total heap size, not counting the space in the permanent generation
-XX:+PrintHeapAtGC : More details
•-XX:+PrintGCTimeStamps will additionally print a time stamp at the start of each collection.
111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]
111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs]
26282K->2311K(32704K), 0.1293306 secs]
The collection starts about 111 seconds into the execution of the application. The tenured generation usage was reduced to about 10%
18154K->2311K(24576K)
 
B) hprof output file

java –Xrunhprof:heap=sites,cpu=samples,depth=10,thread=y,doe=y
The heap=sites tells the profiler to write information about memory utilization on the heap, indicating where it was allocated.
cpu=samples tells the profiler to do statistical sampling to determine CPU use.
depth=10 indicates the depth of the trace for threads.
thread=y tells the profiler to identify the threads in the stack traces.
doe=y tells the profiler to produce dump of profiling data on exit.


C) -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=C:\OOM.txt

Dump the heap on OOM, and then analyze the OOM.txt (Binary file) with jhat tool (bundled with JDK)

The command below will launch http server  @port 7777 . Open a browser with the URL 'http://localhost:7777'  to see the results.

jhat -port 7777 c:\OOM.txt


D) Profiling the app

Normal 0

You can profile the application to figure out Memory Leaks.

Java memory leaks (or what we like to call unintentionally retained objects), are often caused by saving an object reference in a class level collection and forgetting to remove it at the proper time. The collection might be storing 100 objects, out of which 95 might never be used. So in this case those 95 objects are creating the memory leak, since the GC cannot free them as they are referenced by the collection.

There are also other kinds of problems with managing resources that impact performance, such as not closing JDBC Statements/ResultSets in a finally block (many JDBC drivers store a Statement reference in the Connection object).

A java "memory leak" is more like holding a strong reference to an object though it would never be needed anymore. The fact that you hold a strong reference to an object prevents the GC from deallocating it.. Java "memory leaks" are objects that fall into category (2). Objects that are reachable but not "live" can be considered memory leaks.

JVMPI for Profiling applications  give a high level of detailing
Profilers: Hprof, JConsole, JProfiler, AppPerfect, YourKit, Eclipse Profiler, NetBeans Profiler ,JMP, Extensible Java Profiler (EJP), TomcatProbe, Profiler4j

JConsole is good for summary level info, tracking the memory footprint, checking Thread deadlocks etc. It does not provide details of the Heap object. For Heap details you may use AppPerfect (licensed) or JProfiler.

E) For NativeHeap issues.....

JRockit JDK (from BEA) provides better tools than the SUN JDK to peep inside the JNI Heap(atleast on Windows).

JRockt Runtime  Analyzer ...this  is part of the jrockit  install.
jrcmd PSID print_memusage
JRMC.exe  ...launch from /bin and  start  recording.

 Try to get some Solution:

Based on the findings from the diagnosis, you may have to take these actions:

  1. Code change - For memory leak issues, it has to be a code change.
  2. JVM parameters tuning - You need to find the behavior of your app in terms of the ratio of young to old objects, and then tune the JVM accordingly. We ll talk abt when to tune a parameter as we discuss the relevant params below.

    Memory parameters:

    Memory Size:  overall size, individual region sizes

    -ms, -Xms  
    sets the initial heap size (young and tenured generation ONLY, NOT Permanent)

    If the app starts with a large memory footprint, then you should set the initial heap to a large value so that the JVM does not consume cycles to keep expanding the heap.

    -mx, -Xmx
    sets the maximum heap size(young and tenured gen ONLY,NOT Perm) (default:  64mb)

    This is the most frequently tuned parameter to suit the max memory requirements of the app. A low value  overworks the GC so that it frees space for new objects to be  created, and may lead to OOM. A very high value can starve other apps and induce swapping. Hence, Profile the memory requirements to select the right value.

    -XX:PermSize=256 -XX:MaxPermSize=256m

    MaxPermSize  default value (32mb for -client and 64mb for -server)
    Tune this to increase the Permanent gereration max size.
  3. GC parameters:

    -Xminf [0-1], -XX:MinHeapFreeRatio [0-100]

    sets the percentage of minimum free heap space - controls heap expansion rate

    -Xmaxf [0-1], -XX:MaxHeapFreeRatio [0-100]

    sets the percentage of maximum free heap space - controls when the VM will return unused heap memory to the OS

    -XX:NewRatio

    sets the ratio of the old and new generations in the heap. A NewRatio of 5 sets the ratio of new to old at 1:5, making the new generation occupy 1/6th of the overall heap
    defaults: client 8, server 2

    -XX:SurvivorRatio

    sets the ratio of the survivor space to the eden in the new object area. A SurvivorRatio of 6 sets the ratio of the three spaces to 1:1:6, making each survivor space 1/8th of the new object region

 

Garbage Collector Tuning:

 Types of GarbageCollectors (not complete list)

  1. Throughput collector: (default for Server JVM)
    •parallel version of the young generation collector.
    •-XX:+UseParallelGC
    •The tenured gc is the same as the serial collector (default GC for client JVM).
    •multiple threads to execute a minor collection
    •application has a large number of threads allocating objects / large Eden
    •-XX:+UseParallelOldGC (major also in parallel)
  2. Concurrent low pause collector  :
    •collects the tenured generation and does most of the collection concurrently with the execution of the application. Attempts to reduce the pause times needed to collect the tenured generation
    •-Xincgc™ or -XX:+UseConcMarkSweepGC
    •The application is paused for short periods during the collection. A parallel version of the young generation copying collector is used with the concurrent collector.
    •Multiprocessor; apps that have a relatively large set of long-lived data (a large tenured generation;
    •Apps where response time is more important than overall throughput e.g. JAVA_OPTS= -Xms128M -Xmx1024M -XX:NewRatio=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:E:\loggc.txt

    FlipSide: Synchronization overhead, Fragmentation

 

Performance Solution

  1. Application Software profiling
  2. Server and JVM tuning
  3. Right Hardware and OS
  4. Code improvement as per the Behaviour of your application & profiling results….…. easier said than done
  5. Use JVM the right way : optimal JVM params
  6. Client / server application
  7. •-XX:+UseParallelGC  if u have multiprocessors

 

Some Tips

  • Unless you have problems with pauses, try granting as much memory as possible to the virtual machine
  • Setting -Xms and -Xmx to the same value ….but be sure about the application behaviour
  • Be sure to increase the memory as you increase the number of processors, since allocation can be parallelized
  • Don’t forget to tune the Perm generation
  • Minimize the use of synchronization
  • Use multithreading only if it benefits. Be aware of the thread overheads. e.g a simple task like counter incrementing from 1 to billion ....use single thread. Multiple threads will ruin to mutiple of 10. I tested it out on dual CPU WinXP with 8 threads.
  • Avoid premature object creation. Creation should be as close to the actual place of use as possible. Very basic concept that we  tend to overlook.
  • JSPs are generally slower than servlets.
  • Too many custom CLs, reflection : increase Perm generation. Don't be PermGen-agnostic.
  • Soft References for memory leakages. They enable smart caches and yet do not load memory. GC will flush out SoftReferences automatically if the JVM runs low on memory.
  • StringBuffer instead of String concat
  • Minimize JNI calls in your code
  • XML APIs – be careful …SAX or DOM- make correct choice. Use precompiled xpaths for better performance of the queries.

 

Conclusion:

There can be various bottlenecks for the entire application, and application JVM may be one of the culprits.There can be various reasons like JVM not  tuned optimally to suit your application, Memory leakages, JNI issues etc. They need to be diagnosed, analyzed and then fixed.

Published at DZone with permission of its author, Vikash Ranjan.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Ramazan VARLIKLi replied on Wed, 2009/09/02 - 6:59am

 

-JSPs converted into Servlet when they compiled so shouldn't be any performance difference between two

-StringBuilder is faster version of StringBuffer 

Kirk Pepperdine replied on Wed, 2009/09/02 - 7:01am in response to: Ramazan VARLIKLi

StringBuilder is an unsynchronized version of StringBuffer

qi yao replied on Thu, 2009/09/03 - 11:55pm

I think it's a very good article.

Ashish Paliwal replied on Fri, 2009/09/04 - 4:54am

A great article. It was much needed as very little information is available for debugging these frequently occurring scenarios. By any chance, would you be posting anything that can help debug concurrency issues.

Vikash Ranjan replied on Fri, 2009/09/04 - 1:57pm in response to: Ashish Paliwal

Thanks guys. If u are  talking about some deadlock issues, u can try JConsole -it's a simple tool available with JDK- decent for deadlock detection. It's light weight and can connect to any JVM. At any time it can show you the status of all the threads running in the JVM. See if the links below helps:

http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html#DeadlockDetection

http://java.sun.com/javase/6/docs/technotes/guides/management/jconsole.html

Go to the scenario where you suspect deadlock and check the deadlocked threads. It can even tell you the monitor details.

Raveendra Maddali replied on Thu, 2009/09/10 - 12:30am

Hi, We have a windows server machine that have enough memory to bring up a jvm with more than 1gb. We have weblogic 10.3 present on that server. when we try to bring a managed server with JRockit Xmx 1024m , it is bringing it up fine. but when we try the same with SunJDK Xmx 1024m , it is failing to allocate the amount of memory (or) object heap and exiting. What is this difference between JRockit and SunJdk.?? Please let us know , if anybody come across this kind of situation.

Raveendra Maddali replied on Wed, 2009/09/09 - 12:18am

HI, What is the best open source java profiler that is suggested for memoryleaks??
and how to setup the profiling on remote server??
ie., i wanted to profile a managed server running on solaris machine, from my local or client machine.
I tried to setup jconsole remotely , but not able to make it. Anyother profilers with easy of integration will be very much helpful.

Thanks in advance..!!!

Vikash Ranjan replied on Thu, 2009/09/10 - 1:03pm in response to: Raveendra Maddali

Not sure  why u need open source profiler .... did u mean free?

If I am not wrong, they all use the same JVMPI/TI APIs.

U can download and try some ....JProfiler is very good. So is AppPerfect, we tried it and then bought licenses.

AppPerfect @ http://www.appperfect.com/products/java-profiler.html

JProfiler @ http://www.brothersoft.com/jprofiler-download-81861.html  

Yourkit  @  http://www.yourkit.com/   

JProbe I have heard is ok too.

 

Vikash Ranjan replied on Thu, 2009/09/10 - 1:06pm in response to: Raveendra Maddali

BTW, JConsole has  the  easiest setup :)

All the  profilers  that I mentioned need some more  effort for  remote  connection.

But u should be able to troubleshoot @ http://java.sun.com/javase/6/docs/technotes/guides/management/jconsole.html

Raveendra Maddali replied on Mon, 2009/09/14 - 11:12pm

Hi, We have a windows server machine that have enough memory to bring up a jvm with more than 1gb. We have weblogic 10.3 present on that server. when we try to bring a managed server with JRockit Xmx 1024m , it is bringing it up fine. but when we try the same with SunJDK Xmx 1024m , it is failing to allocate the amount of memory (or) object heap and exiting. What is this difference between JRockit and SunJdk.?? Please let us know , if anybody come across this kind of situation

Vikash Ranjan replied on Mon, 2009/11/09 - 2:48am in response to: Raveendra Maddali

Incase u r  still looking for an answer.....:)

There are  differences in the default parameters that start the JVM, so sometimes you may need to modify your parameters to adjust the VM. One I know of is the Perm Gen, which will affect the Heap Max. JRockit dynamically  resizes the PermGen, so it can start with a lower value, leaving more space for Heap. There may be other reasons as well. U may need to check the differences in the JVM Startup params to get a comprehensive answer. 

The max heap you can set depends on the AVAILABILITY of the RAM+Swap, as  some of that may be used by other processes. Please try increasing the SWAP (that's easier to do ...dont  forget to restart m/c).

Normal 0 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;}

Carla Brian replied on Tue, 2012/06/19 - 5:54pm

This is good. This is really helpful as well in testing the performance of the java application. Good job on this. - Garrett Hoelscher

Suminda Dharmasena replied on Tue, 2013/04/23 - 11:45am

Hi,

Most of the objects created in java need not be GCed. Some objects can be allocated in the stack  through escape analysis. Ideally the objects can be de allocated once use is over. Static escape analysis can be performed to see the extent of sharing and optimal time to delete. If any objects falls through this then GC can be performed. 

Also annotations can be introduced to mark timing of when GC happens. This will reduce the overhead and penalty of GC and pause.

Suminda

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.