I am founder and Master Developer of Plumbr, memory leaking detection tool. I enjoy solving problems, raising self-awareness and professional pride of developers around me. In my out-of-office time I am bookworm and computer games addict. Nikita is a DZone MVB and is not an employee of DZone and has posted 88 posts at DZone. You can read more from them at their website. View Full User Profile

What is a Memory Leak in Java?

02.20.2012
| 7031 views |
  • submit to reddit

Disclaimer: the post is a simplistic introduction into Java memory leaks problem, aimed mainly for people who have never given this topic much of a thought.

Let us start with outlining what is the difference in the memory management in Java and, for example, C languages. When a C-programmer wants to use a variable, he has to manually allocate a region in the memory where the value will reside. After the application finishes using that value, the region of the memory must be manually freed, i.e. the code freeing the memory has to be written by the developer. In Java, when a developer wants to create and use a new object using, e.g. new Integer(5), he doesn't have to allocate memory – this is being taken care of by the Java Virtual Machine (JVM). During the life of the application JVM periodically checks which objects in memory are still being used and which are not. Unused objects can be discarded and memory reclaimed and reused again. This process is called garbage collection and the corresponding piece of JVM is called Garbage Collector, or GC.

Java’s automatic memory management relies on GC which periodically looks for unused objects and removes them. And here hides the dragon. Simplifying a bit, we can say that a memory leak in Java is a situation where some objects are not used by application any more, but GC fails to recognize them as unused. As a result, these objects remain in memory indefinitely reducing the amount of memory available to the application.

Here I would like to stress one very important point: the notion of “object is not used by application any more” is totally, absolutely, 100% application specific! Apart from some specific cases, where lifespan of the object can be logically determined (such as local variable of the method, which does not under any circumstances escape the method), object usage can be understood only by the application developer taking into account all usage patterns of the application.

How can GC distinguish between the unused objects and the ones the application will use at some point in time in the future? The basic algorithm can be described as follows:

  1. There are some objects which are considered “important” by GC. These are called GC roots and are (almost) never discarded. They are, for example, currently executing method’s local variables and input parameters, application threads, references from native code and similar “global” objects.
  2. Any object referenced from those GC roots are assumed to be in use and not discarded. One object can reference another in different ways in Java, most commonly being when object A is stored in a field of object B. In that case, we say  “B references A”.
  3. The above process is repeated until all objected that can be transitively reached from GC roots are visited and marked as “in use”.
  4. Everything else is unused and can be thrown away.

Now, it is fairly easy to construct a Java program that satisfies the above definition of a memory leak:


public class Calc {
  private Map cache = new HashMap();
  public int square(int i) {
    int result = i * i;
    cache.put(i, result);
    return result;
  }
  public static void main(String[] args) throws Exception {
    Calc calc = new Calc();
    while (true) {
      System.out.println("Enter a number between 1 and 100");
      int i = readUserInput(); //not shown
      System.out.println("Answer " + calc.square(i));
    }
  }
}

 

This program reads one number at a time from its user and calculates its square value. This implementation uses a primitive “cache” for storing the results of the calculation. But since these results are never read from the cache, the code block represents a memory leak according to our definition above. If we let this program run and interact with users long enough, the “cached” results consume a lot of memory.

This brings us to another important aspect of memory leaks: how big should the leak be to justify the trouble of investigating and fixing it? Technically, whenever you leave an object that you don't use anymore, laying around, you create waste. Practically, a couple of kilobytes here and there don't really constitute real problems for modern applications, especially the “enterprise” ones :) But a leak is a leak, even if its just 2 bytes. 

Which leads us to a simple corollary: a memory leak is like good wine - it needs aging :) If you want to demonstrate the leak or, more importantly, fix it, you really should let it grow. Tiny memory leaks are lost within all those objects that are present in an application at any given point of time. Regardless of the tool you use for identify memory leaks  – be it a profiler, a memory dump analyzer, an APM, or a special-purpose leak finder tool like Plumbr – there should be a lot of objects that outlived their usefulness. Which means that your application should run for a significant period of time AND as many different parts of your application should be executed as possible. Otherwise you will be looking for a needle in a haystack.

If you would like to know more about Java memory leaks, especially about different ways to hunt them down and fix them in your applications, check out our series of blog posts, titled "Solving OutOfMemoryError". To know more about the tool we are building to solve the problem altogether, consult our short screencast. And stay tuned to our twitter @JavaPlumbr. Till next time!

Published at DZone with permission of Nikita Salnikov-tarnovski, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)