I am founder and Master Developer of Plumbr, memory leaking detection tool. I enjoy solving problems, raising self-awareness and professional pride of developers around me. In my out-of-office time I am bookworm and computer games addict. Nikita is a DZone MVB and is not an employee of DZone and has posted 77 posts at DZone. You can read more from them at their website. View Full User Profile

A 12-year-old Bug in JDK, Still Out There Leaking Memory in our Applications

12.18.2012
| 7964 views |
  • submit to reddit

This story goes back. For weeks or even decades, depending on how you mark the starting date. Anyhow, few weeks ago one of our customers had problems with interpreting a leak reported by Plumbr. Quoting his words “It seems that Java itself is broken”.

As a matter of fact, the customer was right. Java was indeed broken. But let’s check the case and see what we can learn from it. Lets start by looking into the report generated by Plumbr. It looked similar to the one below.

From the report we can see that the application at hand contains a classloader leak. This is a specific type of a memory leak when classloaders cannot be unloaded (for example on Java EE application redeploys) and thus all the class definitions referenced by the classloader are left hanging in your permanent generation.

In this specific case there are 14,343 class definitions wasting your precious PermGen:

Plumbr report leak

Those classes are all loaded by the org.apache.catalina.loader.WebAppClassloader which cannot be garbage collected because it is still referenced through the following chain:

  • This classloader is referenced from a field contextClassLoader in a java.lang.Thread instance.
  • The Thread blocking our classloader to be garbage collected is referenced from the sun.net.www.http.KeepAliveCache instance field keepAliveTimer
  • And last in this hierarchy is sun.net.www.http.HttpClient which seems to do something clever and keeping a cache of something internally used in an instance variable kac.

Now we are indeed in a situation where all the symptoms of the problem at hand are pinpointing to the JVM internals and not to the application code. Could this really be true?

Immediately after googling for “sun.net.www.http.HttpClient leak” I stumbled upon endless pages of references to the same problem. And about the same amount of different workarounds found for different libraries and application servers. So for some reason indeed it seems like the caching solution in this HttpClient class does not let go of the internal keep-alive cache. Which in turn refuses to release a reference to the classloader it was created in.

But what is the actual cause for it? Most of the stackoverflow threads or the application server vendor bugreports only offered workarounds to the problem. But there has to be a real reason why this keeps happening. Some more googling revealed a possible suspect in Oracle Java SE public bug database – an issue 7008595.

Let’s look into the issue and see what we can conclude from it. First, those of you who are not familiar with a nice bug report – take another look into it and learn. This is how you should file a report – with minimal test case to reproduce the problem and just two steps to go through when concluding the test. But praising aside, it seems that this problem has been present in Java at least since 1.4 was released. And was patched in a 2011 Java 7 release. Which translates to at least NINE years of buggy releases and thousands (maybe even millions) of affected applications.

But now into the code packaged along with the sample testcase. Its relatively simple. In very general level it goes through the following steps:

  • After start, application creates a new classloader and sets this newly created classloader as a context classloader to the running thread. This is done to emulate a typical web application, where the classloader of the current thread is a special classloader and not inherited from the system. So the author sets the context classloader to his own.
  • Next it loads a new class using the newly created classloader and invokes a getConnection() static method on this class.
  • The getConnection() method opens an URL, connects to it, reads the content and closes the stream. In the very same method author is doing something completely weird as well. Namely allocating 20MB to a byte array never used. He is doing it solely to highlight the leak later on, so I guess we do not have to point fingers and call him mad here. Let’s be grateful instead.
  • Now all the references are set to null and System.gc() is called within the the code.
  • One should now expect that the to the ApplicationClass declaration is now garbage collected as it is no longer referenceable from anywhere.

After walking you through the steps the test application is using we are now ready to compile and run the application. For this run I used the latest Java 6 build 37 available. After I have run the application, taken a heap dump and opened it in Eclipse MAT we see the problem staring right into our face:

JDK leak Eclipse MAT JDK leak Eclipse MAT rootAs we can see, out ApplicationClass with its 20MB byte array is still alive. Why? Because it is held by our custom MyClassloader which is used as context class loader for KeepAlive time thread.

And if you are thinking now that you will never mess with custom classloaders and so this whole talk is not relevant to you, then think again. Vast majority of java developers work with customer classloaders every day. Most often with classloaders your application servers (like Tomcat or Glassfish or JBoss) use for creating and loading web applications. If your web application open a http connection somewhere, and as a result KeepAlive timer is spawn, I congratulate you. You have the exact memory leak described in this article.

So indeed, we have verified the assumption that “Java is broken”. And has been broken ever since the Java 1.4 was release. Which was 12 years ago. Luckily the new patches to Java 7 no longer have this problem. But as the different statistics show – vast majority of the applications out there have not migrated to Java 7 as we speak. So most often than not, your application at hand has got the very same problem waiting to surface.

In either case, the story definitely serves as a great case study on how hard it is to trace down a memory leak. Or how difficult it used to be without Plumbr. It took just one customer with one report and the culprit was staring right into our face. But this is now turning into a commercial and this is not you guys are into, so I am going to stop here.

If you enjoyed the post then – stay tuned for more. We get new and interesting insights from JVM on a daily basis nowadays. Unfortunately we do have to work with our product also every once in awhile, but I do promise interesting posts on a weekly basis!

 

Published at DZone with permission of Nikita Salnikov-tarnovski, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Clemens Eisserer replied on Tue, 2012/12/18 - 7:30am

So,what is this article all about?
Complaining about a specific bug which has already been fixed for Java7, to get Oracle backport it to Java6? Otherwise I can't understand the language chosen - I mean is Linux or Windows broken just because it has unfixed bugs?

As usual, Oracle has to priorize bugs.
In case a bug is really hurting you but not enough other users, these days its fairly simple to get a fix in. I've patched a few issues myself, and beside a friendly and helpful community arround java, you'll find the whole process quite smooth.
Fixing stuff instead of downplaying others work is what really makes a difference!

John J. Franey replied on Tue, 2012/12/18 - 9:21am

The customer did not have an issue with the leak.  Does anyone?  The production environment with such an app (that makes http client calls) in a long running, hot deployed app server?  The developer doing repeated hot-deploy with such an app?  Also note, Tomcat has implemented a workaround  almost 3 years ago, so I wonder if other app servers have resolved this some way as well.

The customer had an issue with explaining the Plumbr report.  He would otherwise not need to know anything about this leak.  I think you identified two 'bugs'.  One in the jdk and one in Plumbr?

On one hand, its cool that memory detectors can find such things, but on the other hand, it would be better if customers didn't have to drill down on issues that are not consequential.

Jose Fernandez replied on Tue, 2012/12/18 - 11:46am in response to: Clemens Eisserer

 The author is "founder and Master Developer of Plumbr", so that's what this article is really about.

Wal Rus replied on Tue, 2012/12/18 - 11:57am

I wish he didn't think everyone around him is stupid... plumber.

Nikita Salnikov... replied on Tue, 2012/12/18 - 3:22pm in response to: Clemens Eisserer

This article started as a blog post. And like most blog posts this article is about some experience author deemed worthy sharing with the community. I can outline at least following points that can be taken from this article:

  1. It is good to file bug reports to Oracle, as those bugs gets finally fixed.
  2. A demo of really good bug report
  3. It is worth upgrading to the latest JDK version available. And this point is worth stressing again and again, because many and many production environments still lives with java 6 or 5.
  4. Open-sourcing java was a really great step, as old bugs are finally getting fixed.
So you are absolutely right: fixing bugs your have encountered really makes a difference.

Nikita Salnikov... replied on Tue, 2012/12/18 - 3:26pm in response to: John J. Franey

It is our believe, that all bugs should be brought to day light. Whether to fix all of them - that is the question of prioritising resource, yes. But to expose them we must.

And I do not like the idea of every application server fixing one and the same bug in jdk. This means tons of wasted efforts.

Nikita Salnikov... replied on Tue, 2012/12/18 - 3:28pm in response to: Wal Rus

Can we please keep discussion constructive? Thank you.

Clemens Eisserer replied on Tue, 2012/12/18 - 4:15pm

> It is our believe, that all bugs should be brought to day light.
> Whether to fix all of them - that is the question of prioritising resource, yes.
> But to expose them we must.

That's what Oracle's bug database is for ;)

John J. Franey replied on Tue, 2012/12/18 - 6:00pm in response to: Nikita Salnikov-tarnovski

I agree.

However, I think I did not do well to make my main point clear.  An analyzer could be smart enough to prevent a customer from researching a non-consequential issue to save time and energy.    

The customer now knows about a leak he did not know about before, and what the cause of this leak is.  So, what will he do?  Probably he will run the app in production without a fix from oracle, because everyone else has for the last 12 years, and like everyone else, will likely not encounter memory exhaustion as a result.  In short, he wasted his time fretting over Plumbr's report.

For Plumbr, this is nothing to brag about, and blaming oracle for having this issue unresolved for 12 years only draws attention to a shortcoming of Plumbr.


Nikita Salnikov... replied on Wed, 2012/12/19 - 12:44am in response to: John J. Franey

Or customer will upgrade to jdk7, making bug go away and making live for his future developers much easier because of up-to-date technology :)

And I really disagree with your suggestions that problem detecting tool should decide by itself which problem to show to the customer and which to hide. That's completely up to the customer to configure exclusion lists.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.