I am the senior IT manager at the Macau Productivity and Technology Transfer Center. I've written popular books on agile and web technologies and created an open source testing library for Wicket. kent is a DZone MVB and is not an employee of DZone and has posted 7 posts at DZone. You can read more from them at their website. View Full User Profile

Architectural improvements for JVM to be enterprise ready

06.04.2010
| 5780 views |
  • submit to reddit

Time proven architecture for long running services

I’ve observed that once in a while our long running Tomcat instance will get slower and slower, until it (the JVM) is restarted. Obviously the Tomcat developers are experts in enterprise Java, why is it still happening and how to fix it? One could look for specific problems in the code, but a much better approach is to adopt a time proven architecture, particularly well-known in traditional Unix daemon processing:

  1. The master daemon process is started.
  2. The master daemon process spawns some child processes to handle client requests
  3. After handling a limited number of requests, a child process will terminate itself (or done by the master). The key point here is that the OS will free any resources (e.g., memory, file handles, sockets) allocated to that child process; there is no way to leave anything behind after its death.

This architecture ensures a long-running service without any degrading, even if the code is poorly written and has resource leaks.

How can Java support this architecture?

So, does Java support this currently? The answer is no. To do that:

  1. The JVM needs to support the  concept of isolated process.
  2. Objects created by the JVM itself and different a process must NOT be allowed to refer to one other, otherwise there would be no way to cleanly kill a process.
  3. The JVM must allow efficiently forking a process to quickly create a child process. This way, the master process can perform all kinds of lengthy initialization (e.g., Hibernate and Spring initialization), while it is still very quick to spawn a child in the initialized state.
References
Published at DZone with permission of kent tong, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Artur Karazniewicz replied on Fri, 2010/06/04 - 10:21am

From my experience this kind of problems are in 99% because of badly written application code, not runtime (Tomcat in this case). If You're sure it's Tomcat - just fill JIRA and ask Tomcat guys to fix this stuff, instead bloating JVM with something like in-jvm "isolated process". First You would like to implement JVM level "processes", then we should implement some IPC stuff, then maybe little bit IO... oh wait, it's OS, isn't it? And it's not trivial. Do You remember green threads, and how they failed, badly? So, if You need processes... just use native OS stuff.

http://blog.restfusion.com
Artur Karazniewicz

Casper Bang replied on Fri, 2010/06/04 - 10:46am

Personally I consider the whole Java servlet container model broken anyway because of the dreaded and all too common PermGenSpace issue. Until this is fixed at the JVM GC level (remove the Permanent generation all-together?), I hesitate running more than on application per container instance. This appears to be the only sure way of keeping these well isolated and prevent redeployment of app#1 to bring down app#n.

Arek Stryjski replied on Fri, 2010/06/04 - 11:07am in response to: Artur Karazniewicz

this kind of problems are in 99% because of badly written application code

The article says it clearly that it is solution for this kind of problems.

if the code is poorly written and has resource leaks

I'm interested, how this will compare with light threads from languages like Erlang. Maybe we don't need to change JVM at all. I'm interested now in projects like Erjang (http://wiki.github.com/krestenkrab/erjang/) and if this could make us write better application (even if we are not perfect ourself).

Osvaldo Doederlein replied on Fri, 2010/06/04 - 12:52pm

The existence of innumerable Enterprise-grade, server apps written in this platform (including full-blown JavaEE containers), and running 24x7 without needing restart, somewhat flies in the face of your comments doesn't it?

This model of spawning per-request processes is stupid. Thanks god Java is not Ruby and I don't need to do that for any reason.

The Permgen is an issue but it's implementation detail of HotSpot that can be fixed (and Oracle has said they're going to fix it.) Meanwhile it's usually easy to work around.

Isolates would obviously be a good addition to the Java arsenal, for different reasons - most remarkably shared hosting, or (in client apps) shared VM for multiple desktop apps as a way to reduce the per-process memory and loading time overheads. But any server-side app that depends on such feature to just keep running, is a broken application.

Gilbert Herschberger replied on Fri, 2010/06/04 - 3:49pm

Take another look at "configuring Tomcat for high availability". We can already launch as many instances of Tomcat as we want, start and stop them as we want, deploy applications as we want. The procedure is well-documented.

Software is imperfect. It fails. Only a few applications work continuously without failure for a long period of time. I agree that it can be frustrating to deal with an application that works most of the time. Remember, "most of the time" is far from "all of the time".

What do we expect when an application does not work that one time out of a thousand? We expect our operating system to step in and protect us. For a web application, Tomcat is the operating system. We expect it to automatically restart a web application when it misbehaves, without restarting all web applications. With high availability, we restart an instance of Tomcat when its performance degrades. But it should not be necessary, should it?

Take another look at the Isolates API. It has been documented and specified, but remains largely unimplemented. The specification provides the justification for separate instance of the JVM within a native process. Some argue that, since they do not need it, no one does. Some argue that, because they cannot see how it might help them, it must be a waste of time. For those of us that need it, we need it.

Do you remember that Java itself was a questionable idea for a while? We had ways of dealing with multiple platforms before Java, you know. It did not require us to invent a new language and build a virtual machine. When we needed a daemon to run 24/7, we build a watchdog for it. A watchdog kills a daemon if it hangs (or its performance degrades), starts a daemon if it stops.

You cannot guarantee that every programmer is going to write good code. Let's say that you are building an Internet site capable of hosting 10,000 Java EE projects, something like sourceforge.net. Each project deploys their web application on Tomcat. How do you guarantee that one project doesn't crash the entire site? With cloud computing, you might run 10,000 separate virtual machines. Each virtual machine would have an operating system and enough to support one instance of Tomcat.

On the other hand, the Isolates API eliminates the need for so many virtual machines. High availability becomes internal. You would have fewer operating systems, fewer instance of Tomcat and more isolates.

So, your mission-critical web application doesn't work in Tomcat. I think that this is just the tip of the iceburg.

Alessandro Santini replied on Fri, 2010/06/04 - 5:32pm in response to: Osvaldo Doederlein

I agree with Osvaldo - the large number of enterprise-grade Java/J2EE applications running for lengthy periods of time make me think that this problem is not so widespread.

I would rephrase the word "stupid" with "inefficient", but I am otherwise on his boat.

As to the permgen space issue, I leave it to the Tomcat + Hibernate freaks (the majority of those affected by this problem, AFAIK). I stand clear of both technologies and I have never witnessed such a problem, besides being orders of magnitude more productive than those using Hibercrap.

James Imber replied on Sat, 2010/06/05 - 9:12am

it might also be a memory fragmentation issue within the JVM. It could happen with the current model of garbage collection. IT might be solved by G1 but I'm not sure.

Arnout Engelen replied on Sat, 2010/06/05 - 2:25pm in response to: Casper Bang

I consider the whole Java servlet container model broken anyway because of the dreaded and all too common PermGenSpace issue

What issue would that be? I don't think I've seen PermGen problems that couldn't simply be fixed by, well, increading the PermGen space :). Even for large applications, the PermGen space requirements aren't that big.

Smeltet Kerne replied on Sun, 2010/06/06 - 3:52am

JavaEE servers don't degrade over time but the applications they host can. I have worked with most of the popular JavaEE servers for years (and many use tomcat internally for web) and I have never had the need to restart any of them.

If you are experiencing a slow degradation of performance and eventually a complete standstill then you most likely have a resource leak which is an application error not a server error. For instance if your app is using a database connection pool and under some rare circumstances don't release a connection back to the pool after use then you can get that kind of experience. But this is difficult to trigger now a days as many pool implementations will detect the "programmers abuse" and repair themselves at runtime.

It seems your are describing CGI/multiprocessing as a better architecture than a multithreaded server. CGI was dropped by most web servers almost a decade ago because its slow - launguages that don't have solid exception handling or garbage collection may still need it but not Java. Dropping processes after a number of uses would be a very unusual pattern for a Java application. If you handle resource management yourself rather than have a framework do it then the basic construct try {} finally {} makes it easy to guarantie that the resources you allocate are released after use no matter what happens in the try block.

 

Kapil Viren Ahuja replied on Sun, 2010/06/06 - 10:19pm

This architecture ensures a long-running service without any degrading, even if the code is poorly written and has resource leaks.

 

 While the question is interesting, I do not agree to the very concept that we should fix/upgrade the JVM to handle bad code. It would be like an "Auto-Pilot" for applications - which to me at this point does not make sense.

Victor Tsoukanov replied on Mon, 2010/06/07 - 12:24am

This architecture ensures a long-running service without any degrading, even if the code is poorly written and has resource leaks.
It is interesting how much will it cost? I mean, java threads are more lightweight then separate processes. So I think the correct code with lightweight java threads will be always faster then "the code is poorly written and has resource leaks"

Aaron Kocourek replied on Sun, 2011/05/01 - 2:18am

"While the question is interesting, I do not agree to the very concept that we should fix/upgrade the JVM to handle bad code. It would be like an "Auto-Pilot" for applications - which to me at this point does not make sense."

@Kapil Viren Ahuja I would have to disagree if they were to upgrade the JVM to handle crap code it would help a lot of developers and companies out and make their jobs easier. I think it would be an intelligent move.

Machine Sewing Thread

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.