Pierre-Hugues Charbonneau (nickname P-H) is working for CGI Inc. Canada for the last 10 years as a senior IT consultant and system architect. His primary area of expertise is Java EE, middleware & JVM technologies. He is a specialist in production system troubleshooting, middleware & JVM tuning and scalability / capacity improvement; including processes improvement for Java EE support teams. Pierre - Hugues is a DZone MVB and is not an employee of DZone and has posted 43 posts at DZone. You can read more from them at their website. View Full User Profile

Java Thread Deadlock: A Case Study

05.05.2012
| 8211 views |
  • submit to reddit

This article will describe the complete root cause analysis of a recent Java deadlock problem observed from a Weblogic 11g production system running on the IBM JVM 1.6.This case study will also demonstrate the importance of mastering Thread Dump analysis skills; including for the IBM JVM Thread Dump format.

Environment specification

 

  • Java EE server: Oracle Weblogic Server 11g & Spring 2.0
  • OS: AIX 5.3
  • Java VM: IBM JRE 1.6.0
  • Platform type: Portal & ordering application

Monitoring and troubleshooting tools


  • JVM Thread Dump (IBM JVM format)
  • Compuware Server Vantage (Weblogic JMX monitoring & alerting)

Problem overview
A major stuck Threads problem was observed & reported from Compuware Server Vantage and affecting 2 of our Weblogic 11g production managed servers causing application impact and timeout conditions from our end users.
Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non-technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:


·        What is the client impact? MEDIUM (only 2 managed servers / JVM affected out of 16) ·        Recent change of the affected platform? Yes (new JMS related asynchronous component) ·        Any recent traffic increase to the affected platform? No ·        How does this problem manifest itself?  A sudden increase of Threads was observed leading to rapid Thread depletion ·        Did a Weblogic managed server restart resolve the problem? Yes, but problem is returning after few hours (unpredictable & intermittent pattern)
-        Conclusion #1: The problem is related to an intermittent stuck Threads behaviour affecting only a few Weblogic managed servers at the time -        Conclusion #2: Since problem is intermittent, a global root cause such as a non-responsive downstream system is not likely

Thread Dump analysis – first pass


The first thing to do when dealing with stuck Thread problems is to generate a JVM Thread Dump. This is a golden rule regardless of your environment specifications & problem context. A JVM Thread Dump snapshot provides you with crucial information about the active Threads and what type of processing / tasks they are performing at that time.


Now back to our case study, an IBM JVM Thread Dump (javacore.xyz format) was generated which did reveal the following Java Thread deadlock condition below:

1LKDEADLOCK    Deadlock detected !!!
NULL           ---------------------
NULL          
2LKDEADLOCKTHR  Thread "[STUCK] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)'" (0x000000012CC08B00)
3LKDEADLOCKWTR    is waiting for:
4LKDEADLOCKMON      sys_mon_t:0x0000000126171DF8 infl_mon_t: 0x0000000126171E38:
4LKDEADLOCKOBJ      weblogic/jms/frontend/FESession@0x07000000198048C0/0x07000000198048D8:
3LKDEADLOCKOWN    which is owned by:
2LKDEADLOCKTHR  Thread "[STUCK] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'" (0x000000012E560500)
3LKDEADLOCKWTR    which is waiting for:
4LKDEADLOCKMON      sys_mon_t:0x000000012884CD60 infl_mon_t: 0x000000012884CDA0:
4LKDEADLOCKOBJ      weblogic/jms/frontend/FEConnection@0x0700000019822F08/0x0700000019822F20:
3LKDEADLOCKOWN    which is owned by:
2LKDEADLOCKTHR  Thread "[STUCK] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)'" (0x000000012CC08B00)

This deadlock situation can be translated as per below:


-        Weblogic Thread #8 is waiting to acquire an Object monitor lock owned by Weblogic Thread #10 -        Weblogic Thread #10 is waiting to acquire an Object monitor lock owned by Weblogic Thread #8
Conclusion: both Weblogic Threads #8 & #10 are waiting on each other; forever!
Now before going any deeper in this root cause analysis, let me provide you a high level overview on Java Thread deadlocks.

Java Thread deadlock overview


Most of you are probably familiar with Java Thread deadlock principles but did you really experience a true deadlock problem?

From my experience, true Java deadlocks are rare and I have only seen ~5 occurrences over the last 10 years. The reason is that most stuck Threads related problems are due to Thread hanging conditions (waiting on remote IO call etc.) but not involved in a true deadlock condition with other Thread(s).

A Java Thread deadlock is a situation for example where Thread A is waiting to acquire an Object monitor lock held by Thread B which is itself waiting to acquire an Object monitor lock held by Thread A. Both these Threads will wait for each other forever. This situation can be visualized as per below diagram:



Thread deadlock is confirmed…now what can you do?


Once the deadlock is confirmed (most JVM Thread Dump implementations will highlight it for you), the next step is to perform a deeper dive analysis by reviewing each Thread involved in the deadlock situation along with their current task & wait condition.Find below the partial Thread Stack Trace from our problem case for each Thread involved in the deadlock condition:


** Please note that the real application Java package name was renamed for confidentiality purposes **
Weblogic Thread #8
"[STUCK] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)'" J9VMThread:0x000000012CC08B00, j9thread_t:0x00000001299E5100, java/lang/Thread:0x070000001D72EE00, state:B, prio=1
(native thread ID:0x111200F, native priority:0x1, native policy:UNKNOWN)
Java callstack:
       at weblogic/jms/frontend/FEConnection.stop(FEConnection.java:671(Compiled Code))
       at weblogic/jms/frontend/FEConnection.invoke(FEConnection.java:1685(Compiled Code))
       at weblogic/messaging/dispatcher/Request.wrappedFiniteStateMachine(Request.java:961(Compiled Code))
       at weblogic/messaging/dispatcher/DispatcherImpl.syncRequest(DispatcherImpl.java:184(Compiled Code))
       at weblogic/messaging/dispatcher/DispatcherImpl.dispatchSync(DispatcherImpl.java:212(Compiled Code))
       at weblogic/jms/dispatcher/DispatcherAdapter.dispatchSync(DispatcherAdapter.java:43(Compiled Code))
       at weblogic/jms/client/JMSConnection.stop(JMSConnection.java:863(Compiled Code))
       at weblogic/jms/client/WLConnectionImpl.stop(WLConnectionImpl.java:843)
       at org/springframework/jms/connection/SingleConnectionFactory.closeConnection(SingleConnectionFactory.java:342)
       at org/springframework/jms/connection/SingleConnectionFactory.resetConnection(SingleConnectionFactory.java:296)
       at org/app/JMSReceiver.receive()
……………………………………………………………………
Weblogic Thread #10
"[STUCK] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'" J9VMThread:0x000000012E560500, j9thread_t:0x000000012E35BCE0, java/lang/Thread:0x070000001ECA9200, state:B, prio=1
 (native thread ID:0x4FA027, native priority:0x1, native policy:UNKNOWN)
Java callstack:
       at weblogic/jms/frontend/FEConnection.getPeerVersion(FEConnection.java:1381(Compiled Code))
       at weblogic/jms/frontend/FESession.setUpBackEndSession(FESession.java:755(Compiled Code))
       at weblogic/jms/frontend/FESession.consumerCreate(FESession.java:1025(Compiled Code))
       at weblogic/jms/frontend/FESession.invoke(FESession.java:2995(Compiled Code))
       at weblogic/messaging/dispatcher/Request.wrappedFiniteStateMachine(Request.java:961(Compiled Code))
       at weblogic/messaging/dispatcher/DispatcherImpl.syncRequest(DispatcherImpl.java:184(Compiled Code))
       at weblogic/messaging/dispatcher/DispatcherImpl.dispatchSync(DispatcherImpl.java:212(Compiled Code))
       at weblogic/jms/dispatcher/DispatcherAdapter.dispatchSync(DispatcherAdapter.java:43(Compiled Code))
       at weblogic/jms/client/JMSSession.consumerCreate(JMSSession.java:2982(Compiled Code))
       at weblogic/jms/client/JMSSession.setupConsumer(JMSSession.java:2749(Compiled Code))
       at weblogic/jms/client/JMSSession.createConsumer(JMSSession.java:2691(Compiled Code))
       at weblogic/jms/client/JMSSession.createReceiver(JMSSession.java:2596(Compiled Code))
       at weblogic/jms/client/WLSessionImpl.createReceiver(WLSessionImpl.java:991(Compiled Code))
       at org/springframework/jms/core/JmsTemplate102.createConsumer(JmsTemplate102.java:204(Compiled Code))
       at org/springframework/jms/core/JmsTemplate.doReceive(JmsTemplate.java:676(Compiled Code))
       at org/springframework/jms/core/JmsTemplate$10.doInJms(JmsTemplate.java:652(Compiled Code))
       at org/springframework/jms/core/JmsTemplate.execute(JmsTemplate.java:412(Compiled Code))
       at org/springframework/jms/core/JmsTemplate.receiveSelected(JmsTemplate.java:650(Compiled Code))
       at org/springframework/jms/core/JmsTemplate.receiveSelected(JmsTemplate.java:641(Compiled Code))
       at org/app/JMSReceiver.receive()
……………………………………………………………
As you can see in the above Thread Strack Traces, such deadlock did originate from our application code which is using the Spring framework API for the JMS consumer implementation (very useful when not using MDB’s). The Stack Traces are quite interesting and revealing that both Threads are in a race condition against the same Weblogic JMS consumer session / connection and leading to a deadlock situation:
-        Weblogic Thread #8 is attempting to reset and close the current JMS connection -        Weblogic Thread #10 is attempting to use the same JMS Connection / Session in order to create a new JMS consumer -        Thread deadlock is triggered!
Root cause: non Thread safe Spring JMS SingleConnectionFactory implementation
A code review and a quick research from Spring JIRA bug database did reveal the following Thread safe defect below with a perfect correlation with the above analysis:
# SingleConnectionFactory's resetConnection is causing deadlocks with underlying OracleAQ's JMS connection https://jira.springsource.org/browse/SPR-5987
A patch for Spring SingleConnectionFactory was released back in 2009 which did involve adding proper synchronized{} block in order to prevent Thread deadlock in the event of a JMS Connection reset operation:
synchronized (connectionMonitor) {
  //if condition added to avoid possible deadlocks when trying to reset the target connection
  if (!started) {
    this.target.start();
    started = true;
  }
}

Solution

Our team is currently planning to integrate this Spring patch in to our production environment shortly. The initial tests performed in our test environment are positive.


Conclusion

I hope this case study has helped understand a real-life Java Thread deadlock problem and how proper Thread Dump analysis skills can allow you to quickly pinpoint the root cause of stuck Thread related problems at the code level. Please don’t hesitate to post any comment or question.

  
Published at DZone with permission of Pierre - Hugues Charbonneau, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Alessandro De S... replied on Sun, 2014/01/05 - 3:27pm

For thread deadlocks analysis and monitoring you can take a look at SpyGlass Thread Lock Monitor .

This is a free VisualVM plugin.

You can monitor in realtime applicative locks (locked threads, locking threads and lock conditions) and deadlocked threads to understand application behavior and error conditions.

It can be used on remote systems.

Hope you'll find useful.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.