SQL Zone is brought to you in partnership with:

Pierre-Hugues Charbonneau (nickname P-H) is working for CGI Inc. Canada for the last 10 years as a senior IT consultant and system architect. His primary area of expertise is Java EE, middleware & JVM technologies. He is a specialist in production system troubleshooting, middleware & JVM tuning and scalability / capacity improvement; including processes improvement for Java EE support teams. Pierre - Hugues is a DZone MVB and is not an employee of DZone and has posted 42 posts at DZone. You can read more from them at their website. View Full User Profile

Oracle Weblogic Stuck Thread Detection

10.09.2013
| 8733 views |
  • submit to reddit
The following question will again test your knowledge of the Oracle Weblogic threading model. I’m looking forward for your comments and experience on the same.

If you are a Weblogic administrator, I’m certain that you heard of this common problem: stuck threads. This is one of the most common problems you will face when supporting a Weblogic production environment.


A Weblogic stuck thread simply means a thread performing the same request for a very long time and more than the configurable Stuck Thread Max Time.



Question:

How can you detect the presence of STUCK threads during and following a production incident?
Answer:

As we saw from our last article “Weblogic Thread Monitoring Tips”, Weblogic provides functionalities allowing us to closely monitor its internal self-tuning thread pool. It will also highlight you the presence of any stuck thread.


This monitoring view is very useful when you do a live analysis but what about after a production incident? The good news is that Oracle Weblogic will also log any detected stuck thread to the server log. Such information includes details on the request and more importantly, the thread stack trace. This data is crucial and will allow you to potentially better understand the root cause of any slowdown condition that occurred at a certain time.
<Sep 25, 2013 7:23:02 AM EST> <Error> <WebLogicServer> <Server1> <App1>
< ExecuteThread: '11' for queue: 'weblogic.kernel.Default (self-tuning)'>
<BEA-000337> <[STUCK] ExecuteThread: '35' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "608" seconds working on the request
"Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 608213 ms
POST /App1/jsp/test.jsp HTTP/1.1
Accept: application/x-ms-application...
Referer: http://..
Accept-Language: en-US
User-Agent: Mozilla/4.0 ..
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Content-Length: 539
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: JSESSIONID=
]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
  <Application Execution Stack Trace>
  ...................................
  javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
  javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
  weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
  weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
  weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:301)
  weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:184)
  weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction....
  weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run()
  weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
  weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
  weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2281)
  weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2180)
  weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1491)
  weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
  weblogic.work.ExecuteThread.run(ExecuteThread.java:221)

Here is one more tip: the generation and analysis of a JVM thread dump will also highlight you stuck threads. As we can see from the snapshot below, the Weblogic thread state is now updated to STUCK, which means that this particular request is being executed since at least 600 seconds or 10 minutes. 



This is very useful information since the native thread state will typically remain to RUNNABLE. The native thread state will only get updated when dealing with BLOCKED threads etc. You have to keep in mind that RUNNABLE simply means that this thread is healthy from a JVM perspective. However, it does not mean that it truly is from a middleware or Java EE container perspective. This is why Oracle Weblogic has its own internal ExecuteThread state.

Finally, if your organization or client is using any commercial monitoring tool, I recommend that you enable some alerting around both hogging thread and stuck thread. This will allow your support team to take some pro-active actions before the affected Weblogic managed server(s) become fully unresponsive.
Published at DZone with permission of Pierre - Hugues Charbonneau, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)