You can tune a server's thread detection behavior by changing the length of time before a thread is diagnosed as stuck (Stuck Thread Max Time), and by changing the frequency with which the server checks for stuck threads. Check here to see how to change the Stuck Thread Max Time.
The problem or Why are Stuck Threads evil?
WebLogic Server automatically detects when a thread in an execute queue becomes "stuck." Because a stuck thread cannot complete its current work or accept new work, the server logs a message each time it diagnoses a stuck thread. If all threads in an execute queue become stuck, the server changes its health state to either "warning" or "critical" depending on the execute queue:
- If all threads in the default queue become stuck, the server changes its health state to "critical." (You can set up the Node Manager application to automatically shut down and restart servers in the critical health state. For more information, see "Node Manager Capabilities" in Configuring and Managing WebLogic Server.)
- If all threads in weblogic.admin.HTTP, weblogic.admin.RMI, or a
user-defined execute queue become stuck, the server changes its health
state to "warning."
What you can do to avoid your application completely fail?
WebLogic Server checks for stuck threads periodically (this is the Stuck Thread Timer Interval and you can adjust it here). If all application threads are stuck, a server instance marks itself failed, if configured to do so, exits. You can configure Node Manager or a third-party high-availability solution to restart the server instance for automatic failure recovery.You can configure these actions to occur when not all threads are stuck, but the number of stuck threads have exceeded a configured threshold:Shut down the Work Manager if it has stuck threads. A Work Manager that is shut down will refuse new work and reject existing work in the queue by sending a rejection message. In a cluster, clustered clients will fail over to another cluster member.
- Shut down the application if there are stuck threads in the application. The application is shutdown by bringing it into admin mode. All Work Managers belonging to the application are shut down, and behave as described above.
- Mark the server instance as failed and shut it down it down if there
are stuck threads in the server. In a cluster, clustered clients that
are connected or attempting to connect will fail over to another cluster
How to identify the problem?
The most recommended way is to check the thread dumps. Check Sending Email Alert For Stuck Threads With Thread Dumps post of Middleware magic, to have Thread Dumps mailed to you automatically when they occur.
Tools to help you with analyzing the Thread Dumps can be:
How to workaround the problem?
After you have identify the code that causes the Stuck Thread, that is the code which execution takes more than the Stack Thread Max Time, you can use Work Manager to execute your code. Work Managers have a Ignore Stuck Thread options that gives the ability to execute long running jobs. See below:
Below are some posts on how to create a Work Manager
Test: How to create a Stuck Thread?
How to create a Stuck Thread in order to test your weblogic settings? Put a breakpoint in a backing bean or model method that is called with you request. If you wait in the breakpoint for Stuck Max Thread Time, you notice a Stuck Thread trace will be shown in servers log:
<16 =Ύί 2011 12:28:22 ΉΉ EET> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "134" seconds working on the request "weblogic.servlet.internal.ServletRequestImpl@6e6f4718[ GET /---/---/----/---/days.xhtml HTTP/1.1 Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-GB,en-US;q=0.8,en;q=0.6 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: JSESSIONID=DYG5TDTZSnKLTFw5CMMdLCD9sPsZS4Jqlmxj9wdGNyt1BnPcfNrR!-1520792836 ]", which is more than the configured time (StuckThreadMaxTime) of "60" seconds. Stack trace: --------------------------------------------(--------------------.java:83) javax.faces.component.UIComponentBase.encodeBegin(UIComponentBase.java:823) com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.encodeRecursive(HtmlBasicRenderer.java:285) com.sun.faces.renderkit.html_basic.GridRenderer.renderRow(GridRenderer.java:185) com.sun.faces.renderkit.html_basic.GridRenderer.encodeChildren(GridRenderer.java:129) javax.faces.component.UIComponentBase.encodeChildren(UIComponentBase.java:848) org.primefaces.renderkit.CoreRenderer.renderChild(CoreRenderer.java:55) org.primefaces.renderkit.CoreRenderer.renderChildren(CoreRenderer.java:43) org.primefaces.component.fieldset.FieldsetRenderer.encodeContent(FieldsetRenderer.java:95) org.primefaces.component.fieldset.FieldsetRenderer.encodeMarkup(FieldsetRenderer.java:76) org.primefaces.component.fieldset.FieldsetRenderer.encodeEnd(FieldsetRenderer.java:53) javax.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:878) javax.faces.component.UIComponent.encodeAll(UIComponent.java:1620) javax.faces.render.Renderer.encodeChildren(Renderer.java:168) javax.faces.component.UIComponentBase.encodeChildren(UIComponentBase.java:848) org.primefaces.renderkit.CoreRenderer.renderChild(CoreRenderer.java:55) org.primefaces.renderkit.CoreRenderer.renderChildren(CoreRenderer.java:43) org.primefaces.component.panel.PanelRenderer.encodeContent(PanelRenderer.java:229) org.primefaces.component.panel.PanelRenderer.encodeMarkup(PanelRenderer.java:152)
- Excellent post by Frank Munz: WebLogic Stuck Threads: Creating, Understanding and Dealing with them. Updated for Weblogic 12c. Includes sample app for creating Stuck Thread too.
- Maxence Button excellent post: http://m-button.blogspot.com/2008/07/using-wlst-to-perform-regular.html