Pierre-Hugues Charbonneau (nickname P-H) is working for CGI Inc. Canada for the last 10 years as a senior IT consultant and system architect. His primary area of expertise is Java EE, middleware & JVM technologies. He is a specialist in production system troubleshooting, middleware & JVM tuning and scalability / capacity improvement; including processes improvement for Java EE support teams. Pierre - Hugues is a DZone MVB and is not an employee of DZone and has posted 43 posts at DZone. You can read more from them at their website. View Full User Profile

Top 10 Causes of Java EE Enterprise Performance Problems

06.19.2012
| 118458 views |
  • submit to reddit

#6 - Application specific performance problems

To recap, so far we have seen the importance of proper capacity planning, load and performance testing, middleware environment specifications, JVM health, external systems integration, and the relational database environment. But what about the Java EE application itself? After all, your IT environment could have the fastest hardware on the market with hundreds of CPU cores, large amount of RAM, and dozens of 64-bit JVM processes; but performance can still be terrible if the application implementation is deficient. This section will focus on the most severe Java EE application problems I have been exposed to from various Java EE environments.

My primary recommendation is to ensure that code reviews are part of your regular development cycle along with release management process. This will allow you to pinpoint major implementation problems as per below and prior to major testing and implementation phases.

 

Thread safe code problems

Proper care is required when using Java synchronization and non-final static variables / objects. In a Java EE environment, any static variable or object must be Thread safe to ensure data integrity and predictable results. Wrong usage of static variable for a Java class member variable can lead to unpredictable results under load since these variables/objects are shared between Java EE container Threads (e.g., Thread B can modify static variable value of Thread A causing unexpected and wrong behavior). A class member variable should be defined as non static to remain in the current class instance context so each Thread has its own copy.

Java synchronization is also quite important when dealing with non-Thread safe data structure such as a java.util.HashMap. Failure to do so can trigger HashMap corruption and infinite looping. Be careful when dealing with Java synchronization since excessive usage can also lead to stuck Threads and poor performance.

 

Lack of communication API timeouts

It is very important to implement and test transaction (Socket read () and write () operations) and connection timeouts (Socket connect () operation) for every communication API. Lack of proper HTTP/HTTPS/TCP IP... timeouts between the Java EE application and external system(s) can lead to severe performance degradation and outage due to stuck Threads. Proper timeout implementation will prevent Threads to wait for too long in the event of major slowdown of your downstream systems.

Below are some examples for some older and current APIs (Apache & Weblogic):

 

Communication API

Vendor

Protocol

Timeout code snippet

commons-httpclient 3.0.1

Apache

HTTP/HTTPS

HttpConnectionManagerParams.setSoTimeout(txTimeout); // Transaction timeout

HttpConnectionManagerParams.setConnectionTimeout(connTimeout);  // Connection timeout

axis.jar (v1.4 1855)

Apache

WS via HTTP/HTTPS

*** Please note that version 1.x of AXIS is exposed to a known problem with SSL Socket creation which ignores the specified timeout value. Solution is to override the client-config.wsdd and setup the HTTPS transport to <transport name="https" pivot="java:org.apache.axis.transport.http.CommonsHTTPSender"/> ***

((org.apache.axis.client.Stub) port).setTimeout(timeoutMilliseconds); // Transaction & connection timeout

WLS103 (old JAX-RPC)

Oracle

WS via HTTP/HTTPS

// Transaction & connection timeout

((Stub)servicePort)._setProperty("weblogic.webservice.rpc.timeoutsecs", timeoutSecs);

WLS103 (JAX-RPC 1.1)

Oracle

WS via HTTP/HTTPS

((Stub)servicePort)._setProperty("weblogic.wsee.transport.read.timeout", timeoutMills); // Transaction timeout

((Stub)servicePort)._setProperty("weblogic.wsee.transport.connection.timeout", timeoutMills); // Connection timeout

 

I/O, JDBC or relational persistence API resources management problems

Proper coding best practices are important when implementing a raw DAO layer or using relational persistence APIs such as Hibernate. The goal is to ensure proper Session / Connection resource closure. Such JDBC related resources must be closed in a finally {} block to properly handle any failure scenario. Failure to do so can lead to JDBC Connection Pool leak and eventually stuck Threads and full outage scenario.

Same rule apply to I/O resources such as InputStream. When no longer used, proper closure is required; otherwise, it can lead so Socket / File Descriptor leak and full JVM hang.

 

Lack of proper data caching

Performance problems can be the result of repetitive and excessive computing tasks, such as I/O / disk access, content data from a relational database, and customer-related data. Static data with reasonable memory footprint should be cached properly either in the Java Heap memory or via a data cache system.

Static files such as property files should also be cached to prevent excessive disk access. Simple caching strategies can have a very positive impact on your Java EE application performance.

Data caching is also important when dealing with Web Services and XML-related APIs. Such APIs can generate excessive dynamic Class loading and I/O / disk access. Make sure that you follow such API best practices and use proper caching strategies (Singleton, etc.) when applicable. I suggest you read JAXB Case Study on that subject.

 

Excessive data caching

Ironically, while data caching is crucial for proper performance, it can also be responsible for major performance problems. Why? Well, if you attempt to cache too much data on the Java Heap, then you will be struggling with excessive garbage collections and OutOfMemoryError conditions. The goal is to find a proper balance (via your capacity planning process) between data caching, Java Heap size, and available hardware capacity.

Here is one example of a problem case from one of my IT clients:

  • Very poor performance was observed from the Weblogic portal application.
  • Data caching was implemented to improve performance with initial positive impact.
  • The more products they were adding in their product catalogue, bigger data caching requirements and Java Heap memory resulted.
  • Eventually, the IT team had to upgrade to 64-bit JVM with 8 GB per JVM process along with more CPU cores.
  • Eventually, the situation was not sustainable and design had to be reviewed.
  • The final solution ended up using a distributed data cache system, outside the Java EE middleware and JVM via separate hardware.

 

The important point to remember from this story is that when too much data caching is required to achieve proper performance level, it is time to review the overall solution and design.

 

Excessive logging

Last but not the least: excessive logging. It is a good practice to ensure proper logging within your Java EE application implementation. However, be careful with the logging level that you enable in your production environment. Excessive logging will trigger high IO on your server and increase CPU utilization. This can especially be a problem for older environments using older hardware or environments dealing with very heavy concurrent volumes. I also recommend that you implement a "reloadable" logging level facility to turn extra logging ON / OFF when required in your day to day production support.

 

#7 - Java EE middleware tuning problems

It is important to realize that your Java EE middleware specifications may be adequate but may lack proper tuning. Most Java EE containers available today provide you with multiple tuning opportunities depending on your applications and business processes needs.

Failure to implement proper tuning and best practices can put your Java EE container in a non-optimal state. I highly recommend that you review and implement proper Java EE middleware vendor recommendations when applicable.

Find below a high-level view and sample check list of what to look for.

 

 

#8 - Insufficient proactive monitoring

Lack of monitoring is not actually "causing" performance problems, but it can prevent you from understanding the Java EE platform capacity and health situation. Eventually, the environment can reach a break point, which may expose several gaps and problems (JVM memory leak, etc.). From my experience, it is much harder to stabilize an environment after months or years of operation as opposed to having proper monitoring, tools, and processes implemented from day one.

That being said, it is never too late to improve an existing environment. Monitoring can be implemented fairly easily. My recommendations follow.

 

  • Review your current Java EE environment monitoring capabilities and identify improvement opportunities.
  • Your monitoring solution should cover the end-to-end environment as much as possible; including proactive alerts.
  • The monitoring solution should be aligned with your capacity planning process discussed in our first section.

 

#9 - Saturated hardware on common infrastructure

Another common source of performance problems is hardware saturation. This problem is often observed when too many Java EE middleware environments along with its JVM processes are deployed on existing hardware. Too many JVM processes vs. availability of physical CPU cores can be a real problem killing your application performance. Again, your capacity planning process should also take care of hardware capacity as your client business is growing.

My primary recommendation is to look at hardware virtualization. Such an approach is quite common these days and has quite a few benefits such as reduced physical servers, data center size, dedicated physical resources per virtual host, fast implementation, and reduced costs for your client. Dedicated physical resources per virtual host is quite important since the last thing you want is one Java EE container bringing down all others due to excessive CPU utilization.

 

 

#10 - Network latency problems

Our last source of performance problems is the network. Major network problems can happen from time to time such as router, switch, and DNS server failures. However, the more common problems observed are typically due to regular or intermittent latency when working on a highly distributed IT environment. The diagram below highlights an example of network latency gaps between two geographic regions of a Weblogic cluster communicating with an Oracle database server located in one geographic region only.

 

 

Intermittent or regular latency problems can definitely trigger some major performance problems and affect your Java EE application in different ways.

 

  • Applications using database queries with large datasets are fully exposed to network latency due to high number of fetch iterations (back and forward across network).
  • Applications dealing with large data payloads (such as large XML data) from external systems are also exposed to network latency that can trigger intermittent high-response time when sending and receiving responses.
  • Java EE container replication process (clustering) can be affected and put at risk its fail-over capabilities (e.g., multicast or unicast packet losses).

 

Tuning strategies such as JDBC row data "prefetch", XML data compression, and data caching can help mitigate network latency. But such latency problems should be reviewed closely when first designing the network topology of a new IT environment.

I hope this article has helped you understand some of the common performance problems and pressure points you can face when developing and supporting Java EE production systems. Since each IT environment is unique, I do not expect that everybody will face the exact same problems. As such, I invite you to post your comments and share your views on the subject.

Published at DZone with permission of Pierre - Hugues Charbonneau, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Alessandro De S... replied on Sun, 2014/01/05 - 3:56pm

Complex environments can have a lot of possible problems and it is not so simple define a top ten... A part from this the most important approach to this problem is selecting a Performance Monitor Tool to have more information on your application behavior.

You can use SpyGlass Tracer  to monitor in realtime your application and identify bottlenecks, slow queries, slow methods and so on.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.