Niklas has posted 30 posts at DZone. You can read more from them at their website. View Full User Profile

Your web applications work - by sheer coincidence!

09.17.2011
| 5748 views |
  • submit to reddit

This blog describes a solution to a typical concurrency problem in a web application environment, and it illustrates that you - in all likelihood - cannot be sure that your application is thread-safe. It just works - by sheer coincidence.

Last week we had a severe problem in a critical web module of our production environment. The problem was that we had to restart a server at a time where we had very high user and transaction volumes (we do ~500.000 transactions in that module per business day). The server shutdown deleted our XSL template cache. As a consequence many threads tried to recompile the templates concurrently after restart. This in turn introduced a (CPU) system overload issue. We could only restart the server by blocking the incoming requests at the web server level. In fact we only allowed some threads to bypass the web server and to enter the application server. After some warmup time we opened the web server and the system started to work at an acceptable CPU load.

We've looked at the application code that caused the issue and decided to implement an intelligent concurrency pattern that fulfills following requirements:

- the number of threads that perform a CPU intense task concurrently (like XSL template compilation) should be limited to a configurable size (=> avoid CPU overload on startup in user load peek times)
- cache the result of each CPU intense task so that it will only execute once (we had that already in our error prone solution)
- enable the system to determine whether the CPU intense task needs to execute again (=> if the application was redeployed the templates need to recompile)

I spare me the effort (and the painfullness) to post the old non-safe code. The bugs in that code were:

firstly - it did not limit the number of threads allowed to compile templates
secondly - it did not ensure that only one thread compiles a specific template at a given time (e.g. our startup page) - instead many threads tried to compile the same template concurrently - this one is critical!

That all being said, here comes the solution to such a "too-many-threads concurrency issue". Here's the code and I also added a class diagram 'cause I believe this is a common situation in web applications.

The following snippet shows our new HTMLDocumentGenerator, its responsibility is to create and cache ConcurrentTransformer instances, one for each XSL document. The responsibility of ConcurrentTransformer is to compile a XSL template and to store the result in its template variable. HTMLDocumentGenerator owns the cache (Line 3), defines a limit of threads that can compile XSL documents (Line 5) and declares a Semaphore that implements a "thread bouncer" (Line 6). The createDocument-Method (Lines 10-20) creates and caches the new ConcurrentTransformer instances for each XSL document. Example: /start/index.xsl will have its own ConcurrentTransformer instance and /start/main_menu.xsl will also have its own unique instance.

public class HTMLDocumentGenerator implements DocumentGenerator, Initialisierbar, Reinitialisierbar, HTMLDocumentGeneratorMBean {

    private final ConcurrentHashMap<String, ConcurrentTransformer> transformerCache =
        new ConcurrentHashMap<String, ConcurrentTransformer>();
    private int concurrentParserThreadCount = 10;
    protected Semaphore bouncer;

...

public void createDocument(OutputStream document, Map<String, ?> inputData) throws DocumentGeneratorAusnahme {
    ...
    ConcurrentTransformer newTransformer;
    ConcurrentTransformer transformer =
    transformerCache.putIfAbsent(xslDocumentName, newTransformer = new ConcurrentTransformer(xslDocumentName));
    if (transformer == null) {
      transformer = newTransformer;
    }
    template = transformer.generateTemplate(loadTimeApplication);
    ...
}

...

public void initialize() throws InitialisierungsAusnahme {
     transformerCache.clear();
     if (concurrentParserThreadCount <= 0) {
         concurrentParserThreadCount = 10;
     }
     bouncer = new Semaphore(concurrentParserThreadCount);
}

...

}

Notice that we use ConcurrentHashmap.putIfAbsent() which allows cache lookup and cached object creation in a single step. This is equivalent to:

if (!map.containsKey(key))
   return map.put(key, value);
else
   return map.get(key);

This is a perfect approach in a multithreaded high volume application. You could do the same with owned locks or synchronized blocks, but it will hardly be so safe (and fast!) like the example above. You will understand my statement if you look at the implementation of ConcurrentHashmap. It locks segments internally and not the whole table! This allows very high volumes of concurrent access without lock contention. In Line 18 we call the method generateTemplate that actually performs the CPU intense task (template compilation).

In Line 5 we declared a Semaphore which acts as a bouncer for the CPU intense code sections. Using this class you can control the number of concurrent threads that perform a certain task concurrently. It's important to declare the Semaphore in the HTMLDocumentGenerator class, 'cause the thread limit applies to all cached ConcurrentTransformer instances.

Now let's look at the ConcurrentTransformer that I declared as a protected inner class of HTMLDocumentGenerator. Using an inner class ConcurrentTransformer instances have immediate access to member variables of HTMLDocumentGenerator. There is one ConcurrentTransformer for each unique XSL template. 

protected class ConcurrentTransformer {

     private final Lock lock = new ReentrantLock();
     private URL url = null;
     private volatile Templates template = null;
     private long lastCompileTime = Long.MAX_VALUE;

     protected Templates generateTemplate(long ladezeitProzessApp) throws TransformerFactoryConfigurationError, TransformerConfigurationException, InterruptedException {
         if (!mustCompile(loadTimeApplication)) {
             return template;
         }
         lock.lockInterruptibly(); 
         try {           
             if (!mustCompile(loadTimeApplication)) {
                 return template;
             }
             bouncer.acquire(); 
             try {
                 TransformerFactory transFactory = TransformerFactory.newInstance();
                 URIResolver resolver = transFactory.getURIResolver();
                 transFactory.setURIResolver(new XMLTransformResolver(resolver));
                 StreamSource source = new StreamSource(url.toExternalForm());
                 template = transFactory.newTemplates(source);
                 lastCompileTime = System.currentTimeMillis();
                 return template;
             } finally {
                 bouncer.release();
             }
         } finally {
             lock.unlock();
         }
     }

     private boolean mustCompile(long loadTimeApplication) {
         return template == null || !templateCacheActive || requiresRecompile(loadTimeApplication);
     }

}

Let's go through generateTemplate step-by-step:

Line 9: check if thread needs to compile this XSL template. If false, return the compiled template. mustCompile returns true if for example the template variable is null, which means the template wasn't compiled yet. The first thread that enters generateTemplate will get mustCompile() = true 'cause the ConcurrentTransformer was just created by that thread and the template variable is null.
Line 12: Block all subsequent threads 'cause only ONE thread can compile THIS XSL template at a given time.
Line 14: check again if thread needs to compile the XSL template. Sounds weird? Imagine a second thread that was blocked at Line 12 'cause the first call to mustCompile returned true. This thread does not need to compile again 'cause the other (faster/first) thread already compiled the template.
Line 17: check if thread exceeds the permitted number of threads allowed to do compile jobs. Because the bouncer was defined in the HTMLDocumentGenerator only a limited number of threads will enter the next code block. This actually was the tricky part of the solution: Lock threads that want to compile a specific template but also ensure that the total number of active compilation tasks does not exceed a defined limit! Because the bouncer applies to all cached ConcurrentTransformer instances this is possible.
Line 19-25: Do the actual compilation work which is the CPU intense part that aused the system overload.
Line 27: Release a permit to allow other thread (in a different ConcurrentTransformer instance) to enter the compilation code.
Line 30: Release the lock so that other threads waiting for that XSL document can continue their processing. (they will return the template that was just compiled, see line 14)

Done! This solution will enforce that (1) only one thread will try to compile a specific XSL document and that (2) only a limitted amount of threads can do compilation work.

Here is the class diagram that shows the pattern-style structure of the solution:



Sleeper bugs and why systems work by coincidence


Some weeks ago I met Dr. Heinz Kabbutz. I attended his Java Specialist Master Course and he opened my eyes for problems like this. Multi-threading was his first lesson and he started by saying something like this: "Your web applications are not thread safe, they just work - by sheer coincidence!" Although its a provocative statement, he is not wrong by saying that ... The described concurrency bug was in our code for a _long_ time, say for 8 years? It just didn't show up. What changed? We migrated from WebSphere Application Server 6 to 7. And we decided to use a new XSL parser, optimized for z/OS. The old XSL parser performed good at compilation time, but the compilation result did not perform so well. The new XSL parser promised an optimized compilation result. But obviously it did take more CPU during compilation. Now, since the compilation was CPU intense with the new parser our consurrency bug turned out to be a big problem suddenly. I'd call that a "sleeper bug". It is there for years, and suddenly it sabotages your system. I believe there are many bugs like this in todays Java applications, and I believe it's not too provocative to say that you will also have some sleeper bugs in your systems. Your systems work - but by sheer coincidence!

From http://niklasschlimm.blogspot.com/2011/09/your-web-applications-work-by-sheer.html

Published at DZone with permission of its author, Niklas Schlimm.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: