Creator of the Apache Tapestry web application framework and the Apache HiveMind dependency injection container. Howard has been an active member of the Java community since 1997. He specializes in all things Tapestry, including on-site Tapestry training and mentoring, but has lately been spreading out into fun new areas including functional programming (with Clojure), and NodeJS. Howard is a DZone MVB and is not an employee of DZone and has posted 81 posts at DZone. You can read more from them at their website. View Full User Profile

Synchronized Considered Harmful

06.15.2012
| 9381 views |
  • submit to reddit

News flash: concurrency is hard. Any time you have mutable data and multiple threads, you are just asking for abuse, and synchronized is simply not going to cut it.

I was recently contacted by a client who was load testing their Tapestry 5.3.3 application; they were using Tomcat 6.0.32 with 500 worker threads, on a pretty beefy machine: Intel Xeon X7460 @ 2.66Ghz, OpenJDK 64-Bit Server VM (14.0-b16, mixed mode). That's a machine with six cores, and 16 MB of L2 cache.

For all that power, they were tapping out at 450 requests per second. That's not very good when you have 500 worker threads ... it means that you've purchased memory and processing power just to see all those worker threads block, and you get to see your CPU utilization stay low. When synchronization is done properly, increasing the load on the server should push CPU utilization to 100%, and response time should be close to linear with load (that is to say, all the threads should be equally sharing the available processing resources) until the hard limit is reached.

Fortunately, these people approached me not with a vague performance complaint, but with a detailed listing of thread contention hotspots.

The goal with Tapestry has always been to build the code right initially, and optimize the code later if needed. I've gone through several cycles of this over the past couple of years, optimizing page construction time, or memory usage, or throughput performance (as here). In general, I follow Brian Goetz's advice: write simple, clean, code and let the compiler and Hotspot figure out the rest.

Another piece of advice from Brian is that "uncontested synchronized calls are very cheap". Many of the hotspots located by my client were, in fact, simple synchronized methods that did some lazy initialization. Here's an example:

public class InternalComponentResourcesImpl ...

    private Messages messages;

    public synchronized Messages getMessages()
    {
        if (messages == null)
            messages = elementResources.getMessages(componentModel);

        return messages;
    }
}

In this example, getting the messages can be relatively time consuming and expensive, and is often not necessary at all. That is, in most instances of the class, the getMessages() method is never invoked. There were a bunch of similar examples of optional things that are often not needed ... but can be heavily used in the cases where they are used.

It turns out that "uncontested" really means virtually no thread contention whatsoever. I chatted with Brian at the Hacker Bed & Breakfast about this, and he explained that you can quickly go from "extremely cheap" to "asymptotically expensive" when there's any potential for contention. The synchronized keyword is very limited in one area: when exiting a synchronized block, all threads that are waiting for that lock must be unblocked, but only one of those threads gets to take the lock; all the others see that the lock is taken and go back to the blocked state. That's not just a lot of wasted processing cycles: often the context switch to unblock a thread also involves paging memory off the disk, and that's very, very, expensive.

Enter ReentrantReadWriteLock: this is an alternative that allows any number of readers to share a lock, but only a single writer. When a thread attempts to acquire the write lock, the thread blocks until all reader threads have released the read lock. The cost of managing the ReentrantReadWriteLock's state is somewhat higher than synchronized, but has the huge advantage of letting multiple reader threads operate simultaneously. That means much, much higher throughput.

In practice, this means you must acquire the shared read lock to look at a field, and acquire the write lock in order to change the field.

ReentrantReadWriteLock is smart about only waking the right thread or threads when either the read lock or the write lock is released. You don't see the same thrash you would with synchronized: if a thread is waiting for the write lock, and another thread releases it, ReentrantReadWriteLock will (likely) just unblock the one waiting thread.

Using synchronized is easy; with an explicit ReentrantReadWriteLock there's a lot more code to manage:

public class InternalComponentResourcesImpl ...

    private final ReadWriteLock lazyCreationLock = new ReentrantReadWriteLock();

    private Messages messages;

    public Messages getMessages()
    {
        try
        {
            lazyCreationLock.readLock().lock();

            if (messages == null)
            {
                obtainComponentMessages();
            }

            return messages;
        } finally
        {
            lazyCreationLock.readLock().unlock();
        }
    }

    private void obtainComponentMessages()
    {
        try
        {
            lazyCreationLock.readLock().unlock();
            lazyCreationLock.writeLock().lock();

            if (messages == null)
            {
              messages = elementResources.getMessages(componentModel);
            }
        } finally
        {
            lazyCreationLock.readLock().lock();
            lazyCreationLock.writeLock().unlock();
        }
    }
}

I like to avoid nested try ... finally blocks, so I broke it out into seperate methods.

Notice the "lock dance": it is not possible to acquire the write lock if any thread, even the current thread, has the read lock. This opens up a tiny window where some other thread might pop in, grab the write lock and initialize the messages field. That's why it is desirable to double check, once the write lock has been acquired, that the work has not already been done.

Also notice that things aren't quite symmetrical: with ReentrantReadWriteLock it is allowable for the current thread to acquire the read lock before releasing the write lock. This helps to minimize context switches when the write lock is released, though it isn't expressly necessary.

Is the conversion effort worth it? Well, so far, simply by converting synchronized to ReentrantReadWriteLock, and adding a couple of additional caches (also using ReentrantReadWriteLock), we've seen some significant improvements; from 450 req/sec to 2000 req/sec ... and there's still a few minor hotspots to address. I think that's been worth a few hours of work!

 

 

 

 

Published at DZone with permission of Howard Lewis Ship, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Francesco Illuminati replied on Sun, 2012/06/17 - 3:58am

Being a simple lazy initialization and not a variable you have to update, I would go for the double checking initiazialization idiom as suggested by Bloch http://java.sun.com/developer/technicalArticles/Interviews/bloch_effective_08_qa.html :

// Double-check idiom for lazy initialization of instance fields.
private volatile FieldType field;
FieldType getField() {
    FieldType result = field;
    if (result == null) { // First check (no locking)
        synchronized(this) {
            result = field;
            if (result == null) // Second check (with locking)
                field = result = computeFieldValue();
        }
    }
     return result;
} 

But I would also seriously consider doing a proper initialization at constrution time (if the getter is used often enough the time spent constructing it is statistically irrilevant).

Stig Christensen replied on Sun, 2012/06/17 - 9:40am

 

Going from synchronized to the read/writer pattern would it this case almost be negligible. After all the only thing that was synchronized was the null instance check, and on your hardware this doesn't cost anything. The cache that your mentioned must be the real reason for the better performance.

The JRE 5+ (use of volatile and double locking) code posted by Francesco would also be my choice!

Philip Herbst replied on Sun, 2012/06/17 - 8:28am in response to: Francesco Illuminati

I would choose the static holder idiom for simple lazy initialization. Easier and no synchronized at all

dieter von holten replied on Sun, 2012/06/17 - 9:27am

i think, the code as written has a problem: the messages field should be volatile. the semantics of synchronized make sure that fields accessed in a synchronized block are flushed when the block is left. i am not aware that the locks have the same behaviour.

as you said : concurrency is hard ...

 

Darryl Miles replied on Sun, 2012/06/17 - 11:49am in response to: Francesco Illuminati

I agree with comment 1 for the scenario presented in the main article. Due to lockless reads.  One thing about synchronized is that every Object has one by default with no additional object count / memory overhead.  Use of ReadWriteLock is better when there is likely to be frequent changes made to the data being protected in the lifetime of the app; in my experience messages would not a realistic example of that.  Other than that a well written article to explain a way to do something differently.

 

With comment 1 I question the use of 'volatile' it can only make sense to ensure the Java VM does not optimize away the local variable 'FieldType result' setup in line 4.  i.e. instead of ensuring a local reference is taken of whatever non-null object may exist in the field 'field' the compiler decides to convert all use of 'result' variable into direct load/store of the field itself.  But I have not found any VM that would do this in a method coded the way comment 1 indicates.  The characteristics of this method are that the value (and field value) is either null or non-null by way of having been initialized and once it is assigned a non-null value it is never changed again, a common Java idiom.

So maybe someone can explain by pointing at the Java specs the theoretical scenario by which volatile is a requirement for the code example function correctly across all VMs (and memory models used by CPUs).

Since there is exactly one load before (and outside) the synchronized block and one compare.  Then the synchronized lock is aquired.  Up until this point volatile seems irrelevant.  Then this load is done again overwriting the old value (or in static single assignment terms, setting up a new version of variable 'result') that is then used for the compare performed inside the synchronized block.

As another commentor points out the writes to memory made inside the synchronized block can not be moved down to an execution point after the close of the synchronized block.  So we are assured of a flush of changes to real memory will be made before control leaves the synchronized block.

 

What is unclear is if the reads from memory made before (and outside) the synchronized block from locations visible to changes from elsewhere need to be reloaded again when such a load expression is evaluated inside the synchronized block.  If this is the case and volatile was not used anywhere and the Java VM optimized the code to completely remove the local variable 'result' the code would still be safe as a reload would happen again because the old value would be considered stale (even if known by the optimizer).

So I am asking does entering a synchronized block automatically make stale any aliases the optimizer knows for values that could have been modified in the meantime.

Re my choice of terminology new variables on the stack are considered private to the owning thread those references can't be modified from elsewhere.  That is not to say the object(s) the variable reference point to can't have their state changed; but the concrete reference to the object can not be changed.  Which is why the loading into a local variable is an important aspect in the comment 1 example.

 

Howard Lewis Ship replied on Mon, 2012/06/18 - 12:17pm

To reiterate comments I've made on the original blog post, the example I chose was the simplest case, for ease of discussion.

One of the main faults of synchronized is that is a lock per-object; imagine 500 threads all rendering the same Tapestry page simultaneously; those threads will be invoking synchronized methods of the same component objects with some frequency (even if they aren't invoking the exact same methods on those objects); in fact, the synchronized keyword will trend towards enforcingg that the threads are operating in lock-step.  Replaced with the ReadWriteLock, the threads will be more free to operate at their own pace, passing though locked methods without respect to what other threads are currently doing, as they walk the page's component tree.

 In any case, in an example this simple it is OK to not do any locking, as long as only a single field is updated, and that the value is idempotent; i.e., if multiple threads write to the field (due to a race condition), it doesn't matter which thread's value "wins". This works in the example, because the ElementResources object caches the result of the getComponentMessages() method anyway, so whichever threads invoke the method, the value assigned to the field will always be the same instance.

 Other examples in the code are more complicated, involving updates to more than one field, or providing the possibility of memory leaks. See the source for those examples. 

Howard Lewis Ship replied on Mon, 2012/06/18 - 12:23pm in response to: Philip Herbst

The static holder pattern is only useful when the initialization can be done statically; to my mind, that means your application is dependendent on a static variable, something I consider a design smell.  Tapestry uses no static variables (static fields exist as constants only; no static field anywhere in the framework holds a mutable value), and I believe that is one of the factors in Tapestry's demonstrated stability under load

Howard Lewis Ship replied on Mon, 2012/06/18 - 12:36pm in response to: dieter von holten

Please refer to Brian Goetz's book; of course once you get through all the locking, the whole point is that the values assigned to the fields are properly published to other threads (i.e., atomically written, no questions about stale values in any core's L1 or L2 cache). Once you start thinking in terms of everything that has to happen to publish a change to a field, and the overall cost of it all, you start appreciating approaches that fully embrace immutability, but that's not always an available avenue.

John Russell replied on Tue, 2012/06/26 - 1:16pm in response to: Stig Christensen

Double Checked Locking isn't all its cracked up to be. In fact, since we're quoting the amazingly great Brian Goetz, its "broken". 

http://www.javaworld.com/jw-02-2001/jw-0209-double.html 

Stig Christensen replied on Thu, 2012/06/28 - 12:25pm

And that is why I mention both volatile and JRE 5+. Your link is from 2001.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.