Peter is a DZone MVB and is not an employee of DZone and has posted 154 posts at DZone. You can read more from them at their website. View Full User Profile

Why Concurrency examples are confusing

11.10.2011
| 4838 views |
  • submit to reddit

I was recently at a talk were the audience was asked who used concurrency and who found it confusing. Nearly all who use concurrency found it confusing.

This was a little surprising to me, and I was left wondering what the reason might be.

I have been thinking about writing a book, so I have been watching a number of web talks and reading a number of books which talk about concurrency and I think I know at least one reason.

 

Bad Examples

The problem is; in a talk or a book you need a simple, easily understood example. Unfortunately, simple and easily understood examples almost always make very BAD examples for concurrency.

What makes them bad examples is that concurrency code comes with an overhead both to the developer in terms of complexity of the code and in CPU performance. Simple examples are likely to have far more overhead than the work done making the performance of the concurrent example far worse than just writing it single threaded and also the code more confusing.

 

Account transfer example

A common example used to illustrate how to avoid dead lock between two objects, is the account transfer problem. The problem is that to transfer between two accounts, you must lock both accounts (assuming you lock individual accounts). However, unless you lock them in the same order, you can get a deadlock where two threads hold the one of the accounts the other thread needs.
Took an average of 30.7 ns to transfer money with a lock on each Account.
This is the speed of using a thread safe account with one thread. In theory, this should scale very well and using two threads would halve this time. You might get good scalability up to the number of cores you have.

But the awkward question is, what if we didn't have one lock per account and just had one global lock. The code would be simpler, but not as scalable. But what is the cost of scalability?

Using synchronized improves performance synchronized is marginally faster for un-contended locks. (i.e. the same threadkeep acquiring the lock)

Took an average of 28.9 ns to transfer money with a lock on each Account.

 

What if I used just one global lock

Took an average of 9.7 ns to transfer money with a one global lock.
This means we would need to have at least 7 threads trying to transfer money to be as fast as using one global lock. (Assuming we achieved the theoretically best scalability) Given a real application has a lot other things to do, it is highly unlikely that you will average more than 7 thread in this section of code.

 

What is I just make the code single threaded?

Took an average of 1.8 ns to transfer money, single threaded
This implies that we need to average over 56 cores (not just threads) in the account transfer block of code, to break even. In this unlikely scenario, I would suggest you need to rethink your application. e.g. if you are working for an Uber-Bank performing one billion transfers per second, why are you using just one machine to do it?

 

A Really Bad Example

A pet hate of mine is using the Fibonacci series in examples of concurrency. This example has the advantage of being easy to define recursively and translate the code to being concurrent. It also an amazingly, staggeringly BAD idea.

This is run on a 4.6 GHz i7 2600K with 16 GB of memory. The code stops when there isn't enough resources to start any more threads.

Fibonacci 2 was 2, took 1,665 us, Time ratio=832.5 
Fibonacci 3 was 3, took 123 us, Time ratio=41.0 
Fibonacci 4 was 5, took 126 us, Time ratio=25.2 
Fibonacci 5 was 8, took 154 us, Time ratio=19.3 
Fibonacci 6 was 13, took 291 us, Time ratio=22.4 
Fibonacci 7 was 21, took 359 us, Time ratio=17.1 
Fibonacci 8 was 34, took 646 us, Time ratio=19.0 
Fibonacci 9 was 55, took 544 us, Time ratio=9.9 
Fibonacci 10 was 89, took 471 us, Time ratio=5.3 
Fibonacci 11 was 144, took 1,417 us, Time ratio=9.8 
Fibonacci 12 was 233, took 3,516 us, Time ratio=15.1 
Fibonacci 13 was 377, took 3,120 us, Time ratio=8.3 
Fibonacci 14 was 610, took 2,143 us, Time ratio=3.5 
Fibonacci 15 was 987, took 7,949 us, Time ratio=8.1 
Fibonacci 16 was 1,597, took 17,539 us, Time ratio=11.0 
Fibonacci 17 was 2,584, took 7,639 us, Time ratio=3.0 
Fibonacci 18 was 4,181, took 28,766 us, Time ratio=6.9 
Fibonacci 19 was 6,765, took 28,838 us, Time ratio=4.3 
Fibonacci 20 was 10,946, took 18,460 us, Time ratio=1.7 
Fibonacci 21 was 17,711, took 85,533 us, Time ratio=4.8 
Fibonacci 22 was 28,657, took 67,250 us, Time ratio=2.3 
Fibonacci 23 was 46,368, took 194,481 us, Time ratio=4.2 
Fibonacci 24 was 75,025, took 122,891 us, Time ratio=1.6 
Fibonacci 25 was 121,393, took 283,110 us, Time ratio=2.3 
Fibonacci 26 was 196,418, took 403,611 us, Time ratio=2.1 
Fibonacci 27 was 317,811, took 2,476,695 us, Time ratio=7.8 
Fibonacci 28 was 514,229, took 7,661,490 us, Time ratio=14.9 
Exception in thread "main" java.lang.AssertionError: java.util.concurrent.ExecutionException: java.lang.AssertionError: java.util.concurrent.ExecutionException: java.lang.AssertionError: java.util.concurrent.ExecutionException: java.lang.AssertionError: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: unable to create new native thread
The time ratio is the ratio of the time taken in micro-seconds and the solution. This is interesting because the time taken is proportional to the final value, something which rises exponentially. Unfortunately it also consumes more resources, exponentially.

Now compare this with the single threaded looping solution with -XX:+PrintCompilation turned on.

     30    1             java.lang.String::hashCode (67 bytes)
     36    2             sun.nio.cs.UTF_8$Encoder::encode (361 bytes)
Fibonacci 2 was 2, took 1 us
Fibonacci 3 was 3, took 1 us
Fibonacci 4 was 5, took 1 us
Fibonacci 5 was 8, took 1 us
Fibonacci 6 was 13, took 1 us
Fibonacci 7 was 21, took 1 us
Fibonacci 8 was 34, took 0 us
Fibonacci 9 was 55, took 0 us
Fibonacci 10 was 89, took 1 us
Fibonacci 11 was 144, took 1 us
Fibonacci 12 was 233, took 1 us
Fibonacci 13 was 377, took 1 us
Fibonacci 14 was 610, took      42    3             java.lang.String::indexOf (87 bytes)
1 us
Fibonacci 15 was 987, took 1 us
Fibonacci 16 was 1,597, took 1 us
Fibonacci 17 was 2,584, took 0 us
     42    4             java.lang.String::charAt (33 bytes)
Fibonacci 18 was 4,181, took 1 us
Fibonacci 19 was 6,765, took 1 us
Fibonacci 20 was 10,946, took 1 us
Fibonacci 21 was 17,711, took 1 us
Fibonacci 22 was 28,657, took 1 us
Fibonacci 23 was 46,368, took 1 us
Fibonacci 24 was 75,025, took 1 us
Fibonacci 25 was 121,393, took 0 us
Fibonacci 26 was 196,418, took 1 us
Fibonacci 27 was 317,811, took 1 us
Fibonacci 28 was 514,229, took 1 us
Fibonacci 29 was 832,040, took 1 us
Fibonacci 30 was 1,346,269, took 0 us
Fibonacci 31 was 2,178,309, took 1 us
Fibonacci 32 was 3,524,578, took 1 us
Fibonacci 33 was 5,702,887, took 1 us
Fibonacci 34 was 9,227,465, took 1 us
Fibonacci 35 was 14,930,352, took 1 us
Fibonacci 36 was 24,157,817, took 0 us
Fibonacci 37 was 39,088,169, took 1 us
Fibonacci 38 was 63,245,986, took 1 us
Fibonacci 39 was 102,334,155, took 1 us
Fibonacci 40 was 165,580,141, took 3 us
Fibonacci 41 was 267,914,296, took 0 us
Fibonacci 42 was 433,494,437, took 1 us
Fibonacci 43 was 701,408,733, took 1 us
Fibonacci 44 was 1,134,903,170, took 1 us
Fibonacci 45 was 1,836,311,903, took 1 us
Fibonacci 46 was 2,971,215,073, took 1 us
Fibonacci 47 was 4,807,526,976, took 1 us
Fibonacci 48 was 7,778,742,049, took 1 us
Fibonacci 49 was 12,586,269,025, took 1 us
Fibonacci 50 was 20,365,011,074, took 1 us
Fibonacci 51 was 32,951,280,099, took 1 us
Fibonacci 52 was 53,316,291,173, took 1 us
Fibonacci 53 was 86,267,571,272, took 1 us
Fibonacci 54 was 139,583,862,445, took 1 us
Fibonacci 55 was 225,851,433,717, took 1 us
Fibonacci 56 was 365,435,296,162, took 1 us
Fibonacci 57 was 591,286,729,879, took 1 us
Fibonacci 58 was 956,722,026,041, took 1 us
Fibonacci 59 was 1,548,008,755,920, took 1 us
Fibonacci 60 was 2,504,730,781,961, took 1 us
Fibonacci 61 was 4,052,739,537,881, took 1 us
Fibonacci 62 was 6,557,470,319,842, took 2 us
Fibonacci 63 was 10,610,209,857,723, took 2 us
Fibonacci 64 was 17,167,680,177,565, took 2 us
Fibonacci 65 was 27,777,890,035,288, took 1 us
Fibonacci 66 was 44,945,570,212,853, took 1 us
Fibonacci 67 was 72,723,460,248,141, took 1 us
Fibonacci 68 was 117,669,030,460,994, took 2 us
Fibonacci 69 was 190,392,490,709,135, took 2 us
Fibonacci 70 was 308,061,521,170,129, took 1 us
Fibonacci 71 was 498,454,011,879,264, took 1 us
Fibonacci 72 was 806,515,533,049,393, took 1 us
Fibonacci 73 was 1,304,969,544,928,657, took 1 us
Fibonacci 74 was 2,111,485,077,978,050, took 1 us
Fibonacci 75 was 3,416,454,622,906,707, took 1 us
Fibonacci 76 was 5,527,939,700,884,757, took 1 us
Fibonacci      49    5             java.util.regex.Matcher::77reset (83 bytes)
 was 8,944,394,323,791,464, took 1 us
Fibonacci 78 was 14,472,334,024,676,221, took 1 us
Fibonacci 79 was 23,416,728,348,467,685, took 1 us
Fibonacci 80 was 37,889,062,373,143,906, took 2 us
Fibonacci 81 was 61,305,790,721,611,591, took 3 us
Fibonacci 82 was 99,194,853,094,755,497, took 2 us
Fibonacci 83 was 160,500,643,816,367,088, took 1 us
Fibonacci 84 was 259,695,496,911,122,585, took 2 us
Fibonacci 85 was 420,196,140,727,489,673, took 2 us
Fibonacci 86 was 679,891,637,638,612,258, took 1 us
Fibonacci 87 was 1,100,087,778,366,101,931, took 2 us
Fibonacci 88 was 1,779,979,416,004,714,189, took 2 us
Fibonacci 89 was 2,880,067,194,370,816,120, took 1 us
Fibonacci 90 was 4,660,046,610,375,530,309, took 1 us
Fibonacci 91 was 7,540,113,804,746,346,429, took 2 us
Note: the method calculating Fibonacci values is not even compiled and is still running in the relatively slow interpreter. Yet it barely uses more than 2 micro-seconds. When compiled to native code it takes around 40 nano-seconds. If the multi-threaded Fibonacci took 2 micro-second per result value, implying "Fibonacci 91" might take around 180 million years. Unfortunately, the number of threads required is also proportional to the result value, requiring in the order of a trillion trillion threads.

Conclusion

When considering multi-threading a section of an application, you should take a single threaded, lock free implementation as a base line. If you can't make the multi-threaded, thread safe implementation faster, don't make it multi-threaded.

Put another way, if you are using concurrency to improve performance, make sure you are dealing with a problem which can be made concurrent and a solution which is faster than using one thread.

The Code

For Accounts, look at the ThreadedAccountMain, ThreadedAccountOneLockMain and UnthreadedAccountMain.

For Fibonaccii, look at the ThreadedFibonacciMain and UnthreadedFibonacciMain

 

Threads code

 I would like to thank Roger Lindsjö for his suggestions.

From http://vanillajava.blogspot.com/2011/11/why-concurency-examples-are-confusing.html

Published at DZone with permission of Peter Lawrey, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Roger Lindsjö replied on Thu, 2011/11/10 - 4:41am

I think your code is somewhat broken, The IDS field in the Account classes should probably be static, otherwhise all accounts gets the same id (0). 

Also, the lock object in the ThreadedAccountOneLockMain should also be static, otherwhise each account gets their own lock, and then I don't think you are testing what you say you are testing.

If I change the  locking logic in ThreadedAccountMain to use the following code:

if (from.compareTo(to) <= 0 ) {

     from.lock().lock();

     to.lock().lock();

} else {

    to.lock().lock();

    from.lock().lock();

}

instead of creating an array and sorting it for each transfer, then I get lower overhead (and since you are comparing the overhead of locking I think as little extra logic as possible makes a better test). 

Or you could even do the following to get more similarity between the multi lock and single lock example:

    private static void transfer(Account from, Account to, int amount) {

        boolean comp = from.compareTo(to) <= 0;

        Lock firstLock = comp ? from.lock() : to.lock();

        Lock secondLock = comp ? to.lock() : from.lock();

        synchronized (firstLock) {

            synchronized (secondLock) {

                if (from.balance() < amount) throw new IllegalArgumentException();

                from.transfer(-amount);

                to.transfer(amount);

            }

        }

    }  

But I agree that the fibonacci makes a terrible example, I would have expected that all (or at least the majority of) developers would understand that that to hand a task to a pool of executers and then ask for the result, then there will be a overhead, so if the lask at hand is small, then the ratio of the overhead is great.

A much better example would be somehting that takes some significant time, such as several calculations that take at least many milliseconds to execute or multiple calls to external services that can be executed in parrallel. 

Loren Kratzke replied on Thu, 2011/11/10 - 1:55pm

I would buy that book.

Peter Lawrey replied on Fri, 2011/12/09 - 4:38am in response to: Roger Lindsjö

Thank you for your suggestions. I have updated the code and the post based on them.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.