Peter has posted 2 posts at DZone. View Full User Profile

HashMap is not a Thread-Safe Structure

05.29.2008
| 45125 views |
  • submit to reddit

Last few months I have seen too much code where a HashMap (without any extra synchronization) is used instead of a thread-safe alternative like the ConcurrentHashMap or the less concurrent but still thread-safe HashTable. This is an example of a HashMap used in a home grown cache (used in a multi-threaded environment):

 

interface ValueProvider{
V retrieve(K key);
}

public class SomeCache{

private Map map = new HashMap();
private ValueProvider valueProvider;

public SomeCache(ValueProvider valueProvider){
this.valueProvider = valueProvider;
}

public V getValue(K key){
V value = map.get(key);
if(value == null){
value = valueProvider.get(key);
if(value!=null)
map.put(key,value);
}
return value;
}
}

There is much wrong with this innocent looking piece of code. There is no happens before relation between the put of the value in the map, and the get of the value. This means that a thread that receives the value from the cache, doesn’t need to see all fields if the value has publication problems (most non thread-safe structures have publication problems). The same goes for the value and the internals (the buckets for example) of the HashMap. This means that updates to the internals of the HashMap while putting, don’t need to be visible to a thread that does the get.
So it could be that the state of the cache in main memory is not in an allowed state (some of the changes maybe are stuck in the cpu-cache), and the cache could start behaving erroneous and if you are lucky starts throwing exceptions. And last, but certainly not least, there also is a classic race problem: if 2 threads do a interleaved map.put, the internals of the HashMap can get in an inconsistent state. In most cases an application reboot/redeploy would be the only way to fix this problem.

There are other problems with the cache behavior of this code as well. The items don’t have a timeout, so once a value gets in the cache, it stays in the cache. In practice this could lead to web-page that keeps displaying some value, even though in the main repository the value has been updated. An application reboot also is the only way to solve this problem. Using a Common Of The Shelf (COTS) cache would be a much saver solution, even though a new library needs to be added.

It is important to realize that a HashMap can be used perfectly in a multi-threaded environment if extra synchronization is added. But without extra synchronization, it is a time-bomb waiting to go off.

5
Your rating: None Average: 5 (1 vote)
Published at DZone with permission of its author, Peter Veentjer.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Collin Fagan replied on Thu, 2008/05/29 - 11:39am

Yeah this is very wrong. The javadocs are explicit on this too. I've build some of my own caches based on SoftReferences or WeekReferences and ConcurrentHashMap, but I've never used anything off the shelf. What are they like?

Edward Ribeiro replied on Thu, 2008/05/29 - 12:14pm

A really useful cache you should use Weak/Soft references, but I would take a look at ehcache before trying to build my own. It implements LRU, FIFO, etc. Just my opinion. And instead of using Hashtable, that is a legacy class, it's better to use ConcurrentHashMap ou a synchronized map like this:

 

Map m = Collections.synchronizedMap(new HashMap());

Edward

 

Ronald Miura replied on Thu, 2008/05/29 - 1:15pm

Just using ConcurrentHashMap or Hashtable isn't enough for most situations. They guarantee atomicity of a single method call, but if you call, for example, { if (!map.containsKey(key)) map.put(key, new Object()); }, by the time you 'put' the new element, another thread could already have put another, and your application's state would be inconsistent (you have two instances when you want only one).

Proper synchronization is almost always necessary. Even if you do use thread-safe Collection/Map implementations, remember to always think twice about the thread-safety of YOUR code that uses it.

Peter Veentjer replied on Thu, 2008/05/29 - 1:31pm in response to: Ronald Miura

In this case the only bad thing that can happen is that if multiple threads want to retrieve the value for the same key at rougly the same moment, and the value is not in the cache, that they all are going to retrieve the value. If the retrieval of the value takes a long time, this could be an issue (an issue I have observed this week because the retrieval of the value was way to slow). 2 instances in the cache are not possible for the same key btw. The last thread that retrieves the value, is the 'winner'. His value will remain in the cache, the others will be garbage collected.

The ConcurrentHashMap has atomic 'check act' methods like the 'putIfAbsent'. So no extra pessimistic locks are required to make the action atomic.

But in a lot of cases you are right. Just using some synchronized collection is not going to fix your problems.

Pankaj Jaiswal replied on Fri, 2008/05/30 - 12:43am

Yes, HashMap should not be used in the multithreaded environment.

Either the code should be in proper synchronized block or you can used SynchronizedMap. 

 

Achim Abeling replied on Fri, 2008/05/30 - 2:54am

I was even more surprised about the HashMap when I found out that non synchronized access can result in infinite loops, as described here: http://lightbody.net/blog/2005/07/hashmapget_can_cause_an_infini.html

Dmitriy Setrakyan replied on Fri, 2008/05/30 - 4:55am

I really don't understand why the unsafe map access code provided in the article had not thrown java.util.ConcurrentModificationException.

If this exception had not happened, and you claim that the code is used in multithreaded environment in production, are you really sure that this code is really accessed by multiple threads?

Best,
Dmitriy Setrakyan
GridGain - Grid Computing Made Simple

Artur Biesiadowski replied on Fri, 2008/05/30 - 6:16am in response to: Dmitriy Setrakyan

[quote=dsetrakyan]I really don't understand why the unsafe map access code provided in the article had not thrown java.util.ConcurrentModificationException.[/quote]

Which kinds of proves the point of the article that many people are getting multithreading behaviour in java wrong ;) 

ConcurrentModificationException is thrown from iterators only. It is not even so much about multithreaded access, it is about modifying the underlying collection directly while iterating over it, which invalidates the iterator. In many cases I see it happening from single-threaded code (you iterate over elements and then call map.remove(key) instead of iterator.remove()).

 

Peter Veentjer replied on Fri, 2008/05/30 - 7:15am in response to: Dmitriy Setrakyan

Hi Dimitry

 >> I really don't understand why the unsafe map access code provided in the article had not thrown java.util.ConcurrentModificationException.

Only when using an iterator you get the ConcurrentModificationException. So a concurrent put is not going to throw this exception.

For more information you could have a look at the source to help you understand what really is going on.

Besides race probem, the HashMap has other concurrency problem as well. So even 'relying' on a concurrent modification exception to save you when a concurrent action happens, is not going to get you out of the woods anyway.

 

 

 

 

JD Evora replied on Fri, 2008/05/30 - 9:04am in response to: Peter Veentjer

Hi,

Are you 100% sure about that?

My understanding is that in a multicore environment it could happen because each core (or CPU) can have their own copy of the variable and not share it with the other CPUs until it hits a "synchronized keyword"

Cheers

 JD

 

Peter Veentjer replied on Fri, 2008/05/30 - 9:30am in response to: JD Evora

You are correct about the 'each cpu can have its own copy'. But the magic word is 'can'. It can't be compared to a database transaction that prevents isolation problems!

As long as no 'happens before' relation is encountered between 2 actions (a unlock/lock on a monitor, a volatile write/read etc) no guarantees are made which variables are visibile and invisible. So it could be that some variables are visible to other threads, and others not: it is complely undefined which ones.

In this case it could happen that some part of the hashmap are visible to other threads, and other parts are not. 

Dmitriy Setrakyan replied on Fri, 2008/05/30 - 12:08pm in response to: Artur Biesiadowski

Oops. My mistake.

ConcurrentModificationException is indeed only thrown in iterator code.

I never suggested that user should rely on ConcurrentModificationException, but I just thought it would happen every now and then... should have known better.

Best,
Dmitriy Setrakyan
GridGain - Grid Computing Made Simple

Christian Vest ... replied on Sat, 2008/05/31 - 6:28pm

A HashMap can be used safely in a concurrent environment without using synchronization if a) the instances are thread-locally scoped and thus never shared between threads, or b) the (preferably single) reference to the instance is final and the map is only ever mutated by the constructor of the class holding that final reference, and references to the map does not escape the instances of said class. However, these constraints thward any attempt to use the HashMap as a mutable cache.

 Another concurrent Map implementation is Cliff Clicks NonBlockingHashMap in the high-scale-lib (on sf.net). This implementation is (i believe) wait-free (lock-free for certain) and exhibits linear scalability to hundreds of cpu-cores :)

Jean-Marie Dautelle replied on Sun, 2008/06/01 - 10:15pm

For caching purpose, a shared FastMap instance is particularly efficient (non-blocking, internal synchronization on write only and RTSJ-Safe).

Christian Vest ... replied on Mon, 2008/06/02 - 3:39am in response to: Jean-Marie Dautelle

Jean-Marie,

Do you have any presentation or documentation of some sort on how you create these non-blocking, thread-safe and RTSJ-safe implementations? I'm interrested in learning how to create non-blocking and thread-safe data structures myself.

Slava Imeshev replied on Wed, 2008/06/04 - 5:34pm in response to: Collin Fagan

[quote=cfagan]Yeah this is very wrong. The javadocs are explicit on this too. I've build some of my own caches based on SoftReferences or WeekReferences and ConcurrentHashMap, but I've never used anything off the shelf. [/quote]

 

Agreed, someone has just discovered that there can be problems when accessing shared data ouside of a critical section :)

[quote=cfagan]

What are they like?

[/quote]

 

You might want to check out our Cacheonix. It is a commercial clustered cache and data grid for Java. Local cache in Cacheonix is free.

 

Hope this helps.

 

Regards,

Slava Imeshev

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.