Muhammad Khojaye is an experienced consultant who has worked on both large scale Agile development projects for top financial clients and public sector innovative research and development projects. In his spare time, Muhammad likes to work on independent open source programs. Muhammad is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Java Hashing

10.27.2009
| 43664 views |
  • submit to reddit

Overriding hashCode() Equal() contract

Every Java object has two very important methods i.e. hashCode() and an equals() method. These methods are designed to be overridden according to their specific general contract. This article describes why and how to override the hashCode() method that preserves the contract of HashCode while using HashMap, HashSet or any Collection.

Contract For HashCode

The contract for hashCode says

If two objects are equal, then calling hashCode() on both objects must return the same value.

Now the question that should come into your mind is that; is it necessary that the above statement should always be true?

Consider the fact that we have provided a correct implementation of equal function for our class, then what would happen if we do not obey the above contract.

To answer the above question, let us consider the two situations,

  1. Objects that are equal but return different hashCodes
  2. Objects that are not equal but return the same hashCode

 

Objects that are equal but return different hashCodes

What would happen if the two objects are equal but return different hashCodes?  Your code would run perfectly fine. You will never come in trouble unless and until you have not stored your object in a collection like HashSet or HashMap. But when you do that, you might get strange problems at runtime. 

To understand this better, you have to first understand how collection classes such as HashMap and HashSet work. These collections classes depend on the fact that the objects that you put as a key in them must obey the above contract. You will get strange and unpredictable results at runtime if you do not obey the contract and try to store them in a collection.

Consider an example of HashMap. When you store the values in HashMap, the values are actually stored in a set of buckets. Each of those buckets has been assigned a number which is use to identify it. When you put a value in the HashMap, it stores the data in one of those buckets. Which bucket is used depends on the hashCode that will return by your object. Let’s say, if hashCode() method return 49 for an object, then it gets stored into the bucket 49 in the HashMap.

Later when you try to check whether that collection contains element or not by invoking Contains(element) method, the HashMap first gets the hashCode of that “element “. Afterwards it will look into the bucket that corresponds with the hashCode. If the bucket is empty, then it means we are done and its return false which means the HashMap does not contain the element.

If there are one or more objects in the bucket, then it will compare “element” with all other elements in that bucket using your defined equal() function.

Objects that are not equal but return the same hashCode

The hashCode contract does not say anything about the above statement. Therefore different objects might return the same hashCode value, but collections like HashMap will work inefficiently if different objects return the same hashCode value.

Why Buckets

The reason why bucket mechanism is used is its efficiency. You can imagine that if all the objects you put in the HashMap would be stored in to one big list, then you have to compare your input with all the objects in the list when you want to check if a particular element is in the Map. With the use of buckets, you will now campare only the elements of specific bucket and any bucket usually holds only a small portion of all the elements in the HashMap.

Overriding hashCode Method

Writing a good hashCode() method is always a tricky task for a new class.

Return Fixed Value

You can implement your hashCode() method such that you always return a fix value, for example like this:

//bad performance
@Override
public int hashCode() {
return 1;
}

The above method satisfies all the requirements and is considered legal according to the hash code contract but it would not be very efficient. If this method is used, all objects will be stored in the same bucket i.e. bucket 1 and when you try to ensure whether the specific object is present in the collection, then it will always have to check the entire content of the collection. 

On the other hand if you override the hashCode() method for your class and if the method breaks the contract then calling contains() method may return false for the element which is present in the collection but in a different bucket.

Method From Effective Java

Joshua Bloch in Effective Java provides an good guidelines for generating an hashCode() value

       1. Store some constant nonzero value; say 17, in an int variable called result.
       2. For each significant field f in your object (each field taken into account by the equals( )), do the following

                a.  Compute an int hashCode c for the field:
                          i.      If the field is a boolean, compute
                                             c = (f ? 1 : 0).
                          ii.     If the field is a byte, char, short, or int, compute c = (int) f.
                          iii.    If the field is a long, compute c = (int) (f ^ (f >>> 32)).
                          iv.    If the field is a float, compute c = Float.floatToIntBits(f).
                          v.     If the field is a double, compute
                                             long l = Double.doubleToLongBits(f),
                                             c = (int)(l ^ (l >>> 32))
                          vi.    If the field is an object reference then equals( ) calls equals( ) for this field. compute
                                             c =  f.hashCode()
                          vii.   If the field is an array, treat it as if each element were a separate field.
                                 That is, compute a hashCode for each significant element by applying above rules to  each
                                 element                           

                b.  Combine the hashCode c computed in step 2.a into result as follows:
                                             result = 37 * result + c;

       3. Return result.
       4. Look at the resulting hashCode() and make sure that equal instances have equal hash codes.

Here is an example of a class that follows the above guidelines
public class HashTest {
private String field1;
private short field2;
----

@Override
public int hashCode() {
int result = 17;
result = 37*result + field1.hashCode();
result = 37*result + (int)field2;
return result;
}
}

You can see that a constant 37 is chosen. The purpose to choose this number is that it is a prime number. We can choose any other prime number. Using prime number the objects will be distributed better over the buckets. I encourage user to explore the topic further by checking out other resources.

Apache HashCodeBuilder

Writing a good hashCode() method is not always easy. Since it can be difficult to implement hashCode() correctly, it would be helpful if we have some reusable implementations of these. 

The Jakarta-Commons org.apache.commons.lang.builder package is providing a class named HashCodeBuilder which is designed to help implementing hashCode() method. Usually developers struggle hard with implementing hashCode() method and this class aims to simplify the process.

Here is how you would implement a hashCode algorithm for our above class 

public class HashTest {
private String field1;
private short field2;
----

@Override
public int hashCode() {
return new HashCodeBuilder(83, 7)
.append(field1)
.append(field2)
.toHashCode();
}
}

Note that the two numbers for the constructor are simply two different, non-zero, odd numbers - these numbers help to avoid collisions in the hashCode value across objects.

If required, the superclass hashCode() can be added using appendSuper(int).

You can see how easy it is to override HashCode() using Apache HashCodeBuilder.

Mutable Object As Key

It is a general advice that you should use immutable object as a key in a Collection. HashCode work best when calculated from immutable data. If you use Mutable object as key and change the state of the object so that the hashCode changes, then the store object will be in the wrong bucket in the Collection 

The most important thing you should consider while implementing hashCode() is that regardless of when this method is called, it should produces the same value for a particular object every time when it is called. If you have a scenario like the object produces one hashCode() value when  it is put() in to a HaspMap and produces another value during a get(), in that case you would not be able to retrieve that object. Therefore, if you hashCode() depends on mutable data in the object, then made changing those data will surely produce a different key by generating a different hashCode().

Look at the example below

public class Employee {

private String name;
private int age;

public Employee() {
}

public Employee(String name, int age) {
this.name = name;
this.age = age;
}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}

public int getAge() {
return age;
}

public void setAge(int age) {
this.age = age;
}

@Override
public boolean equals(Object obj) {
//Remember: Some Java gurus recommend you avoid using instanceof
if (obj instanceof Employee) {
Employee emp = (Employee)obj;
return (emp.name == name && emp.age == age);
}
return false;
}

@Override
public int hashCode() {
return name.length() + age;
}

public static void main(String[] args) {
Employee e = new Employee("muhammad", 24);
Map<Object, Object> m = new HashMap<Object, Object>();
m.put(e, "Muhammad Ali Khojaye");

// getting output
System.out.println(m.get(e));

e.name = "abid";

// it fails to get
System.out.println(m.get(e));

e.name = "amirrana";

// it fails again
System.out.println(m.get(new Employee("muhammad", 24)));
}
}

So you can see in the above examples that how we are getting some unpredictable results.

You can easily fix the above by overriding the hashCode() using either Joshu Recipe or using HashCodeBuilder class.

Here is an example,

Joshu Recommendation
@Override
public int hashCode() {
int result = 17;
result = 37*result + name.hashCode();
result = 37*result + age;
return result;
}
Using HashCodeBuilder
@Override
public int hashCode() {
return new HashCodeBuilder(83, 7)
.append(name)
.append(age)
.toHashCode();
}

Another Example of Mutable Field as Key

Let consider the example 

public class HashTest {    
private int mutableField;
private final int immutableField;

public HashTest(int mutableField, int immutableField) {
this.mutableField = mutableField;
this.immutableField = immutableField;
}

public void setMutableField(int mutableField) {
this.mutableField = mutableField;
}

@Override
public boolean equals(Object o) {
if(o instanceof HashTest) {
return (mutableField == ((HashTest)o).mutableField)
&& (immutableField == ((HashTest)o).immutableField);
}else {
return false;
}
}

@Override
public int hashCode() {
int result = 17;
result = 37 * result + this.mutableField;
result = 37 * result + this.immutableField;
return result;
}

public static void main(String[] args) {
Set<HashTest> set = new HashSet<HashTest>();
HashTest obj = new HashTest(6622458, 626304);
set.add(obj);
System.out.println(set.contains(obj));
obj.setMutableField(3867602);
System.out.println(set.contains(obj));
}
}

After changing mutableField, the computed hashCode is no longer pointing to the old bucket and the contains() returns false. 

We can tackle such situation using either of these methods

  • Hashcode is best when calculated from immutable data; therefore ensure that only immutable object would be used as key with Collections.
  • Implement the hashCode() using our first technique i.e. return a constant value but you must aware that it would kills all those advantage of bucket mechanism.
  • If you need mutable fields included in the hashCode method then you can calculate and store the hash value when the object is created and whenever you update mutable field, you must first remove it from the collection(set/map) and then add it back to the collection after updating it. 

References and More Information

Effective Java

Object Class

HashCodeBuilder

http://www.javaranch.com/


- See more at: http://muhammadkhojaye.blogspot.com/2010/02/java-hashing.html 

Published at DZone with permission of Muhammad Khojaye, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Yair Ogen replied on Tue, 2009/10/27 - 2:19am

you might want to add that eclipse has a built-in hash code method generator...

Florent Ramiere replied on Tue, 2009/10/27 - 3:44am

You can see also other strategies when working with Hibernate here http://blog.springfuse.com/2009/10/testing-various-equals-and-hashcode_3597.html

Slava Lo replied on Tue, 2009/10/27 - 5:22am

Is it a bug in Employee.equals method implementation when 'name' field is compared using reference equality ?

Should it really be:


35.               return (name.equals(emp.name) && emp.age == age);

?

Muhammad Khojaye replied on Tue, 2009/10/27 - 6:19am in response to: Slava Lo

Yes. It should compare with equal, otherwise

e = new Employee(new String("muhammad"), 24);

would fail to get.

 

 

Developer Dude replied on Tue, 2009/10/27 - 11:10am

Thanks. I have used EqualsBuilder and HashCodeBuilder for years, in part due to collections issues, but I had not thought about what happens when you change a mutable object that is in a collection.

I try to use immutable objects for some method arguments for thread safety, but for other reasons (DI, iBatis, etc.) many of my value objects, where I usually override equals() and hashcode() and often these wind up in collections, are mutable and the builder pattern of making immutable objects seems like overkill in most cases.

I am not saying my way is optimum, but it works for the apps I write.

Unfortunately, most of the value objects I have seen written by others, including many that wind up in collections, rarely have equals() or hashcode() overriden, and if they do it is usually done improperly. It is sad that such basic best practices and principles that have been known for years are ignored (even unknown) while everyone tries to follow all the bleeding edge tech and methodologies/paradigms.

Aljoscha Rittner replied on Tue, 2009/10/27 - 12:32pm

In Netbeans we have an equals/hashcode wizard, too.

br, josh.

Alex Miller replied on Wed, 2009/10/28 - 8:29am

Probably also of interest is that a new java.util.Objects class is currently under discussion for JDK 7.  Some of the methods under consideration are to help with generating hash codes safely in the presence of nulls, good hash codes for arrays, etc.  You can see more on the core-libs-dev mailing list.

Amit Jagtap replied on Wed, 2009/11/04 - 4:20am

Hi,

Great article. Thanks for posting this.

To counter the - Objects that are not equal but return the same hashCode 

I was looking at the way hashcode value is computed for String objects and found it uses an almost similar formula, the one described by Joshua Bloch in Effective Java. Can we take advantage of String class's hashcode implementation for computing our hashcode? Some thing like the following -

 

public int hashCode() { 
return this.toString().hashCode();
}

 This check in addendum to the above discussed can help in circumventing the not-equal-same-hashcode situtation.I would like to hear your thoughts on this.

 thanks

 

Piotr Powalowski replied on Wed, 2009/11/04 - 7:27am

Have you seen the Objects.hashCode(Object...) method from Google collections? It is basically a helper that can calculate hashCode for given set of important object's values - the same as apache HashCodeBuilder, but simplier.

Mohamed El-beltagy replied on Wed, 2009/11/04 - 8:12am

Reading the code above in the "Mutable Object As Key" section and Joshu reciepe for fixing it, I noticed that Joshu relys on the String.hasCode() implementation. So I decided to check if a reference to two different strings would generate the same hashCode (which is of course non logic). Here's the code I tried:

public static void main(String[] args) {
int age = 45;

String s = "Initial text to test";
System.out.println(s.hashCode());
int result = 17;
result = 37 * result + s.hashCode();
result = 37 * result + age;
System.out.println(result);

s = "We are going to change the string and see what happens now";
System.out.println(s.hashCode());
result = 17;
result = 37 * result + s.hashCode();
result = 37 * result + age;
System.out.println(result);
}

 I printed both the string's hashCode and the hashCode implementation's result using Joshu receipe and, as expected, got different values. Of course, using any of these hashCodes for storing into a hash collection will mean storing two differenet items in the collection.

So, depending on a String's hashcode while calculating mutable objects in a hashed collection would break the contract anyway. Unless of course, the string we are using in our calculations would not change once initiated in our object.

BTW, depending on a string that might change in an equal comparison, would break the equality contract as well.

 

Thinking of Amit suggesstion, this.toString().hashCode(), we have three possible scenarios:

1- The same VM: would work fine.

2- Two different VMs (for example, serialization using RMI or clustered environment): the result would change as the default toString method returns the object's full class name appended with the its full name's hash code. The toString() default implementation, accourding to Java API documentation, is

getClass().getName() + '@' + Integer.toHexString(hashCode())

And the hashCode() according to the Java doc says:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer).

So, based on the dependency on the internal address of the object and due to the fact that we are on two different VMs; default hashCode would generate different values. (Analytically speaking. I hope someone with real life practice would share his info with us and correct me if I'm wrong).

3- toString() method is overridden with custom impl. In this case, if the string returned is always the same, it would generate the same hash code, otherwise....

As a conclusion, IMHO and generally speaking,  if the string we are using in our calculations would not change once initiated in our object; we are safe. Otherwise, do not depend on it.

Or in other words, to implement hashCode method always depend on unmodifiable state of the object.

Another solution could be to store the hash code once generated and always return it afterwords. Haven't thought of the implefications of such a solution, but I guss if the hashed collection would only use the hash code to select the bucket, then we are safe with this solution (as it still have to call the equals method to all keys in that bucket to decide if the two keys are equal).

From the Java API docs:

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

 According to the last point, and as a response to the next case you mentioned (two unequal objects return the same hash code), it would improve the performance of hash tables. But that does not mean that it will behave in a wrong way (as the hashed collection would still call the equals method against all elements in that bucket).

Note: I did not try HashCodeBuilder from Apache

Frank Lemke replied on Sun, 2012/04/15 - 10:54am

If you want to use collections but be independent of the equals()- and hashCode()-implementations, look at he-collections on sourceforge.net. He-collections can delegate to other methods than equals()- and hashCode()...

Shwetank Sharma replied on Tue, 2013/05/28 - 3:49pm

hi  Muhammad,


In Topic "Mutable object as key" , You describe that why mutable object are not working with HashMap, when we are changing object name its hashcode also changed.

But after that you provided a solution for above problem your statement:

You can easily fix the above by overriding the hashCode() using either Joshu Recipe or usingHashCodeBuilder class.

But by using both solution which you suggest, not providing solution, hash map get method is is returning null.

my opinion: HashMap is always work with immutable object otherwise it will be return null while we will use mutable object & changes its property.


Muhammad Khojaye replied on Thu, 2013/07/18 - 5:53pm in response to: Shwetank Sharma

Yes you are right and It mention in another example in the article "Another Example of Mutable Field as Key ". In order to work with Mutable field as key, you can calculate and store the hash value when the object is created and whenever you update mutable field, you must first remove it from the collection(set/map) and then add it back to the collection after updating it.

Kushal Bajaj replied on Tue, 2013/07/23 - 4:10am

 the best explanation over www .thanks mate

Lokesh Gupta replied on Wed, 2013/07/24 - 2:13am

 I am in great confusion now. How the default hashcode() method compute the hashCode for any object. Is it internal structure, or its state or its memory location. What's it??

http://howtodoinjava.com/2012/10/09/working-with-hashcode-and-equals-methods-in-java/

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.