Equals and HashCode
From Software By Jeff
It happens far too often that developers shortcut creating equals() and hashCode() in their objects. This discussion hopes to enlighten those short-cutters, and introduce a simple solution to their problem.
The use of equals() and hashCode() is assumptive of only one concern, and that is that an object is believed to be distinct based on its content, not its instantiation. If an object has no reason to be distinguished from other instances of the same object, then there is no reason to implement these methods. If, however, the contents of the object do indeed need to be evaluated, then it is not only strongly encouraged, but also quite likely required to do so.
| Table of contents |
Equals and HashCode
There are way too many other resources explaining the details of the "equals-hashCode contract" to go over it in much detail here, but I'll hit the high points just to make sure that everyone is on the same page.
To start with, a basic understanding of Object (http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html) is assumed, particularly the implications of equals().
Equals
The basic idea of equals() is to determine if two objects have the same value. The trouble comes into determining what the "value" of an object is. An object, for this discussion, is an aggregation of properties (member variables) and optionally methods (member "functions" and "procedures") that act on those properties. An object without properties doesn't concern this discussion, and the presence or absence of methods is not going to affect the evaluation of its properties.
Some concepts of equals() that must hold true work for all ideas of equality. This is discussed in detail in the Object JavaDoc, but here's the main gist of the discussion that must be maintained. They're essentially the same as equals considerations for mathematics.
If A is equal to B, then B must be equal to A. In Java, that means that a.equals(b) must have the same result as b.equals(a).
If A is equal to B and B is equal to C, then A must also be equal to C. In Java that means that a.equals(b) and b.equals(c) and a.equals(c) must all have the same result.
Finally, the same results should happen for all instances of the object with the same compared values. That is, the results need to be consistent. Every time we make a new instance of the object, and fill its members with the same values, we should get the same result out of the equals() evaluation.
With that understanding, consider the Object.equals() that every object automatically inherits. This implementation technically meets the needs of those criteria, however, it does not truly reflect the equality of an object based on its values.
The Object.equals() compares the objects references, and returns true if, and only if, the objects are the same. That is, even if different variables are used, the variables refer to the same instance of an object. Essentially, Object.equals() really only works if a.equals(a) is the case being tested.
Object a = new Object();
Object b = a;
a.equals(b); // This is true
b = new Object();
a.equals(b); // This is false
One other criteria of equals() is that if a class implements equals() it must implement hashCode(). The explanation is pretty straight forward.
HashCode
An object's hashCode() relates to equals() this in one simple statement.
Equal objects must have equal hashCode.
That's all. It is often misunderstood that hashCode() needs to somehow uniquely identify an object or reflect differences in equals(), but that is expressly not the case. It is very simply the case that if a.equals(b) then it must be true that a.hashCode() == b.hashCode().
It is also required that hashCode() needs to be consistent. Therefore, for any set of values in an object, the same hashCode() needs to be presented. That's all.
If the objects do not equal each other, their hashCode() is irrelevant; they may be the same or different. It is not necessary to go to any lengths to ensure that if a.equals(b) == false that a.hashCode() != b.hashCode().
One may argue that this means that hashCode() could simply return a constant value and the contract would be met. This is indeed true, but it also is fair that as much as is reasonably possible, distinct values in an object should return different hashCode() values. The use of the concept "reasonable" allows some flexibility and allows us to not go crazy with our hashCode()<code> implementations.
Object gets in the way here, too, because if no <code>hashCode() is provided by an object, the one Object provides is essentially a representation of the object in memory. Again, this helps meet the equals-hashCode contract because if comparing the same object, the equals will be true and the hashCode will be the same.
Reasons to Use
Almost as important as understanding what equals() and hashCode() have to do with each other and some of their pitfalls, is understanding why a developer might need to implement these methods. This can be summed up with one word: Collection
OK, so a few words should be used, but this gives a good starting point. If it is the case that an object will ever be put into a collection of any kind (Map, Set, List, etc.) that it should implement equals() and hashCode().
This again supposes that the uniqueness of an object is determined by the values of its members. In order for an object to be found by a Collection.contains() method, or to be correctly distinct in a Set or as a key in a Map, equals() needs to be implemented.
Many other good examples exist, but this is fairly trivial and tremendously common, and gives good reason for nearly every object with properties to implement equals() and hashCode().
Sample
Take for this discussion this very trivial object.
public class Simple {
public Integer objectInteger = null;
public int primitiveInt = 0;
}
This bean (for purposes of this and most other discussions, an object with properties and trivial methods) simply contains a small number of, well, numbers. We have one each of the object Integer and the primitive int. They're all made public to keep the source short; in a proper bean they'd be private and have getter and setter methods to access them.
If we take this bean as written, there is no way to determine if two instances are equal based on their values.
Sample a = new Simple();
Sample b = new Simple();
a.primitiveInt = 1;
b.primitiveInt = 1;
a.equals(b); // This is false!
Since it is not executed, the comment must be trusted; go ahead and make the necessary files and test it out and the result will be as noted. The code sample shows the instantiation of two Sample objects, and each is provided with the same int value of one.
It should probably be the case that these objects would be evaluated as equal, however. Looking at it logically, the object members are both null, and the int value in each instance has been set to one. These objects likely represent the same concept for which it was written.
The equals method for this object would be pretty simple to write. There are a couple of objects, so care needs to be taken to take null into account, but this object could be corrected in one pass.
public class Simple {
public Integer objectInteger = null;
public int primitiveInt = 0;
public boolean equals(Object object) {
if(!(object instanceof Simple))
return false;
Simple simple = (Simple)object;
return (this == simple) ||
((primitiveInt == simple.primitiveInt) &&
((objectInteger == null)
? (simple.objectInteger === null)
: objectInteger.equals(simple.objectInteger)));
}
}
Looking at the new equals() method in detail, the first thing done is to ensure that the comparison object is of the same type, and if that is not the case, surely there is no match. One helpful thing about the use of instanceof is that it also catches the case of a null value passed, and an instance of a class is definitely not equal to null. The next line simply casts the parameter as our type. This can surely be done in-line, but since the comparison uses the value a few times, this is more readable.
The last line (the returns) is one likely to be assaulted by style hawks. The bean is simple, with two members, so rather than a series of if statements one properly grouped boolean can do the job.
The first bit compares to see if this instance is the same as the passed instance. Since the next operator is an or, the operation will return true at this point if someone is doing some derivative of a.equals(a).
Should a different instance be checked, the next part starts comparing the members. The and operator between the member checks will cause the evaluation to stop at the first failure, so the primitives are compared first. Should they be the same, the object is compared. The object comparison is done with a ternary operator to first compare for null values, then to compare the values of the objects. This provides a nice null-safe comparison that will return true if the primitives have the same value and the objects are either both null or represent the same Integer value.
In the case that an object has some members that don't factor into the equality of the object, simply leave them out of the boolean equation, and they'll be ignored. A good example of this would be a bean that represented a row in a database; in this case, the equals comparison should only include any members that would match the database primary key (e.g., a unique ID).
Implementing hashCode() is just about as simple. The key is to try to get a consistent result out of any calculation done within. The sample below shows a hashCode() that will work for the related equals().
public class Simple {
public Integer objectInteger = null;
public int primitiveInt = 0;
public boolean equals(Object object) {
if(!(object instanceof Simple))
return false;
Simple simple = (Simple)object;
return (this == simple) ||
((primitiveInt == simple.primitiveInt) &&
((objectInteger == null)
? (simple.objectInteger === null)
: objectInteger.equals(simple.objectInteger)));
}
public int hashCode() {
long hashCode = 10002003l + primitiveInt;
if(objectInteger != null)
hashCode += objectInteger.hashCode();
return Long.valueOf(hashCode).intValue();
}
}
One key to note is that the hashCode() includes both of the member variables since our equals does as well. Since objects that result in equals() resolving to true must have the same hashCode() it is recommended that the hashCode() calculation include a bit for each represented variable.
Another thing that is easy to spot is that the hashCode() calculation uses a long, but returns only an int. This is done simply to reduce overflow errors. It is also obvious that the calculation does not start with zero. It is recommended that each class start with a distinct value. It's unclear exactly why, but it's easy to accommodate. An easy trick for Serializable classes is to use the serialVersionUID as the seed.
It is easy to see that (after being checked for null) the member object is asked for its hashCode(), not its value. It happens to be the case that for an Integer the hashCode() will be its value, but if this pattern is repeated, it will become easy to add any kind of object, including a String, or even a member variable of type Simple.
A word of caution about the case where a class contains a member reference of the same type (a self-made linked-list, for example), is that should that be a self-reference, an infinite loop would be created. a simple check for this will protect the code from that happening; this check would be as simple as adding the following example to the calcuation (assuming our Simple class had a member Simple named simple):
if(this != simple.simple) hashCode += simple.hashCode();
Finally, the long used in the calculation is truncated to an int for return.
Now, looking at this, it will be the case the same hashCode() is possible for objects with different but similar values. This is perfectly acceptable, and does not violate any part of the equals-hashCode contract. The following example shows how the above Simple class would have the same hashCode() for objects that are not evaluated as equal.
Sample a = new Simple();
Sample b = new Simple();
a.primitiveInt = 1;
b.objectInteger = new Integer(1);
a.equals(b); // This is false!
a.hashCode() == b.hashCode(); // This is OK.
Yes, it is known that this sample won't compile...it's an example... For those that wonder why...the last line is a boolean in the middle of no where...put it in an if statement or assign it to a value to make it compile.