Identity Field, Equality and Hash Code

In this post I'll describe a possible base class for domain entities which implements a surrogate key as identity field and provides equality and hash code.

Introduction

Martin Fowler writes in his PoEAA book: "The identity field saves a database ID field in an object to maintain identity between an in-memory object and a database row."

And further he states: "The first concern is whether to use meaningful or meaningless keys. A meaningful key is something like the U.S. Social Security Number... A meaningless key is essentially a random number the database dreams up that's never intended for human use."

There are many reasons why meaningful keys often are NOT good candidates for an identity field. Primarily they often are not immutable (due to possible human errors) and not unique. Thus Martin Fowler states: "... As a result, meaningful keys should be distrusted. ..."

Having you provided some background about the ongoing dispute about what is a good candidate for an identity field I'll now make my choice. I always choose meaningless keys as identity fields. Such fields are often called surrogate key. Important: "The surrogate key is not derived from application data."

My favorite type of surrogate key is a GUID (global unique identifier). The mathematical algorithm used to generate a new GUID is such as that it is (nearly) impossible to generate the same ID twice (the probability tends to zero).

NHibernate supports GUID as one possible type for the identity field.

Problem Description

When dealing with NHibernate one often uses a special type of collection known as Set. A set is a collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.Equals(e2), and at most one null element. As the Set is not provided by the .NET framework NHibernate uses the IESI collections library which contains an implementation of a set.

In the definition above you find which is the important predicate to decide whether two elements are the same or not. It is the Equals function. By default the Equals function takes the hash code of two objects and compares it. So if two variables e1 and e2 refer to 2 different instances of a class Equals will always return false. But we want to use the identity field as the relevant part in the comparison of two instances. If two different instances have the same identity field then they are equal (that is they refer to the same database record).

Implementation

The default implementation of the Equals function is to be found in the System.Object class. From this class all other classes in .NET implicitly or explicitly inherit. Fortunately the Equals function is virtual and we are able to override it. But when overriding the Equals function we have to also override the GetHashCode function.

Assuming that we take a GUID called Id as identity field we can define the following base class from which all our domain classes directly or indirectly will inherit

public class IdentityFieldProvider<T>
    where T : IdentityFieldProvider<T>
{
    private Guid _id;
 
    public virtual Guid Id
    {
        get { return _id; }
        set { _id = value; }
    }
}

Now lets override the Equals method. A possible solution is

public override bool Equals(object obj)
{
    T other = obj as T;
    if (other == null)
        return false;
 
    // handle the case of comparing two NEW objects
    bool otherIsTransient = Equals(other.Id, Guid.Empty);
    bool thisIsTransient = Equals(Id, Guid.Empty);
    if (otherIsTransient && thisIsTransient)
        return ReferenceEquals(other, this);
 
    return other.Id.Equals(Id);
}

We have to distinguish 3 possible cases. The first one is that the user/developer wants to compare two objects of different type. This case is trivial; the answer is ALWAYS "not equal". The second case is when the two objects are both new (also called transient) then the two references point to the same instance. And the third case just takes the implementation of the Equals method of the GUID type to check for equality.

Now we have to also override the GetHashCode method also inherited from System.Object.

private int? _oldHashCode;
 
public override int GetHashCode()
{
    // Once we have a hash code we'll never change it
    if (_oldHashCode.HasValue)
        return _oldHashCode.Value;
 
    bool thisIsTransient = Equals(Id, Guid.Empty);
    
    // When this instance is transient, we use the base GetHashCode()
    // and remember it, so an instance can NEVER change its hash code.
    if (thisIsTransient)
    {
        _oldHashCode = base.GetHashCode();
        return _oldHashCode.Value;
    }
    return Id.GetHashCode();
}

Now, why this kind of code you might ask yourself? Well, a object should never ever change it's hash code during its life, that is from the moment the object is instantiated until it is disposed. If a object is restored from database there is no problem since any existing database record has always a well defined and unique identity field. Thus we can derive the hash code from this Id field. This is done in the last line of code in the code snippet above.

A little bit more problematic is the case when a new object is created in memory, then it's identity field is undefined (the object has not been saved to the database so far and is thus considered as being transient). In our case undefined means that the Id field has a value of Guid.Empty. In this case we take the default implementation (of System.Object) of the GetHashCode method to generate a hash code. But we store is in an instance variable for further reference.

Later in the life cycle of the instance it may be persisted to the database (but still continues to sit around in the memory). At this moment NHibernate assigns a new unique value to the Id field of the instance. Now the object isn't transient any more but the 2 first lines in the method avoid that the hash code of the object changes. It is still the same object as before. It has just been made persistent.

Finally we can also override the two operators '==' and '!=' to make it possible to compare two instances with those operators instead of only the Equals method.

public static bool operator ==(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return Equals(x, y);
}
 
public static bool operator !=(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return !(x == y);
}

That's it. You can now use this class as the base for every entity class in your domain and never ever have to think about the identity field and the equality of objects. It just happens...

Enjoy

Blog Signature Gabriel

.

Print | posted on Friday, April 04, 2008 2:28 AM

Comments on this post

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
I am using Hibernate so far (not the N version), but I approached the same problem in a different way - Every entity has a surrogate key (a sequence generated from our oracle db), but it also has its own natural key, which contains one or more immutable properties, that participate in the equals method.
The hashCode method uses a subset of those properties, since the hashcode doesn't have to be unique in the entire system, but just in the set this entity participate in, and sometimes those immutable properties are other entities, and I don't want any lazy loading of other entities just for the sake of hashcode calculation.

That way an object is always equal to "itself" no matter if one instance is transient, and the other is saved. In case a detached client has a set containing a new object, and it persists it, the returned object from the save operation will take the original's place, and I won't have any doubles.
Left by Noam Gal on Apr 04, 2008 6:53 AM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
This is very close to the base class I use, however I didn't know that an object's hash code shouldn't change.

Thanks for the tip!
Left by David Newman on Apr 04, 2008 12:38 PM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
if you put an object inside a set, then change the hashcode, a call to
set.contains(obj);
will return false, which is obviously not true.
Left by Noam Gal on Apr 04, 2008 9:08 PM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
i tried the List<T> type. I found that .Contains() for this type just ignored the changed hash code. why?
Left by jack on Apr 06, 2008 2:39 PM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
however, the IESI hashed set is sensible to the hashed code...
Left by jack on Apr 06, 2008 3:24 PM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
@jack: a List<T> can contain the very same instance more than once. In a Set an instance must be unique...
Left by Gabriel Schenker on Apr 06, 2008 4:30 PM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
@Gabriel: do u mean that, if i use List<T> as my collection type, then it is not necessary to provide these two overrides?
Left by jack on Apr 09, 2008 4:19 AM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
@Gabriel: another question, about the operator== implementation provided above, you use Equals(x, y). why not use x.Equals(y)?
Left by jack on Apr 09, 2008 4:21 AM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
jack,

> do u mean that, if i use List<T> as my collection type, then it is not necessary to provide these two overrides?

Isn't that a bad argument to not override? Maybe you use it in a set in the future and you encounter strange problems because you didn't override.
List doesn't use GetHashCode but it uses Equals (sort of). The default Equals and GetHashCode look at the instance of a class, not the contents of it.

> another question, about the operator== implementation provided above, you use Equals(x, y). why not use x.Equals(y)?

If x is null an exception is raised. You can check for null before, but it's easier to let object.Equals do that ;) object.Equals(x,y) calls x.Equals if it is not null.
Left by alwin on Apr 17, 2008 1:55 AM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
Gabriel, a few comments:

1) You write "By default the Equals function takes the hash code of two objects and compares it". That is not true... it compares the identity of the objects (that they point to the same physical object in mem). Its perfectly legal to have objects which are not equal but gives back the same hashcode... (depending on the hashing algoritm)

2) Your solution doesn't handle the situation where one object is transient - and the other not. Very possible problem...

3) I tend to use a much simpler approach which is 100% failsafe:
I also use Guid's as identity field but marks it as "assigned" in the mapping file. I create and assign a new sequential Guid in the constructor of my entities. This makes my Equals and GetHashcode very simple - they need not check for unassigned id's.
In order for nhibernate to know whether to insert or update I use the "version" mechanism with unsaved-value = negative (and have a version field in my base class set to -1.

<id name="Id">
<generator class="assigned" />
</id>
<version name="Version" unsaved-value="negative"/>

Another plus is that this gives you optimistic-locking functionality out of the box.

Kind regards Carsten
Left by Carsten Hes on Apr 25, 2008 7:39 PM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
@Carsten:
1) you are right regarding the Equals function! Microsoft states in the online help: "the Object.Equals method determines whether two Object instances are equal."
2) Either I don't understand your question or you are wrong... sorry
3) You are correct. But I don't want to introduce a version field for every single case even if not explicitely needed. It only complicates the picture. Please keep in mind that these posts all use simplified models to not obscure what I want to point out with to much implementation details. In a real business application though you will always have a more complex model.

Left by Gabriel Schenker on Apr 26, 2008 7:14 AM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
@Gabriel

You are right about 2) in my previous post being my bad.

Keep up the good work!
/Carsten
Left by Carsten Hess on May 26, 2008 12:01 AM

# re: Identity Field, Equality and Hash Code

Requesting Gravatar...
Hello.
One question: did you look at Billy's solution based on domain signature?
http://devlicio.us/blogs/billy_mccafferty/archive/2007/04/25/using-equals-gethashcode-effectively.aspx

he has improved this on his demo project which is also available on his site. feedback on these approaches?
Left by Luis Abreu on May 27, 2008 11:08 PM

Your comment:

 (will show your gravatar)
 
Please add 4 and 4 and type the answer here: