Sunday, November 09, 2008 #

First and Second Level caching in NHibernate

I'll try to dive deep into the caching of NHibernate in this article. This post has been inspired by the talk given by Oren Eini (aka Ayende) at the Kaizen Conference in Austin TX.

Caching is a topic that is IMHO only superficially described so far especially regarding the second level cache. Most of the time one finds a lot of information about how to configure a specific cache provider for usage but the real usage (who and when) is not really described. I hope to be able to provide some of the missing pieces with this post.

The full source code used for this post can be found here.

First Level Cache

When using NHibernate the first level cache is automatically enabled as long as one uses the standard session object. We can avoid to use a cache at all when using the stateless session provided by NHibernate though. The stateless session is especially useful for reporting situations or for batch processing. When NHibernate is loading an entity by its unique id from the database then it is automatically put into the so called identity map. This identity map represents the first level cache.
The life-time of the first level cache is coupled to the current session. As soon as the current session is closed the content of the respective first level cache is cleared. Once an entity is in the first level cache a subsequent operation that wants to load the very same entity inside the current session retrieves this entity from the cache and no roundtrip to the database is needed.
One of the main reasons behind this entity map is to avoid the situation that two different instances in memory can represent the same database record (or entity). The NHibernate session object provides us two ways to retrieve an entity by its unique id from the database. There are subtle but important differences between them.

Let's implement an Account class for our samples

public class Account
{
    public virtual int Id { get; private set; }
    public virtual string Name { get; private set; }
    public virtual decimal Balance { get; private set; }
 
    protected Account()
    {
    }
 
    public Account(string name, decimal balance)
    {
        Name = name;
        Balance = balance;
    }
 
    public virtual void Credit(decimal amount)
    {
        Balance += amount;
    }
}

the corresponding XML mapping file is

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
                   namespace="Caching"
                   assembly="Caching">
  <class name="Account">
    <id name="Id">
      <generator class="hilo"/>
    </id>
    <property name="Name"/>
    <property name="Balance"/>
  </class>
</hibernate-mapping>

Get an entity from database

With the session.Get(id) method we can retrieve an entity from database. If there is no record found in the database with the given id then null is returned.
On the other hand if a record with the given unique id exists in the database then NHibernate loads this record and instantiates a fully populated entity in memory and immediately puts this entity into the entity map (or first level cache)
Assuming that a specific record has already been loaded from database inside the current session then a subsequent Get(id) operation will return the cached entity to the caller. We can see this in the following output produced by the unit test. There is only ONE select statement produced by NHibernate

FistLevelCache

The above output was produced by this unit test

[Test]
public void trying_to_get_the_same_account_a_second_time_should_get_the_account_from_1st_level_cache()
{
    Console.WriteLine("------ now getting entity for the first time");
    var acc1 = Session.Get<Account>(account.Id);
    Console.WriteLine("------ now getting entity for the second time");
    var acc2 = Session.Get<Account>(account.Id);
 
    acc1.ShouldBeTheSameAs(acc2);
}

Load an entity from database

When using the session.Load(id) method NHibernate only instantiates a proxy for the given entity. As long as we only access the id of the entity the entity itself is not loaded from the database. Only when we try to access one of the other properties of the entity NHibernate loads the entity from the database. We can see this clearly in the following output produced by the unit test. I have added some comment to the output to make it easier to verify the result.

Here is a unit test

[Test]
public void trying_to_load_a_non_existing_entity()
{
    var acc1 = Session.Load<Account>(account.Id);
    acc1.ShouldNotBeNull();
    Console.WriteLine("------ now accessing the id of the entity");
    Console.WriteLine("The id is equal to {0}", acc1.Id);
    acc1.Id.ShouldEqual(account.Id);
    Console.WriteLine("------ now accessing a property (other than the ID) of the entity");
    Console.WriteLine("The name of the account is: {0}", acc1.Name);
    acc1.Name.ShouldEqual(account.Name);
}

and the output produced by the above code

FistLevelCache2

Using Load to optimize data access

The behavior mentioned above is especially useful when creating or updating complex entities which have relations to other entities. Assume that the account entity references a customer entity and I want to create a new account for a customer from which I only know its unique id. Then my code might look as follows

var newAccount = new Account("EUR Account 1", 1250m);
newAccount.Customer = Session.Load<Customer>(customerId);
Session.Save(newAccount);

Note that I use the Load method to get the (existing) customer entity. NHibernate will not physically load the customer since in the account table on the database there is only the id of the customer needed (as a foreign key). And as I mentioned above, as long as you only use the id of an entity retrieved with the Load method the corresponding entity is not physically loaded from the database.

Second level cache

The life time of the second level cache is tied to the session factory and not to an individual session. Once an entity is loaded by its unique id and the second level cache is active the entity is available for all other sessions (of the same session factory). Thus once the entity is in the second level cache NHibernate won't load the entity from the database until it is removed from the cache.
To enable the second level cache we have to adjust our configuration file. We have to define which cache provider we want to use. There exist various implementations of a second level cache. For our sample we use a Hashtable based cache which is included in the core NHibernate assembly. Please note that you should never use this cache provider for production code but only for testing. Please refer to the chapter "Second Level Cache implementations" below to decide which implementation fits best for your needs. You won't have to change your code if you change the cache provider though.

We have to add the the following line to the configuration file

    <property name="cache.provider_class">NHibernate.Cache.HashtableCacheProvider</property>

this will instruct NHibernate to use the previously mentioned Hashtable based cache provider as a provider for the second level cache.
Now let's have a look at the following unit test.

[Test]
public void trying_to_load_an_existing_item_twice_in_different_sessions_should_use_2nd_level_cache()
{
    using(var session = SessionFactory.OpenSession())
    {
        var acc = session.Get<Account>(account.Id);
        acc.ShouldNotBeNull();
    }
 
    using(var session = SessionFactory.OpenSession())
    {
        var acc = session.Get<Account>(account.Id);
        acc.ShouldNotBeNull();
    }
}

In the above test we open a first session and load an existing entity from the database. Then we open a second session and try to load the very same entity from the database again. Without a second level cache we would expect that NHibernate loads the entity two times from the database since we are using 2 different sessions and thus the first level cache can not be used to avoid a roundtrip to the database. So let's have a look at the result produced.

SecondLevelCache

Wait a moment - we clearly see two select statements instead of only one. What did we do wrong? This is not an error, no but it's a feature. NHibernate does not enable the second level cache by default, since it would have too many undesired implications. One has to explicitly enable the second level cache. If we add the following statement to the configuration file

    <property name="cache.use_second_level_cache">true</property>

we activate the second level cache. But that is still not enough. We have to also enable our entity to be cached in the second level cache.
This can be done by adding the following statement to the entity's mapping file

    <cache usage="read-write"/>

If we now run the unit test again we obtain the expected result. The entity is loaded only once from the database. The second time it is loaded from the second level cache.
Now if we try to update an entity which is already in the second level cache then this entity should also be automatically updated in the second level cache. The following unit test should prove this behavior.

[Test]
public void when_updating_the_entity_then_2nd_level_cache_should_also_be_updated()
{
    using(var session = SessionFactory.OpenSession())
    using (var tx = session.BeginTransaction())
    {
        var acc = session.Get<Account>(account.Id);
        acc.Credit(200m);
        tx.Commit();
    }
 
    using(var session = SessionFactory.OpenSession())
    {
        var acc = session.Get<Account>(account.Id);
        acc.Balance.ShouldEqual(1200m);
    }
}

and indeed it does as we can see in the test output. Again the entity is loaded from the cache the second time it is requested although it was updated (no select statement after the update statement). The last line in the test code verifies that the entity was indeed updated.
SecondLevelCache2 

Second Level Cache Providers

All second level cache providers are part of the NHibernate contribution. The following list gives as short description of each provider.

  • Velocity: uses Microsoft Velocity which is a highly scalable in-memory application cache for all kinds of data.
  • Prevalence: uses Bamboo.Prevalence as the cache provider. Bamboo.Prevalence is a .NET implementation of the object prevalence concept brought to life by Klaus Wuestefeld in Prevayler. Bamboo.Prevalence provides transparent object persistence to deterministic systems targeting the CLR. It offers persistent caching for smart client applications.
  • SysCache: Uses System.Web.Caching.Cache as the cache provider. This means that you can rely on ASP.NET caching feature to understand how it works.
  • SysCache2: Similar to NHibernate.Caches.SysCache, uses ASP.NET cache. This provider also supports SQL dependency-based expiration, meaning that it is possible to configure certain cache regions to automatically expire when the relevant data in the database changes.
  • MemCache: uses memcached; memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Basically a distributed hash table.
  • SharedCache: high-performance, distributed and replicated memory object caching system. See here and here for more info

Saving a transient entity

Lately the following question came up in the NHibernate user list. "When I create a transient object and then save it to my session and
commit to my database, should it be added to my second level cache as well?" The answer is of course YES. Let's write a unit test

[TestFixture]
public class when_saving_a_transient_account : FixtureBase
{
    private Account newAccount;
 
    protected override void Context()
    {
        base.Context();
        using (var session = SessionFactory.OpenSession())
        using (var tx = session.BeginTransaction())
        {
            newAccount = new Account("CHF Account", 5500m);
            session.Save(newAccount);
            tx.Commit();
        }
    }
 
    [Test]
    public void account_should_be_in_second_level_cache()
    {
        using (var session = SessionFactory.OpenSession())
        {
            Console.WriteLine("--> Now loading account");
            var acc = session.Get<Account>(newAccount.Id);
            acc.ShouldNotBeNull();
            acc.Name.ShouldEqual(newAccount.Name);
        }
    }
}

In the method Context I create and save a new account entity. In the test method I open a new session and try to load the previously created entity from database. I now expect that NHibernate should take the entity out of the second level cache. And indeed it does. This is the resulting output

SecondLevelCache3

as we can see, no select statement is sent to the database when the entity is loaded from a new session after it has been created and saved beforehand.

Inside the second level cache

An important point is that the second-level cache does not cache instances of the object type being cached; instead it caches the individual values of the properties of that object. This provides two benefits. One, NHibernate doesn't have to worry that your client code will manipulate the objects in a way that will disrupt the cache. Two, the relationships and associations do not become stale, and are easy to keep up-to-date because they are simply identifiers. The cache is not a tree of objects but rather a map of arrays.

If you are interested in some more details about the inner workings of the second level cache then the following text (taken from Ayende and only slightly edited by me) will be of interest to you:

NHibernate is design as an enterprise OR/M product, and as such, it has very good support for running in web farms scenarios. This support include running along side with distributed caches, including immediate farm wide updates.  NHibernate goes to great lengths to ensure cache consistency in these scenarios...

The way it works, NHibernate keeps three caches.

  • The entities cache - the entity data is disassembled and then put in the cache, ready to be assembled to entities again.
  • The queries cache - the identifiers of entities returned from queries, but no the data itself (since this is in the entities cache).
  • The update timestamp cache - the last time a table was written to.

The last cache is very important, since it ensures that the cache will not serve stale results.

Now, when we come to actually using the cache, we have the following semantics.

  • Each session is associated with a timestamp on creation.
  • Every time we put query results in the cache, the timestamp of the executing session is recorded.
  • The timestamp cache is updated whenever a table is written to, but in a tricky sort of way:
    • When we perform the actual writing, we write a value that is somewhere in the future to the cache. So all queries that hit the cache now will not find it, and then hit the DB to get the new data. Since we are in the middle of transaction, they would wait until we finish the transaction. If we are using low isolation level, and another thread / machine attempts to put the old results back in the cache, it wouldn't hold, because the update timestamp is into the future.
    • When we perform the commit on the transaction, we update the timestamp cache with the current value.

Now, let us think about the meaning of this, shall we?

If a session has perform an update to a table, committed the transaction and then executed a cache query, it is not valid for the cache. That is because the timestamp written to the update cache is the transaction commit timestamp, while the query timestamp is the session's timestamp, which obviously comes earlier.

The update timestamp cache is not updated until you commit the transaction! This is to ensure that you will not read "uncommitted values" from the cache.

Please note that if you open a session with your own connection, it will not be able to put anything in the cache (all its cached queries will have an invalid timestamp!)

In general, those are not things that you need to concern yourself with, but I spent some time today just trying to get tests for the second level caching working, and it took me time to realize that in the tests I didn't used transactions and I used the same session for querying as for performing the updates.

Collections and the second level cache

Let's assume the following scenario: we have a blog which can have many posts. See the diagram below.
Blog diagram
The corresponding code to define the entities is as follows

public class Blog
{
    public virtual int Id { get; set; }
    public virtual string Author { get; set; }
    public virtual string Name { get; set; }
    public virtual IList<Post> Posts { get; set; }
 
    public Blog()
    {
        Posts = new List<Post>();
    }
 
}
 
public class Post
{
    public virtual int Id { get; private set; }
    public virtual string Title { get; set; }
    public virtual string Body { get; set; }
}

and we can define the mapping of the blog and the post entity like this

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
                   namespace="Caching"
                   assembly="Caching">
  <class name="Blog">
    <cache usage="read-write"/>
    <id name="Id">
      <generator class="hilo"/>
    </id>
    <property name="Author"/>
    <property name="Name"/>
    <bag name="Posts" cascade="all" lazy="true">
      <cache usage="read-write"/>
      <key column="BlogId"/>
      <one-to-many class="Post"/>
    </bag>
  </class>
  
  <class name="Post">
    <cache usage="read-write"/>
    <id name="Id">
      <generator class="hilo"/>
    </id>
    <property name="Title"/>
    <property name="Body"/>
  </class>
</hibernate-mapping>

Note that I have added a <cache> element to the mapping for the Blog entity. This is enough to cache all simple Blog property values (e.g. Id, Name and Author) but not the state of associated entities or collections. Collections require their own <cache> element. In our case I added a <cache> element to the Posts collection (which is mapped as a bag). This cache will be used when enumeration the collection blog.Posts, for example. Please be aware that a collection cache only holds the identifiers of the associated post instances. That is, if we have a blog with three posts having id's 1,2 and 3 respectively then the second level cache will contain the values of simple properties of the blog and in addition an array with the ids {1,2,3}. If we require the post instances themselves to be cached, then we must enable caching of the Post class by adding a <cache> element to it's mapping.

Let me resume: by adding a <cache> element to the Blog, the Posts collection and the Post itself I have declared that I want NHibernate to cache not only my blog entities but also the associated Post collection in full detail.

Attention: dragons ahead

A common error (It happened to me as well!) is to forget to commit or omit a transaction when adding or changing an entity/aggregate to the database. If we now access the entity/aggregate from another session then the 2nd level cache will not be prepared to provide us the cached instances and NHibernate makes an (unexpected round trip to the database). The reason why this is the case is described in the chapter "Inside the second level cache" above.

What does this mean? Let's have a look at the code I use to setup the context for our unit tests regarding the Blog-->Posts problem.

blog = new Blog{ Author = "Gabriel", Name = "Keep on running"};
blog.Posts.Add(new Post{Title = "First post", Body = "Some text"});
blog.Posts.Add(new Post { Title = "Second post", Body = "Some other text" });
blog.Posts.Add(new Post { Title = "Third post", Body = "Third post text" });
using (var session = SessionFactory.OpenSession())
using(var tx = session.BeginTransaction())
{
    session.Save(blog);
    tx.Commit();        // important otherwise caching does NOT work!
}

In the above code I create a new blog having three assigned posts. The blog instance is then saved to the database inside a transaction. If I would omit the transaction or if I would forget to commit the transaction then the above samples would not work as expected and the 2nd level cache would not be used as desired.

Caching queries in the second level cache

We cannot only cache entities loaded by their respective unique id but also any query. For this we have to define the query as cacheable and set the desired cache mode. Let's have a look at a typical sample

[Test]
public void trying_to_cache_a_query()
{
    using (var session = SessionFactory.OpenSession())
    {
        Console.WriteLine("---> using query first time");
        var query = session
            .CreateQuery("from Blog b where b.Author = :author")
            .SetString("author", "Gabriel")
            .SetCacheable(true);
        var list = query.List<Blog>();
    }
    using (var session = SessionFactory.OpenSession())
    {
        Console.WriteLine("---> using query second time");
        var query2 = session
            .CreateQuery("from Blog b where b.Author = :author")
            .SetString("author", "Gabriel")
            .SetCacheable(true);
        var list2 = query2.List<Blog>();
    }
}

In the above sample I use the same query from different sessions. Please not that I have set cacheable to true for the query. In this case the query will be cached in the second level cache the first time it is executed. Any subsequent calls using the very same query will not hit the database. It is important to note however that if I change the value of the parameter(s) in the query then the query is reloaded again from the database. So it is the query and the set of parameter values that define the key under which the query is stored in the 2nd level cache.

The result of the above test looks as follows

 CachQuery1 

Of course I can also use named queries and cache them. A named query is defined inside a mapping file, e.g.

<query cacheable="true" cache-mode="normal" name="query1">
  <![CDATA[from Blog b where b.Name like :name]]>
  <query-param name="name" type="String"/>
</query>

The above query is called "query1" and has a single parameter called "name". The cache mode for this query is set to "normal". I can use such a query as follows

[Test]
public void trying_named_query()
{
    using (var session = SessionFactory.OpenSession())
    {
        Console.WriteLine("---> using named query first time");
        var list = session.GetNamedQuery("query1")
            .SetString("name", "Keep%")
            .List<Blog>();
    }
    using (var session = SessionFactory.OpenSession())
    {
        Console.WriteLine("---> using named query second time");
        var list2 = session.GetNamedQuery("query1")
            .SetString("name", "Keep%")
            .List<Blog>();
    }
}

The session object has a method GetNamedQuery to retrieve the query. The output produced by the above test is then

CachQuery2
If the content of the table on which the cached query is based is changed then the query is evicted from the second level cache and the next time the query is executed the query must be reloaded.

Cache Regions

If we don't use cache regions the second level cache can only be cleared as a whole. If you need to clear only part of the second level cache then use regions. Regions are distinguished by their name. One can put any number of different queries into a named cache region. The command to clear a cache region is as follows
    SessionFactory.EvictQueries("My Region");
where SessionFactory is the session factory instance currently used and "My Region" is the name of the cache region.

Source Code

The full source code used for this post can be found here.

Summary

NHibernate provides two types of caches. The first level cache and the second level cache. The first level cache is also called the identity map and is used not only to reduce the number of round trips to the database to improve the speed of an application but also to guarantee that there do not exist two distinct instances of an object having the very same id. NHibernate provides us two methods to load an entity by its unique id from the database. The Get method returns null if an entity with the given id does not exist or returns the fully loaded entity to the caller. The Load method on the other hand returns a proxy to the caller and only loads the entity from the database if another property than the identity is accessed. One can also call this a deferred load.

The second level cache is not used by default and should be used with caution. It can provide a huge scalability gain if used wisely but also reduce the performance of the overall system and introduce unnecessary complexity if used wrong.

The second level cache is related to the session factory, that is all session instances of a given session factory use the same 2nd level cache. That's differs from the behavior of the 1st level cache which is related to an individual session instance. One can cache individual entities or whole aggregates in the 2nd level cache. But one can also cache (complex and/or time consuming) queries in the 2nd level cache. The 2nd level cache can be fragmented into regions for a more fine grained control.

Enjoy.

Blog Signature Gabriel

posted @ Sunday, November 09, 2008 5:57 AM | Feedback (963)