Loading a complex object graph

Lately I had the joy to load a complex object graph from a (legacy) MS Access database. Some findings:

  • MS Access does not support the multi-query or multi-criteria API.
  • My first implementation using lazy load was very very slow since it created way to many database hits

The domain model

Let's have a look at the (simplified) domain model. The object graph I am talking about represents a chart with several data series in it. Each data series (or curve) consists of one to many segments. Between two successive segments there is always a segment transition.

image

In this simplified model I only have one curve in my chart. Each segment of the curve has some additional segment data and plot details. All but the last segment of the curve have also a transition (to a next segment). The transition also has plot details. Each segment and transition have a collection of data points. The SegmentPoint and TransitionPoint classes contain some additional data like a sort index, whether or not the data point is active or not, etc.

Now I can load this object graph in a default way and make it populate by NHibernate (which in turn uses lazy loading), or I can tune my NHibernate queries and use eager loading to populate the object graph.

Assume that my curve has 3 segments (and thus 2 transitions between the segments) and a total of 20 data points (distributed to the 3 segments and 2 transitions).

Using Lazy Loading

Let us use lazy loading as below

int chartId = ...;  //some id
var chart = _session.Get<Chart>(chartId);
DumpChart(chart);

Note that the DumpChart() method just transverses the whole object tree an accesses some of the properties of each object. This causes the objects to be (lazy) loaded if necessary.

In doing so I get A LOT of queries hitting the database! In total I have counted the incredible number of 35 select statements. If my curve would have had more segments and more data points the situation would have been even worse!

NHibernate: SELECT chart0_.Id as Id0_0_, chart0_.MainCurveId as MainCurv2_0_0_ FROM Chart chart0_ WHERE chart0_.Id=@p0; @p0 = '1'
NHibernate: SELECT curve0_.Id as Id1_0_, curve0_.Name as Name1_0_ FROM Curve curve0_ WHERE curve0_.Id=@p0; @p0 = '1'
NHibernate: SELECT segments0_.CurveId as CurveId1_, segments0_.Id as Id1_, segments0_.Id as Id2_0_, segments0_.Name as Name2_0_, segments0_.TransitionId as Transiti3_2_0_, segments0_.SegmentDataId as SegmentD4_2_0_, segments0_.PlotDetailsId as PlotDeta5_2_0_ FROM Segment segments0_ WHERE segments0_.CurveId=@p0; @p0 = '1'
NHibernate: SELECT points0_.SegmentId as SegmentId1_, points0_.Id as Id1_, points0_.Id as Id5_0_, points0_.DataPointId as DataPoin2_5_0_ FROM SegmentPoint points0_ WHERE points0_.SegmentId=@p0; @p0 = '1'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '1'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '2'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '3'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '4'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '5'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '6'
NHibernate: SELECT plotdetail0_.Id as Id4_0_, plotdetail0_.LineColor as LineColor4_0_ FROM PlotDetails plotdetail0_ WHERE plotdetail0_.Id=@p0; @p0 = '1'
NHibernate: SELECT segmentdat0_.Id as Id8_0_, segmentdat0_.Data as Data8_0_ FROM SegmentData segmentdat0_ WHERE segmentdat0_.Id=@p0; @p0 = '1'
NHibernate: SELECT transition0_.Id as Id3_0_, transition0_.Name as Name3_0_, transition0_.PlotDetailsId as PlotDeta3_3_0_ FROM Transition transition0_ WHERE transition0_.Id=@p0; @p0 = '1'
NHibernate: SELECT points0_.TransitionId as Transiti3_1_, points0_.Id as Id1_, points0_.Id as Id6_0_, points0_.DataPointId as DataPoin2_6_0_ FROM TransitionPoint points0_ WHERE points0_.TransitionId=@p0; @p0 = '1'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '7'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '8'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '9'
NHibernate: SELECT points0_.SegmentId as SegmentId1_, points0_.Id as Id1_, points0_.Id as Id5_0_, points0_.DataPointId as DataPoin2_5_0_ FROM SegmentPoint points0_ WHERE points0_.SegmentId=@p0; @p0 = '2'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '10'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '11'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '12'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '13'
NHibernate: SELECT plotdetail0_.Id as Id4_0_, plotdetail0_.LineColor as LineColor4_0_ FROM PlotDetails plotdetail0_ WHERE plotdetail0_.Id=@p0; @p0 = '3'
NHibernate: SELECT segmentdat0_.Id as Id8_0_, segmentdat0_.Data as Data8_0_ FROM SegmentData segmentdat0_ WHERE segmentdat0_.Id=@p0; @p0 = '2'
NHibernate: SELECT transition0_.Id as Id3_0_, transition0_.Name as Name3_0_, transition0_.PlotDetailsId as PlotDeta3_3_0_ FROM Transition transition0_ WHERE transition0_.Id=@p0; @p0 = '2'
NHibernate: SELECT points0_.TransitionId as Transiti3_1_, points0_.Id as Id1_, points0_.Id as Id6_0_, points0_.DataPointId as DataPoin2_6_0_ FROM TransitionPoint points0_ WHERE points0_.TransitionId=@p0; @p0 = '2'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '14'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '15'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '16'
NHibernate: SELECT points0_.SegmentId as SegmentId1_, points0_.Id as Id1_, points0_.Id as Id5_0_, points0_.DataPointId as DataPoin2_5_0_ FROM SegmentPoint points0_ WHERE points0_.SegmentId=@p0; @p0 = '3'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '17'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '18'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '19'
NHibernate: SELECT datapoint0_.Id as Id7_0_, datapoint0_.X as X7_0_, datapoint0_.Y as Y7_0_ FROM DataPoint datapoint0_ WHERE datapoint0_.Id=@p0; @p0 = '20'
NHibernate: SELECT plotdetail0_.Id as Id4_0_, plotdetail0_.LineColor as LineColor4_0_ FROM PlotDetails plotdetail0_ WHERE plotdetail0_.Id=@p0; @p0 = '5'

But fortunately I can do better!

Using Eager Loading

By using eager loading I can reduce the number of queries down to 3 (in this simplified sample). That's a factor of 10! How do I do that? Here is the code

var chartId = ...;    // some id of an existing chart
var chart = _session.CreateQuery("from Chart chart" +
                                 " inner join fetch chart.MainCurve mc" +
                                 " left join fetch mc.Segments s" +
                                 " left join fetch s.SegmentData sd" +
                                 " left join fetch s.PlotDetails spd" +
                                 " left join fetch s.Transition t" +
                                 " left join fetch t.PlotDetails tpd" +
                                 " where chart.Id=:id")
    .SetInt32("id", chartId)
    .UniqueResult<Chart>();
 
var chart2 = _session.CreateQuery("from Chart chart" +
                                  " inner join fetch chart.MainCurve mc" +
                                  " left join fetch mc.Segments s" +
                                  " left join fetch s.Points sp" +
                                  " left join fetch sp.DataPoint dp" +
                                  " where chart.Id=:id")
    .SetInt32("id", chartId)
    .UniqueResult<Chart>();
 
var chart3 = _session.CreateQuery("from Chart chart" +
                                  " inner join fetch chart.MainCurve mc" +
                                  " left join fetch mc.Segments s" +
                                  " left join fetch s.Transition t" +
                                  " left join fetch t.Points tp" +
                                  " left join fetch tp.DataPoint dp" +
                                  " where chart.Id=:id")
    .SetInt32("id", chartId)
    .UniqueResult<Chart>();
 
In the first query I load the curve with all its segments and transitions. I also load the additional segment data of each segment and the plot details of each segment and each transition.

In the second query I load all the data points assigned to each segment of the curve.

And in the third query I load all the data points assigned to each transition of the curve.

When I now call the DumpChart() method NO additional query is generated by NHibernate since the object graph is already fully populated by the above 3 queries.

Please note: the variables chart, chart2 and chart3 in the above code represent all the same object. NHibernate recognizes this since the 3 queries are all referencing the same chart object (that is: the chart object with the same id).

Sample Code

As usual you can find the code to this sample here.

Summary

NHibernate provides us an easy way to populate even a very complex object graph by using lazy loading. One just has to navigate through the object graph and when accessing a specific object (or its properties) NHibernate dynamically populates the object from the database if needed. The disadvantage of this is that NHibernate possibly generates a huge number of database requests until the whole object graph is fully populated.

By using the eager loading techniques provided by NHibernate I could dramatically improve the speed to populate a complex object graph.

Enjoy

Blog Signature Gabriel .

Print | posted on Tuesday, May 06, 2008 10:29 PM

Comments on this post

# re: Loading a complex object graph

Requesting Gravatar...
Hi Gabriel,

isn't it helpful to set the batch size for lazy loaded objects. That will reduce the database round trips also.

Regards
Left by Dirk on May 06, 2008 11:55 PM

# re: Loading a complex object graph

Requesting Gravatar...
@Dirk: yes, one can also "play" with the batch size to reduce the number of queries produced. But I prefer to eager load the object graph.
Left by Gabriel Schenker on May 07, 2008 12:07 AM

# re: Loading a complex object graph

Requesting Gravatar...
Gabriel,

I have to say that I disagree with this recommendation. Eagerly fetching this many relationships should be considered long and hard before doing it. Keep in mind that when doing an eager fetch you are effectively just doing an inner join sql query and then parsing the resulting table in C#.

Now lets say for simplicity that your chart class has 10 properties with an average size of 20 bytes. You have one main curve (obviously), the main curve has 20 segments, each segment has 20 sets of segment data, each segment data has 30 plot details, each segment has 10 transitions, each transition has 10 plot details.

The result of this query will be that the 200 bytes of data for the single chart will be repeated 20 * 20 * 30 * 10 * 10 = 1.2 Million times! The resulting size of the data (not including all of the data from the tables besides the chart) will be 240 MB.

The resulting result set would actually be much larger than that. Obviously transferring 240MB where all but 200 bytes is unnecessary duplication is less than ideal and just made the query performance far far worse than the result of running many queries.

I don't think the number of items in these collections are very large either.

All I want to say is that people should be careful when using this approach. In most cases you really do not want to eagerly fetch more than one or two collections. Usually I try to limit myself to just one.

Btw, for those who are curious, if we were to assume that each class was 10 properties each of about 20 bytes in size, the total result set of the query provided would be 252.622MB.
Left by John Chapman on May 07, 2008 1:06 PM

# re: Loading a complex object graph

Requesting Gravatar...
@John: I completely agree with you that one should think twice before using eager fetching. But in my situation it has improved the situation a lot. You might have noticed that I only eager load one (possibly big)collection at a time. These are the segment- and the transition points. On the other hand the number of segments is limited (always below 5) and so is the number of transitions (below 4). Each segment and each transition exactly has ONE plot-details (the size of each is less than 100 bytes) and each segment as exactly one segment data object (less that 200 bytes).

So your calculation is not correct for the sample given in the post!
On the other hand the situation you describe can arise if one does not pay attention to the given sample and tries to e.g. load several (possibly big) disjunct collection in one single query

But back to the sample of the post; let's see:

If we have one curve with 4 segments in the chart then the first query gives me

1 chart * 1 curve * 4 segments = 4 records (!)

even if each record is several kB in size this is no problem for the db (and the preformance)

If I now have 20 data points per segment and 10 per transition then the second and third query will consist of

1 chart * 4 segments * 20 data points = 80 records

1 chart * 4 segments * 1 transition * 10 data points = 40 records

again I see no problem.
Left by Gabriel Schenker on May 08, 2008 1:03 AM

# re: Loading a complex object graph

Requesting Gravatar...
Gabriel,

I'm not saying it can't work. I'm really trying to stress that this type of things should be avoided in most scenarios.

Additionally, I know you are aware of this (there was an e-mail to the effect) in your case all 3 chart objects are the same reference. You don't actually need 3 of them. This is a function of the identity map.
Left by John Chapman on May 08, 2008 6:24 AM

# re: Loading a complex object graph

Requesting Gravatar...
Hi John,

There is a third strategy for loading your object graph: subselects. By using it, you avoid the N+1 query problem - in fact, you will have a separate query for each entity - and you work around the exponential growth in size due to the duplication of the "denormalized" data.

Subselect fetching works quite well in NHibernate and it's my personal favorite. But, as with so many problems, the the optimal solution highly depends on the situation.
Left by Tolomaüs on May 12, 2008 8:02 AM

Your comment:

 (will show your gravatar)
 
Please add 4 and 3 and type the answer here: