Linq GroupJoin and Join differences<!-- --> | <!-- -->Patrick Desjardins Blog
Patrick Desjardins Blog
Patrick Desjardins picture from a conference

Linq GroupJoin and Join differences

Posted on: August 29, 2011

Linq let you query a collection and one of its possible action is the GroupJoin.

A GroupJoin is in SQL what we call a "Left Outer JOIN" while a Join in SQL refer to "Inner Join". In short, a GroupJoin will do a link between 2 entities even if the right side of the link has nothing to link to. In contrast, the Join will link 2 entities only if both entities contain a link between them.

To demonstrate the GroupJoin and the Join, a small example with Northwind Database will be used. This database schema and data is free at Microsoft. Once downloaded, this will create a SQL file that require to be executed. Once executed, you will see multiples tables. For our example, only the table Customers and Orders will be used.

customersandordersNorthwind 257x300

The Customers can contain multiple Orders. The Order contain a link to the Customer.

We could have been used any type of project, but to be short I have used a ASP.NET Webform and have created a ASPX page with the containing HTML:

2<h1>Group EntityFramework</h1>

On the code-behind the code is executed:

1NorthwindEntities db = new NorthwindEntities();
3var source = db.Customers.Where(c => c.CustomerID.StartsWith("A"))
4 .GroupJoin(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { Customer = c, Order = o });
6string html = string.Empty;
7foreach(var cust in source){
8 html+="</pre> <h2>"+cust.Customer.CompanyName+"</h2> ";
9 html+="";
10 foreach (var order in cust.Order) {
11 html+="";
12 }
13 html += " <table> <tbody> <tr> <th>Order id</th> <th>Order date</th> </tr> <tr> <td>" + order.OrderID + "</td> <td>"+order.OrderDate+"</td> </tr> </tbody> </table> <pre>";
16this.groupEF.Controls.Add(new LiteralControl(html));
18sqlGroupef.Text = (source as ObjectQuery).ToTraceString();

Alright, so the LINQ query select all client that start with the letter A to filter a little bit the amount of returning data and then join to this result all orders. The second parameter of the GroupJoin is the "On" statement that we use to see in SQL. It tells how to join the two entities. Once it's done, the code within the foreach statement loop all customers and the inner foreach loop and order.

The output look like this:

join 139x300

This is generated with almost the same code as before but instead of using GroupJoin only the Join is used. Also, since no grouping is done, the second foreach is not needed.

1NorthwindEntities db = new NorthwindEntities();
2var source = db.Customers.Where(c => c.CustomerID.StartsWith("A"))
3 .Join(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { Customer = c, Order = o });
4string html = string.Empty;
5foreach(var cust in source) {
6 html+="</pre> <h2>"+cust.Customer.CompanyName+"</h2> ";
7 html+="";
8 html+="";
9 html+=" <table> <tbody> <tr> <th>Order id</th> <th>Order date</th> </tr> <tr> <td>" + cust.Order.OrderID + "</td> <td>"+ cust.Order.OrderDate+"</td> </tr> </tbody> </table> <pre> ";
11this.groupEF.Controls.Add(new LiteralControl(html));
12sqlGroupef.Text = (source as ObjectQuery).ToTraceString();

The output produce a cross join without any grouping. This is why we can see the Customer repeated in the output. If we want to group the result of the join without using GroupJoin it's possible. In fact, it's possible to create the same output with Join if using On statement and Equal the 2 entities together. Here is the change required :

1NorthwindEntities db = new NorthwindEntities();
2var source = from c in db.Customers where c.CustomerID.StartsWith("A") join o in db.Orders on c.CustomerID equals o.CustomerID into g select new { Customer = c, Order = g };
3string html = string.Empty; foreach(var cust in source) {
4 html+="</pre> <h2>"+cust.Customer.CompanyName+"</h2> ";
5 html+="";
6 foreach (var order in cust.Order) {
7 html+="";
8 }
9 html += " <table> <tbody> <tr> <th>Order id</th> <th>Order date</th> </tr> <tr> <td>" + order.OrderID + "</td> <td>"+ order.OrderDate+"</td> </tr> </tbody> </table> <pre> ";
11this.groupEF.Controls.Add(new LiteralControl(html));
13sqlGroupef.Text = (source as ObjectQuery).ToTraceString();

To be sure that what we state is real, let compare the generated MSIL for the two Linq. The first one is the GroupJoin and the second one is the Join with a On and Into.

1var source = db.Customers.Where(c => c.CustomerID.StartsWith("A"))
2 .GroupJoin(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { Customer = c, Order = o });
1var source2 = from c in db.Customers
2 where c.CustomerID.StartsWith("A")
3 join o in db.Orders on c.CustomerID equals o.CustomerID
4 into g select new { Customer = c, Order = g };

Both generate the exact same MSIL:

1IQueryable arg_244_0 = db.Customers;
2 ParameterExpression parameterExpression = Expression.Parameter(typeof(Customers), "c");
3 IQueryable arg_381_0 = arg_244_0.Where(Expression.Lambda>(Expression.Call(Expression.Property(parameterExpression, (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_CustomerID()))), (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(StartsWith())), new Expression[] { Expression.Constant("A", typeof(string)) }), new ParameterExpression[] { parameterExpression })); IEnumerable arg_381_1 = db.Orders;
4 parameterExpression = Expression.Parameter(typeof(Customers), "c");
5 Expression> arg_381_2 = Expression.Lambda>(Expression.Property(parameterExpression, (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_CustomerID()))), new ParameterExpression[] { parameterExpression });
6 parameterExpression = Expression.Parameter(typeof(Orders), "o");
7 Expression> arg_381_3 = Expression.Lambda>(Expression.Property(parameterExpression, (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_CustomerID()))), new ParameterExpression[] { parameterExpression });
8 parameterExpression = Expression.Parameter(typeof(Customers), "c");
9 ParameterExpression parameterExpression2 = Expression.Parameter(typeof(IEnumerable), "o");
10 var source = arg_381_0.GroupJoin(arg_381_1, arg_381_2, arg_381_3, Expression.Lambda(Expression.New((ConstructorInfo)MethodBase.GetMethodFromHandle(ldtoken(.ctor()), typeof(<>f__AnonymousType0>).TypeHandle), new Expression[] { parameterExpression, parameterExpression2 }, new MethodInfo[] { (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_Customer()), typeof(<>f__AnonymousType0>).TypeHandle), (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_Order()), typeof(<>f__AnonymousType0>).TypeHandle) }), new ParameterExpression[] { parameterExpression, parameterExpression2 }));

The file compare result show exact same code:

filematch 300x164

To conclude, the GroupJoin is a short lamba version of the Join statement with the On and Into. It's possible to do a simple Join with only Join statement. Linq let you do other type of Join like Cross Join.