Linq to Object with Cross Join

It’s possible to perform a Cross Join with Linq. A Cross Join is a Cartesian product. It means that if between 2 sets of value all value will join one time the other set.

For example:

If we have one set with the letter A and B, and a set with C, D, E than the result should be : A-C, A-D, A-E, B-C, B-D, B-E. Let do it in code.

var x = new string[] { "A", "B" };
var y = new string[] { "C", "D", "E" };
var cross = from x1 in x
			from y1 in y
			select new {x1,y1};
foreach (var output in cross)
{
	Console.WriteLine(string.Format("{0} - {1}",output.x1, output.y1));
}
Console.Read();

This console application output:

CrossJoin's Result

CrossJoin's Result

To conclude, cross join are pretty straight forward with Linq. This example was done with Linq to Object but can be also be done with Linq to Sql query.

Linq GroupJoin and Join differences

Linq let you query a collection and one of its possible action is the GroupJoin.

A GroupJoin is in SQL what we call a “Left Outer JOIN” while a Join in SQL refer to “Inner Join”. In short, a GroupJoin will do a link between 2 entities even if the right side of the link has nothing to link to. In contrast, the Join will link 2 entities only if both entities contain a link between them.

To demonstrate the GroupJoin and the Join, a small example with Northwind Database will be used. This database schema and data is free at Microsoft. Once downloaded, this will create a SQL file that require to be executed. Once executed, you will see multiples tables. For our example, only the table Customers and Orders will be used.

Northwind's Customers and Orders schema

Northwind's Customers and Orders schema

The Customers can contain multiple Orders. The Order contain a link to the Customer.

We could have been used any type of project, but to be short I have used a ASP.NET Webform and have created a ASPX page with the containing HTML:

</pre>
<h1>Group EntityFramework</h1>
<pre>

On the code-behind the code is executed:


NorthwindEntities db = new NorthwindEntities();

var source = db.Customers.Where(c => c.CustomerID.StartsWith("A"))

			.GroupJoin(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, o)

			 => new { Customer = c, Order = o });

string html = string.Empty;

foreach(var cust in source)

{

	html+="</pre>
<h2>"+cust.Customer.CompanyName+"</h2>
"; html+=""; foreach (var order in cust.Order) { html+=""; } html += "
<table>
<tbody>
<tr>
<th>Order id</th>
<th>Order date</th>
</tr>
<tr>
<td>" + order.OrderID + "</td>
<td>"+order.OrderDate+"</td>
</tr>
</tbody>
</table>
<pre>";

}

this.groupEF.Controls.Add(new LiteralControl(html));

sqlGroupef.Text = (source as ObjectQuery).ToTraceString();

Alright, so the LINQ query select all client that start with the letter A to filter a little bit the amount of returning data and then join to this result all orders. The second parameter of the GroupJoin is the “On” statement that we use to see in SQL. It tells how to join the two entities. Once it’s done, the code within the foreach statement loop all customers and the inner foreach loop and order.

The output look like this:

Join Statement

Linq Join without On

This is generated with almost the same code as before but instead of using GroupJoin only the Join is used. Also, since no grouping is done, the second foreach is not needed.

NorthwindEntities db = new NorthwindEntities();
var source = db.Customers.Where(c => c.CustomerID.StartsWith("A"))
			.Join(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, o)
			 => new { Customer = c, Order = o });
string html = string.Empty;
foreach(var cust in source)
{
	html+="</pre>
<h2>"+cust.Customer.CompanyName+"</h2>
"; html+=""; html+=""; html+="
<table>
<tbody>
<tr>
<th>Order id</th>
<th>Order date</th>
</tr>
<tr>
<td>" + cust.Order.OrderID + "</td>
<td>"+ cust.Order.OrderDate+"</td>
</tr>
</tbody>
</table>
<pre>
";
}
this.groupEF.Controls.Add(new LiteralControl(html));

sqlGroupef.Text = (source as ObjectQuery).ToTraceString();

The output produce a cross join without any grouping. This is why we can see the Customer repeated in the output. If we want to group the result of the join without using GroupJoin it’s possible. In fact, it’s possible to create the same output with Join if using On statement and Equal the 2 entities together. Here is the change required :

NorthwindEntities db = new NorthwindEntities();
var source = from c in db.Customers
			 where c.CustomerID.StartsWith("A")
			 join o in db.Orders on c.CustomerID equals o.CustomerID into g
			 select new { Customer = c, Order = g };
string html = string.Empty;
foreach(var cust in source)
{

	html+="</pre>
<h2>"+cust.Customer.CompanyName+"</h2>
"; html+=""; foreach (var order in cust.Order) { html+=""; } html += "
<table>
<tbody>
<tr>
<th>Order id</th>
<th>Order date</th>
</tr>
<tr>
<td>" + order.OrderID + "</td>
<td>"+ order.OrderDate+"</td>
</tr>
</tbody>
</table>
<pre>
";

}
this.groupEF.Controls.Add(new LiteralControl(html));

sqlGroupef.Text = (source as ObjectQuery).ToTraceString();

To be sure that what we state is real, let compare the generated MSIL for the two Linq. The first one is the GroupJoin and the second one is the Join with a On and Into.

var source = db.Customers.Where(c => c.CustomerID.StartsWith("A"))
			.GroupJoin(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new { Customer = c, Order = o });
var source2 = from c in db.Customers
			 where c.CustomerID.StartsWith("A")
			 join o in db.Orders on c.CustomerID equals o.CustomerID into g
			 select new { Customer = c, Order = g };

Both generate the exact same MSIL:

IQueryable arg_244_0 = db.Customers;
ParameterExpression parameterExpression = Expression.Parameter(typeof(Customers), "c");
IQueryable arg_381_0 = arg_244_0.Where(Expression.Lambda>(Expression.Call(Expression.Property(parameterExpression, (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_CustomerID()))), (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(StartsWith())), new Expression[]
{
	Expression.Constant("A", typeof(string))
}), new ParameterExpression[]
{
	parameterExpression
}));
IEnumerable arg_381_1 = db.Orders;
parameterExpression = Expression.Parameter(typeof(Customers), "c");
Expression> arg_381_2 = Expression.Lambda>(Expression.Property(parameterExpression, (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_CustomerID()))), new ParameterExpression[]
{
	parameterExpression
});
parameterExpression = Expression.Parameter(typeof(Orders), "o");
Expression> arg_381_3 = Expression.Lambda>(Expression.Property(parameterExpression, (MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_CustomerID()))), new ParameterExpression[]
{
	parameterExpression
});
parameterExpression = Expression.Parameter(typeof(Customers), "c");
ParameterExpression parameterExpression2 = Expression.Parameter(typeof(IEnumerable), "o");
var source = arg_381_0.GroupJoin(arg_381_1, arg_381_2, arg_381_3, Expression.Lambda(Expression.New((ConstructorInfo)MethodBase.GetMethodFromHandle(ldtoken(.ctor()), typeof(<>f__AnonymousType0>).TypeHandle), new Expression[]
{
	parameterExpression,
	parameterExpression2
}, new MethodInfo[]
{
	(MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_Customer()), typeof(<>f__AnonymousType0>).TypeHandle),
	(MethodInfo)MethodBase.GetMethodFromHandle(ldtoken(get_Order()), typeof(<>f__AnonymousType0>).TypeHandle)
}), new ParameterExpression[]
{
	parameterExpression,
	parameterExpression2
}));

The file compare result show exact same code:

MSIL are exactly the same for GroupJoin or Join

MSIL are exactly the same for GroupJoin or Join

To conclude, the GroupJoin is a short lamba version of the Join statement with the On and Into. It’s possible to do a simple Join with only Join statement. Linq let you do other type of Join like Cross Join.

Differences between Join, Inner Join, Left Join, Left Outer Join, Right Join, etc

Since few months I have notice that some people near of me still do their join between table in the Where clause. It works even if it has its limitations but the biggest problem come for the maintenance. Later, a big query become hard to modify because the criterion to refine the upcoming data are mixed up with joining where clauses.

To solve this problem, it’s simpler to use Join keyword for joining table between each of them. The Where clause is still important to refine the data. In this post, I will write the difference between all possible Join from a SQL perspective and write a little bit about their Linq syntax homologous. After all, we are a .Net blog!

In Sql, some statement are the same. The only difference is how it’s written.

Join and Inner Join

To begin with, a query with Join is the same thing as Inner Join. What is a Join or a Inner Join? It’s what people are doing with the Where clause, it compares two keys and join them only if a relationship exist. Here is a simple example with the Join done inside the Where clause:

SELECT EmployeeTerritories.EmployeeID AS 'ET.EMPID'
	,  EmployeeTerritories.TerritoryID AS 'ET.TID'
	,  Territories.TerritoryID AS 'T.TID'
FROM EmployeeTerritories, Territories
WHERE EmployeeTerritories.TerritoryID = Territories.TerritoryID
AND EmployeeTerritories.EmployeeID

These tables can be found in the Microsoft NorthWind database. The line 3 does the join when the line 4 refines the query. The database’s table schema below illustrate you the relationship between the table EmployeeTerritories and Territories.

Table schema for Territories

Tables Schema

This produce :

 
ET.EMPID|ET.TID|T.TID
---------------------
1	06897	06897
1	19713	19713
2	01581	01581
2	01730	01730
2	01833	01833
2	02116	02116
2	02139	02139
2	02184	02184
2	40222	40222
3	30346	30346
3	31406	31406
3	32859	32859
3	33607	33607

The same result is output using the Join keyword or the Inner Join.

SELECT EmployeeTerritories.EmployeeID AS 'ET.EMPID'
	,  EmployeeTerritories.TerritoryID AS 'ET.TID'
	,  Territories.TerritoryID AS 'T.TID'
FROM EmployeeTerritories
JOIN Territories
ON Territories.TerritoryID = EmployeeTerritories.TerritoryID
WHERE EmployeeTerritories.EmployeeID <= 3

Or

SELECT EmployeeTerritories.EmployeeID AS 'ET.EMPID'
	,  EmployeeTerritories.TerritoryID AS 'ET.TID'
	,  Territories.TerritoryID AS 'T.TID'
FROM EmployeeTerritories
INNER JOIN Territories
ON Territories.TerritoryID = EmployeeTerritories.TerritoryID
WHERE EmployeeTerritories.EmployeeID <= 3

Left Join and Left Outer Join

These two are also the same. In fact, the Outer keywords is implicit everytime in a SQL query.

The following query return exactly the same as the Join but it has a reason why.

SELECT EmployeeTerritories.EmployeeID AS 'ET.EMPID'
	,  EmployeeTerritories.TerritoryID AS 'ET.TID'
	,  Territories.TerritoryID AS 'T.TID'
FROM EmployeeTerritories
LEFT JOIN Territories
ON Territories.TerritoryID = EmployeeTerritories.TerritoryID
WHERE EmployeeTerritories.EmployeeID <= 3

The reason is that all EmployeeTerritories have their territories defined in the Territories table. Let say that in the future a territory is deleted. The territory 19713 is deleted but not in EmployeeTerritories table. The result will be :

ET.EMPID|ET.TID|T.TID
---------------------
1	06897	06897
1	19713	NULL
2	01581	01581
2	01730	01730
2	01833	01833
2	02116	02116
2	02139	02139
2	02184	02184
2	40222	40222
3	30346	30346
3	31406	31406
3	32859	32859
3	33607	33607

You can see that the T.TID return NULL. Left Join or Left Inner Join let you return all data from the Where criterion and try to Join. If data is found, it will output them, otherwise, it will return NULL. This is different from Inner Join or Join that would have simply not return this row.

Right Join or Right Outer Join

The Right Join statement also known as Right Outer Join is the mirror of the Left Outer Join. That mean that all results are from from the Joined table and then the join is try. If no data is found, NULL is returned.

Cross Join

Cross Join is a Cartesian product. That mean that all data from the left side and join them to the right. The previous example with the Where statement that is query to ‘<=3' has 3 EmployeeTerritories. The Join Territories table contains 53 territories. If we execute a Group Statement to know how many Territories the EmployeeTerritories have per EmployeeID this will give you the number to multiply to 53 to know the amount or row returned. [sql] SELECT EmployeeID, count(TerritoryID) FROM EmployeeTerritories WHERE EmployeeTerritories.EmployeeID <= 3 GROUP BY EmployeeID [/sql]

EmployeeID|count(TerritoryID)
——————————
1	    2
2	    7
3	    4

This give 13 totals amount of Employees/Territories that will be Cross Joined to 53 territories. For a total of 689 rows. Here is the Cross Join query:

SELECT distinct(EmployeeTerritories.EmployeeID) AS 'ET.EMPID'
	,  EmployeeTerritories.TerritoryID AS 'ET.TID'
	,  Territories.TerritoryID AS 'T.TID'
FROM EmployeeTerritories
CROSS JOIN Territories
WHERE EmployeeTerritories.EmployeeID <=3

As you can see, the CROSS JOIN does not require to have the ON keyword because all rows are joined.

Full Join and Full Outer Join

This join is combining the Left Join and Right Join. In fact, even some database does not support Full Join (like SqlLite) but Microsoft SQL Server does.

The same query modified for Full Join return 13 rows also.

SELECT EmployeeTerritories.EmployeeID AS 'ET.EMPID'
	,  EmployeeTerritories.TerritoryID AS 'ET.TID'
	,  Territories.TerritoryID AS 'T.TID'
FROM EmployeeTerritories
FULL JOIN Territories
ON Territories.TerritoryID = EmployeeTerritories.TerritoryID 
WHERE EmployeeTerritories.EmployeeID <= 3

This return the same thing because as the Left and Right example, it had always a joined value on the other table. But with a Full Join, it can be possible to see returned value with NULL in the left or in the right column. Result could be like this:

1    - 2
NULL - 3
4    - NULL

Conclusion

All those Joins can be executed in Linq. Most of them have a similar syntax except the Full Join that is more tricky. In an other post, default will be provided for all those joins.

ILSpy is an open-source .NET assembly browser and decompiler that is free

Few years ago, Lutz Roeder has developed an assembly decompiler that was very popular. So much popular that Red Gate Software has bought the software and since than it’s not anymore free.

Good news, a free open source alternative exist. This alternative is ILSpy.

IlSpy ScreenShot

Screenshot of IlSpy in action

This tool let you select an assembly and it will decompile it. This can also work with Microsoft Framework assembly. To select one of the .Net Framework, you need to load the desired DLL of from : C:\Windows\Microsoft.NET\Framework\v4.0.30319 for .Net 4 framework DLL.

For framework 2.0, the location is : C:\Windows\Microsoft.NET\Framework\v2.0.50727. It’s also possible to get 64 bits version of the code. 64 bits .Net DLL are located to the directory C:\Windows\Microsoft.NET\Framework64\.

Last thing, this tool let you also see the IL code. IL Disassembly mean Intermediary Language. It’s also known as MSIL for MicroSoft Intermediary Language. Seeing the IL is interesting to see how the C# compiler has produce code that will be read by the Virtual Machine later. This is a good way to optimized query by comparing the produced code.

For your personal information, Microsoft .Net Framework come with a tool called : Ildasm.exe. This tool can be used as a command line or with its graphical interface.

ILAsm tool

ILASM tool

 

How to log with Entity Framework 4.0?

Logging with Entity Framework is not the same as logging with Linq to Sql. First, you need to have a .edx file with at least one entity declared on it because to be able to log, a query must be executed. Like the Linq to Sql, to have a log the query needs to be run and for that it needs to be executed. Here is a snippet of code showing how to log with Entity Framework.

NorthwindEntities db = new NorthwindEntities();
var source = db.Customers.Where(c => c.CustomerID.StartsWith("A"));
this.gridEF.DataSource = source;
this.gridEF.DataBind();
sqlEF.Text = (source as ObjectQuery).ToTraceString();

The first line contain the Entity Framework ObjectContext that need to be created first. This is done by adding with Visual Studio a new Item of ADO.NET Entity Data Model. To get the log file, after the execution a call to ToTraceString() is required. This require a cast to ObjectQuery to the Query. The reason is that the Where clause return a IQueryable. The ObjectQuery inherit the IQueryable and also a lot other interface :

public class ObjectQuery<T> : ObjectQuery,
	IOrderedQueryable<T>, IQueryable<T>, IEnumerable<T>, IOrderedQueryable, IQueryable,
	IEnumerable, IListSource

The ToTraceString() come from the inherited class ObjectQuery.

The output of ToTraceString() is the SQL statement executed. Here is an example:

SELECT [Extent1].[CustomerID] AS [CustomerID], [Extent1].[CompanyName] AS [CompanyName]
FROM [dbo].[Customers] AS [Extent1]
WHERE [Extent1].[CustomerID] LIKE N'A%'

This is interesting if we compare the generated Sql from the Entity Framework and from the Linq to Sql. The  main difference is that Linq to SQL will use parameters substitution instead of directly add the value to the query. You can see it in the Where clause of the Sql query. The Linq to Sql code looks like:

WHERE [t0].[CustomerID] LIKE @p0

And the Entity Framework looks like:

WHERE [Extent1].[CustomerID] LIKE N'A%'

To conclude, if you want to log Entity Framework generated Sql Query do not forget to add a condition statement when you move into your production environment.

How to log with Linq to Sql?

It’s possible to log all SQL Query generated by Linq to Sql with C# code. This can be useful when debugging or to optimize how the Linq is done.

DataClasses1DataContext db = new DataClasses1DataContext();
StringWriter sw = new StringWriter();
db.Log = sw;

this.gridLinqToSql.DataSource = db.Customers.Where(c => c.CustomerID.StartsWith("A"));
this.gridLinqToSql.DataBind(); //Here is when the Linq query will be executed.

sqlLinqToSql.Text = sw.GetStringBuilder().ToString();

Linq to Sql use a DataContext object that is generated with the .dbml file. For example, in this test project, the .dbml file name is DataClasses1.dbml. From there, this object contain a Log property that takes a TextWriter. The TextWriter class come from the System.IO namespace and it’s an abstract class. The namespace contain two concrete classes that inherit from this class. The first one is StringWriter that has been used in the example above. This class has a StringBuilder and when using the property GetStringBuilder() of the StringWriter let you get a reference on it. From there, using the ToString() method give a String with all the SQL used to get the information.

The result will look like this:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] LIKE @p0

-- @p0: Input NVarChar (Size = 4000; Prec = 0; Scale = 0) [A%]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 4.0.30319.1

The Log return the SQL statement (or many SQL statement if many calls has been done) with a parameter way (@p…). After the SQL Statement, each parameter details are written like the type, and the size of the SQL parameter.

To conclude, using the Log property is interesting to get the SQL without using a SQL Profiler. Of course, a good practice would be to not log all the queries in production environment.