Submitted by BillB on 4/21/2010
(If you have a minute, please add a comment when you're done so I know if I'm presenting this well enough.)
Extension Methods and LINQ
4/21/2010Contents
- Extension Method Syntax
- Chained Calls and a Fluent Interface
- Query Syntax
- Query Syntax Transformed into Extension Method Calls
- Extension Methods on IEnumerable
- Yield and Deferred Execution
- Anonymous Types and Var
- Play Time
Extension methods in C# 3.0 aren't just about extending the functionality of other types, though in that role they add much to C#. They're also about LINQ where they lay beneath query language syntax and allow the piping, aka chaining, of the result of one operation into another. I learned about it from Jon Skeet's great book, C# In Depth. As I understand it, the compiler "transforms", (Skeet's word), SQL-like syntax, called query syntax, into corresponding extension method calls on whatever list of data is being worked with. (Note that I'm talking about LINQ to Objects, as opposed to LINQ to SQL.) The data must be of a type that implements the IEnumerable
I'll start with a quick review of extension methods and throw in the concept of fluent interfaces, (just because I thought the concept was interesting and afterall, LINQ is one). Then I'll show a little query syntax and how it's transformed into extension method calls on IEnumerable
Extension Method Syntax 
The type that the extension method operates on is indicated by the first parameter that follows the this keyword. So, you won't find an extension method with no parameters; you have to have at least one. Also, the method has to be in a static class that's non-generic and non-nested. Since it's in a static class, the method itself also has to be static. The return type can be anything. There are a few other requirements I won't bore you with.
Here's an extension method on the int type, which doubles the int it operates on:
{
public static int DoubleMe(this int i)
{
return (i * 2);
}
}
You call the extension method as if it were an instance method. You don't supply an argument for the first parameter as it's only there to indicate the type that the method operates on.
int myIntDoubled = myInteger.DoubleMe();
// Directly on an int
int m = 4.DoubleMe();
// Like a regular static method
int n = IntExtensions.DoubleMe(6);
{
return (i + toBeAdded);
}
int myIntegerPlus9 = myInteger.AddThisToMe(9);
Chained Calls and a Fluent Interface 
Remember I said that the return value on an extension method can be anything? That's true but if you want to be able to chain your extension method calls, the return value has to be of the type that the next chained call will operate on.
Since our DoubleMe method returns an int and operates on an int, calls to DoubleMe can be chained to one another:
There's a concept of a fluent interface, which is code designed to resemble natural language, (English), making it easy for people to read. Let's say you want to code a system to convert lengths between different measuring systems; inches to centimeters, meters to miles, etc. Using extension methods, you might come up with something like this: (The numbers are type double, so the first dot, as in 10.0, is part of the value, not a dot operator. The dot in .inches, IS a dot operator.)
// or
double tenMetersInMiles = 10.0.meters().inMiles();
namespace LengthConversionExtensionMethods
{
class Program
{
static void Main(string[] args)
{
double myNumber = 10;
double myNumberInCentimeters = myNumber.inches().inCentimeters();
Console.WriteLine("My number in cm, " + myNumber.inches().inCentimeters());
// And I can do this:
Console.WriteLine("20.5 number in cm, " + 20.5.inches().inCentimeters());
Console.ReadKey();
}
}
public struct LengthValue
{
public double value;
public string units;
public string system;
}
public static class LengthConversions
{
public static LengthValue inches(this double numInches)
{
LengthValue val = new LengthValue();
val.value = numInches;
val.system = "English";
val.units = "inches";
return val;
}
public static double inCentimeters(this LengthValue value)
{
double returnVal = 0;
if (value.system == "English")
{
if (value.units == "inches")
{
returnVal = value.value * 2.54;
}
}
// blah blah.....
return returnVal;
}
}
}
Kind of dumb but you get the idea. This was just an experiment for me. The point is that you can design a fluent interface achieved with the use of extension methods. That's exactly what LINQ is. But LINQ goes one step better; all those dots and parentheses detract from good fluency. Still, it's kind of cool to be able to create your own fluent interface; it's almost a language, (a Domain Specific Language, DSL), in itself. Let's look at some LINQ syntax and then we'll see how the compiler transforms it into the extension method calls found in the Enumerable class.
Query Syntax 
SQL language for relational databases is an example of a fluent interface and LINQ To Objects is designed with SQL very much in mind. Here's some SQL:
Select CustomerName From Customers Where CustomerCity = "Seattle"
It looks a lot like an English sentence. You can see why Microsoft used ANSI SQL as a model for C#'s LINQ syntax; they wanted to bridge the gap between C# and SQL, (the so-called impedance mismatch), AND provide a common language for dealing with data no matter where it's stored or in what format. Here's an example of LINQ to objects, where we're selecting from a list of person objects all the objects where the person's age is > 70.
IEnumerable<Person> olderPeople = from person in people
where person.Age > 70
select person;
Console.WriteLine("\r\n----- People Going Strong -----");
foreach (Person person in olderPeople)
{
Console.WriteLine(person.ToString());
}
- from - a query operator
- from person in people - a from clause
- people - data source
- person - range variable (You can add variables with a let clause)
Query Syntax Transformed into Extension Method Calls 
Here's where extension methods come in. The compiler makes a first pass during which LINQ syntax is transformed into LINQ extension method calls. Then the extension method calls created by the first pass are compiled as if you had coded them directly.
Here's an example.
IEnumerable<Person> olderPeople = from p in people
where p.Age > 70
select p;
Console.WriteLine("\r\n----- Older People -----");
foreach (Person p in olderPeople)
{
Console.WriteLine(p.ToString());
}
.Where(p => p.Age > 70)
.Select(p => p);
//.Select( delegate (Person p) { return p; } );
Time to look at a few of these extension methods and make sense of these arguments, like p => p. I have another article on delegates if you need more background.
Extension Methods on IEnumerable 

The IEnumerable
Here's the Where extension method:
this IEnumerable<TSource> source,
Func<TSource, bool> predicate
)
It returns an IEnumerable and takes a Func(TSource, bool), which is a delegate that takes one argument and returns a bool. There's a delegate type already in .Net, System.Predicate that would work too but MSDN shows Func(TSource, bool) instead. (See this post on StackOverflow for why.) The example returns people over 70, so the predicate is a check for p.Age > 70.
Here's one override for Select():
this IEnumerable<TSource> source,
Func<TSource, TResult> selector
)
It returns an IEnumerable and takes a Func(TSource, TResult), which is a delegate that takes one argument and returns something. The example returns person objects so it takes a person object and returns the same person object. If we just wanted names back, it would have been p => p.name.
Yield and Deferred Execution 
The IEnumerable<Person> olderPeople doesn't contain the results after the LINQ query statement is done. It's the foreach, using the c# 2.0 yield keyword that actually starts producing data as the query is iterated over. Deferred execution increases performance of queries on massive collections.
You can imagine that the efficiency of a query depends on the query iteself. A simple where/select will go through the collection once, returning, (yielding), a value for each one it reads in the collection. An Orderby will have to run through the whole collection before it returns anything. Often, temporary data structures are created in memory to process the query.
Anonymous Types and Var 
What if you want to return something that's not really part of the data you're querying? This is where anonymous types are handy.
If you declare an object with var the compiler will usually figure out what type it is. Here I declare the variable, allPeople using var instead of IEnumerable
select p;
foreach (var p in allPeople)
{
Console.WriteLine(p.ToString());
}
The compiler will give allPeople, a type of IEnumerable
select new { Age_Doubled = (p.Age * 2) };
foreach (var thingy in results)
{
Console.WriteLine(thingy.ToString());
}
// Output:
{ Age_Doubled = 100 }
{ Age_Doubled = 120 }
Play Time
Here's a little console app I started playing with to solidify the concepts for myself. You can copy and paste it into a console app and start messing with it to get a feel for this stuff. It's pretty much what you've already seen above but in a form that's easily pasted.
The code takes the following course:
- Basic Extension method example
- Extension methods chained
- Basic LINQ query syntax example
- Why var can be handy (Just threw this in for kicks)
- Transforming the query syntax to regular C# extension method calls
using System.Collections.Generic;
using System.LINQ;
using System.Text;
namespace ExtensionMethods2
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("====== A Quick Extension Method Example ======");
int myInteger = 2;
// Extension method called as if it were an instance method on the type int
Console.WriteLine("Doubling " + myInteger + " gives " + myInteger.DoubleMe());
// Now let's chain the results.
// Chaining can be done because DoubleMe() returns an int. And
// remember that extension methods are called as if they were instance methods
// on the type they're extending. So, each part of the expression below that
// is to the left of the dot is an int and each DoubleMe call is an
// extension method call on an int.
// So, a is an int. a.DoubleMe() returns an int, so a.DoubleMe().DoubleMe()
// works just fine. And so on.
int c = myInteger.DoubleMe().DoubleMe().DoubleMe();
Console.WriteLine("3 DoubleMe's chained : " + c);
// Here's a call to a regular static method, not an extension method,
// just to show that you can't chain regular static methods
int d = DoubleMeStatic(2);
Console.WriteLine("DoubleMeStatic - can't chain a regular static method: " + d);
// But you can't call it in a chained fashion
//int d = DoubleMeStatic(2).DoubleMeStatic(2).
//*********************************************************//
// Code to play with LINQ query style language and how the compiler
// transforms it into regrular looking C# code, switching
// out the query words like select to the extention method names
// like, IEnumerable.Select.
//*********************************************************//
Console.WriteLine
("\r\n====== LINQ Query Syntax Transformed into Extension Methods ======");
// First, we need some data to work with.
// Near the bottom of this listing you'll find a Person class
// with a GetPeople() method that returns a List of Person
// objects.
//
// The List class implements IEnumerable<T>, as you can see here:
// public class List<T> : IList<T>, ICollection<T>,
// IEnumerable<T>, IList, ICollection, IEnumerable
// Implementeing IEnumerable is important because it's IEnumerable
// that has has all the LINQ extension methods.
// Before we get to the extension methods however, lets just look at some
// basic query syntax.
// First, get the data we'll be working with
List<Person> people = Person.GetPeople();
// Now we need an example query to start off with.
// Here, we're grabbing all the person objects from the people List
IEnumerable<Person> allPeople = from p in people
select p;
// List the people with a for loop
Console.WriteLine("\r\n----- All People -----");
foreach (Person p in allPeople)
{
Console.WriteLine(p.ToString());
}
// Var
var allPeople2 = from p in people
select p;
Console.WriteLine("\r\n----- All People with var -----");
foreach (var p in allPeople2)
{
Console.WriteLine(p.ToString());
}
// No big deal, right? I don't even like the looks of var in statically
// typed code. BUT wait...
// Here's why var comes in handy - when you're returning
// some brand new object from the query, maybe something you
// derive from the Person object.
var results = from p in people
select new { Age_Doubled = (p.Age * 2) };
Console.WriteLine("\r\n----- List of doubled ages -----");
foreach (var x in results)
{
Console.WriteLine(x.ToString());
}
// So much for var. Back to some more query syntax.
// Now I'll throw in a where.
IEnumerable<Person> olderPeople = from p in people
where p.Age > 70
select p;
Console.WriteLine("\r\n----- Older People -----");
foreach (Person p in olderPeople)
{
Console.WriteLine(p.ToString());
}
// The next bit of code looks like the query syntax code but instead of
// query syntax it's actually using extension method calls.
// You have to look closely but you'll see that there are dot operators
// in from of the extension method names
// and the method names begin with capital letters.
// (for eaxmple, .Where, as opposed to where).
// The code with the extension method calls and the code using query syntax
// do exactly the same thing. In fact, the compiler transforms
// query syntax into normal extension method calls, so the normal code
// saves the compiler a pass.
// When the code uses extension method calls, you can see how the results
// are chained. Just like I showed with the
// DoubleMe extension method above.
// I used multiple lines but you could take out the CRLFs and put it all on
// one line: people.Where(p => p.Age > 70).Select(p => p)
IEnumerable<Person> olderPeople2 = people
.Where(p => p.Age > 70)
.Select(p => p);
// using an anonymous method
//.Select( delegate (Person p) { return p; } );
Console.WriteLine("\r\n----- Older People 2 Chained Extension methods on IEnumerable -----");
foreach (Person p in olderPeople2)
{
Console.WriteLine(p.ToString());
}
// BTW, IQueryable<T>, another interface in the System.LINQ space, also
// has these LINQ extension methods, but IQueryable is intended for use with
// LINQ to SQL or other LINQ queries that are converted into expression trees.
// That's another whole world that I don't understand very well yet. Again,
// this article is about LINQ to Objects, not LINQ to SQL.
Console.ReadKey();
}
// A non-extension method version of DoubleMe, just to contrast with an
// extension method.
public static int DoubleMeStatic(int i)
{
return (i * 2);
}
}
// A class for extension methods on the int data type.
// Used to illustrate basic extension method use to extend
// a type.
// An extension method must be static.
// The first param is the type that is being extended, preceeded this the keyword this.
// You don't send a param that corresponds to the first param when calling the method.
// An extension method call looks like an instance method call, so in this case :
// int i; int ii = i.DoubleMe();
public static class IntExtensions
{
public static int DoubleMe(this int i)
{
return (i * 2);
}
}
// I put this here because it's my favorite extension method, or one I use the most.
// I got it from a post on StackOverflow. A great example of how useful extension
// are when used to just extend an existing class, in this case, the Object class,
// which covers a lot of ground. Bye, bye null reference exceptions.
public static class ObjectExtensions
{
public static string NullSafeToString(this object obj)
{
return obj != null ? obj.ToString() : String.Empty;
}
}
// Person Class, used as a data source for the LINQ expressions to work on
public class Person
{
// Auto props (Automatically implemented properties
public string Name { get; set; }
public int Age { get; set; }
public static List<Person> GetPeople()
{
return new List<Person>
{
// Property Based initialization
new Person { Name="Mary", Age = 50},
new Person { Name="Ken", Age = 60},
new Person { Name="Fred", Age = 72},
new Person { Name="Marge", Age = 83}
};
}
public override string ToString()
{
return string.Format("{0}, Age: {1}", Name, Age);
}
}
}