Submitted by BillB on 4/21/2010

(If you have a minute, please add a comment when you're done so I know if I'm presenting this well enough.)

Extension Methods and LINQ

4/21/2010

Contents

Extension methods in C# 3.0 aren't just about extending the functionality of other types, though in that role they add much to C#. They're also about LINQ where they lay beneath query language syntax and allow the piping, aka chaining, of the result of one operation into another. I learned about it from Jon Skeet's great book, C# In Depth. As I understand it, the compiler "transforms", (Skeet's word), SQL-like syntax, called query syntax, into corresponding extension method calls on whatever list of data is being worked with. (Note that I'm talking about LINQ to Objects, as opposed to LINQ to SQL.) The data must be of a type that implements the IEnumerable class, like a List for example. Then the compiler does it's normal job of compiling the regular code produced by the tansform pass.

I'll start with a quick review of extension methods and throw in the concept of fluent interfaces, (just because I thought the concept was interesting and afterall, LINQ is one). Then I'll show a little query syntax and how it's transformed into extension method calls on IEnumerable objects. This isn't a LINQ tutorial; I just wanted to demystify LINQ a bit and present a couple of interesting notions around LINQ, like how a fluent interface can be coded using extension methods. All the example code snippets in this article are repeated at the bottom in a full console app that you can copy and paste into a console app to experiment with.

Extension Method Syntax Back to Top

The type that the extension method operates on is indicated by the first parameter that follows the this keyword. So, you won't find an extension method with no parameters; you have to have at least one. Also, the method has to be in a static class that's non-generic and non-nested. Since it's in a static class, the method itself also has to be static. The return type can be anything. There are a few other requirements I won't bore you with.

Here's an extension method on the int type, which doubles the int it operates on:

public static class IntExtensions
{        
    public static int DoubleMe(this int i)
    {
        return (i * 2);
    }      
}

You call the extension method as if it were an instance method. You don't supply an argument for the first parameter as it's only there to indicate the type that the method operates on.

int myInteger = 2;
int myIntDoubled = myInteger.DoubleMe(); 
// Directly on an int
int m = 4.DoubleMe();
// Like a regular static method
int n = IntExtensions.DoubleMe(6);
Just to clarify, here's an extension method with two parameters. To call it, you supply an argument for just the second parameter, toBeAdded :
public static int AddThisToMe(this int i, int toBeAdded)
{
    return (i + toBeAdded);
}

int myIntegerPlus9 = myInteger.AddThisToMe(9);

Chained Calls and a Fluent Interface Back to Top

Remember I said that the return value on an extension method can be anything? That's true but if you want to be able to chain your extension method calls, the return value has to be of the type that the next chained call will operate on.

Since our DoubleMe method returns an int and operates on an int, calls to DoubleMe can be chained to one another:

int myIntDoubled3Times = myInteger.DoubleMe().DoubleMe().DoubleMe();

There's a concept of a fluent interface, which is code designed to resemble natural language, (English), making it easy for people to read. Let's say you want to code a system to convert lengths between different measuring systems; inches to centimeters, meters to miles, etc. Using extension methods, you might come up with something like this: (The numbers are type double, so the first dot, as in 10.0, is part of the value, not a dot operator. The dot in .inches, IS a dot operator.)

double twentyInchesInCentimeters = 20.0.inches().inCentimeters();
// or 
double tenMetersInMiles = 10.0.meters().inMiles();
It looks a little like English, reading something like, "What's 20.0 inches in centimeters." Here's a start of what you might want to code to support it. Notice you have to design things so each method returns the appropriate type for the next operation in the chain.
using System;

namespace LengthConversionExtensionMethods
{
    class Program
    {
        static void Main(string[] args)
        {
            double myNumber = 10;
            double myNumberInCentimeters = myNumber.inches().inCentimeters();
            Console.WriteLine("My number in cm, " + myNumber.inches().inCentimeters());
            // And I can do this:
            Console.WriteLine("20.5 number in cm, " + 20.5.inches().inCentimeters());

            Console.ReadKey();
        }
    }

    public struct LengthValue
    {
        public double value;
        public string units;
        public string system;
    }

    public static class LengthConversions
    {
        public static LengthValue inches(this double numInches)
        {
            LengthValue val = new LengthValue();
            val.value = numInches;
            val.system = "English";
            val.units = "inches";
            return val;
        }

        public static double inCentimeters(this LengthValue value)
        {
            double returnVal = 0;
            if (value.system == "English")
            {
                if (value.units == "inches")
                {
                    returnVal = value.value * 2.54;
                }
            }
            // blah blah.....        
           
            return returnVal;
        }
    }   
}

Kind of dumb but you get the idea. This was just an experiment for me. The point is that you can design a fluent interface achieved with the use of extension methods. That's exactly what LINQ is. But LINQ goes one step better; all those dots and parentheses detract from good fluency. Still, it's kind of cool to be able to create your own fluent interface; it's almost a language, (a Domain Specific Language, DSL), in itself. Let's look at some LINQ syntax and then we'll see how the compiler transforms it into the extension method calls found in the Enumerable class.

Query Syntax Back to Top

SQL language for relational databases is an example of a fluent interface and LINQ To Objects is designed with SQL very much in mind. Here's some SQL:

Select CustomerName From Customers Where CustomerCity = "Seattle"

It looks a lot like an English sentence. You can see why Microsoft used ANSI SQL as a model for C#'s LINQ syntax; they wanted to bridge the gap between C# and SQL, (the so-called impedance mismatch), AND provide a common language for dealing with data no matter where it's stored or in what format. Here's an example of LINQ to objects, where we're selecting from a list of person objects all the objects where the person's age is > 70.

List<Person> people = Person.GetPeople();
IEnumerable<Person> olderPeople = from person in people
                                  where person.Age > 70
                                  select person;

Console.WriteLine("\r\n----- People Going Strong -----");
foreach (Person person in olderPeople)
{
    Console.WriteLine(person.ToString());
}
Some terminology: If you know SQL, the first thing you'll notice is that the order is different from what you'd code in a SQL query. Why is from first? MSDN says that from comes first because the compiler needs a varaible declaration before the variable can be used. The range variable is what they're talking about and it's first used in the where clause, which appears after the from clause. I'll have to take their word for it. I'd like to say the order of the various clauses reflects the sequence of processing but I'd be guessing. I'm sure that's true for certain queries, including a simple from-where-select but when things start getting complicated, who knows. If I find out more, I'll update this article. Moving on to the jist of this article; how extension methods and LINQ are related.

Query Syntax Transformed into Extension Method Calls Back to Top

Here's where extension methods come in. The compiler makes a first pass during which LINQ syntax is transformed into LINQ extension method calls. Then the extension method calls created by the first pass are compiled as if you had coded them directly.

Here's an example.

List<Person> people = Person.GetPeople();
IEnumerable<Person> olderPeople = from p in people
                                  where p.Age > 70
                                  select p;
Console.WriteLine("\r\n----- Older People -----");
foreach (Person p in olderPeople)
{
    Console.WriteLine(p.ToString());
}
Here are the extension method calls that the compiler would create:
IEnumerable<Person> olderPeople2 = people
                                  .Where(p => p.Age > 70)
                                  .Select(p => p);
                                //.Select( delegate (Person p) { return p; } );
The dots and parentheses are back. Select and Where are capitalized. You could code it either way, query syntax or method calls but the query syntax looks better or is said to be more fluent or more like english. To help clarify lambda syntax a bit, I added the commented out last line to show an anonymous method argument instead of the lambda version but you'd be unlikely to see anyone coding it this way. Notice that the order of from, where and select are different from what you'd code in ANSI SQL. The extension method calls reveal the order in which the methods are being called on each member of the collection.

Time to look at a few of these extension methods and make sense of these arguments, like p => p. I have another article on delegates if you need more background.

Extension Methods on IEnumerable Back to Top

The IEnumerable interface has a bunch of LINQ extension methods which are all implemented in the public static Enumerable class. I'll only look at Where and Select, which both return an IEnumerable but there's a lot of others, most of which are reminiscent of Structured Query Language. For example you'll find about 20 overloads for Average(), each operating on a different IEnumerable<T>, from Int32's to custom objects and if the latter, it also takes a delegate used to figure out how to actually do the averaging; what MSDN calls a transform function to apply to each element. Check out the Enumerable Class.

Here's the Where extension method:

public static IEnumerable<TSource> Where<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, bool> predicate
)

It returns an IEnumerable and takes a Func(TSource, bool), which is a delegate that takes one argument and returns a bool. There's a delegate type already in .Net, System.Predicate that would work too but MSDN shows Func(TSource, bool) instead. (See this post on StackOverflow for why.) The example returns people over 70, so the predicate is a check for p.Age > 70.

Here's one override for Select():

public static IEnumerable<TResult> Select<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TResult> selector
)

It returns an IEnumerable and takes a Func(TSource, TResult), which is a delegate that takes one argument and returns something. The example returns person objects so it takes a person object and returns the same person object. If we just wanted names back, it would have been p => p.name.

Yield and Deferred Execution Back to Top

The IEnumerable<Person> olderPeople doesn't contain the results after the LINQ query statement is done. It's the foreach, using the c# 2.0 yield keyword that actually starts producing data as the query is iterated over. Deferred execution increases performance of queries on massive collections.

You can imagine that the efficiency of a query depends on the query iteself. A simple where/select will go through the collection once, returning, (yielding), a value for each one it reads in the collection. An Orderby will have to run through the whole collection before it returns anything. Often, temporary data structures are created in memory to process the query.

Anonymous Types and Var Back to Top

What if you want to return something that's not really part of the data you're querying? This is where anonymous types are handy.

If you declare an object with var the compiler will usually figure out what type it is. Here I declare the variable, allPeople using var instead of IEnumerable:

var allPeople = from p in people
                 select p;
            
foreach (var p in allPeople)
{
    Console.WriteLine(p.ToString());
}

The compiler will give allPeople, a type of IEnumerable. I don't like to use var this way because I like to be explicit, keeping with C#'s static nature but I don't think there's a real consensus. C# gets repetative when declaring new objects so I can understand the argument to use var to keep the code a little cleaner. Where var becomes very useful is when you're using an anonymous type. Here we're projecting an anonymous type, something derived from a person property; the age doubled.

var results = from p in people
              select new { Age_Doubled = (p.Age * 2) };

foreach (var thingy in results)
{
    Console.WriteLine(thingy.ToString());
}

// Output:
{ Age_Doubled = 100 }
{ Age_Doubled = 120 }

Play TimeBack to Top

Here's a little console app I started playing with to solidify the concepts for myself. You can copy and paste it into a console app and start messing with it to get a feel for this stuff. It's pretty much what you've already seen above but in a form that's easily pasted.

The code takes the following course:

  • Basic Extension method example
  • Extension methods chained
  • Basic LINQ query syntax example
  • Why var can be handy (Just threw this in for kicks)
  • Transforming the query syntax to regular C# extension method calls
using System;
using System.Collections.Generic;
using System.LINQ;
using System.Text;

namespace ExtensionMethods2
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("======  A Quick Extension Method Example  ======");
            int myInteger = 2;
            // Extension method called as if it were an instance method on the type int
            Console.WriteLine("Doubling " + myInteger + " gives " + myInteger.DoubleMe());            

            // Now let's chain the results.
            // Chaining can be done because DoubleMe() returns an int.  And
            // remember that extension methods are called as if they were instance methods
            // on the type they're extending.  So, each part of the expression below that
            // is to the left of the dot is an int and each DoubleMe call is an 
            // extension method call on an int.  
            // So, a is an int.  a.DoubleMe() returns an int, so a.DoubleMe().DoubleMe()
            // works just fine.  And so on.
            int c = myInteger.DoubleMe().DoubleMe().DoubleMe();
            Console.WriteLine("3 DoubleMe's chained : " + c);  

            // Here's a call to a regular static method, not an extension method,
            // just to show that you can't chain regular static methods
            int d = DoubleMeStatic(2);
            Console.WriteLine("DoubleMeStatic - can't chain a regular static method: " + d);
            // But you can't call it in a chained fashion
            //int d = DoubleMeStatic(2).DoubleMeStatic(2). 
                       

            //*********************************************************//
            // Code to play with LINQ query style language and how the compiler
            // transforms it into regrular looking C# code, switching
            // out the query words like select to the extention method names
            // like, IEnumerable.Select.
            //*********************************************************//
            Console.WriteLine
                ("\r\n======  LINQ Query Syntax Transformed into Extension Methods  ======");

            // First, we need some data to work with. 
            // Near the bottom of this listing you'll find a Person class
            // with a GetPeople() method that returns a List of Person
            // objects.
            // 
            // The List class implements IEnumerable<T>, as you can see here:
            //    public class List<T> : IList<T>, ICollection<T>, 
            //         IEnumerable<T>, IList, ICollection, IEnumerable
            // Implementeing IEnumerable is important because it's IEnumerable
            // that has has all the LINQ extension methods.
            // Before we get to the extension methods however, lets just look at some
            // basic query syntax.
            
            // First, get the data we'll be working with
            List<Person> people = Person.GetPeople();

            // Now we need an example query to start off with.
            // Here, we're grabbing all the person objects from the people List
            IEnumerable<Person> allPeople = from p in people
                                            select p;

            // List the people with a for loop
            Console.WriteLine("\r\n----- All People -----");
            foreach (Person p in allPeople)
            {
                Console.WriteLine(p.ToString());
            }

            // Var
            var allPeople2 = from p in people
                           select p;
            Console.WriteLine("\r\n----- All People with var -----");
            foreach (var p in allPeople2)
            {
                Console.WriteLine(p.ToString());
            }

            // No big deal, right?  I don't even like the looks of var in statically
            // typed code. BUT wait...
            // Here's why var comes in handy - when you're returning
            // some brand new object from the query, maybe something you 
            // derive from the Person object.
           
            var results = from p in people
                             select new { Age_Doubled = (p.Age * 2) };

            Console.WriteLine("\r\n----- List of doubled ages -----");
            foreach (var x in results)
            {
                Console.WriteLine(x.ToString());
            }

            // So much for var. Back to some more query syntax.
            // Now I'll throw in a where.    
            IEnumerable<Person> olderPeople = from p in people
                                              where p.Age > 70
                                              select p;
            Console.WriteLine("\r\n----- Older People -----");
            foreach (Person p in olderPeople)
            {
                Console.WriteLine(p.ToString());
            }
            
            // The next bit of code looks like the query syntax code but instead of
            // query syntax it's actually using extension method calls. 
            // You have to look closely but you'll see that there are dot operators
            // in from of the extension method names 
            //  and the method names begin with capital letters.  
            // (for eaxmple, .Where, as opposed to where).
            // The code with the extension method calls and the code using query syntax
            // do exactly the same thing.  In fact, the compiler transforms
            // query syntax into normal extension method calls, so the normal code
            // saves the compiler a pass.            
            
            // When the code uses extension method calls, you can see how the results
            // are chained.  Just like I showed with the
            // DoubleMe extension method above.
            // I used multiple lines but you could take out the CRLFs and put it all on
            // one line: people.Where(p => p.Age > 70).Select(p => p)
            IEnumerable<Person> olderPeople2 = people
                                            .Where(p => p.Age > 70)                                            
                                            .Select(p => p);
                                            // using an anonymous method
                                            //.Select( delegate (Person p) { return p; } ); 


            Console.WriteLine("\r\n----- Older People 2 Chained Extension methods on IEnumerable -----");
            foreach (Person p in olderPeople2)
            {
                Console.WriteLine(p.ToString());
            }
            
            // BTW, IQueryable<T>, another interface in the System.LINQ space, also
            // has these LINQ extension methods, but IQueryable is intended for use with
            // LINQ to SQL or other LINQ queries that are converted into expression trees.
            // That's another whole world that I don't understand very well yet.  Again,
            // this article is about LINQ to Objects, not LINQ to SQL.
            
            Console.ReadKey();
        }


        // A non-extension method version of DoubleMe, just to contrast with an
        // extension method.
        public static int DoubleMeStatic(int i)
        {
            return (i * 2);
        }       
    } 


    // A class for extension methods on the int data type.
    // Used to illustrate basic extension method use to extend
    // a type.
    // An extension method must be static.
    // The first param is the type that is being extended, preceeded this the keyword this.
    // You don't send a param that corresponds to the first param when calling the method.
    // An extension method call looks like an instance method call, so in this case :
    // int i;  int ii = i.DoubleMe();
    public static class IntExtensions
    {        
        public static int DoubleMe(this int i)
        {
            return (i * 2);
        }

    }

    // I put this here because it's my favorite extension method, or one I use the most.
    // I got it from a post on StackOverflow.  A great example of how useful extension 
    // are when used to just extend an existing class, in this case, the Object class,
    // which covers a lot of ground.  Bye, bye null reference exceptions.
    public static class ObjectExtensions
    {
        public static string NullSafeToString(this object obj)
        {            
            return obj != null ? obj.ToString() : String.Empty;
        }
    }


    // Person Class, used as a data source for the LINQ expressions to work on
    public class Person
    {
        // Auto props (Automatically implemented properties
        public string Name { get; set; }
        public int Age { get; set; }

        public static List<Person> GetPeople()
        {
            return new List<Person>
            {
                // Property Based initialization 
                new Person { Name="Mary", Age = 50},
                new Person { Name="Ken", Age = 60},
                new Person { Name="Fred", Age = 72},
                new Person { Name="Marge", Age = 83}
            };
        }

        public override string ToString()
        {
            return string.Format("{0}, Age: {1}", Name, Age);
        }
    }
    
}

Click a star

Comments

Add your comment: