C# 2.0 has support for closures.
This article discusses the implementation of the closures in the C# language and how this has been done using pure compiler magic without any CLR changes. I had previously mentioned C# closures in an earlier blog entry and had linked to Antonio Cisternino’s blog entry:
Closures in CLR 2.0
This is an interesting entry and worth a read. Unfortunately, IMO, Mr. Cisternino’s entry is a not correctly titled. The support for closures is a C# only thing not a CLR level addition.
I intend to start off where his blog entry ends. Here I shall look into how it is done.
Features such as closures have been taken to legendary fame in languages like scheme and ruby. Recent language developments like Groovy also ship with closure constructs.
Microsoft, for some reason, has been calling the new feature of the language as anonymous methods. I don’t know why this feature hasn’t been publicly been spoken of as closure or lambda support in the C# language. Maybe there are subtle differences in the theoretical definition of closures and what the C# language achieves.
That said – let’s jump into the matter.
This is C# code that shows off a closure:
Code Showing One Anonymous Method / Closure that shares variables with its scope
//closure.cs
//Compile: csc closure.cs
using System;
class CMain
{
delegate void Closure(int n);
static Closure CreateClosure()
int c = 0;
return delegate(int n) {
Console.WriteLine("Closure: n = {0}, c = {1}",n,c++);
};
}
static void Main()
Closure c1 = CreateClosure();
Closure c2 = CreateClosure();
c1(1);
c2(2);
This code is very similar to the groovy snippet I had posted sometime back. When executed, this is the output:
> closure.exe
Closure: n = 1, c = 0
Closure: n = 1, c = 1
Closure: n = 1, c = 2
Closure: n = 2, c = 0
Closure: n = 2, c = 1
Closure: n = 2, c = 2
If you don’t know what closures are about, the easiest way to appreciate them is to take a careful look at the C# code above. Notice the function CreateClosure() is returning a delegate to a block of code that is part of the function itself. (Normally you would create a delegate to a function). If you don’t understand what a delegate is, let it be. What you need to understand is that the function returns a block of code that is a part of the function.
The block of code accepts an integer parameter. Also, you will notice that the local variable of the function (variable ‘c’) is being used in the code block.
When the block of code is returned to the caller (in this case main), the Main() can invoke the variable that represents the block of code, causing the code to run.
Once upon a time you could create a delegate to an independent function only and not to a code block within a function. A delegate to a function had to have the same signature as the function. The delegate could be thought of as an old C style function pointer. When the delegate is invoked, the function pointed to is called. Delegates came with the additional merit that they were type safe function pointers – most people did not think much more than that about delegates.
Now it is a little different. While the anonymous method within the CreateClosure() method actually looks like it simply has a nested function that does not have a name explicitly provided its not that simple. You might venture to guess that the compiler actually goes on to create a new method (by extracting this code block) and simply creates a delegate to the new function, the same way delegates once used to behave.
However, notice that the code block uses variable ‘c’ that is defined locally to the enclosing function. If this code block is going to be carved out of the enclosing function, how can it access variable ‘c’? Better yet, take a closer look at the output.
It looks like a closure returned from a call to CreateClosure() is seems to remember its value of the variable ‘c’. Some how, the state of the function CreateClosure() is captured in the delegate/closure that it returns. So much so, that the state of two invocations of CreateClosure() seemed to be maintained independent of each other.
This is in violation of the way simple C like functions work, where the function state is stored on the stack and the functions stack frame is torn down from the stack and state is lost when the functions return. (Refer ‘The Big Deal about Iterators’)
Functions that return closures seemed maintain state even after they have returned. The closure object is maintains a reference to that state. This requirement of maintaining is similar to the implementation of iterators (if you give it some thought).
Implementation
This is what our closure.cs looks like under ILDASM.
Notice the new class called __LocalsDisplayClass$0…1 that has been created. This is interesting culprit.
In essence how closures work in C# 2.0 is that the compiler creates a new class that contains member variables correspond to the local variables of CreateClosure() that are being used in the closure/anonymous method that it defines. Thus you can see the local variable ‘c’ of method CreateClosure() can be seen in class __LocalsDisplayClass$0…1.
So calling the CreateClosure method creates an instance of the __Locals… class. It then creates an old (classical) delegate to the __AnonymousMethod$0… method of the class. So the actual support for delegates in the CLR hasn’t changed at all. The delegate that is returned from the CreateClosure() method is a normal (old C# 1.x type) delegate.
All access to variables that are shared between the CreateClosure() method and the anaymous method are accesses to members of the __Locals… class.
Here is IL code of CreateClosure()
.method private hidebysig static class CMain/Closure
CreateClosure() cil managed
// Code size 30 (0x1e)
.maxstack 3
.locals init (class CMain/__LocalsDisplayClass$00000001 V_0,
class CMain/Closure V_1)
IL_0000: newobj instance void CMain/__LocalsDisplayClass$00000001::.ctor()
IL_0005: stloc.0
IL_0006: ldloc.0
IL_0007: ldc.i4.0
IL_0008: stfld int32 CMain/__LocalsDisplayClass$00000001::c
IL_000d: ldloc.0
IL_000e: ldftn instance void CMain/__LocalsDisplayClass$00000001::__AnonymousMethod$00000000(int32)
IL_0014: newobj instance void CMain/Closure::.ctor(object,
native int)
IL_0019: stloc.1
IL_001a: br.s IL_001c
IL_001c: ldloc.1
IL_001d: ret
} // end of method CMain::CreateClosure
Notice some things in the above code
- An object of __LocalsDisplayClass$00000001 is created. - Access to the variable is actually access to the member ‘c’ of this class. - The delegate is created to the instance of the __LocalsDisplayClass$00000001 class that was created and its __AnonymousMethod$00000000 method.
This is the code of the anonymous method, which has now become CMain/__LocalsDisplayClass$00000001::__AnonymousMethod$00000000(int32)
.method public hidebysig instance void __AnonymousMethod$00000000(int32 n) cil managed
// Code size 39 (0x27)
.maxstack 5
.locals init (int32 V_0)
IL_0000: ldstr "Closure: n = {0}, c = {1}"
IL_0005: ldarg.1
IL_0006: box [mscorlib]System.Int32
IL_000b: ldarg.0
IL_000c: dup
IL_000d: ldfld int32 CMain/__LocalsDisplayClass$00000001::c
IL_0012: dup
IL_0013: stloc.0
IL_0014: ldc.i4.1
IL_0015: add
IL_0016: stfld int32 CMain/__LocalsDisplayClass$00000001::c
IL_001b: ldloc.0
IL_001c: box [mscorlib]System.Int32
IL_0021: call void [mscorlib]System.Console::WriteLine(string,
object,
object)
IL_0026: ret
} // end of method __LocalsDisplayClass$00000001::__AnonymousMethod$00000000
This is quite exactly the code that we had written into the CreateClosure() function. Except that the local variable ‘c’ is not a local variable any more.
What do you think will happen when there is a function that defines two anonymous methods within it?
Code Showing Multiple Anonymous Methods / Closures that share variables with its scope
//closure2.cs
//Compile: csc closure2.cs
static Closure t1,t2;
static void CreateClosure()
int c1 = 0;
int c2 = 0;
int c3 = 0;
int c4 = 0;
Console.WriteLine("c4 = {0}",c4);
t1 = delegate(int n) {
Console.WriteLine("Closure: n={0}, c1={1}, c2={2}",
n,c1++,c2++);
t2 = delegate(int n) {
Console.WriteLine("Closure: n={0}, c1={1}, c3={2}",
n,c1++,c3++);
CreateClosure();
This is what happens:
Notice:
- There is still only one class that maintains state.- The class has two methods (one for each of the anonymous methods) - The variables that are being used by either of methods are part of the class (c1,c2, c3). - Variables that are not being used from either closure are not part of the class (c4 is omitted).
One final look, what if the closure does not use any variables of its enclosing scope, how would this work?
Code Showing an Anonymous Method / Closure that is stateless
//closure3.cs
//Compile: csc closure3.cs
Console.WriteLine("Closure: n = {0}",n);
The code generated looks like this:
As expected there is NO hidden class generated in this case. Why? Because these is no need for the anonymous method to save state. The anonymous method itself is added to the class that contains the CreateClosure().
One more case to look at: what happens when the closure accesses both a local variable of the enclosing method as well as a member variable of the class that the enclosing method is a part of? Because state is persisted in a new class instance, how will the member variables of the original class be accessed?
This is C# code that shows this situation:
Code where closure accesses locals and class members
//closure4.cs
//Compile: csc closure4.cs
int a = 10;
Closure CreateClosure()
Console.WriteLine("Closure: n = {0}, a = {1}, c = {0}",n,a++,c++);
CMain main = new CMain();
Closure c1 = main.CreateClosure();
Closure c2 = main.CreateClosure();
This is what the generated code looks like:
Notice that there is a member in the _Locals… class that is called < this >. The this pointer/reference refers back to the original enclosing class where the anonymous method belonged. Thus it can access member variables.
Notes
I think C# closures were created as a by product of trying to implement iterators into the language. The implementation of iterators involved this sort of temporary class creation that preserved state of functions. (A little like Bjarne Stroutrup’s Function Objects).
I don’t know if there are reasons why this implementation of anonymous methods cannot be called as proper closures. Anonymous methods cannot use any ref or out type parameters of its enclosing function – this is a limitation. Does this limitation imply that they cannot be called closures? I don’t know. But other than that, it seems to serve the purpose of closures pretty well.
The fact that closures have been implemented as a compiler level hack means that there is no overhead at all for code that does not take advantage of these sort of feature. So there is no performance penalty on the CLR itself. Maybe in time, when these features gain adequate popularity there will also exist efficient means of integrating these features into the CLR in a way that does not affect functioning of traditional code.
Once we have closures, we can implement a large variety of constructs in the language (iterators being one of them). However since the syntax is a little clunky and the actual implementation a little slow (because there is a hidden managed class involved) this may never happen. All the same this is one amazing feature to have in a mainstream language like C#. Maybe even a little too advanced for some folk, so don’t be surprised is the coding guidelines of your company say NO to anonymous methods, the way they said to C++ macros once upon a time.
Powered by: newtelligence dasBlog 2.0.7226.0
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2010, Roshan James
E-mail