Thursday, June 03, 2004

Antonio has a post about generalizing environment classes (that capture state of a closure) with using generic types, such that the class itself captures only the arity of the environment.

 

To quote:

http://rotor.di.unipi.it/cisterni/Lists/My%20Blog/DispForm.aspx?ID=15

In general we may think that the compiler generates several environment classes, for the needed arities; for instance:

 

class Environment3<A, B, C> {

  A a;

  B b;

  C c;

}

 

There is one issue that I can think of, off the top of my head – the CLR has a an excellent approach to generics (among the best I have seen) and is well described in Don Syme’s paper here:

The Design and Implementation of Generics for the .NET Common Language Runtime

http://research.microsoft.com/projects/clrgen/generics.pdf

 

the thing about CLR generics is that they are very efficient for all reference types, because for reference types there is no specialization of templating behavior (as with classical c++ style generics). All reference types use the class definition during runtime.

 

However value types cause the runtime to generate specialized classes to handle type of a value type that is used is a templated entity. Classes definitions are shared by value types only when they share the same foot print with respect to the GC.

 

So it would be better is compiler actually generated specialized classes to hold environment state whenever is knows that the types the environment needs to hold are value types. This simply provides for a performance benefit, because the specialization of the class will not happen at runtime, instead will be done at compile time.

 

 

Antonio also discusses a private member access issue – again I don’t think I fully get him. Assuming the new delegate mechanism is in place we could have classes that look like this

 

class Env

{

        //have only public members

}

 

class Foo

{

        //original method

        void bar()

        {

        }

       

        //anonymous compiler generated method

        void anon_bar()

        {

                //access all members of Foo here

                //access all public members of Env here

        }

}

 

Is there a need to make members to Env private? The entire point of having Env is simply to act as a place holder for some values. Better yet (I don’t know if the old friend method mechanism works), but if friend decls are possible then the anonymous method can be declared as a friend in class Env. This does not add to the class definition in any way, it would simply allow for member access.

 

 

You might want to look at these links to follow the sequence of these posts –

1)       Closures in CLR 2.0

2)       Implementation of Closures (Anonymous Methods) in C# 2.0 (Part 6)

3)       More on CLR 2.0 closures

4)       Closure implementation enhancement in CLR 2.0 using the new delegate mechanism

5)       Again on closures

6)       this post

Thursday, June 03, 2004 12:58:34 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Wednesday, June 02, 2004

This is a tremendously exciting time to be thinking about programming languages and language research. I recently I have come across a lot of material that has made me think a lot.

 

Polyphonic C#

http://research.microsoft.com/%7Enick/polyphony/

            This is a C# like language that is built for concurrency control. Amazing piece of thought exercise there. I recommend looking at Modern Concurrency Abstraction for C#.

 

Xen/X# from Microsoft Research

Xen basically proposes to extend the C# language to better data handling support into the language. 

Unifying Tables Objects and Documents

This should give you a good idea of X#. This is by Erik Meijer of MSR.

Programming with Circles, Triangles and Rectangles

More – interesting reading.

 

C Omega

http://research.microsoft.com/Comega/

A combination of Xen and Polyphonic C#.

You might want to download this ppt that discusses C Omega by none other than Damian Watkins of MSR.

 

Groovy

http://groovy.codehaus.org/

            Groovy is the Ruby like language for the JVM. Ruby itself takes from the power of dynamic object oriented-ness that was so characteristic of smalltalk and whips a powerful expressive language on it. The thing is that Groovy also builds in Xen like concepts.

You might want to download this ppt that discusses Groovy by James Strachan co-author of groovy.

 

Self

http://research.sun.com/self/language.html

An old language from Sun Microsystems. Reading up about self makes you appreciate the spirit of many message passing and prototypes and cloning based pure object oriented systems.

 

 

 

I am not mentioning functional languages here because it is not fair to put up stuff I have no clue about. I must say this is an amazing time to be interested in programming languages.

 

Wednesday, June 02, 2004 7:35:22 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Friday, May 28, 2004

I had dropped a link to Antonio Cisternino’s blog entry about Closure support in CLR 2.0 when I was writing a description of how the C# compiler (Whidbey release) implements closures.

 

I was completely wrong about what Antonio’s blog entry was about. For some reason, my head being so full of C# that it is, I assumed he was talking about the how closures were implemented in C# for the Whidbey release. Closures in the Whidbey release (Visual Studio 2005) are officially called anonymous methods. So apologies Antonio, I had missed the point.

 

That said, what Antonio was referring to was a subtle change in the implementation of the delegates in CLR 2.0 so that the future CLR can be better used to implement closures and support functional languages and constructs. I exchanged some mails with him and I think I get what he was talking about.

 

Before I go any further I would recommend you reading Antonio’s original entry

Closures in CLR 2.0, my entry about the implementation of closures in C# 2.0 using delegates as they are available in CLR 1.x and Antonio’s follow-up entry that speaks of the distinguishing aspect of the new delegate implementation: More about Closures in CLR 2.0

 

The following are derived from mail exchanges with Antonio.

 

A delegate can be thought of as an entity that holds a pointer to a function and optionally a reference to the object for which the function is supposed to be an instance function. In CLR 2.0 a simple and subtle change has taken place where in the constraint that the instance pointer has to refer to the object for which the function pointer is member function, has been removed. The function pointer and the object reference don’t have to be related to each other.

 

In functional languages delegates are often implemented by having a pointer to a function which is the block of code that belongs to the closure and a pointer to an instance of the environment. The environment is the entity that holds the state for the code in the closure block.

 

In C# 2.0 the environment object is implemented by creating a class for every function that wraps a closure and shares state with it. Such a class is generated name-mangled as __LocalsDisplayClassXXX in C# 2.0.

 

The code that is part of the closure is now part of the environment class. When a closure is created an instance of the environment class is created and a delegate is returned to the instance and its member function that wraps the closure code. This is what could be done with CLR 1.x delegates as the function had to be a member of the environment class.

 

With CLR 2.0 delegates, what the compiler writers now have the option of doing is that they can generate a common environment class for all functions that wrap closures that need to capture similar information about its environment.

 

Here is an example and some notes courtesy of Antonio:

 

Closures in functional programming languages are often implemented with pointers pairs (env, func) where env is the pointer to the environment of the closure, and func is a pointer to a function whose first argument will be env.

 

With CLR 1.0 this cannot be achieved because delegates are pairs but with an additional constraint: func should be declared in the class of env.

The problem with this additional constraint on delegates is that you tend to define a class for each closure you make. In a functional programming language you'll pay a significant overhead because for each closure you have to introduce a private class with the environment and the code.

 

Besides removing that constraint (as it has been done in CLR 2.0) you can define a class with plenty of static methods, one per closure and define a type for each possible environment. This reduces the number of types you need to have.

 

For instance:

 

void foo() {

 string s;

 Cmd d = { Console.WriteLine(s); }

 //...

}

 

void baz() {

 string s;

 Cmd d = { Console.WriteLine("Hello {0}", s); }

 //...

}

 

The current compiler generates two classes both having a single int field.

With CLR 2.0 the compiler could use a single class as follows:

 

class Env_String {

 string f;

}

 

void f(Env_String env) { Console.WriteLine(env.s); }

void g(Env_String env) { Console.WriteLine("Hello {0}", env.s); }

 

void foo {

 Env_String env = new Env_String();

 Cmd d = (Cmd)Delegate.CreateDelegate(typeof(Cmd), env,

GetType().GetMethod("f"));

 //...

}

 

void baz {

 Env_String env = new Env_String();

 Cmd d = (Cmd)Delegate.CreateDelegate(typeof(Cmd), env,

GetType().GetMethod("g"));

 //...

}

 

In the above case the environment class need be only one, as in both the closures, the environment needs to capture the state of only a string variable. The environment class itself can be kept free of the closure specific code and the methods that are generated for the closure can be placed in the class that the original enclosing method was a part of.

 

This subtle thing had escaped my thinking for sometime. I guess when you are doing your PhD with a university that hosts one of the worlds largest .Net user-groups and are working with matters related to MSR Cambridge, you tend to pick up subtle things a lot easier. :-)

 

There is one thing that has me thinking is about the implementation of closures in the case of closures being defined inside instance methods. If you refer here, I show a screen shot of an ildasm of that case.

 

When an instance function is being used the environment will have to have a ‘this’ pointer member that the closure block can access to access any class data members. When the C# compiler is generating one class per function wrapping a closure the ‘this’ can be statically typed to the type of the class that contained the parent method.

 

 

If the environment class is to be shared across multiple classes that implement closures, then what will be the type of the ‘this’ pointer?

 

I am guessing here, but I would expect that they might choose to create the environment class as generic class that is type independent on the ‘this’ pointer. Do you think that sounds right? What are the possible fallouts with that approach? Generic environment classes.

 

As a matter of fact when you dive into the possibility that generics opens up, it’s rather interesting. We could have the entire environment as a class that contains templated / generic types for every reference type member. This might be efficient to implement because the implementation of generics in the .Net framework does not involve specialization of the runtime class for reference types. Even for value types, I believe it does optimizations to avoid class duplication if the value types have similar footprints as far as the GC is concerned.

 

One thing is for sure, the future looks interesting for functional and dynamic languages leveraging the CLR.

 

Among other things, I am looking forward to the MVP India summit that should be happening this weekend 28th May to 31st May.

 

 

Friday, May 28, 2004 1:16:56 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Sunday, May 23, 2004

How many of us recognize the name of this man?

 

Mitch Kapor was the founder of the Lotus corporation. He was the man who designed the Lotus 1-2-3. If you know you history, Lotus was one of the only large applications companies that was a serious challenger for Microsoft in its early years. There were years spent over the battle for the spreadsheet that was fought on both the old Mac as well as the old DOS machines.

 

Microsoft’s offering those days were called Multiplan. Multiplan was fairly beat by Lotus 123 in almost all fronts. Microsoft eventually thought through their faults and strengths and eventually released Excel – the spreadsheet battle was over.

 

Mitch Kapor himself, is one person I think of as being fairly amazing.

 

He was co founder of the EFF, the Electronic Frontier Foundation along with John Perry Barlow. The EFF was the organization that for the first time stood up for hackers rights and digital rights. This was of significant and epic proportions in the early 90s when hacker arrests and crackdowns were gaining a witch-hunt like momentum.

 

“The EFF is a non-profit civil liberties organization working in the public interest to protect privacy, free expression, and access to public resources and information online, as well as to promote responsibility in new media.”

 

The EFF was the organization that for the first time took the American Secret Service to court over the ruling and prosecution of the ‘hacker’ Knight Lightning. The EFF won and it literally brought the end of an era about how people of ‘hackers’ and the rules for information security.

 

I highly recommend reading this book called the Hacker Crackdown by Bruce Sterling. The book reflects the ethos of a time when the parameters of information security were very different from how we think of them now. Considering the license of the book, what it intends to convey and what I hope it may change about your thinking, I would recommend downloading a softcopy of the book.

 

Today I happened to come across Mitchell Kapor’s website and blog.

Website: http://www.kei.com/homepages/mkapor/

Blog: http://blogs.osafoundation.org/mitch/

 

I found this entry, right on top and I couldn’t help smiling:

 

May 09, 2004

Now I'm Mad

 

Some idiot Atkins Diet spammer just posted 53 bogus comments in this blog. I'm disabling comments (globally) shortly and figuring out if there's any recourse.

 

They don't know it yet, but they picked the wrong person to do this to.

 

 

Sunday, May 23, 2004 8:24:41 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Saturday, May 22, 2004

C# 2.0 has support for closures.

 

This article discusses the implementation of the closures in the C# language and how this has been done using pure compiler magic without any CLR changes. I had previously mentioned C# closures in an earlier blog entry and had linked to Antonio Cisternino’s blog entry:

Closures in CLR 2.0

This is an interesting entry and worth a read. Unfortunately, IMO, Mr. Cisternino’s entry is a not correctly titled. The support for closures is a C# only thing not a CLR level addition.

 

I intend to start off where his blog entry ends. Here I shall look into how it is done.

 

Features such as closures have been taken to legendary fame in languages like scheme and ruby. Recent language developments like Groovy also ship with closure constructs.

 

Microsoft, for some reason, has been calling the new feature of the language as anonymous methods. I don’t know why this feature hasn’t been publicly been spoken of as closure or lambda support in the C# language. Maybe there are subtle differences in the theoretical definition of closures and what the C# language achieves.

 

That said – let’s jump into the matter.

This is C# code that shows off a closure:

 

Code Showing One Anonymous Method / Closure that shares variables with its scope

//closure.cs

//Compile: csc closure.cs

using System;

 

class CMain

{

      delegate void Closure(int n);

     

      static Closure CreateClosure()

      {

            int c = 0;

            return delegate(int n) {

                  Console.WriteLine("Closure: n = {0}, c = {1}",n,c++);

            };

      }

     

      static void Main()

      {

            Closure c1 = CreateClosure();

            Closure c2 = CreateClosure();

            c1(1);

            c1(1);

            c1(1);

            c2(2);

            c2(2);

            c2(2);

      }

}

 

This code is very similar to the groovy snippet I had posted sometime back. When executed, this is the output:

 

> closure.exe

Closure: n = 1, c = 0

Closure: n = 1, c = 1

Closure: n = 1, c = 2

Closure: n = 2, c = 0

Closure: n = 2, c = 1

Closure: n = 2, c = 2

 

If you don’t know what closures are about, the easiest way to appreciate them is to take a careful look at the C# code above. Notice the function CreateClosure() is returning a delegate to a block of code that is part of the function itself. (Normally you would create a delegate to a function). If you don’t understand what a delegate is, let it be. What you need to understand is that the function returns a block of code that is a part of the function.

 

The block of code accepts an integer parameter. Also, you will notice that the local variable of the function (variable ‘c’) is being used in the code block.

 

When the block of code is returned to the caller (in this case main), the Main() can invoke the variable that represents the block of code, causing the code to run.

 

Once upon a time you could create a delegate to an independent function only and not to a code block within a function. A delegate to a function had to have the same signature as the function. The delegate could be thought of as an old C style function pointer. When the delegate is invoked, the function pointed to is called. Delegates came with the additional merit that they were type safe function pointers – most people did not think much more than that about delegates.

 

Now it is a little different. While the anonymous method within the CreateClosure() method actually looks like it simply has a nested function that does not have a name explicitly provided its not that simple. You might venture to guess that the compiler actually goes on to create a new method (by extracting this code block) and simply creates a delegate to the new function, the same way delegates once used to behave.

 

However, notice that the code block uses variable ‘c’ that is defined locally to the enclosing function. If this code block is going to be carved out of the enclosing function, how can it access variable ‘c’? Better yet, take a closer look at the output.

 

It looks like a closure returned from a call to CreateClosure() is seems to remember its value of the variable ‘c’. Some how, the state of the function CreateClosure() is captured in the delegate/closure that it returns. So much so, that the state of two invocations of CreateClosure() seemed to be maintained independent of each other.

 

This is in violation of the way simple C like functions work, where the function state is stored on the stack and the functions stack frame is torn down from the stack and state is lost when the functions return. (Refer ‘The Big Deal about Iterators’)

 

Functions that return closures seemed maintain state even after they have returned. The closure object is maintains a reference to that state. This requirement of maintaining is similar to the implementation of iterators (if you give it some thought).

 

Implementation

This is what our closure.cs looks like under ILDASM.

 

 

Notice the new class called __LocalsDisplayClass$0…1 that has been created. This is interesting culprit.

 

In essence how closures work in C# 2.0  is that the compiler creates a new class that contains member variables correspond to the local variables of CreateClosure() that are being used in the closure/anonymous method that it defines. Thus you can see the local variable ‘c’ of method CreateClosure() can be seen in class __LocalsDisplayClass$0…1.

 

So calling the CreateClosure method creates an instance of the __Locals… class. It then creates an old (classical) delegate to the __AnonymousMethod$0… method of the class. So the actual support for delegates in the CLR hasn’t changed at all. The delegate that is returned from the CreateClosure() method is a normal (old C# 1.x type) delegate.

 

All access to variables that are shared between the CreateClosure() method and the anaymous method are accesses to members of the __Locals… class.

 

Here is IL code of CreateClosure()

 

.method private hidebysig static class CMain/Closure

        CreateClosure() cil managed

{

  // Code size       30 (0x1e)

  .maxstack  3

  .locals init (class CMain/__LocalsDisplayClass$00000001 V_0,

           class CMain/Closure V_1)

  IL_0000:  newobj     instance void CMain/__LocalsDisplayClass$00000001::.ctor()

  IL_0005:  stloc.0

  IL_0006:  ldloc.0

  IL_0007:  ldc.i4.0

  IL_0008:  stfld      int32 CMain/__LocalsDisplayClass$00000001::c

  IL_000d:  ldloc.0

  IL_000e:  ldftn      instance void CMain/__LocalsDisplayClass$00000001::__AnonymousMethod$00000000(int32)

  IL_0014:  newobj     instance void CMain/Closure::.ctor(object,

                                                          native int)

  IL_0019:  stloc.1

  IL_001a:  br.s       IL_001c

  IL_001c:  ldloc.1

  IL_001d:  ret

} // end of method CMain::CreateClosure

 

Notice some things in the above code

- An object of __LocalsDisplayClass$00000001 is created.
- Access to the variable is actually access to the member ‘c’ of this class.
- The delegate is created to the instance of the __LocalsDisplayClass$00000001 class that was created and its __AnonymousMethod$00000000 method.

 

This is the code of the anonymous method, which has now become CMain/__LocalsDisplayClass$00000001::__AnonymousMethod$00000000(int32)

 

.method public hidebysig instance void  __AnonymousMethod$00000000(int32 n) cil managed

{

  // Code size       39 (0x27)

  .maxstack  5

  .locals init (int32 V_0)

  IL_0000:  ldstr      "Closure: n = {0}, c = {1}"

  IL_0005:  ldarg.1

  IL_0006:  box        [mscorlib]System.Int32

  IL_000b:  ldarg.0

  IL_000c:  dup

  IL_000d:  ldfld      int32 CMain/__LocalsDisplayClass$00000001::c

  IL_0012:  dup

  IL_0013:  stloc.0

  IL_0014:  ldc.i4.1

  IL_0015:  add

  IL_0016:  stfld      int32 CMain/__LocalsDisplayClass$00000001::c

  IL_001b:  ldloc.0

  IL_001c:  box        [mscorlib]System.Int32

  IL_0021:  call       void [mscorlib]System.Console::WriteLine(string,

                                                                object,

                                                                object)

  IL_0026:  ret

} // end of method __LocalsDisplayClass$00000001::__AnonymousMethod$00000000

 

This is quite exactly the code that we had written into the CreateClosure() function. Except that the local variable ‘c’ is not a local variable any more.

 

What do you think will happen when there is a function that defines two anonymous methods within it?

 

Code Showing Multiple Anonymous Methods / Closures that share variables with its scope

//closure2.cs

//Compile: csc closure2.cs

using System;

 

class CMain

{

       delegate void Closure(int n);

       static Closure t1,t2;

      

       static void CreateClosure()

       {

              int c1 = 0;

              int c2 = 0;

              int c3 = 0;

              int c4 = 0;

              Console.WriteLine("c4 = {0}",c4);

             

              t1 =  delegate(int n) {

                     Console.WriteLine("Closure: n={0}, c1={1}, c2={2}",

                                n,c1++,c2++);

              };

              t2 = delegate(int n) {

                     Console.WriteLine("Closure: n={0}, c1={1}, c3={2}",

                                n,c1++,c3++);

              };

       }

      

       static void Main()

       {

              CreateClosure();

       }

}

 

 

This is what happens:

 

 

Notice:

- There is still only one class that maintains state.
- The class has two methods (one for each of the anonymous methods)
- The variables that are being used by either of methods are part of the class (c1,c2, c3).
- Variables that are not being used from either closure are not part of the class (c4 is omitted).

 

One final look, what if the closure does not use any variables of its enclosing scope, how would this work?

 

Code Showing an Anonymous Method / Closure that is stateless

//closure3.cs

//Compile: csc closure3.cs

using System;

 

class CMain

{

       delegate void Closure(int n);

      

       static Closure CreateClosure()

       {

              int c = 0;

              return delegate(int n) {

                     Console.WriteLine("Closure: n = {0}",n);

              };

       }

      

       static void Main()

       {

              Closure c1 = CreateClosure();

              Closure c2 = CreateClosure();

              c1(1);

              c2(2);

       }

}

 

The code generated looks like this:

 

 

As expected there is NO hidden class generated in this case. Why? Because these is no need for the anonymous method to save state. The anonymous method itself is added to the class that contains the CreateClosure().

 

 

One more case to look at: what happens when the closure accesses both a local variable of the enclosing method as well as a member variable of the class that the enclosing method is a part of? Because state is persisted in a new class instance, how will the member variables of the original class be accessed?

 

This is C# code that shows this situation:

Code where closure accesses locals and class members

//closure4.cs

//Compile: csc closure4.cs

using System;

 

class CMain

{

       delegate void Closure(int n);

       int a = 10;

      

       Closure CreateClosure()

       {

              int c = 0;

              return delegate(int n) {

                     Console.WriteLine("Closure: n = {0}, a = {1}, c = {0}",n,a++,c++);

              };

       }

      

       static void Main()

       {

              CMain main = new CMain();

              Closure c1 = main.CreateClosure();

              Closure c2 = main.CreateClosure();

              c1(1);

              c2(2);

       }

}

 

This is what the generated code looks like:

 

 

Notice that there is a member in the _Locals… class that is called < this >. The this pointer/reference refers back to the original enclosing class where the anonymous method belonged. Thus it can access member variables.

 

Notes

I think C# closures were created as a by product of trying to implement iterators into the language. The implementation of iterators involved this sort of temporary class creation that preserved state of functions. (A little like Bjarne Stroutrup’s Function Objects).

 

I don’t know if there are reasons why this implementation of anonymous methods cannot be called as proper closures. Anonymous methods cannot use any ref or out type parameters of its enclosing function – this is a limitation. Does this limitation imply that they cannot be called closures? I don’t know. But other than that, it seems to serve the purpose of closures pretty well.

 

The fact that closures have been implemented as a compiler level hack means that there is no overhead at all for code that does not take advantage of these sort of feature. So there is no performance penalty on the CLR itself. Maybe in time, when these features gain adequate popularity there will also exist efficient means of integrating these features into the CLR in a way that does not affect functioning of traditional code.

 

Once we have closures, we can implement a large variety of constructs in the language (iterators being one of them). However since the syntax is a little clunky and the actual implementation a little slow (because there is a hidden managed class involved) this may never happen. All the same this is one amazing feature to have in a mainstream language like C#. Maybe even a little too advanced for some folk, so don’t be surprised is the coding guidelines of your company say NO to anonymous methods, the way they said to C++ macros once upon a time.

 

 

Saturday, May 22, 2004 1:10:56 AM (Eastern Standard Time, UTC-05:00)  #    Comments [4]  | 
 Friday, May 21, 2004

Today I had a rather shocking realization. I realized that C# 2.0 supports closures.

 

It was rather shocking, because here I was running up and down obscure languages looking for features like this and bang C# has it. I was pointed to this blog entry by a good friend of mine at Microsoft: Antonio Cisternino's Blog: Closures in CLR 2.0.

A lot of the content on Mr Cisternino’s blog is rather interesting and I would recommend a visit to

http://rotor.di.unipi.it/cisterni/Lists/My%20Blog/AllItems.aspx

 

The entry on closures is an interesting read. A quick search on google, showed me that the rest of the world seemed to have realized that C# has closures, a long time before I did.

 

 

 

Looking at closures brought back something from hazy old memory from a time when I was more ignorant:

 

Function Objects in C++

 

What is a function object?

 

An object that in some way behaves like a function, of course. Typically, that would mean an object of a class that defines the application operator - operator().

A function object is a more general concept than a function because a function object can have state that persist across several calls (like a static local variable) and can be initialized and examined from outside the object (unlike a static local variable). For example: