Iterators in Ruby (Part - 1)
Warming up to using Iterators (Part 2)< I am yet to write a part 3 >SICP, Fiber api and ITERATORS ! (Part 4)
I had a look at the implementation of iterators in C#. What follows is based on code generator I have seen on the C# Whidbey post PDC release. Things might have changed by now as things are moving to technology preview phase.
This is example code that is present in MSDN:
// yield-example.cs
using System;
using System.Collections;
public class List
{
public static IEnumerable Power(int number, int exponent)
int counter =0;
int result = 1;
while(counter++ < exponent)
result = result * number;
yield result;
}
static void Main()
// Display powers of 2 up to the exponent 8:
foreach(int i in Power(2, 8))
Console.Write("{0} ", i);
Notice the introduction of a nice little ‘yield’ keyword? The behavior of C# iterators in this context is a lot like ruby iterators that I have been talking about in previous articles. Knowing a little about the state management requirements for iterators and the fact that the CLR is stack based, how are iterators implemented in C#?
The implementation of iterators in C# is not driven by the CLR in any way, it is completely implemented in the language as a compiler construct.
Let me explain what the compiler tries to do – the compiler examines the function that does the yield
Lets just ignore the yield statement for now and look at the method as thought it contained only a loop. It would basically look like this:
[initial code]
while([loop condition])
[loop body]
[post loop code]
This is then generated into a sequence of IL statement blocks that have the following jumps between them:
Now, what yield would require is that the method exit at each point a yield statement occurs. The next time the method is invoked, execution continues immediately past the yield statement with all variables preserving their values.
In the CLR, when a method returns its stack frame is torn down. So there is no way that the local variables can actually preserve state. The solution taken by the C# team is turn the method that implements the yield statement into a class.
Such a class would
· Have all local variables of the method as members of the class
· Have a special variable that hold the value that is being yielded.
· Have a special variable to indicate where the method should continue from, the next time it is invoked.
When the caller of such a method runs and encounters the foreach loop that invokes the iterator an object of this class gets created. This object is maintained as long as the foreach lop is running. When the loop exits the object is disposed.
That’s how iterators are implemented in C#. :)
Now here are some details:
The method that implements the iterators generates not one but two classes. Both the classes are generated as nested/inner classes to the class that contains the method. The classes are named as (method name)$(number )_IEnumerableImpl(method name)$(number)_IEnumeratorImpl
I am not sure about the exact reasoning behind the generation of two classes. The earlier standard for writing enumerators in C# probably required, but from the standpoint of implementing iterators, I don’t understand the need.
The first of these, the IEnumerableImpl simply creates an instance of the second class and returns it to the caller.
The second class IEnumeratorImpl is the interesting one. This class has data members for all the local variables as well as our two special data members.
Compare the data members (the cyan colored diamond shapes) to the original local variables of the method.
The parameters number and exponent are there as such and the local variables counter and result are there with some name mangling (I would expect this is to avoid clashes with duplicate names in nested scopes, though that is not allowed in C# (duh?)).
The two new members on the class are
· $PC
· $_current
$_current is the member that holds the yielded value. In the case of the above method, $_current holds the value of ‘result’. It is an ‘object’ type for there will be a nice boxing and gc overhead when moving around an int type – I don’t know why something was not done for special casing value types.
$PC is the interesting variable. Remember our little diagram above that showed execution through IL. In the case of the iterators, the method does not simply execute in a loop like shown there, but executes one iteration of the loop on each call. The $PC is the variable that keeps track of where the code should jump to, the next time the method is called. Understandably $PC is someone idea of program counter ;)
The code method called MoveNext() in the class actually does the work that the method power() originally did. This is what is look like in IL code.
Sorry I am not very good with diagrams, but if you look at it you will see that the code is simply built for repeated invocation. Each time according to $PC the code braches to a new location and executes. It then sets the value of $PC to a new location of entry before the function exits.
In C# the yielded value is assigned to the $_current member variable and the MoveNext() itself exits by returning a true or false. A true indicates that the method returned through a yield and the false indicates that the method has completed execution. Subsequent calls to the method, after it has returned false will simply cause the method to return false and the $_current will not be updated.
So how does the caller of an iterative method behave? In this case its the main. The caller simply does the following –
Invoke Power$00000000__IEnumerableImpl. GetEnumerator() to get an instance of Power$00000000__IEnumeratorImpl
Invoke Power$00000000__IEnumeratorImpl.MoveNext()
If result is true, use invoke Power$00000000__IEnumeratorImpl.get_Current() which will return the $_current. The foreach loop will cause the MoveNext() to be invoked again after the reurned value is consumed.
If result is false, break out of foreach loop.
After the loop breaks out the instance of Power$00000000__IEnumeratorImpl is dispose and is available for garbage collection.
I am looking forward to the beta preview to see f things have changed. There is a lot of rather redundant code generated by the compiler here that I have not mentioned. When you are reading IL you might want to skip over those parts.
Probably in the future the CLR will contain constructs that enable true iterators and closures. Present day processors don’t natively support such constructs and so implementation will have to be hacks on the C stack or using some kind of class-object mechanism like shown here.
As a foot note I would like to mention that the Python also implements iterators in a manner similar to C#. There the function that yields is also converted into a class that maintains state. I believe the MoveNext() equivalent in Python is simply next(). Python however raises an exception to signal end of iteration. C# uses a Boolean return value to indicate this.
If this topic holds your interest then I recommend reading:
Coroutines in C by Simon Tatham
C# 2.0 Create Elegant Code with Anonymous Methods, Iterators, and Partial Classes by Juval Lowy (MSDN Mag)
Charming Python: Iterators and simple generators - New constructs in Python 2.2by David Mertz (developerWorks)
Remember Me
a@href@title, strike
Powered by: newtelligence dasBlog 2.0.7226.0
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2008, Roshan James
E-mail