Tuesday, October 11, 2005

After a long time, one missing piece in the yield (iterator) puzzle fell in place in my head. It should have been obvious because it had been staring at me in the face for a while now. I am not sure, but it seems to me at this moment, that some of this might be of lasting importance beyond the scope of this blog entry.

 

Time warp 3 or 4 years back to when Sid and I were discussing ruby iterators and python list comprehensions. Sid had this notion that there was something missing in the Python implementation that made ruby iterators more powerful. We however could not place it at that time and we let it be. And we missed it for years after that. I am going to make some claims that some languages can and cant do certain things – I might be wrong and I hope that I am – if I am do show me how and maybe we can take this forward.

 

The most powerful yield (iterator) construct that I can think of, needs all of these –

1)     the parent function that takes arguments

2)     the parent function yields values

3)     the parent function can return a one or more values finally and terminate

4)     the parameter block of the yield should be able to ‘break’

5)     the yield should be able return values into the iterator

6)     it should possible to iterate two or more iterators in an interleaved manner.

 

1 and 2 are obvious. Almost all implementations of yield do this.

 

3 means that there is a notion of the function returning a value as the exit case of the function (such that the function instance terminates and that it will not produce any more values).

 

4 means that you should be able to do this

x.each {|y| break }

 

5 means that you should be able to do this

def foo

      n = yield 5

      puts n

end

 

6 means that you should be able to do this

a = foo1()

b = foo2()

puts a.next() + b.next()

puts b.next() + a.next()

 

Now, here is the search for yield the magnificent – I don’t a language that gives me a yield that can do all of the above and I am upset.

 

C# and Python rely on transformation of iterator (the callee) into an object that models the the state machine of iterator code. They use the objects state to maintain the environment for their closure (their state machine transformed iterator function). Read more about C# 2.0 iterator implementation here. The ruby way as I understand is that the callers block is passed to the callee as a closure. Scheme does not intrinsically have a yield, but you make an equivalent by passing a closure to a callee function. I am aware that there are other languages that implement the yield – Comega and Groovy, Icon and CLU – I don’t much about these (except that I believe that Cw is similar to C# in this regard), and so I wont discuss them here. Ever since ruby popularized the construct, it has found its way into many languages.

 

All of these support 1 and 2 above.

 

C# and Python support 4 and 6. They don’t support returning values from the function and do not support sending values into the iterator. This means that you cannot write code similar to the powerful inject and other similar ‘reduce’ like constructs. Here is what the inject (courtesy of smalltalk looks like)

 

class Array

      def inject n

            each { |v| n = yield n, v }

            return n

      end

end

 

def sum ls

      v = ls.inject(0){|n, v| n + v }

      return v

end

 

Inject is very powerful as it enables creating a large number of reduce like behavior. Inject itself is only one example of many things you can do if you can control the execution of iterator by passing it values from its consumer. The argument in favor of the return is weaker, but it is sufficient to say that it it important to separate a final value, from that of the last yielded value.

 

(I am not sure that Py cannot return a final value. If I am correct it terminates iteration of next() calls by throwing an exception. It don’t know if it captures a return value in the function object)

 

 

Scheme iterators, the simplistic approach that I have in mind is

(define iterator

       (lambda (x cls)

             (cls (+ 1 x)) ; yield 1

             (cls (+ 2 x)))) ; yield 2

(define caller

       (lambda ()

             (iterator 10

                    (lambda (n) ; this lambda is in the lexical scope of caller

                           (display n)))))

 

Should be equivalent to

def iterator  x

      yield x + 1

      yield x + 2

end

def caller

      iterator {|n|

            puts n

      }

end

 

The limitation in scheme is that you cannot do a ‘break’ the block of the caller. Atleast there is no easy way to do this without grabbing a continuation at the point of the call of the call to ‘iterator’, and then invoking the abortive continuation in the closure argument cls.

                       

If if it given that the abortive continuation is a natural way to do this, it is not obvious about how two iterators can created to lockstep iterate into a single block of code. I will discuss this further below.

 

So scheme can do 1, 2, 3 and 5. It can’t ‘break’ without continuations and cannot interleave or lockstep iteration in any obvious way.

 

Ruby, by far comes out with a lot of the above – the only thing it can’t do is 6, that you cant interleave iterations. The fact that you cannot interleave or lockstep iterations means that there whole classes of problems that you cannot solve. The simple case is the classical example of co-routines or continuations, the problem of comparing leaves of two trees. To compare the leaves of two trees the standard solution is write a routine that traverses the tree recursively and yields leaf values out of the recursive structure. You can then write a simple compare routine that initializes two instances of this procedure with two trees you want to compare, and then it merely compares the yielded values. This is the solution in Comega –

 

//the int* represents a stream of 0 or more ints. It is not a int pointer.

static int* getleaves(node n)

{

       if(n != null)

       {

             if(n.value != null)

                    yield return n.value;

             else

             {

                    if( n.left != null) yield return getleaves( n.left );

                    if( n.right != null) yield return getleaves( n.right );

             }

       }

}

 

static bool samefringe(node tree1, node tree2)

{

       en1 = getleaves(tree1).GetEnumerator();

       en2 = getleaves(tree2).GetEnumerator();

 

       t1 = en1.MoveNext();

       t2 = en2.MoveNext();

      

       //this is the lockstep iteration

       for(; t1 && t2; t1 = en1.MoveNext(),  t2 = en2.MoveNext())

       {

             if( en1.Current != en2.Current )

                    return false;

       }

       return !t1 && !t2;

}

 

As you can see, the class of problems that fall into the above category is fairly large and its annoying that they cant be solved in ruby.

 

Note: Ruby provides something called a generator to solve the problem of lockstep iteration. I like the language they describe this in –

Generator converts an internal iterator (i.e. an Enumerable object) to an external iterator.

Note that it is not very fast since it is implemented using continuations, which are currently slow.

 

The first problem is that it grabs continuations over something that already manages state. Lets assume that that’s transparent to us and is a non-issue, there is a larger problem that generator created iterators violate some of the other requirements, specifically 5: how you can send values into the iterator.

g = Generator.new { |g|

      n = "hello"

      loop {

             n = g.yield( n ) 

             puts n

             break if n == nil

      }

}

g.each {|n|

      puts n

      n + "."

}

This code does not work as expected (does not generate hellohello.hello..hello…). If there is something amiss above that you can point out, let me know.

 

All of this brings us closer to the conclusion – you will notice that there are two approaches to the implementation of iterators – one where the callee (the iterator) is closure converted (like C#, Python…) and the approach to grab a closure in the caller with a light-weight or heavy-weight abortive construct (as in Ruby and Scheme). Here is a pleasant discussion on iteration styles.

 

Finding Yield the Magnificent

 

Fixing the C# and Py approach requires two potentially simple changes. Make the C# MoveNext() or the Py next() functions take an argument. This is not strictly semantically correct, because this means that if MoveNext or next are used to start the computation then the first value passed is ‘hanging in space’. This fixes requirement 5. To fix 3 it is enough to create a new member in the closure object that retains the return value.

 

While this might be all very well in Py, its probably not as easy in C# or other strongly typed systems since you hit a syntactic limitation about where you express the return type of the function and the types of the value that yield returns to the function.

 

 

Fixing lockstep in a blocks/internal iterators/scheme-ruby-style approach can be done if it is possible to create a combinator that takes multiple iterators and a block that takes multiple values and does an iterator equivalent of scheme’s ‘apply’. I don’t if such an apply-iterator operator exists. Maybe this was the difficulty they weren’t built into ruby in the first place. I don’t have a solution to interleaved iteration when it’s the caller that is closure converted – there is probably no valid general purpose transformation of this sort possible.

 

 

There, I am glad all of this is said finally. I knew it all at the back of my mind as a shapeless dark area in my understanding of yield from which I could derive working knowledge, but nothing quantified. What started off all of this is that I finally needed a yield that did all of the above – that seemed to be the only natural way to express a particular algorithm. Now that all of this is said its quantified, weather it is all right or wrong. I will be happy to see what ideas you can contribute to the above discussion. I would also like work of the apply-iterator operator or if such a beast exists and you know about it, let me know.

 

I will get down to proof-reading this soon.

Tuesday, October 11, 2005 12:10:46 AM (Eastern Standard Time, UTC-05:00)
This is linked to from the 'IteratorStyles' page that I had dropped refernce to and I had just missed link earlier. This is very close to what I wanted to do.
http://onestepback.org/articles/same_fringe/index.html

As a matter of fact two schocking things, the problem I originally wanted to solve is the sameFringe problem. This paper also mentions Prof Dan Friedman's paper on continuations in its refernces.

The problem where this breaks is the 'repmin' problem - http://knorretje.hku.nl/wiki/repmin.
Here is solution I consider 'natural' in psuedo ruby code -
def repmin t
if t is leaf
m = yield t.val
return tree(m)
else
#lock step iteration required
t1, t2 = repmin( t.left ), repmin (t.right ) {|lm, rm|
yield (min (lm, rm) )
}
return tree(t1, t2)
end
end

This is the solution that no yield I know of can satisfy.
Roshan
Wednesday, October 12, 2005 10:48:17 AM (Eastern Standard Time, UTC-05:00)
About the ruby generator the g.yield(n) is returning g and not the value returned by the code block. This is probably to maintain the contract withe ruby's iterator interface.

sidharth
Wednesday, October 12, 2005 2:35:34 PM (Eastern Standard Time, UTC-05:00)
repmin would make a good example for Mozart's dataflow variables. I'll try and pull up an example of it.

Wouldn't it be more efficient to implement repmin in two passes?

You dont have to use yield for everything

def inject(f, l, b):#{
for i in l:#{
b = f(b)
#}
return b
#}

def sum(l):#{
return inject(l, lambda a, b: a + b, 0)
#}

#!@*%^$ no pre in comments!!!! X-(

interestingly my inject looks a lot like pythons reduce :-P

Your repmin implementation is an advertisement for asprin. I have a feeling that the only place you could express something like that in a clean way is using a dataflow language like mozart. Even in mozart you'd need to use multiple threads.

btw check out my blog, new post for you. :-P

sidharth
Wednesday, October 12, 2005 4:06:29 PM (Eastern Standard Time, UTC-05:00)
I think of reduce and inject as the same so it should not come as a surpise.

The yield implementation indeed does two passes, but the second pass is implicit and is obvious if you are used to thinking about yield. When I say 'used to thinking about yield', I mean discovering something something similar to 'recursing back down a recursion heirarchy'. Its subtle and powerful.

I am begining to think of it as the 'structured programming' of continuations, in the same way loops are the structured programming of gotos. And that, I think, is something beautiful.

One of these days I would see value in formalising all of this a bit.

(Sid has a nice explanation of the yield-to-class transformation in Py.
Take a look: http://blogs.applibase.net/sidharth/?p=30#more-30)
Wednesday, October 12, 2005 4:11:11 PM (Eastern Standard Time, UTC-05:00)
Repmin traverses a tree and replaces each leaf with the minmum value of leaves of the tree.
Here is repmin, a little more pretty formatted:

def repmin t
---if t is leaf
------return tree( yield t.val )
---else
------#lock step iteration required
------t1, t2 = repmin( t.left ), repmin (t.right ) {|lm, rm|
---------yield (min (lm, rm) ) #here the return of the yield, is the return to both the iterators
------}
------return tree(t1, t2)
---end
end

t2 = repmin(t1){|m| m }
Wednesday, October 12, 2005 4:27:25 PM (Eastern Standard Time, UTC-05:00)
I missed responding to this :

Sid:
You dont have to use yield for everything
def inject(f, l, b):
---for i in l:
------b = f(b)
---return b
def sum(l):
---return inject(l, lambda a, b: a + b, 0)

This similar to what I am saying above about how ruby and scheme iterators work: by passing a closure (which is your lambda) to the caller.

(define iterator
----(lambda (x cls)
--------(cls (+ 1 x)) ; yield 1
--------(cls (+ 2 x)))) ; yield 2

(define caller
----(lambda ()
--------(iterator 10
------------(lambda (n) (display n)
------------; the lambda above is in the lexical scope of caller
------------))))

Which brings me to limitations of not being able to interleave iterators.
Wednesday, October 12, 2005 5:54:08 PM (Eastern Standard Time, UTC-05:00)
Here is your repmin transliterated to mozart. Turns out that in this case I didnt need to use threads. My Bad. Seems that mozart might make a goog vehicle to explore your ideas. :-P

functor
import
Browser(browse:Browse)
define
fun {RepMin Tree Yield}
case Tree
of tree(Left Right) then
LYield RYield LVal RVal YieldVal NewTree in
fun {LYield LeftVal}
LVal = LeftVal
YieldVal
end
fun {RYield RightVal}
RVal = RightVal
YieldVal
end
NewTree = tree({RepMin Left LYield } {RepMin Right RYield})
{Browse LVal}
YieldVal = {Yield {Min LVal RVal}}
NewTree
else
{Yield Tree}
end
end
Tree = tree(5 tree(tree(tree(1 4) 3) 6))
{Browse Tree}
{Browse {RepMin Tree fun{$ Val} Val end}}
end
sidharth
Wednesday, October 12, 2005 6:06:24 PM (Eastern Standard Time, UTC-05:00)
Rosh I just want to point that I blown that you could think of something as convoluted as repmin. It took me over a while to figure out what you were trying to say there. Even after that I was unable to accept that it would work I just had to go by past experience in disagreeing with you :-P

The reason I was picking on your use of yield was that I though you were unnecessarily restricting the idea of closures as iterators to a single closure per call like ruby. This is not really necessary. And yes I know that the single closure restriction doesn't hurt your implementation.

Another thing is that your idea is still based on the closure based iterator concept. Which is very different to what is done in python. So there isn't really any reason to bring python into this discussion. Hense by blog entry.


sidharth
Wednesday, October 12, 2005 8:33:58 PM (Eastern Standard Time, UTC-05:00)
I dont understand what you are saying about closures Sid. There are two approaches to doing yields - the closures approach and the one that transforms the iterators into classes. Of these the later can be more easily modified to have all the 6 required properties. The closure based approach is not so easily ammenable to this sort of change. Thats what I am trying to say under 'Finding Yield the Magnificient'.

The fact that there can be only one closure per call is the only way to do it. The challenge is to map that one closure to multiple iterators. If you are making a different point, I didnt get it.

Its relevant to have Py because Py does yields and between C# and Py it might be easier to fix Py because it does have to have the onus to specifying the types of '= yield' and the funtion return. C# doesnt have syntactic space to keep these type qualifiers.
Friday, October 14, 2005 2:29:14 AM (Eastern Standard Time, UTC-05:00)
Took me a while but I finally got what you are talking about. Teach me to jump to conclusions.

btw you can do this in python by providing an alternating method of comunicating back into the iterator. Might not look pretty but its definitely possible.
sidharth
Friday, October 14, 2005 12:40:18 PM (Eastern Standard Time, UTC-05:00)
Here is my python generator implementation of repmin
http://blogs.applibase.net/sidharth/?p=31
sidharth
Saturday, August 16, 2008 10:24:56 AM (Eastern Standard Time, UTC-05:00)
my name is roshan

Saturday, August 16, 2008 10:25:33 AM (Eastern Standard Time, UTC-05:00)
roshan web side roshan
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, strike) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview