What is the Y combinator? I have seen lots of discussions on the web but nothing that seemed like a simple explanation of how it works or why it is needed. I finally understood the Y combinator a few days back and a lot of things snapped in place. So here I will try and explain it from my perspective. In this entry I am taking liberties with various syntaxes – please bear with me.
This is the Y-combinator for a normal order lazy language –
Y = \f.((\x.(f (x x))) (\x.(f (x x))))
(Here, the ‘\’ denotes a lambda). Scheme is an applicative order language and the above combinator cannot be used as such in scheme because it will lead to an infinite expansion. More on this later.
Let’s start from some basics – say we are writing a mathematical expression and we said
a = a + 1
While the above means something in imperative programming languages (like C), it does not mean anything in mathematics.
Now say I want to write the factorial function in mathematics, I would say something like this
fact(0) = 1
fact(n) = n * fact(n-1)
And this definition is perfectly valid. However you notice that it is similar to the a = a + 1 expression in the sense that in describing the value of something, we refer to that very thing.
This general principle, called a recursive definition, has a problem in the context of pure functions (like the lambda calculus). The problem is as follows – a function is a nameless entity – the only named entities that a function can use are function’s parameters (the variables introduced by the function itself) or parameters of a parent function that it is define in the scope of.
For example
(lambda (x y)
; can use x and y here
(lambda (a)
; can use a, x and y here
)
)
Now for a recursive definition, a function needs to be able to refer to itself (or like we say in some languages, a functions needs to ‘call’ itself).
So, the only way that one can define a recursive function would be to receive the function itself as one of the functions arguments. Think about this for a bit – if you are not used to programming in a language where you can pass a function, or return a function as though it were a value (similar to how you treat an integer), then this might be a hard idea. If you are from a programming background that wherein you are used to ‘higher order’ programming, then I am only stating the obvious here. (The general idea with higher order programming is that functions are first class values, like integers – which means that they can be passed as arguments, returned from functions, stored in data structures etc).
So we define a function such that it takes an additional argument which is itself.
Lets say
factorial0(factorialx, n) = if n == 0 then 1 else n * factorialx(factorialx, n -1)
factorial(n) = factorial0(factorial0, n)
Does this make sense? If it does not, then don’t bother read further. You notice that the factorial0 function takes 2 arguments one of which is the function itself.
Let me rewrite this in scheme-style syntax
(let ((factorial0 (lambda (fact, n)
(if (zero? n)
1
(* n (fact fact (sub1 n)))))))
(let ((factorial (lambda (n)
(factorial0 factorial n))))
; I can use factorial here
))
Make sense? Ok, so now let me write the interesting function factorial0 as a nested lambda as follows -
(lambda (fact) (lambda (n)
(if (zero? n) 1 (* n ((fact fact) (sub1 n)))))
You notice that fact is applied to fact, to generate the inner lambda which is then invoked with n-1.
So now, suppose one has a handy function of the following form
omega = (lambda (x) (x x))
One can define factorial as follows
(let ((factorial
(omega (lambda (fact)
(lambda (n)
(if (zero? n)
1
(* n ((omega fact) (sub1 n)))))))))
; use factorial here
)
There you go – so now we have a way of defining recursive function definitions even if your language did not directly allow a recursive definition.
So why do we need the Y combinator? You notice that in the definition of factorial we had to do something that looks very strange in the definition of the function – we call the function omega to return a function that we can call. The Y combinator does away with this for us.
To start thinking about the Y combinator, think about things this way – suppose the parameter ‘fact’ was already ‘(omega fact)’ then we wouldn’t have to call omega inside the definition of factorial correct. Right?
If we wanted that we simply need to rewrite omega this way right –
(lambda (x) (x (x x)))
Such that the first parameter is already applied to itself. But then this is not enough, because when the recursive call is made, the second time the parameter is only fact and not (fact fact) or (omega fact). Well, so we can fix this by having one more level in the definition of omega, such that omega looks like this –
(lambda (x) (x (x (x x))))
But then the third time this would not be the correct parameter… ideally we need an infinite application of x, because we don’t know how many times the function will call itself. In other words the omega should look a little like this
(lambda (x) (x (x (x (x (x … ))))
This is exactly what the Y combinator does. To see how it does this consider OMEGA defined as
OMEGA = (omega omega)
(or expanding for little omega we get)
OMEGA = ((lambda (x) (x x)) (lambda (x) (x x)))
If you haven’t seen this before, look at this device for a bit – do you see how it is infinite self application? If you do then consider the following
((lambda (x) (f (x x))) (lambda (x) (f (x x)))
It doesn’t matter what f is right now, but see what this does. After one reduction this would look like this -
(f (term term))
Where term = (lambda (x) (f (x x)))
The second reduction would look like this and so on
(f (f (term term)))
(f (f (f (term term))))
This, if you notice is similar to what we required earlier, where instead of ‘f’ we used ‘x’.
This is the basic mechanism for the Y combinator. And the Y combinator is simply
Y = (lambda (f) ((lambda (x) (f (x x))) (lambda (x) (f (x x)))
So how do we define a recursive function using the Y? Here is factorial
(let ((factorial (Y (lambda (fact)
(lambda (n)
(if (zero? n)
1
(* n (fact (sub1 n)))))))))
; use factorial here
)
Nice?
Now in the beginning I said that this Y is the Y for the last normal order language. Scheme is applicative order and non lazy. What that means is that if you use the Y combinator defined above in scheme, during the definition of factorial it will try to do the infinite expansion and will never terminate.
Instead one just eta-expands the definition to get the applicative order Y combinator. Suppose you say that term is (lambda (x) (f (lambda (y) ((x x) y))))
And we define the combinator as
Y = (lambda (f) (term term))
Then we have the applicative order Y that is usable from scheme and other applicative order languages.
Look at its expansion to see what I mean
(f (lambda (y) ((term term) y))
So the first parameter to ‘f’ (recursive function we are interested in is a lambda – the application inside the lambda is done only if f calls it.
Let me define both the Y combinators described here in lambda form
Y = \f.((\x.(f (x x)) (\x.(f (x x)))
Y = \f.((\x.(f (\y ((x x) y))) (\x.(f (\y ((x x) y))))
Once you see why the Y combinator is needed and you understand omega – its easy to see what the Y is and how it useful in defining recursive functions in a mathematically pure way.
The Y combinator is described as a fixed point function or combinator. The idea is that is lets a function describe its behavior in terms of itself until the function reaches a fixed point – like in the case of factorial the function reaches 0 and does not need to recurse any more.
References –
http://en.wikipedia.org/wiki/Y_combinator
http://www.dreamsongs.com/NewFiles/WhyOfY.pdf
Though I myself had looked at the Y many times before I never really understood it until I had to write recursive functions in the pure lambda calculus and I used omega to do it. I then happened to mentioned to Prof Friedman that I did not understand the Y and in a minute he set my thinking right.
I understand that in the description above I have taken some liberties with syntax and something are described in a hand wavy sort of way. However if you have suggestions about how I can better explain this, let me know. Of course I require that you understand scheme style syntax and higher order programming (at least minimally).