Monday, June 05, 2006

Work at MSRC has been moving slowly, though I have been busy.

 

Parallel GC for Haskell

Not much to say on the Haskell parallel GC front except that I am yet to actually start writing any code. I have been trying to understand the existing code base and it has been a little over whelming. This is certainly not one of the best code bases that I have seen. Actually quiet far from it. There are functions of several hundred lines in length. The main GC function itself is 800+ lines long. I have been told that several people have got their PhDs of that GC.c file. I was expecting a pure functional language to contain a couple of core object types and everything else to be built around them. Far from it, there are 70+ types of objects in the system – they have special cased almost everything they can think of for efficiency sakes. Being a pure lazy language these things matter I guess.

 

That aside the source is littered with lots of interesting ideas. The problem with something so interconnected is that its hard to build incrementally on this. So I am little stuck. The GC has a copy collector and a mark compact collector in the same code base – they share a large amount of code. For a long time this confused me. The system switches from a copy collector to a mark compact collector depending upon memory usage of every step of every GC. The number of generations is runtime configurable. The number of steps per generation is runtime configurable. It’s a crazy, interesting system.

 

I have been making some notes over here, which I hope to continue updating over the period of my internship –

http://cvs.haskell.org/trac/ghc/wiki/GarbageCollectorNotes

 

Papers - Quantum

That aside, I helped complete – or rather didn’t help complete – a paper on Quantum Computing since I landed in Cambridge. I actually didn’t do much for that paper. I had some interesting ideas, but they remained undeveloped enough to not go into the final paper (though we initially thought it would). So well, in some sense my first paper is not really my first paper. :)

 

Papers - Yield

I finished my second paper since landing at Cambridge this past weekend. The paper is about yield, (yes, finally!). It is coauthored with Simon Peyton Jones and Amr Sabry, though most of the actual writing was done by me. So in many senses this paper will not be of comparable caliber to most Amr Sabry or Simon Peyton Jones papers. Writing this paper was quiet an experience – as it is, it was my first real paper. Usually, I am told, people don’t write their first papers completely themselves, they write a section or so. Secondly I had the names of two very respected computer scientists on this and I was afraid of writing accidental stupidity. They however did give me lots of comments over email. And at the end of it I think I have come out braver, a tab bit less afraid of writing papers.

 

The paper has been submitted to the Haskell Workshop 2006. Haskell is probably the most difficult language to suggest a new control flow operator. Being the pinnacle of pure functional programming, with monads and lazy evaluation it is a very very hard language to make a value proposition to in terms of adding control flow operators. All the same, I think we may have one or two interesting arguments in favor of yield.

 

Breadth First Renumbering

As an example of control flow based on lazy evaluation consider this problem: You are given a binary and asked to renumber the nodes in breadth first order, thus creating a new tree of the same structure but with changed node values.

 

 to

 

Think about it. See if you can do this by only one traversal of the tree and use constant space etc. Basically the best solution you can think up.

 

In Haskell, lazy evaluation is used as a control flow operation. But one that is an implicit control flow operation – in the sense that the programmer doesn’t have to specify the order of evaluation the system will figure it out, driven by need.

 

Here is the lazy Haskell solution -

 

data T a = B a (T a) (T a) | L a

       deriving Show

 

bfn :: ([Int], T a) -> ([Int], T Int)

bfn (k : ks, L a) = (k + 1 : ks, L k)

bfn (k : ks, B x a b) = (k+1 : ks'', B k a' b')

    where (ks', a') = bfn (ks, a)

          (ks'', b') = bfn (ks', b)

 

bfnum :: T a -> T Int

bfnum t = t'

    where (ks, t') = bfn (1 : ks, t)

 

It is just mind boggling that something like this can work – and work so perfectly. If you know how to read Haskell and you haven’t seen this solution before, get ready for a mild surprise. :) The implementation is adapted from the appendix of Chris Okasaki’s  paper –

http://portal.acm.org/citation.cfm?id=351253&dl=ACM&coll=ACM

 

Even if you don’t understand Haskell, try to solve this in your pet language and see what you come up with.

 

Haskell

Since I am in a mood for writing a bit – here is a little bit about Haskell. A tiny tiny intro. Haskell is a pure functional language. If you don’t know what that means, there is good chance that you haven’t seen one before. Also Haskell is one of the few languages that are lazy evaluated. Values are computed only on a need basis – hence one gets to write all sorts of crazy circular dependencies between function parameters and their results – if one is careful. Also it is strongly typed – Haskell is a laboratory for stuffy of type systems, if nothing else.

 

Lets do some examples. Immediately following the example, I write what it means.

 

n = 5

n is a function that takes no parameters and returns the value 5.

 

 

f n = n+1

f is a function that takes one parameter, namely n and return n+1. Hence if one writes (f 1) it is the equivalent of calling f with the parameter 1 and will return 2.

 

If you write code like the above, it will work. But usually people write the types of everything they write. If you miss out the types, Haskell has a type inferencer which will automatically figure out the types (and might occasionally complain if there are ambiguities).

 

So here is the definition of the function f with its type signature.

 

f :: Int -> Int

f n = n+1

So the type of f is written by putting two colons after f and is “from Int to Int”. In other words the function f takes an Int and returns an Int.

 

Here is the type of the function even, a function that checks if a number is even.

even :: Int -> Bool

It takes an Int and returns a Bool.

 

So what is the type of the function n?

n :: Int

n = 5

Simple? The type of n is simply Int.

 

Here is a function g that takes two parameters, namely x and y and returns their sum.

g :: Int -> Int -> Int

g x y = x+y

The type says that g takes an Int and yet another Int and returns an Int. The value of (g 5 10) is 15.

 

The interesting this is that in haskell, unlike in many other languages, the expression (g 5)  by itself has a meaning.

g5 :: Int -> Int

g5 = g 5

(g 5)  is a function that takes an Int  and returns an Int. Here I assigned the name g5 to be (g 5) hence is I say (g5 20) the result will be 25. The general idea behind applying only some of the parameters of a function thereby result in a new function that awaits the remaining argument is called “currying” – named after Haskell B Curry.

 

Does this interest you? If so you should consider reading up about Haskell a bit. Maybe I will drop a few simpler examples of lazy evaluation based niceties sometime.

 

Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, strike) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview