Wednesday, September 12, 2007

I just finished reading Mirza Muhammed Hadi Ruswa's 1905 Urdu novel "Umrao Jan Ada" as translated by the grand old man of Delhi, Kushwant Singh. What a terrible book!

Umrao Jan is based on the life of a courtesan in old Lucknow who went by the name of Umrao Jan. The book traces her life as narrated by her to Mirza Ruswa. In the process it gives you a glimpse into the life and times of the glory days of Lucknow and surrounding parts from the 1840s to the early 1900s.

The inside of the dust jacket says:

Old Lucknow is now but a dream, but this work preserves for posterity the full flavor of that golden age in the history of the city, which comes to life in the pages of this book.

We enter the palaces of wealthy nawabs, the luxurious abodes of cultured courtesans, and the hideouts of colourful vagabonds. The courtesans - out immediate concern - were on the whole well-read and exceedingly proficient in the arts of dancing and singing.

In the pages of this charming book, we find pieced together an exquisite civilization that will never recur, and warm human emotions that are eternal.

While this is all true, what is captured is not particularly beautiful. This is a book that seems to be a poor presentation of someone's (rather eventful) life. To its credit, I can say that it exists and so the story isnt lost. The book is littered with attempts at poetic verse that is, in comparison with some of the poetry that I have been hit by in recent days, just painful. (I guess you are set up for disappointment once you have developed a taste for the likes of Ghalib and Dylan.) The book also seemed to lack focus in some ways, unimportant characters held unnecessarily long conversations about unnecessary things. Lots of shallow moralizing.

In many ways Umrao Jan Ada has parallels with Geisha. Many similarities.

I havent seen the movie Umrao Jan, but I would expect it to be a far more pleasent experience than the book. A story about being "Lucknawi" truly has to be told in visuals with lots of flair and charm. There is not one, but two Umrao Jaan movies. The first one, the 1981 version, with Rekha as Umrao.

http://www.imdb.com/title/tt0083248/

The second, a glitzy 2006 rendition, with Aishwarya Rai as Umrao.

http://www.imdb.com/title/tt0485522/

I expect these to be better than the book - can't be too hard to manage that.
Enough said.

Wednesday, September 12, 2007 5:53:45 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Monday, September 10, 2007

I walk into the Computer Science department today morning and I couldn't help notice little containers randomly placed on tables. I had a class on Computational Complexity and as usual I was running a little late so I decided to investigate later.

I walk into class, nod my usual apology to the lecturer and sit down. There on my table was one of them. Then I looked around and there was one of these on every table! Whats happening here?

They are little containers with unmarked white pills in them. Hmm... unmarked white pills? There is no text on the containers saying what the pills might be about. Strange. The only text on them says "JUST BE" in largish red letters on a white label under which there is a url in small font. Aaah.... the whole department was exposed to this stuff. And what a lot of victims they are, they are computer scientists - they hardly ever notice what they are eating. I would not have noticed myself had it not been for my temporary heightened sense of precense I acheive out of being late from time to time.

So what can this be? What sort of tablets would someone label "JUST BE"? A halucinogen? Some new form of crack? Birth control pills? After i went over all the usual possibilities, I started to panic. What if this is one of the secret government test thingies? Maybe a a neural inhibitor? or better a yet, a thought tranfer or a mind control pill? Then it struck me that this reminded me a lot of something I saw on the X-files many years back called "PURITY CONTROL"... JUST BE - PURITY CONTROL ... muuaaahhhaaa...

What sort of agency would expose the department? Maybe the whole campus to these? I decided to pocket one of these to examine it in detail later for clues. As I was doing so, I noticed the url - the url was for a cs department webpage for "WIC". I have heard that name before.. they did it! The "Women In Computing". The WIC exposed the department to unmarked white pills, which many of us would unsuspectingly eat. Why would the WIC want to do that? And what could be in those tablets? Come to think of it I dont know what the WIC really do... they are an organisation of some sort, yes (in most cases their membership is based on some marginal excuses). Why? Why?

This general general thought caused my mind to think about some WIC members. Aah.. well, lets skip that bit for a public blog shall we? And why has there been so much care taken to disguise these tablets to look like common mint? What would happen to me if I were to eat one? I decide to risk it all and try it. After all better take the risks here in the precense of people than to accidentally eat one one when I am alone at home with these tabs.

I flip the container over to try one and then I see this -

What? It expires today? Also it looks like it has been ripped open.. someone has eaten one of these! Something happened here because the victim is missing!

Then the sense and the safety in all of this becomes clear to me. It is a feminism thing.. and these tablet things are most likely just mint or mind control pills or something safe like that after all. And they were there as a show of good will. :) Some small details seem to have been missed, in this case the expiry date... details. After all who cares about details?

All this brought my Complexity lecture to an end. I guess I will have to find another muse for another day.

Monday, September 10, 2007 5:55:42 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  | 
 Wednesday, September 05, 2007

Khusrau rain suhaag ki, jo main jaagi pi ke sang,
Tan mora man piya ka, jo dono ek hi rang.

Thus, with the words of Amir Khusro, starts the 1978 opus of Shyam Benegal. I was so moved by this movie, all the little details in it, the characters, the poetry, the zeitgeist. The film aparently introduces the gorgeous Nafisa Ali. It also has Naseeruddin Shah, vibrant as ever, Shashi Kapoor, Shabana Azmi, Jennifer Kendall ...

The movie captures an angst in visuals and poetry, over the period of 1857 to 1858 - the days of the "Sepoy Mutiny" in a little town in Rohilkhand.

http://en.wikipedia.org/wiki/Junoon_(film)
http://www.imdb.com/title/tt0077783/

 

Tuesday, September 04, 2007 11:17:50 PM (Eastern Standard Time, UTC-05:00)  #    Comments [3]  | 
 Sunday, August 05, 2007

I asked my soul, "What is Delhi?"
She replies, "The world is the body, Delhi its life"
                                                                                  - Ghalib


Sunday, August 05, 2007 1:01:11 AM (Eastern Standard Time, UTC-05:00)  #    Comments [2]  | 
 Wednesday, July 25, 2007
  sudo! 
This is awesome:




Wednesday, July 25, 2007 7:36:01 PM (Eastern Standard Time, UTC-05:00)  #    Comments [5]  | 
 Tuesday, July 24, 2007

In a preachy mood: Today I spent some time talking to someone about how a piece of code that is going to affect his hapiness (aka he is responsible for) has some really bad design decisions in it. We went over how there is this little bug that shows as a consequence of this design flaw. The bug itself has an easy fix and such, but the underlying reason is that several layers of leaky abstractions interact with each other in unhappy ways.

In short I was trying to tell him that its not a bug, but that its bad design waiting for a victim - this should be better described as a pit or a trap than a bug. Bugs are visible warts, pits are lurking demons. Bugs are plain as day, easy to justify - pits are lurkers, those who can tell good code from bad, feel uneasy in their precense.

Do you have pits in your code? Do you have a sense of what parts of your code are pits? If you had to choose between fixing a pit and a bug, what would you do?

I believe everyone's code has pits to varying degrees and that they are completely unavoidable beacuse there is simply no such thing as a perfect apriori design. That said, here are some slightly more interesting questions:

If you had to implement something new - there is solution 1 that is going to introduce a pit but will deliver said target surely and quickly and there is approach 2 that will potentially eliminate the pit, but is going to be far more expensive to implement - How would you choose which approach to take?

I usually go ahead and implement the pit when I dont have enough information to decide between the two. Often there are pits that will never surface and no one gets trapped in them. When I dont know enough implementing the pit is a good way to (1) get my work done (2) ensure that I am collecting data so that the next time I know what the real costs are.

Its vastly more valuable to document the fact that a choice was made based on the lack of sufficient information, than the excrutiating details of choice itself. Pits turn into chronic illnesses of your code base when no one knows why ceratin decisions were made, and hence no one ever feels comfortable back tracking over a bad decision. Once I know an ill-informed choice was made, and once I know the price I am paying for it, its a lot easier to back track over it.

In other words, dont just document your design, document your choices in the context of your design space - it will keep you honest. Well written code to a large extent is its own documentation - what it is not a good documentation of is the choices that you did not take and the reasons why you did not take them.

Thoughts?

Tuesday, July 24, 2007 2:33:39 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Friday, July 20, 2007

As an indirect result of some of conversations in the past few days at Google and Microsoft, I was left thinking that I need a better way to explain to someone why basic research of languages and language constructs is necessary. This crops up every now and then, and it would be useful if I had a simple answer to this. I think I have an approximate answer now, what do you think:

I think basic research about language constructs is necessary because it helps separate concerns. It separates of the quality of the abstraction that the construct provides from the details of the implementation of the construct. In this way one can study and construct and make observation about how valuable it would be as a tool for expressing computation. Once we have a great abstraction, we can then focus our attention on the best possible implementation of the abstraction or some select component of it, now with a clear understanding of the expressive power of the end result.

As an example our present notions of concurrent programming with threads and locks/semaphores is a broken idiom - they are hard to program with, are brittle and don’t compose well. The notion that it would be nicer to write concurrent programs as memory transactions came about and gained favor because the quality of the abstractions that memory transactions provided was far greater than that of locks and semaphores. By quality I mean that memory transactions compose far better, provide guarantees about progress, fairness etc and are not prone to the sorts of pitfalls that lock-based approaches suffer from. This view of transactions matured into a set of key combinators that gave us "composable" memory transactions. Once the idea that "memory transactions are good" gained ground it spurred a cottage industry of people trying to efficiently implement these abstractions.

The underlying point is that the search for abstractions to represent computation is not hindered by details of implementation. What we refer to as an implementation is only the expression of an idea in terms of other abstractions that previous languages have imposed on us. Limiting ourself to the scope of these, baises us against ideas that dont appear natural to them. I like to think of this as a form of "the evil of premature optimisation" at the level of ideas.

Friday, July 20, 2007 12:31:34 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Wednesday, July 04, 2007

One of the things that I have been working on in my free time is a compiler for a subset of the Scheme programming language. This compiler is written in Scheme and is a small, relatively fast compiler.

Languages and Usage

It supports a very basic core language:

  1. fixed size integers
  2. vectors
  3. scheme pairs (via cons)
  4. quoted constants
  5. let, letrec, top level define
  6. booleans, if
  7. set! (mutable variables)
  8. lambdas, higher order functions

The core language that it supports is a higher order programming language (with first class functions) and proper tail calls.

The compiler takes input in the supported subset of scheme and emits x86 assembly code. Here is the "life-cycle" of a compilation: The emitted assembly has entry point into scheme, a sort of scheme_main(). To use the compiler one needs to compile the generated assembly into .obj using any intel-syntax-compatible assembler - I use ml.exe, which is Microsoft's assembler. The .obj then needs to be compiled and linked against a minimal bootstrap C source file which essentially allocates some amount of memory from the OS and them jumps into the scheme code. I use cl.exe, Microsoft's C++ compiler, for this purpose. The assembler and C compiler are free tools that maybe downloaded from Microsoft's website.

All of the core compilation, including the register allocator and code generator are written in Scheme. The generated code is rather efficient (read: not terrible), though a lot can be done improve its quality. In fact the quality of code it generates substantially better than one or two compilation systems that I know of that are used commercially. The code generator is a macro-based instruction selector and the register allocator is a linear allocator as opposed to a graph-based allocator.

The compiler code base is purely functional (for the most part ...) - the minor variations are control operators like yield, recursive functions etc. which are borderline-functional depending upon the school of thought that you belong to.

Organisation

At the point of this writing, the bulk of the compiler source is limited to one file called compiler.ss. This file contains all of the traditional compiler passes as well as the register allocator and the code generator. This file is a little above two thousand lines of scheme code. While that is pretty small for a compiler, I believe it could be made smaller and prettier still.

The compiler is a collection of "passes". The input to the first pass is the source language. The output from the last pass is x86 assembly. Each pass does a small transformation on the source tree - for example a pass could remove one-armed ifs and replace them with ifs that have a do-nothing else branch. The compiler has a fairly large number of passes (greater than 15 or so), each doing a simple and relatively specific transformation. One could think of a pass as a mini-compiler that turns a specific input grammar into a specific output grammar. I believe that it was Kent Dybvigs original observation that the traditional wisdom about expensive-compiler-passes is essentially misguided. Hence his own production quality compiler has many many passes, is blazingly fast, produces efficient code and has a manageable code base.

In compiler.ss, passes are written as a collection of a "visitors". A visitor is a small code transformer that takes one syntactic form and does a transformation on it. For example a visitor could be one that takes a one-armed if and returns a proper if that has an else branch. A pass is constucted by composing several visitors together. The infrastructure that enables us to write the passes as a collection of visitors has most of the basic elements of a term rewriting system. The advantage of writing the compiler as a collection of visitors makes things easy -  I can completely isolate each code transformation from the rest of the system and be free to focus on solving that little problem well.

The source tree is represented as a scheme s-expression and hence requires no parsing and none of the traditonal noise that gets in the way of compilation. At some point towards the middle of the passes the intermediate language that is output is referred to as an "IL". This IL is rich enough to represent all of the details of the scheme program while not have any scheme specific components to it. It is very much like a machine-independant assembly language - almost like C. The hope is that I maybe able to compile other languages to this IL and them share the backend of the compiler - the code generator, register allocator etc.

Most of the other files are essentially infrastructure pieces (sometimes rather generic) and helpers. The source walking visitor infrastructure is in micro-pass.ss. The yield control operator is implemented via callcc and set! in yield.ss. The super handy match macro is implemented in match.ss (I am not the original author of the match pattern matcher).

Usage and Installation notes

You will need the following to use the compiler

  1. The compiler sources
  2. Petite Chez Scheme (Just get the latest from www.scheme.com)
  3. Microsoft's C++ compiler suite or something compatible (You need the compiler and assembler, these are free downloads) 
  4. Ruby (Only needed if you wish the run the test suite)

Install Petite and the Microsoft tools. Makes sure that you have petite, cl.exe and ml.exe in your command line path. Compile a little hello-world with cl and see if things are installed and working fine. Extract the compiler sources to some folder. Thats it you are done. If you want to run the test suite, have ruby also in your path - this maybe a good idea if you want to play around with the sources.

There are two batch files for your convenience:

test-all.cmd: Runs the test suite. The test suite consist of the files in the tests folder.

compile.cmd: This compiles a specific file for you.

Consider a little program that recursively removes a value from a list:

(define foldr (lambda (op unit xs)
(if (null? xs)
unit
(op (car xs) (foldr op unit (cdr xs))))))
(define remove (lambda (xs x)
(foldr (lambda (v ls)
(if (pair? v)
(cons (remove v x) ls)
(if (eq? x v)
ls
(cons v ls))))
()
xs)))



(let* ((vals '(0 1 2 0 0 0 3
(1 2 0 0 3 4)
(1 2 0
(1 2 0 3 4) 4)
4 5 0
(1 2 0 0 3 4 0)
7 8 0))
(res (remove vals 0)))
(cons vals (cons res ())))

We compile foo.ss as follows:
> compile foo.ss
produces foo.asm and foo.exe, among other things. foo.exe when run produces:
((0 1 2 0 0 0 3 (1 2 0 0 3 4) (1 2 0 (1 2 0 3 4) 4) 4 5 0 (1 2 0 0 3 4 0) 7 8 0) (1 2 3 (1 2 3 4) (1 2 (1 2 3 4) 4) 4 5 (1 2 3 4) 7 8))

The generated foo.asm is the result of the compilation. Its the .asm file that you want to eyeball to make observations about how inifintely retarded the quality of my output asm is in comparison to your-favourite-compiler.

Source Language

The best way to find the supported subset of the language is to grep the tests folder. However, here is an attempt:

  • 30-bit numbers, ascii characters, vectors, lists, #t, #f, (), (void)
  • quoted constants composed of the above
  • cons, car, cdr, set-car!, set-cdr!
  • make-vector, vector-ref, vector-set!, vector-length
  • let, letrec, let*, set!
  • fx+, fx-, fx*, fxadd1, fxsub1, fxlognot, fxlogor, fxlogand
  • fixnum->char, char->fixnum,
  • pair?, null?, fixnum?, char?, vector?, boolean?, fxzero?, procedure?
  • fx=, fx<, fx<=, fx>, fx>=, char=?, char<=?, char>?, char>=?, eq?, not
  • lambda, if, begin, define (top level only)

If you are unfamiliar with these operators you can use Chez Scheme's index as a reference. Here are the disappointing bits: there is no quasiquote, no input-output, no forgein function interface, no file loads or includes ... yet. Note that while this is only the core subset the compiler supports, this is mostly rich enough to build a library of commonly used stuff.

Since the supported language has no output statements, the result of of program execution (the final result) is merely printed out at the console. This can be of any of the supported data type, including any s-expression (Note: I dont do anything to handle circular expression so brace yourself to see lots of parenthesis if thats the case).

XEmacs nonsense:
The best way to read the sources would be to use xemacs with folding mode enabled to ;{{{ and ;}}}. To do this, you need to add the following to your init.el file:
(folding-add-to-marks-list 'scheme-mode "{{{" "}}}" nil t)

The most useful folding mode commands are folding-mode, folding-context-next-action. It maybe nice to bind them to key combinations like so:
(global-set-key (kbd "C-c C-m") 'folding-mode)
(global-set-key (kbd "C-c C-f") 'folding-context-next-action)

Future Work and What it lacks

There are many things I would like to add to the compiler. Since I am writing this compiler entirely for the fun of it, it is one of my objectives to make choices that are interesting as opposed to useful. I believe most programming languages today are designed with the aim of being useful which is partly why they all suck; they are crammed with hasty decisions substantiated by all sorts of reasons. Eventually I suspect my compiler will fall prey to usefulness and/or boredom, until then I hope to have fun.

Most of the items in the menu below haven't been implemented yet because of time or because I am not compelled enough by the best ideas I have.

GC: The topmost on my list of things to add is a GC. While it is fairly easy to do a copy-collector in the C-runtime system, I have been toying with the idea of writing the GC in the subset of scheme that I compile. It is interesting to see the minimal set of primitives that will have to be added to the language and how cleanly this can be expressed.

Threading: It would be interesting to write a compiler that supports a mutlithreaded language. There are easy and not so nice ways to accomplish in the present compiler, and there are deeper more elegant ways to this as well.

Errors: The compiler assumes good input programs. This is never the case in "real" life and an early error checking pass is required. Traditionally good error reporting is considered hard in compilers and with good reason. However I may never get down to doing this because this leans more towards useful than interesting in my present world view.

STM: I have been doing some work on STM over the past year or so and I have some pet algorithms of my own that need compiler support to implement well. Once I have STM, like Satnam Singh shows, it maybe possible to express Join patterns and such in terms of them.

Large Objects and Strings: The compiler doesnt have any native support for strings. One may treat vectors of characters of strings, but I wonder if I can do better - for example have a fast string-copy. Similarly the compiler works with a fixed size heap at this point. Allocation requests that overshoot this heap size would simply run into a guard page and segfault the system. Support extensible and large object heaps would be nice.

Pattern Language: I would really like to add compilation support for a pattern language. At some level I suspect that pattern languages are more basic than lambdas. The existence of Barry Jay's Pattern calculus strengthens this view. Scheme's best pattern matcher 'match' is added as a macro to the language. It would be nice if the compiler to primitively support a generic enough pattern matcher.

Continuations: I would like to natively support delimited continuations in some form - most likely my yield operator or a more traditional shift-reset. These can however be added nicely into the compilation once the GC and the threading issues have been addressed because those will affect the runtime stack architecture a lot.

Module System: Some sort of a module system.

Macros: Though this is strictly not an addition to the compiler, it would be intersting to support a pattern language of some sort. Some system where the compiler itself was part if the runtime system would be awesome - maybe some approximation of Walid Taha's MetaML. The thing is that MetaML does not have an catamorphism operator and is hence not very useful for describing syntax extensions: maybe this can be coupled with a pattern system in some way.

Types: At some point I would be intersted in adding a type inferencer to the system. The compilation at this point is rather blind and fast. If one attempts to do a primitive addition between a cons cell and a number, the compiler provides absolutely no security - not even a runtime exception. Scheme solution, as is most "dynamic" languages, is to verify the types of things at runtime before doing primitive operations. These checks are expensive and it would be interesting to see to what extent I can substitute these with compile time assurances.

Parser: With all of these additions, the input language is no longer Scheme. Atleast no self respecting Schemer would allow me the pleasure of their company if I said so. This calls for a different input syntax, one that is more amenable to type decorations as well. While writing a parser is a easy, I am fairly stumped by what a good syntax would be. The syntax should also lend itself to macro generation and possibly tot a Haskell-like do-synatx so that I can hide the set! behind monadic state - maybe ST or some such state monad. If you have any good ideas about syntax design, let me know.

As a matter of fact if you are interested in toying with any part of the compiler let me know. Chances are that I am going to disagree with you and reject your code in the first place, but all the same it would be fun to know someone else think.

Credits

I started work on this compiler in the course of Prof. Kent Dybvig's famous compiler class, in Spring 2007. Kent is the author of the amazing Chez Scheme compiler. This compiler would not have happened without Kent and his awesome grad student Abdulaziz Ghulom.

Why is the compiler written in Scheme? The compiler is written in Scheme because Scheme is a beautiful, expressive language. It is my opinion that writting your system in the most expressive medium that you have leads to better and more manageable code. A direct conseuence of having mangeable code is that it encourages one to implement and experiment with more interesting compilation strategies - in time the extra abstraction privded by an expressive language pays off in spades. I have seen too many systems die the death of complexity by being prematurely optimised into a C-like language - in time the code becomes expert-only and eventually just unmanageable.

Download Compiler Source

Wednesday, July 04, 2007 4:50:44 PM (Eastern Standard Time, UTC-05:00)  #    Comments [2]  |