Saturday, November 03, 2007

One of my long standing irritations with Haskell is that there is no real way to generate fresh names if your code is non-monadic. Haskell is a fairly "pure" side-effect free language. Hence when you write interpreters and such in it, and you need a new name for something (say a new variable name), there is no way to generate one.

The normal (and correct) solution for this is to write your code in a State monad of some sort and pass a counter around. Every time you need a fresh name you retrieve the counter, increment it, update it and such. This is all very well, but it brings the baggage of the monadic syntax with it. It would be nice to be able to retain direct-style syntax and have fresh names.

Often when you want to test an idea, the clutter of fresh name generation simply gets in the way. Generating fresh names is essentially an effectful operation, which is essentially why Haskell does not allow it. All effects should be made explicit in the types of terms in Haskell.

Using the hack for fresh names below you side step this requirement. The direct consequence of this is that your code suddenly becomes effectful. It is no longer pure. Writing code that relies on fresh names does not automatically mean that code has a purely functional representation in the obvious way.

The other down side is that code written using fresh name generation may not even work correctly in GHC most of time. This is because GHC assumes that there are no side effects in your code. Hence it freely moves code around and does common sub-expression elimination. To clearly see what this means, consider a pure function that computes the 100th prime number. The function takes no arguments and returns a prime number to the caller. If the function is called from 5 different parts of your program, the sharing semantics of lazy evaluation ensures that it is only computed once (if at all). Subsequent calls, simply use the previously computed value.

Now imagine you write a function that returns a fresh name. If GHC decides that the function does not need to be re-evaluated, in every context where the function is called the original computed value will be returned. This means that you will get only one name ever and this is not what you want. In other words adding fresh names breaks "referential transparency".

Hence writing code that uses fresh names, as defined here, without any direct support from the compiler, is inherently at risk from various compiler optimizations. Use it based on your judgement of things. (Don't use it for the the next missile guidance system even if you think you know what you are doing. )

module FreshName(Opaque, freshName) where
-- Generate FreshNames hack
-- Roshan James, Nov 2007
 

import Data.IORef
import Debug.Trace
import Foreign

data Opaque = Opaque Int
              deriving (Eq, Show)

freshCounter :: IORef Int
freshCounter = unsafePerformIO $ newIORef 0

freshName :: (Opaque -> a) -> a
freshName f = unsafePerformIO $ do c <- readIORef freshCounter
                                   writeIORef freshCounter (c + 1)
                                   return (f (Opaque c))

In the above module, Opaque is the type of a fresh name. The Opaque data type cannot be inspected by consumers of the module. The only operation that's defined on Opaque is the equality test i.e. Opaques can be compared to each other, but cannot be used for anything else. For convenience  I have defined a Show on Opaque as well for pretty printing, in the absence of which one would need to have a lookup table to correctly display Opaques.

In some senses freshness is a strange effect. It is an effect that breaks (programs change their results) in the presence of sharing or code motion. It is completely impervious to order-of-evaluation unlike state, control operations, jumps, exceptions etc - we can switch the order of reductions any way we like and the semantics of the program is preserved.

The general rule, to my best understanding, is this: If the type of a term exposes the type of the fresh name then the term cannot be freely shared. It is not a referentially transparent term. When I say the type exposes the Opaque type, I mean that some sub-term of the type is the Opaque type or is defined in terms of the Opaque type. Terms that do not expose the Opaque type, but potentially uses them internally, are referentially transparent. Maybe this insight can form a good intuitive guideline for writing code in GHC using fresh names. 

Here is an sample usage of the fresh names module. I define an alpha conversion function using it:

substitute :: Var -> Var -> Exp -> Exp
makeVariable :: Opaque -> Var

alphaConvert :: Exp -> Exp
alphaConvert (Lam x e)  = freshName $ \op ->
                         let x' = makeVariable op
                         in (Lam x' (substitute x x' e))

Drop me a note if you find this useful or if you have a better technique.

Saturday, November 03, 2007 1:18:43 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, strike) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview