Monday, January 17, 2005

I recently had some C# code that that had to be made localizable. Most articles about localization/internationalization that you find on the web would talk about how nice Visual Studio is for code internationalization and would show nice examples of how many ways the forms-designer would extract code out into a resx file. I am perfectly ok with studio doing all the work for you. However there are very often, strings in your actual code that studio does not externalize to resx files.

 

Strings.rb is a ruby script that will parse your C# code base and identify literal string definitions in the code base and will move them to your resx file. The code was hacked up to fill out a personal need so your mileage on this may vary. The tool certainly isn’t fool proof and there are certain cases that it doesn’t handle too well. If you are however on the smart-scripter side of things then you may find it useful.

 

The script needs to be setup for your specific project. Once done you can run it several times on your code base and it can incrementally catch strings and externalize them for you. This is handy to have while your code is still undergoing changes so new strings can be identified as they pop up and can be moved out.

 

Getting Started

 

Downloads

1) First thing download the script (strings.rb) and put it in your project folder.

 

2) Download and install ruby from here – http://rubyforge.org/frs/?group_id=167, its about 12mb and the installation happens in a snap.

 

3) Download an install REXML library for XML handling in Ruby from here –

http://www.germane-software.com/archives/rexml_3.1.2.zip

http://www.germane-software.com/software/rexml/docs/tutorial.html

 

 

Patching Strings.rb for your project

1) You need to patch the script file to have the correct path to your resx file and the path to your wrapper class that will be used to read strings from your resx file.

 

Open the script file in a text editor. (If you have ruby installed you should find this editor called scite in the ruby installation folder – that’s a nice editor. Alternately you might want to try installing scite - http://scintilla.sourceforge.net/SciTEDownload.html - about 600k).

 

In your project identify your resx file. It will usually be in Properties\Resources.resx.

Change the following line the rb file to reflect the path path to your resx file.

strings.rb:4:$resx_fn = "properties/Resources.resx"

(The actual line number might change a bit)

 

2) Now create a new class in your project called Strings. VS should typically create an empty class definition file that looks like this.

 

#region Using directives

 

using System;

using System.Collections.Generic;

using System.Text;

 

#endregion

 

namespace <Some Namespace>

{

    public class Strings

    {

 

 

    }

}

 

Patch the file with the following additions

- Add a using directive for your ‘Properties’ namespace.

- Add a comment that stays //start and one that says //stop. These ad as delimiters between with the script will generate the string definitions.

 

 

#region Using directives

 

using System;

using System.Collections.Generic;

using System.Text;

using <Some namespace>.Properties;

 

#endregion

 

namespace <Some Namespace>

{

    public class Strings

    {

 

//start

//stop

 

    }

}

 

3) This is the wrapper class into which the script will generate string definitions. You need to patch the script with the path to this class file. Basically patch this line –

strings.rb:5:$stringsclass_fn = "helper/Strings.cs"

 

Done

If you have got this far then your installation is done and you are ready to go.

For sake of completeness let me just list out things again –

1) download the script and put it into the project folder

2) install ruby

3) install the REXML library for Ruby

4) patch the script with the path to the resx file of the project

5) create a empty Strings class and add the namespace directive and comment markers to it

6) patch the script to have the correct path to your Strings.cs file.

 

What does the script do?

The script does a few basic things.

1) it parses your *.cs files in all subdirectories and looks for strings.

2) when it finds a string a it prompts the user for an action

3) if it is a string that should be localized the user can provide a pseudonym for the string. On getting this name the script will -

            1) add the string and the name to the resx file

            2) add a property to the Strings class that will read the string from the rex file

            3) replace the string literal in the code with a call to the property.

 

Running the script

To run the script after all the previous setup, simply go to the command line and type strings.rb

 

Here is a sample run of the Strings.rb script

Let me take up a simple project and show you how the internationalization script works.

 

Here is a project that has only one Program.cs file –

#region Using directives

 

using System;

using System.Collections.Generic;

using System.Text;

 

#endregion

 

namespace ConsoleApplication1

{

    class Program

    {

        static void Main(string[] args)

        {

            string a = "hello world";

            string x = "skip this line";

            string b = "escape sequences  \n\r\t\\\"";

            string c = @"cant handle this one";

        }

    }

}

 

The resx file looks like this –

<?xml version="1.0" encoding="utf-8"?>

<root>

  <resheader name="resmimetype">

    <value>text/microsoft-resx</value>

  </resheader>

  <resheader name="version">

    <value>2.0</value>

  </resheader>

  <resheader name="reader">

    <value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>

  </resheader>

  <resheader name="writer">

    <value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>

  </resheader>

</root>

(I have removed some unnecessary details from the original resx file here)

 

I created this Strings class –

#region Using directives

 

using System;

using System.Collections.Generic;

using System.Text;

using ConsoleApplication1.Properties;

 

#endregion

 

namespace ConsoleApplication1

{

    public class Strings

    {

 

//start

//stop

 

    }

}

 

This is what happens when you run the strings.rb script –

C:\work\vcsexpress\Sample1\Sample1>strings

Error reading skip data! continuing with no skip data.

HelloString = hello world

EscString = escape sequences  \n\n\t\\\"

Program.cs:0:n++#region Using directives

Program.cs:1:

Program.cs:2:using System;

Program.cs:3:using System.Collections.Generic;

Program.cs:4:using System.Text;

Program.cs:5:

Program.cs:6:#endregion

Program.cs:7:

Program.cs:8:namespace ConsoleApplication1

Program.cs:9:{

Program.cs:10:    class Program

Program.cs:11:    {

Program.cs:12:        static void Main(string[] args)

Program.cs:13:        {

Program.cs:14:            string a = "hello world";

"hello world">?

Help ----------

        =<name> = the string will be externalised as <name>

        sf = skip file : file will not processed on next run

        if = ignore file : file will be processed on next run

        sl = skip line : line will be processed on next run

        il = ignore line : line will be processed on next run (default)

        x, exit = exit script

        all skip information in stored in "skip_list.txt"

Program.cs:14:            string a = "hello world";

"hello world">=HelloString

            string a = Strings.HelloString;

Program.cs:15:            string x = "skip this line";

"skip this line">sl

Program.cs:16:            string b = "escape sequences  \n\r\t\\\"";

"escape sequences  \n\r\t\\\"">=EscString

            string b = Strings.EscString;

Program.cs:17:            string c = @"cant handle this one";

Program.cs:18:        }

Program.cs:19:    }

Program.cs:20:}

Writing Resource File "properties/Resources.resx" : done

Writing Strings class "Strings.cs" : done

Writing Skip data "skip_list.txt" : done

 

Effectively you can see the script run through the source file (actually it runs through all the cs files) and prompt you with each string. It also shows a little help on the actions possible.

 

To replace a string, you need to give it a name. Simply type =<name> and the string will get replaced.

 

If you don’t want to do anything about a particular line, type ‘sl’ for skip line and it will skip that line. It also adds the line to a file called skip_file.txt so that in subsequent runs of strings.rb it will not keep prompting you to patch the same line.

 

You can similarly choosing skip a file using the ‘sf’ option. You may typically want to skip the *.designer.cs files, the strings.cs file etc.

 

All skip information is human readable and is stored in a text file called skip_list.txt.

 

Strings.rb is deisgned to be run multiple times over the sample project through its development so that it can catch new strings as they appear in your code base, incrementally. The resx and strings.cs files are recreated at each run.

 

To show you the output of the process, this is what happened.

 

This is the new Program.cs file –

#region Using directives

 

using System;

using System.Collections.Generic;

using System.Text;

 

#endregion

 

namespace ConsoleApplication1

{

    class Program

    {

        static void Main(string[] args)

        {

            string a = Strings.HelloString;

            string x = "skip this line";

            string b = Strings.EscString;

            string c = @"cant handle this one";

        }

    }

}

 

This is the new resx file –

<?xml version="1.0"?>

<root>

  <resheader name="resmimetype">

    <value>text/microsoft-resx</value>

  </resheader>

  <resheader name="version">

    <value>2.0</value>

  </resheader>

  <resheader name="reader">

    <value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>

  </resheader>

  <resheader name="writer">

    <value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>

  </resheader>

  <data name="HelloString">

    <value xml:space="preserve">hello world</value>

  </data>

  <data name="EscString">

    <value xml:space="preserve">escape sequences 

 

       \"</value>

  </data>

</root>

 

Notice that the two strings have appeared here.

 

And this is the new Strings.cs file –

#region Using directives

 

using System;

using System.Collections.Generic;

using System.Text;

using ConsoleApplication1.Properties;

 

#endregion

 

namespace ConsoleApplication1

{

    public class Strings

    {

 

//start

              // "escape sequences  \n\r\t\\\""

              public static string EscString { get { return Resources.ResourceManager.GetString("EscString"); } }

 

              // "hello world"

              public static string HelloString { get { return Resources.ResourceManager.GetString("HelloString"); } }

 

//stop

 

    }

}

 

Also, if you are interested in seeing the skip data, this is the skip_list.txt that got created –

Program.cs:::string x = "skip this line";

 

Limitations

1) The string matching that is done by the script is fairly limited. Basically it identifies strings in the the c# code by comparing with the following regex –

strings.rb:15:$string_pattern = /[^@]("(\\.|[^\\"])*")/

This does not cleanly cover all sorts of escape sequences that a string can have. It also does not support @””. But .. well… this covers large number of strings that you would face, so its good enough to get along. Also if you can get me a better pattern match, I would be happy.

 

The script iterates over all strings on a line of cs code using –

      line.scan($string_pattern).each {|str,e1|

            //str is the string

      }

 

 

2) The resx file tags that are generated by script are those that are valid for Visual C# Express Edition Beta 1 format. I don’t know if this resx format is valid for other versions of studio. I would expect that it is. Even if it is not, you can easily patch it for you version of studio. This is how –

 

The resx file has a tag added for each string definition that looks like this –

  <data name="HelloString">

    <value xml:space="preserve">Hello world</value>

  </data>

 

If your studio generates tags like this, then you are ok. If you are not just patch the following block of ruby code to generate your tags. It’s fairly easy –

            el = doc.root.add_element "data"

            el.add_attribute("name", key)

            val = el.add_element("value")

            val.add_attribute("xml:space","preserve")

            val.text = remove_esc_seq($map[key])

This is part of the writeresx() function.

 

3) The escape sequence handling in the script is a hack – its funny – it’s limited. It’s actually a little sad:

def add_esc_seq(str)

       str.gsub("\\", "<double_back_slash>").gsub("\"", "\\\"").gsub("\n", "\\n").gsub("\t", "\\t").gsub("\r", "\\r").gsub("<double_back_slash>", '\\\\\\')

end

 

def remove_esc_seq(str)

       str.gsub("\\\\","<back_slash>").gsub("\\n", "\n").gsub("\\t", "\t").gsub("\\r", "\r").gsub("\\\"", "\"").gsub("<back_slash>","\\")

end

 

These are however good enough for \r \n \t \\ \” etc.

 

4) The resx XML doesn’t look too nice. It works however. This is because the REXML library produces badly formatted XML. You can download the XML Pretty Printing program on mine and run it on the output resx file for pretty XML formatting.

 

5) “The setup is a little contrived and all this requires me to know ruby programming “

If you actually said that then this script is not for you. For the simple reason that this is something home-grown and not meant to be a polished product in any way. You don’t need to know ruby much to just get it working. You need to know ruby only if you need to extend it in non-obvious ways. Secondly the setup isn’t that contrived if you have been using ruby. You would, most likely, have most of the tools in place already.

 

Finally, Why Ruby?

My only real answer to the question is that I wanted to get the job done. For an example take a look at the engine code and peaceful separation that it gives me from the prompt/ui code.  

 

That’s it. So if you are geeky enough and consider it below your dignity to get down to doing a menial job of looking through source files and copying out strings to the resx files – then this script might help you.

 

Download Strings.rb

 

Ps. It’s a lot of effort documenting any ruby program that is more that 200 lines. It just does too many things.

 

Monday, January 17, 2005 8:40:18 AM (Eastern Standard Time, UTC-05:00)  #    Comments [9]  | 
 Thursday, December 02, 2004

Here is another command line tool. Strangely, a couple of quick web searches could not come up with a command line tool for resizing images – so I wrote my own. If you have photographs from a digital camera that you want to mail out and the images are too large for email then most of the time it involves taking each image to some sort of image editing software and resizing them and such.

 

Image Manipulation Utility v0.1

(c) Roshan James, Dec 1 2004

 Img v0.1 is built on the .Net 2.0 GDI+ API and supports only creation

 of JPG image files. Exif/Iptc metadata are lost during convertions.

 

Syntax:

     imgmanip [/S] < filepattern> [additional patterns] < image size>

         /S               - recurse subdirs

         < filepattern>    - any wildcard combination

         < image size>     - format < Width>x< Height>, Ex: 800x600

 

(Don’t tell me it looks cheesy – I know it does – but it solves the problem)

A part of this source I found on the web, so appropriate mention is given to the original article.

 

Here are a few usage examples

 

> img *.jpg 800x600

File1.800x600.jpg

File2.800x600.jpg

This basically converts all jpf files to images of 800 * 600 resolution.

 

To recursively change

> img /S *.jpg 800x600

File1.800x600.jpg

File2.800x600.jpg

Simple?

 

If the original images have any metadata information then they are not retained in the new ones. What is this? Well most cameras insert information about the camera into the image file. You can also add your custom information like a title or description or comments to the image. To see this information (on a WinXP) simple right click the image file and take a look at the properties -> summary tab. Also if you tinker around with the column settings of explorer in detail view you can display some of this info directly in explorer.

 

I can think of a bunch of simple useful things to add – format conversions, cropping, borders, grayscale etc. Lets see…

 

The code is simple usage of .Net GDI+ API. The download exe is compiled to .Net 2.0 – but you can recompile from source to the version you want. For compilation run the following from a .Net SDK 2.0 command line –

>csc img.cs

 

Download

 

Speaking of image metadata, if you are a Ruby programmer, take a look at the exif library available. EXIF is a metadata tagging standard for image files.

http://raa.ruby-lang.org/list.rhtml?name=rexif

Thursday, December 02, 2004 1:14:33 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Saturday, June 19, 2004

A few days back I found what seemed to be a book about Ruby. This was being discussed on the Ruby mailing list. It’s called “A Little Ruby” or more precisely “A Little Ruby. A Lot of Objects”. You can find it here:

http://web.archive.org/web/20030618203059/visibleworkings.com/little-ruby/

(Someday it will be available here: http://www.visibleworkings.com/little-ruby/ )

 

Instead of writing the whole thing myself or copy paste it, I ask you to simply go read the book. That is my blog entry for the day.

 

The “Little Ruby” book is a conversation between two people where some sublime ideas about the design philosophy of the Ruby language are discussed. The book itself is a pleasure to read and more importantly, to think about. (It is an incomplete book, only 3 chapters – the author Brain Marick said on the Ruby list that he hopes to complete it sometime).

 

Reading “Little Ruby” put in a phrase in my thinking – “Model of Computation”, I don’t know if this sounds sober, but I think this is what I am really looking for.

In all my tinkering around languages, compilers, runtimes and other things – I am looking for a Model of Computation, a fundamental set of programmatic thought abstractions that are beautiful and can encompass various forms of programming.

 

The Little Ruby book talks about a model of computation where all computation is simply built around the idea of passing messages to objects. It is a simple concrete idea with which the rest of the Ruby world is built (apart of syntactic sugar). I don’t know if you are used to thinking in this way – but it is a powerful form of thought.

 

Let me quote from one of the conversation toward the end of the third chapter (the last chapter that is written so far):

 

“A language that provides lots of features

will always be missing that one feature you

need.”

 

“But a language that chooses the right

simple rules for you to combine lets you

build the features you need.”

 

This is the basic idea of composition – small integral units that compose to produce powerful behavioral entities. Have you ever thought why a unix command shell guy never really thought much of a Win/Dos user – because somewhere the way the shell forces you to thinking terms of composition of small do-one-thing-well tools and create powerful meta-tools, is a greater thought pattern.

 

You might have heard this being said about tools in the old unix culture (I say ‘old’ because I have different opinions of ‘unix’ culture as it is now)

 

"This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

--Doug McIlroy

 

The “Little Ruby” book is inspired by the old book “The Little LISPer”. Something that is now on my reading list – I can’t seem to get a copy of this anywhere. The present edition of the book is called “The Little Schemer”. The book is co written by Prof Daniel P Friedman of Indiana University and Prof Matthias Felleisen of Rice University. The Little Schemer discusses a different model of computation from what the “Little Ruby” describes.

 

I did not know this then, but sometime last year I was in email correspondence with Prof Friedman. That time, had I known that he is author of a respected LISP text book, I might have been frightened off the prospect of asking this -  but in one of the mails I had asked “why Lisp?”

 

Roshan,

 

The most fundamental building block of computation is composition. If the language does not support composition in a trivial way, then I have no use for it.  ML, Haskell, LISP, and Scheme each give a kind of composition.  Composition is the building block of Category Theory, which is a unifying tool that helps clarify much of mathematics. and logic.  So, thinking that it would be okay to use a language that does not support composition is impossible for me.

 

(I quote this here presently without his permission, I believe he would be ok though).

I didn’t understand him then. But now after a year, I think I am closer to understanding him.

 

What would a unified model of computation be? Can such a thing exist? Can we think of all computation using a set of minimal and powerful abstraction such that every other form of computation can be built out of it. Can this be one that is easy and fun to use that we could interact with this force on a day to day basis.

 

And what forms the underlying foundation for computation then might also form the underlying basis for other systems of organized thought as well. This is like the dream of Grand Unified Field Theory in physics. Can something like that exist in the computational systems as well?

 

I don’t know enough to guess. But however I believe that as long we keep pursuing computing in a way that is fun and simple, we are probably on the right track.

 

 

To end this entry I want to quote from the preface of the little ruby:

 

Welcome to my little book. In it, my goal is to teach you a way to think about computation, to show you how far you can take a simple idea: that all computation consists of sending messages to objects. Object-oriented programming is no longer unusual, but taking it to the extreme - making everything an object - is still supported by only a few programming languages.

 

Can I justify this book in practical terms? Will reading it make you a better programmer, even if you never use "call with current continuation" or indulge in "metaclass hackery"? I think it might, but perhaps only if you're the sort of person who would read this sort of book even if it had no practical value.

 

The real reason for reading this book is that the ideas in it are neat. There's an intellectual heritage here, a history of people building idea upon idea. It's an academic heritage, but not in the fussy sense. It's more a joyous heritage of tinkerers, of people buttonholing their friends and saying, "You know, if I take that and think about it like this, look what I can do!"

 

As a closing note, sometime last year I was looking to do research under someone working with the SSCLI code base and work on virtual machines and runtimes. I wanted to do my Masters.

 

At that time the best way I could describe what I wanted to do was to say that I was looking runtimes and virtual machines research with a specific interest in SSCLI. Now, maybe I can describe myself a little better.

 

The only way I could think of doing this that time was to ask around in online forums and mailing lists about universities doing work with Rotor. That accompanied by a barrage of mails to everyone who I thought might know, or point me in the right direction. One name that came up was of Prof Ralf Johnson of UIUC. Right now I was looking for Brian Marick (author of little ruby) on Google, Brian is research student doing his PhD under Prof. Johnson.

 

Saturday, June 19, 2004 2:50:25 AM (Eastern Standard Time, UTC-05:00)  #    Comments [2]  | 
 Wednesday, April 21, 2004

This is a Wish List for Ruby. Ruby is an excellent language, however here are some small things that I would like to see added to Ruby:

 

  • Threading
    I wish ruby had real threads. The threading support currently provided is really sad. If Rite could actually have OS threads as Ruby threads, like in the .Net framework it would be awesome, instead of doing them as interpreter threads. Write now doing any sort of meaningful multithreaded application in ruby is meaningless.

  • C/C++ style operators
    I wish ruby had ++, -- operators. They really do not contribute to unmanageable code and on the whole are nice things to have.
  • Use of Curly Braces { }
    I wish that Ruby would let the usage of curly braces to define blocks of code other than just parameter blocks that receive yield results. I would like to use {} to enclose methods, classes, if statements, loops etc.

    Write now code that is written like:

    def func(a)
       [1,2,3].each {|n|
          if(n % 2 == 0)
             print “This is even”
           else
             print “Odd”
             print “Multiple of 3” if (n%3==0)
           end
       }
    end


    being very C-ish in my ways I would really like it if I could avoid all those clumsy ends.

    def func(a) {
      [1,2,3].each {|n|
        if(n%2 == 0)
          print “This is even”
         else {
             print “Odd”
             print “Multiple of 3” if (n%3==0)
         }
      }
    }

     
    These days since the Python bug has bitten a bit, I am warming up to the idea of scope by indentation.

    def func(a)
      [1,2,3].each |n|
        if(n%2 == 0)
          print “This is even”
         else
             print “Odd”
             print “Multiple of 3” if (n%3==0)

    This actually looks quiet nice, but it may not be a good think to have because such code often tends to get messed up real bad when you copy paste it around and spoils the indentation.


  • Better Win32Ole libraries
    This is something that I must have. I use scripting to be able to talk to WMI (Windows Management Instrumentation).

    The libraries that Ruby ship for this is really sad. Very unstable. At the time of this writing the current Ruby distribution has removed the win32ole libraries from Ruby. I hope they will come back, stabler.

    The reason why Win32Ole is important to me is that it is the mechanism used to talk to WMI and WMI can let you some really awesome stuff.

    WMI Primer on MSDN

  • Auto Initialization of variables
    When I write code like this

    sum = 0
    10.times{|n| sum = sum + n }


    I wish I need not have to initialize ‘sum’. I wish there was some unambiguous way of saying that ‘I know sum hasn’t been defined before, so please use its initial value as ’. I would just like to be able to say 10.times {|n| sum = sum + n } and things should just work, assuming that sum gets initialized as 0. I wish there was some shorthand hand initializing a variable for its first appearance in an expression.

    Like I could probably replace:

    sum  = 0
    prod = 1
    10.times {|n| sum = sum + n; prod = prod * n }

    with

    10.times {|n| sum = sum<0> + n; prod = prod<1> + n; }

    or better if I had support for C++ style operators, I could write

    10.times{|n| sum<0> += n; prod<1> *= n; }


  •  Run on .Net
    I wish ruby could run on .Net. There are python variants that run on Java and now Python is coming up for .Net (
    IronPython). Imagine the power of having the flexibility of Ruby with the power and expanse of the .Net framework.

    Maybe more work needs to be done before this is possible.

  •  Currying of Methods and Partial Evaluations
    I wish I could have currying/partial evaluation possible for ruby methods.
    In many functional languages, functions are defined like this:

    - fn add x y = x + y
    > int -> int -> int

    Consider the ML like code above. The first line I have defines a function called ‘add’ that takes x and y and does x+y. The second line is what the interpreter echoes back to me about the function.

    It is simply is trying to say that the method consumes two integers and produces an integer. The two integers how ever are not used up at one go, rather, they are used up sequentially. First the integer x is taken and bound to the function and then the value y.

    By being able to do that, we can define other function instances of add that have one of the variables bound.

    - fn add10 = add 10
    - fn add5 = add 5
    - add10 2
    > 12
    - add5 2
    > 7

    This shows off some very powerful features of what currying can do. Here add10 and add5 are created as new functions, but with the value of x substituted as 10 and 5 respectively. Now we can treat add10 and add5 as proper functions that take only one parameter. 


    What these languages let us do is that we can apply a subset of the parameters of a function and created a curried or partially evaluated function instance. Such an instance can, if the runtime is optimizing enough, already do all the processes possible in the code upfront. Whenever the remaining parameters are supplied, it could just go on to complete the operations.  

Imagine that the method we were calling is this

fn mult x y = 10 * x * y

and then we wish to do


mult 10 2

mult 10 3

mult 10 4

 

These calls will now cause it to do 10 * 10 * 2, 10 * 10 * 3 and 10 * 10 * 4

However if we could partially evaluate a function we could say

 

fn mult10 = mult 10
mult10 2

mult10 3

mult10 4

 

When mult10 is created it is already evaluated to being “100 * y”. So, subsequent calls would cause it to do only 100 * 2, 100 * 3 and 100 * 4.

 

To add this sort of support to Ruby will have to bring large changes to the language. A simpler implementation would be to create a method object (yes that’s possible in Ruby) and also a hash of the partial list of parameters. The call itself could be formally executed only when all formal parameters are satisfied by the parameter hash table collection.

 

If you are still reading this might interest you:

http://www.svendtofte.com/code/curried_javascript/

 

Whew!

Well, that’s about it for now. But as you can see most of what I am asking for here are simple things and superficial changes. I would however really like to see the win32ole, threads and ++ operators in Ruby, even if none of the others work out.

 

Matz, (Yukihiro Matsumoto), the creator of Ruby is planning to introduce some significant changes to the language and more importantly going to get it running off a formal virtual machine that he is writing for Ruby called Rite.

Here are some of the plans for Rite and Ruby:

http://www.rubygarden.org/ruby?Rite

 

I found this on one of the websites, this is about how Matz wanted to work on Rite:

 

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/76588
|* will Rite be developed publicly.. Or will you keep it souce secret?

From my experience and observation, an open source software needs to

have running code before the ball rolling to success.  I think I need to work alone until the first running version.

 

|* still use Ruby license scheme?

It will be open source software for sure.  License terms may be

changed.

 

|* do you need help?  Say what we should do and we will do it :-)

This is very important.  Listen carefully.

 

From the reason I stated above, I feel like I will work alone.

But if someone shows his talent, and comes up with his own _good_

implementation of new Ruby earlier than me, and if he is willing to

contribute his code, and if he allows me to hack and chop his code to

make it "Rite", I will name it "Rite".  And he will be honored for ever.

Wednesday, April 21, 2004 4:21:08 AM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  | 
 Monday, April 19, 2004

Last night we did it again.

We went for this movie (50 First Dates) and came home feeling a little giddish. I was feeling a little giddish before the movie after nearly having my head ripped off sitting on a Torra Torra, in a fair in Bangalore.

 

So after the movie and the drive back home, what do we decide to do, like the nice normal people we are? We decide that we need to drink coffee at 12am and discuss programming. So we head off to Leela Palace where there is a late night Barista.

 

Something about the way coffee affects my head, when drunk late at night, especially after a movie needs some investigation. Sidharth was my comrade is arms, or rather comrade in coffee. So what do we do? we go there and sit down and drink coffee and I start off on SICP (Structure and Interpretation of Computer Programs) which I have been postponing for several years now.

 

I think part of why I was so adamant about starting out on SICP in the middle of the night is that I feel life (like usual) isn’t going anywhere. It turns out that a lot of smart people at various Universities decided that I was wasn’t smart enough to warrant a formal higher education in Computer Science and the place I want to be the most, doesn’t seem to want me around because of some technicality (for the fifth time). So since life wasn’t going anywhere, I figured I’d just have teach myself the things I want to know, my own way.

 

A little fast-forward in time and what finally ends up happening is that Sidharth and I end up talking about a certain MSDN article.

Implementing Coroutines for .NET by Wrapping the Unmanaged Fiber API

http://msdn.microsoft.com/msdnmag/issues/03/09/CoroutinesinNET/default.aspx

We ended up in a rather (heated) philosophic discussion about how iterators could be implemented, till 4am, which is what this blog entry is about.

 

If you have been reading about iterators in my previous blog entries

Iterators in Ruby (Part - 1)

Warming up to using Iterators (Part 2)

Then the idea is probably growing on you already. What Sidharth and I did is put in some thinking about how iterators could be implemented. This entry is going to break the logical flow of these two articles, but I am letting it be. I will probably have a part 3 post that will bridge the gap between Parts 1 and 2 and what I am going to say here about iterators.

Also, like a lot of things on this blog, I am not an authority on the subject so I am just guessing at how these things actually work.

 

 

Iterators

The thing about iterators is that there are two functions involved that have to maintain execution state at the same time. So example when a function calls another function, the caller is frozen and the callee executes – so the caller maintains execution state during the run time of the callee.

 

def callee

      yield 1

      yield 2

      yield 3

end

 

def caller

      callee { |n|

#parameter block to the iterator

puts n
}

end

 

When the callee is an iterator, the control actual leaves the callee and returns to the caller, when the execution is in the parameter block of the iterator. However we don’t see this sort of behavior in a normal C stack. Why? because when a function on the C stack returns to the caller, the function’s activation record on the stack is destroyed.

 

How do we do this?

The approach in the MSDN article uses an API called the fiber API.

 

Fiber Approach

The fibers can the thought of as threads that don’t have the scheduler attached to them.  So unless a fiber is explicitly passed control it will not be executed, unlike a thread which is invoked by scheduler for a time slice.

 

What Ajai Shankar (the author of the MSDN article) does is use fibers to represent iterators. So in the above snippet, the function callee() would actually execute on a different fiber from  caller. So when control needs to shift to the parameter block, which is to be executed in the caller() function, a fiber is a switch occurs.

 

When the parameter has finished execution a context switch occurs again.

 

What further happens is that the author has wrapped up all this dirty jumping around into a managed C++ class that invokes the OS api. He then goes onto write C# code (really!) that uses yield, almost the same way Ruby would use it.

 

(pasted)

class CorIter {

    public void Next() {

        object[] array = new object[] {1, 2, 3, 4};

        for(int ndx = 0; true; ++ndx)

            Yield(arr[ndx]);

    }

}

 

If you get the general idea, then lets move on.

 

The problems with using the fiber API, among other problems, are

·         Every fiber is like a thread, which means that the more the iterators the more the number of fiber specific stack frames and such that get created – which means  more the code bloat for code like this.

·         Using the fiber api actually makes this a very OS specific solution – other OSes that the CLR may wish to target may not have provisions for building up such an API.

·         Exceptions: exceptions in the windows world are strung to the TLS (Thread Local Storage) of the thread of execution – this may behave rather odd when fibers are mixed into the picture.

 

Let ignore everything and just examine the first problem, the issue of creating separate stack frames per fiber and thus bloating the system – if we could solve this one, then I think (and I might be wrong), would bring more credit to this approach.

 

Wrapping State in a Caller Object

One other approach to supporting iterators is to ensure that one of the two functions (the caller or the callee) maintain state using some mechanism other than the C stack.

 

Lets take a look at the caller:

 

def caller

      callee { |n|

            puts n

      }

end

 

or maybe a C# equivalent.

 

void caller()
{

      foreach(int n in callee())

      {

            Console.WriteLine(n);

      }

}

 

This method can actually be though off as consisting of three parts

 

void caller()
{

     

      foreach(int n in callee())

      {

           

            Console.WriteLine(n);

      }

     

}

 

We could create an object to hold the state of the function that would hold these three parts. Something like this:

 

class caller_object

{

      //declare all local variable so the class as member variables here

      void do_part1()

      {

}

 

void do_codeblock() //part 2

{
}

 

void do_part3()

{

}

}

 

The idea is that we create an object that has member variables that represent the local variable of the caller.  So we execute the caller as three parts

 

void caller()

{

      caller_object co = new caller_object()

      co.do_part1();

      callee(co);

      co.do_part3();

}

 

The caller method now is simply a wrapper around the class that represents the caller function as an object. When the method do_part1() is called on the class, the object will have the same state as the original caller() function when it has just run till the point where the iterator is invoked.

 

Then the callee() is invoked and the object that represents the caller’s state is passed to the callee. The callee then goes on to invoke the object’s do_codeblock() every time a yield is required.

 

Since the callee never returns till it has completed execution it maintains state on the runtime stack, like a normal function. The do_codeblock() has the same code that the code block of the for each loop had and it can also maintain any state changes into the object. Finally when the callee() exits the object’s do_part3() is invoked.

 

This is similar to what the iterators accomplish. Here the state is stored in an object and not on the stack. However, here a full managed type that represents that caller has to be created. I didn’t like that too much.

 

Wrapping State in a Callee Object

This is similar to the above approach, except that roles are reversed. We create an object that can represent the callee. The callee then returns to the caller at every yield statement.

 

The callee state is maintained in the object representing it. There is an excellent write up you can read about a similar approach here:

Coroutines in C

http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

 

The idea there is that the state of the function is retained in a state variable. The state variable is used to jump back to the point where the function had previously yielded from. Code would look a little like this:

 

(pasted)

int function(void) {

    static int i, state = 0;

    switch (state) {

        case 0: /* start of function */

        for (i = 0; i < 10; i++) {

            state = 1; /* so we will come back to "case 1" */

            return i;

            case 1: /* resume control straight after the return */

        }

    }

}

 

Now this example uses static variables but it is easy to imagine this being extended such that each variable is the member of some object.

 

(pasted)

It's a little bit ugly, because suddenly you have to use ctx->i as a loop counter where you would previously just have used i; virtually all your serious variables become elements of the coroutine context structure. But it removes the problems with re-entrancy, and still hasn't impacted the structure of the routine.

 

(Kudos to Pooja, for coming up with this idea at one sitting).

 

 

 

When C# announced the coming of iterators in the language and a new yield keyword, I was excited. In the mood of the MSN co-routines article, I had expected a CLR level support for iterators.

 

It turns out that the C# teams approach is similar to that of the saving the callee state in an object. (I am not very sure about whether its the caller or the callee, in case I am wrong in assuming that it’s the callee, which seems to be the more logical choice, I will blog about it).

 

In the Co-routines in C article, the author talks of writing macros that wraps up the behavior.  Since the compiler does the temporary object creation and hides all the mess from you, in the case of C#, it seems like a reasonable alternative.

 

 

A modified form of the Fiber API idea

The reason I don’t really like the way C# does iterators right now is because it is a hack. They did not want to change the CLR for a feature that may not catch on. So I guess, they used a less expensive approach. If I am wrong, I would like to be corrected. I would expect that more serious CLR level support will come up for iterators if the idea’s introduced in Whidbey C# become popular.

 

The other reason I don’t really like the approach, the real reason, is that the .Net type system is a fairly comprehensive type system designed to propagate an idea of types as a level playing field for language agnostic components to interact. Introducing a type into the system just to retain a function’s state does not seem consistent with this philosophy.

 

Fiber API on the other hand more naturally lend themselves to the way I would choose to think of iterators – as functions that can be frozen during execution and be continued.

 

Now this might seem like a weak argument, but it seems to better to use the processors abilities to do a context switch to actually freeze execution of a block of code, that write the code as code that manages members of an object (only for the purpose that the object can be used to retain the state of the code).

 

The Fiber API like approach seemed to do this more naturally. I would expect that the CLR in future would internally provide some API similar to that of the OS provided fibers so that it can do iterators and closures and probably even continuations.

 

Some basic requirements would be that implementing such features don’t slow down execution of code that don’t require any of these features. Such features should be reasonably efficient with respect time as well as space.

 

Let me try and discuss the space issues here. In fiber API there would be need for creating totally new independent stack frames for each fiber. This is wasteful.

 

Would it be possible so that we have a modified API, which will behave like fibers, share stack space with the common C stack and can use the processor context switching abilities to freeze function execution, rather than save state as a managed object.

 

A little bit of brainstorming last night and we had this:

 

In the .Net world, we have the luxury of being able to predict the stack usage of a function under execution with IL directives like “.maxstack”. Which is to say - we know how much space the function will use on the managed stack.

 

The stack frame for regular method calls would look like this:

  

 

This is obvious for anyone who understands how methods are laid out on the stack. The only advantage that we have here is that in the .Net world. We know exactly how much stack space a given method will use.

 

Now if the method calls an iterators that has a yield, we create a Fiber, but a special sort that would use the main stack itself as its stack frame. So the newly created method instance (the iterator itself) will reside on the call stack, above the caller.

 

 

Now the usual semantics of stack usage are allowed on this fiber. The fiber behaves like any other thread would behave, owning the stack. To allow methods to keep track of their callee’s we add a reference to the activation record of the callee.

 

  

 

The interesting part, when the iterator needs to yield a value. When it does control is switched back to the original fiber. The activation record of the iterator is still maintained on the stack. Further method calls would however place their activation records above the iterator’s activation and behave as though it was normal C stack.

 

 

Thus I think it is possible to have fiber API like constructs to implement iterators, share stack space have reasonably efficient implementations too. The only real over head introduced here is a level of indirection when activation records are torn down from the stack frame.

 

I feel that this is a more co-routine like approach that the one that involves creating hidden managed objects.

 

I would like to wish that this idea can be extended to implement proper continuations also, that is not very easy. Here the stack management is very easy because as any point a sleeping fiber will contain only one activation record on the stack. A continuation will require that activation objects live and die on the manage stack as though they were proper objects and some sort of garbage collection routine will be required on the stack.

 

I am extremely open to opinions about this entry, because I am treading on many areas that I am not very well versed with. I am hoping that the idea of freezing execution state via fiber like constructs is more efficient that the approach that involves creating full managed objects.

 

Monday, April 19, 2004 7:48:40 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Friday, April 16, 2004

In the past article Iterators in Ruby (Part - 1) I talked about the concept of iterators and how iterators are available in Ruby. In this part I will dwell on how iterators are used, so that the concept grows on you.

 

For someone used to the C/C++ world, the constructs provided by those languages suffice to express any idea of their choosing. While that is true, programming in a C-like language causes us to close our minds to other styles of programming and other constructs that might exist. Programming in C after a point is about writing the next big program optimized with lots of data-structure usage and trying to tie a new algorithm down into C. Sometimes the joy of programming, where the language lets you do your job - express ideas as code, is lost. Sometimes we spend our time servicing our language syntax and spoon-feeding our compilers. The fact that languages might actually evolve so that you can get on with your job, was alien for a long time to me.

 

If you have read the first part you might be wondering how iterators are used in Ruby. Admittedly, the idea would seem a little complex and maybe contrived to the uninitiated.

 

In Ruby, iterators are used pervasively. Its there all over the place and once you get started on Ruby, you will probably end up using an iterator without realizing that you are using one. The Ruby libraries are rich with iterators of various sorts.

 

Simple Loops

When you start of on Ruby code, you might see loops of the sort:

 

10.times {

      print “hello world”

}

 

This, as you might expect, prints ‘hello world’ 10 times.

 

How does this work? Ruby is a pure object oriented language. The number 10 is an integer, and the integer class exposes a method called ‘times’. The times method is an iterators that yields values from 0 to its value -1.

 

Since it yields values, can we catch them ? Yes.

 

10.times {|n|

      print n

}

 

And this prints all the values from 0 to n-1.
'times' is an iterator.

 

File Handling

Let’s look at some file handling in Ruby. The following code will open a file and read each line of the file and print the line among with its line number.

 

file = File.new(“filename.txt”)

c = 0
file.each_line {|line|

      c = c + 1

      print “#{c}: #{line}”

}

 

The code is simple. I open a file and create a file object. I ask the object to yield each line to me. As I get each line I print it out along with the line number. This is as logically expressive as I have seen in any language that I have used. All the mess stays out of your way and you get to focus on the job at hand.

 

The each_line is a method of the File class and it yields each line in the file. The variable ‘line’ will hold the value of each line. Slick?

 

(I you are wondering what “#{c}: #{line}” means – in a string #{ } is a substitution. You can write any expression into the curly braces. Here the values of c and line get substituted into the string)

 

Arrays / Collections

Similarly collection types expose an “each” method which yields every member of the collection. So if I had to iterate over an array I would write:

 

array = [1,2,3,4]

array.each {|m| puts m }

 

The above code creates an array of 4 elements and accesses each element using the iterator “each”.

 

In similar fashion, a lot of the Ruby library exposes functionality as iterators. So much so, that I rarely write for loops in Ruby.

 

Recursive Directory Enumeration

Now let us try and write code of our own. Something you may all have written is code that will find all the text files in a folder and is sub folders. The usual approach is to write a recursive function.

 

The function will try and remember a list of text files, in the current directory and the list of sub directories it has. It will then recursively call each of the subdirectories, each of which will do the same task. The problem is that if every time a text file is to be found, some processing is to be done, things get very complicated. The usual approach is to find all the text files and create a big list of filenames, which is then processed later.

 

Here is an approach with iterators. Try and implement this in your favorite language that does not have iterators and see how it looks.

 

def textfiles(dir)

        Dir.chdir(dir)

        Dir["*"].each do |entry|

                yield dir+"\\"+entry if /^.*\.txt$/ =~ entry

                if FileTest.directory?(entry)

                        textfiles(entry){|file| yield dir+"\\"+file}

                end

        end

        Dir.chdir("..")

end

 

textfiles(“c:\\”){|file|

        puts file

}

 

What the above code does is simple. I have defined a method called textfiles() that takes a directory name as a parameter.

 

The code looks exactly like you would explain it algorithmically.

  1. Go to the folder (chdir)
  2. Take a look at the contents (Dir[“*”])
  3. See is an entry is a text file, if so yield it (yield dir+"\\"+entry if /^.*\.txt$/ =~ entry)
  4. See is an entry is a directory, if so, recurse into it
    (
    if FileTest.directory?(entry)
       textfiles(entry){|file| yield dir+"\\"+file}
    end)

 

Simple?  Notice that the beauty of code is that the yield actually sends the value of a filename down a recursive hierarchy.

 

As a disclaimer, if you are using Ruby, then you might a well finish off in one line by saying:


Dir[“**/*.txt”].each{|file| puts file }

 

 

Friday, April 16, 2004 3:28:18 AM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  | 
 Thursday, April 15, 2004

One of the things that’s rather high on my mind’s sort order these days is Ruby programming. I have been thinking about what makes ruby so neat a language to use, simply by virtue of what the language lets me do.

 

One of the early things that got me hooked to Ruby, was its support for iterators. If you have been spoiled by many years of C programming, like me, then its time to wake up and take a look at a few things that C can’t pull off, at least not very easily.

 

What is an iterator?

Let’s take a look at code like this, where there is a piece of code that produces value and a piece of code that consumes values.

 

void produce()
{

for (int i=0;i<100;i++)
            if( i%5 == 0)
                  consumer(i);

}

 

void consumer(int v)
{

printf(“%d”,v);
}

 

All things considered this code is fine, except that the producer invokes the consumer. And simply because of that the consumer cannot maintain state. The only way the consumer can maintain state, ie remember something between two calls is to save variables into either static variables, or globals or into some object.

 

The would be the argument if the consumer function tried invoking the producer, where the producer will have to have a very contrived piece of code to remember variable values between calls.

 

From a perspective, an iterator solves exactly this problem. This is a ruby code:

 

def producer

      for i in 0..99

            if (i%5 == 0)

                  yield i

            end  

      end

end

 

def consumer

      producer() do |v|

            print v

      end

end

 

The ‘def’ keyword starts a function/method declaration. The code above for the producer should be rather easy to understand, except for the yield statement.

 

What does the yield do? The yield causes the function producer() to exit with the return value of the function as the parameter of the yield, in this case ‘i’.

 

The difference between yielding a value and actually doing a return is that the function can continue execution from the point of the yield statement.

 

The consumer function then simply invokes the producer() function and catches each of the yielded values. That is why these is a ‘do’ statement and a corresponding ‘end’ statement in the consumer code. The parameter for the do-end block is the ‘v’ that is enclosed in ||. Every time the producer yields a value, the value is available in ‘v’ and the do-end block is executed. When the block finishes the producer continues after the point of the yield.

 

So if you want to, say calculate the sum of all the values that the producer yields, then you can

 

def consumer

      sum = 0

      producer() do |v|

            sum = sum + v

      end

      print sum

end

 

 

Now that you have been introduced to the idea of iterators, I suggest you do some thinking about, especially if you have done a fair bit of C programming. Imagine how these functions would have to maintain state, what their call stacks will look like and such.

 

Now let me clean up on a few things. In ruby all functions are called methods, formally. So let’s start calling them methods. Secondly a lot of the Ruby libraries are built to support iterators so you will see the idea being used a lot. Thirdly, the do-end block can also be written as { } curly braces.

 

The methods that I have written have been written in a drawn out C-like style, so that the ideas are clear despite the slight difference in syntax. So lets just rewrite the two methods slightly more ruby-ishly and close this blog entry.

 

def producer

      100.times{|i| yield i if i%5 == 0}

end

 

def consumer

      sum = 0

      producer {|v| sum = sum + v}

      print sum

end

 

You can get Ruby from here, for your windows box:

http://rubyinstaller.sourceforge.net

Apr 13 2004 Tuesday 11-05AM

Thursday, April 15, 2004 1:31:50 AM (Eastern Standard Time, UTC-05:00)  #    Comments [3]  | 

I started out by saving my blog files on my hard disk. Here is a little Ruby hack that I have to help manage by blog entries on my local disk. Presently I just write blog entries as small word doc files and save the files with the title of the blog as the filename. Then I figured that it would be more useful if, for sorting purposes I could name the files also by date and time.

 

A naming like this would be convenient:
Apr 13 2004 Tuesday 01-18AM - First Blog Entry.doc

 

Now I would not want to type in a name like this by hand, I’d just like to save the file and then have some little proggie do the rename for me. Here is a ruby script to do just that.

 

dater.rb

$format = '%b %d %Y %A %I-%M%p '

$format_pattern =       /^\w+\s\d+\s200\d\s\w+\s\d\d-\d\d(A|P)M\s.*/

 

def rename file,count=1

      $filename = Time.now.strftime($format) + "- " + file

      $alternate_filename = Time.now.strftime($format) + "[#{count}] - " + file

      if count == 1

            File.rename(file,$filename)

      else

            File.rename(file,$alternate_filename)

      end

end

 

if ARGV.length == 1

      rename ARGV[0]

else

      #~ Lets try anbd find the file to rename

      count = 1

      Dir["*.doc"].each do |file|

            puts file

            #~ See if this file needs renaming

            unless $format_pattern =~ file

                  puts "Renaming : " + file

                  if count == 1

                        rename file

                  elsif count == 2

                        #~ More than one file, so shift to "[count]" syntax

                        File.rename($filename,$alternate_filename)

                        rename file,count

                  else

                        rename file,count

                  end

                  count = count + 1

            end

      end

end

 

This snippet does the following

-          if a filename is given as an argument it simply renames the file.

-          if no file name is given it does search and rename in the current folder.

It basically looks for all doc files and sees which ones seem to be already renamed appropriately and skips those. Then it renames any file that it finds by prefixing the date time.

 

As an addition it also sees if there are multiple files to be renamed, in which case it puts a special count prefix also which is enclosed in []. Now all I need to do is to double click on my ruby script every time I save a blog entry into my entries folder.

 

That’s just what I am going to do with this one. :)

 

There is a Peter and Gordon West singing Hundred Miles in the background. Now the music has changed to Iris – Goo Goo dolls. I am feeling wishful. I also think I need to sleep now.

 

And I don’t want the world to see me

‘cause I don’t think that they’d understand

When everything’s made to be broken

I just want you to know who I am

- Iris, Goo Goo dolls

Apr 13 2004 Tuesday 03-25AM

Thursday, April 15, 2004 1:30:56 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |