Friday, April 16, 2004

In the past article Iterators in Ruby (Part - 1) I talked about the concept of iterators and how iterators are available in Ruby. In this part I will dwell on how iterators are used, so that the concept grows on you.

 

For someone used to the C/C++ world, the constructs provided by those languages suffice to express any idea of their choosing. While that is true, programming in a C-like language causes us to close our minds to other styles of programming and other constructs that might exist. Programming in C after a point is about writing the next big program optimized with lots of data-structure usage and trying to tie a new algorithm down into C. Sometimes the joy of programming, where the language lets you do your job - express ideas as code, is lost. Sometimes we spend our time servicing our language syntax and spoon-feeding our compilers. The fact that languages might actually evolve so that you can get on with your job, was alien for a long time to me.

 

If you have read the first part you might be wondering how iterators are used in Ruby. Admittedly, the idea would seem a little complex and maybe contrived to the uninitiated.

 

In Ruby, iterators are used pervasively. Its there all over the place and once you get started on Ruby, you will probably end up using an iterator without realizing that you are using one. The Ruby libraries are rich with iterators of various sorts.

 

Simple Loops

When you start of on Ruby code, you might see loops of the sort:

 

10.times {

      print “hello world”

}

 

This, as you might expect, prints ‘hello world’ 10 times.

 

How does this work? Ruby is a pure object oriented language. The number 10 is an integer, and the integer class exposes a method called ‘times’. The times method is an iterators that yields values from 0 to its value -1.

 

Since it yields values, can we catch them ? Yes.

 

10.times {|n|

      print n

}

 

And this prints all the values from 0 to n-1.
'times' is an iterator.

 

File Handling

Let’s look at some file handling in Ruby. The following code will open a file and read each line of the file and print the line among with its line number.

 

file = File.new(“filename.txt”)

c = 0
file.each_line {|line|

      c = c + 1

      print “#{c}: #{line}”

}

 

The code is simple. I open a file and create a file object. I ask the object to yield each line to me. As I get each line I print it out along with the line number. This is as logically expressive as I have seen in any language that I have used. All the mess stays out of your way and you get to focus on the job at hand.

 

The each_line is a method of the File class and it yields each line in the file. The variable ‘line’ will hold the value of each line. Slick?

 

(I you are wondering what “#{c}: #{line}” means – in a string #{ } is a substitution. You can write any expression into the curly braces. Here the values of c and line get substituted into the string)

 

Arrays / Collections

Similarly collection types expose an “each” method which yields every member of the collection. So if I had to iterate over an array I would write:

 

array = [1,2,3,4]

array.each {|m| puts m }

 

The above code creates an array of 4 elements and accesses each element using the iterator “each”.

 

In similar fashion, a lot of the Ruby library exposes functionality as iterators. So much so, that I rarely write for loops in Ruby.

 

Recursive Directory Enumeration

Now let us try and write code of our own. Something you may all have written is code that will find all the text files in a folder and is sub folders. The usual approach is to write a recursive function.

 

The function will try and remember a list of text files, in the current directory and the list of sub directories it has. It will then recursively call each of the subdirectories, each of which will do the same task. The problem is that if every time a text file is to be found, some processing is to be done, things get very complicated. The usual approach is to find all the text files and create a big list of filenames, which is then processed later.

 

Here is an approach with iterators. Try and implement this in your favorite language that does not have iterators and see how it looks.

 

def textfiles(dir)

        Dir.chdir(dir)

        Dir["*"].each do |entry|

                yield dir+"\\"+entry if /^.*\.txt$/ =~ entry

                if FileTest.directory?(entry)

                        textfiles(entry){|file| yield dir+"\\"+file}

                end

        end

        Dir.chdir("..")

end

 

textfiles(“c:\\”){|file|

        puts file

}

 

What the above code does is simple. I have defined a method called textfiles() that takes a directory name as a parameter.

 

The code looks exactly like you would explain it algorithmically.

  1. Go to the folder (chdir)
  2. Take a look at the contents (Dir[“*”])
  3. See is an entry is a text file, if so yield it (yield dir+"\\"+entry if /^.*\.txt$/ =~ entry)
  4. See is an entry is a directory, if so, recurse into it
    (
    if FileTest.directory?(entry)
       textfiles(entry){|file| yield dir+"\\"+file}
    end)

 

Simple?  Notice that the beauty of code is that the yield actually sends the value of a filename down a recursive hierarchy.

 

As a disclaimer, if you are using Ruby, then you might a well finish off in one line by saying:


Dir[“**/*.txt”].each{|file| puts file }

 

 

Friday, August 18, 2006 11:47:46 AM (Eastern Standard Time, UTC-05:00)
trackback doesn't appear to work.

http://royashbrook.blogspot.com/2006/08/ruby-filefolder-enumerator.html
Comments are closed.