Wednesday, May 19, 2004

This is an entry I have kept pending for a long time. I should have had this out much earlier.

 

Here I am going to talk about why iterators are a hard feature to implement in a conventional stack based language. This will probably help you understand some of the design decisions with respect to implementing iterators in a language and, on the whole, make you a better programmer with respect to this programming construct.

 

I am assuming that you have read Parts 1 and 2

Iterators in Ruby (Part - 1)

Warming up to using Iterators (Part 2)

 

After you have read this entry you should be in better light to grasp Parts 4 and 5.

SICP, Fiber api and ITERATORS ! (Part 4)

Implementation of Iterators in C# 2.0 (Part 5)

 

 

Method Calls and the Stack Frame

Since you have seen some samples of iterator based code in Part 2 lets take a look at how an iterator works. Before we delve into the related issues, first let us take a look at how method calls work on the C stack. (We could have taken an stack based language here, for example .Net IL, but I figured it is easier to stick to C).

 

Consider the following functions

 

void caller()

{

        int a;

        int b;

        int sum = callee(a, b);

}

 

 

int callee(int a, int b)

{

        int temp = a + b;

        return temp;

}

 

That is amazingly simple. However what happens here? When I say what happens here, I mean to ask what happens here at the system stack level.  To do that lets take a look at the dissembly of these functions. If you were to compile this code with the vc++ compiler, you could use the following switch to generate assembly code.

>cl /Facode.cpp.asm code.cpp

 

?callee@@YAHHH@Z PROC NEAR                      ; callee

; File d:\roshanj\work\cpp\iter\func.cpp

; Line 2

      push  ebp

      mov   ebp, esp

      push  ecx

; Line 3

      mov   eax, DWORD PTR _a$[ebp]

      add   eax, DWORD PTR _b$[ebp]

      mov   DWORD PTR _temp$[ebp], eax

; Line 4

      mov   eax, DWORD PTR _temp$[ebp]

; Line 5

      mov   esp, ebp

      pop   ebp

      ret   0

?callee@@YAHHH@Z ENDP                           ; callee

 

?caller@@YAXXZ PROC NEAR                        ; caller

; Line 8

      push  ebp

      mov   ebp, esp

      sub   esp, 12                             ; 0000000cH

; Line 11

      mov   eax, DWORD PTR _b$[ebp]

      push  eax

      mov   ecx, DWORD PTR _a$[ebp]

      push  ecx

      call  ?callee@@YAHHH@Z              ; callee

      add   esp, 8

      mov   DWORD PTR _sum$[ebp], eax

; Line 12

      mov   esp, ebp

      pop   ebp

      ret   0

?caller@@YAXXZ ENDP

 

 

The stack frame of the caller function looks like this:

 

 

And the when the caller() is calling the callee() then both the methods have their stack-frames mounted.

 

 

In Intel based systems the C stack grows downward in memory, which means that each item that is added to the stack causes the top of the stack (SP) to have a lesser address value. So the right way to draw these diagram would have been to draw them upside down. But that detail is not relevant here and so I have depicted them as a conventional stack that grows upwards.

 

When the callee() returns, the stack looks like the initial stack diagram and the variable ‘sum’ contains its required value.

 

 

To reiterate, what happens is that a method that is currently running uses the stack for storing its local variables – or more generally the state of the running method is preserved on the stack. When a method is called, it builds its own stack frame on top of whatever was already on the stack. The called method uses the stack frame to save its state, irrespective of what other methods are already on the stack.

 

You might have heard of this concept called a stack-overflow. That happens when methods calls happen to such an extent that there is not more free space left on the stack for the stack frame of a new method to be created.

 

Whenever a method returns, the part of the stack that is used for its variables is freed up – or so to speak, the methods stack frame is torn down. So when a method returns, its state information, that was on the stack is completely lost. This is necessary, because the parent function or method, might go on to call other methods that go on to use same stack space subsequently.

 

So let us say there is a calling pattern like this

 

function a()

        return

       

function b()

        call a()

        return

       

function caller()

        call a()

        call b()

        return

 

The stack usage will look like this –

 

 

Now that we have a reasonable idea of how the stack is used across method calls (though in reality there are so many many approaches), lets move on to iterators.

 

Iterators

Look at the following ruby snippet that shows an iterator. If you have an interest in C/Cpp/C# and couldn’t care less about Ruby, don’t throw your hands up in the air – the language is used here as an example of a language that implements iterators (and rather well at that), which I am using to communicate the idea.

 

def callee()

        for i in 1..10

                yield i

        end

end

 

def process(v)

        return (v * 10)

end

       

       

def caller()

        callee() {|value|

                value = process(value)

                print value

        }

end

 

If you recall the behavior of the iterator from Part 1 and 2, you will remember that when caller() calls callee() and the callee() invokes the yield statement, the control is back at the caller(). In this case, the value of ‘i’ is yielded from callee() which is received in caller() as value.

 

In C code, when the control returns to the calling function the assumption is that the called function is dead on the stack and that the stack frame is free for subsequent method calls, as shown in the stack diagram above.

 

If we have a similar diagram for the this ruby code, how would we draw it?

 

 

This is where we hit our first wall.

 

Assume that the caller() calls callee() and the callee() has yielded value 1. Now the stack frame of callee() is torn down in the old C way and the process() method is called which goes on to use the same area of the stack that was one used by callee(). After that caller() has done whatever it needs to do with the value it received from callee() it tries to invoke callee() again so that it can get the next value. This is where we hit the wall. The callee() cannot return the next value simply because the previous value it has for ‘i’ is lost when its stack frame got torn down.

 

In other words, on a conventional C stack, the callee() state is lost and therefore cannot resume execution from the point of a yield.

 

What work arounds do we have to this issue?

Let us assume that what Ruby does is simply a compiler hack – let us assume that it never really tears down the stack frame of the callee() at all, but instead it ensures that function returns back to the caller() but with the callee() stack frame still in place. That would look like this

 

 

While this is a nice diagram to look at, what does this mean? Where will the stack frame of the subsequent call to process() go? It cannot over-write the callee() stack frame – so it has to go above it. Like this?

 

 

Woo, now wait a minute, how does the caller() function know how much of the stack the callee() function is using to be able to do this? This is a difficult question to answer. Remember that the callee() is a proper function that can be using a variable amount of the stack at any point of time. So the caller() cannot predict the stack usage of the callee() but lets assume that the callee() passes back the information of its stack usage back along with the value that it is yielding.

 

Ok, so that would solve the problem of how the process() function uses the stack. But there is one more issue. What is the code block inside the caller() (the one that receives the yielded value) needs to push a value onto the stack?

 

If the code block inside the caller, needs to push a value onto the stack, then the pushed value will go on to overwrite the stack-frame of the callee above it. The obvious way to fix the problem is to shift the entire stack pointer to the top of the stack, above the callee() also. You can visualize it like this:

 

 

While this could work, there is one more issue – the local member variables of a function are accessed via EBP offsets. Now this is fine when local variables occur at known distances from the base of the stack frame for the function. To see this happen, you might want to refer back to the assembly code I posted towards the beginning of this article. You notice that most of the member variables are being accessed as _a$[ebp] or _b$[ebp], or similar syntax. The _a$ here does nothing but add a fixed positive offset to the EBP pointer. For the access of local variables when code is in the yield’s parameter block region of the caller() these rules would have to change, as there is an issue of adding an additional offset. The additional offset is introduced because now there is the stack frame of the callee() sitting squarely in the middle of what should have been the stack frame of the caller().

 

These issues crop up when using single iterators. If using multiple iterators or nested iterators and when used in conjunction with stack intensive operations like recursion, the picture because very complicated.

 

Alternative approaches of State maintenance and the concept of Method instances

While it should be clear that, to implement iterators in a language the must support the idea of functions maintaining their state even after they surrender control to their calling methods – it is not very clear as to how this can be implemented on a conventional C stack.

 

One approach is to go for a ‘stackless’ implementation. What that means is that function activations or stack frames are treated as allocated memory blocks on the heap and each function instance (when you call a function it needs to create a stack frame and that can be considered a function or method instance) lives on the heap like any other dynamically allocated object. These mini stacks as specific to function instances and so have no real concept of colliding with each other due to sharing a larger OS/platform maintained stack.

 

I believe languages like lisp/scheme behave in this manner with respect to the language stack. (I have been told this and I hope this is correct).

 

 

Another alternative is to approach the problem like this. Assume that the callee used only static variables. Immediately, the issue of maintaining state of variable of the callee on the stack is eliminated. The only issue then, is to resume execution of the function from the correct place (immediately past the yield) when the function is called again. This can be easily achieved by saving the resume position into an additional variable and implementing a switch case goto construct at the beginning of the function.

 

Now since it is persisting state in static variables we cannot use this recursively or in any other fashion. But we could fix this if we created a structure/class that contains member variables corresponding to the local variables of the function and use an instance of this structure in the function instead of using proper static variables.

 

Languages such as C# and Python use a similar approach to maintaining state of iterators between calls. You can read more about this in the Part 5 of this series.

 

For constructs like iterators to be supported in .Net and similar runtimes, in a native way, will require substantial changes in the way the runtime manages stack-frames and similar entities. Basically languages and runtimes need to be retro-fitted with a concept of function and code block instances the same way object oriented programming is fitted with concepts of class instances called objects.

 

Languages like Sun Microsystems’ Self build on similar concepts for methods/function instantiation.

 

If you give some of the ideas here a little thought, you should be well on your way to dreaming about new languages and fancy programming constructs.

Wednesday, May 19, 2004 9:09:21 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Tuesday, May 18, 2004

 

Trashbin finally has patch.

 

Before trashbin was written I used to wonder a bit out this mythical entity called metadata that is often talked about in the .Net framework. Metadata was mentioned on almost every discourse on .Net and libraries like reflection libs were based squarely on it. Metadata formed the central pillar of design of the self describing type/component system that is called the .Net framework.

 

However, I could find almost nothing that showed me actual metadata, in its physical form. As a consequence I decided to try and understand what metadata was all about. Thus trashbin was written. Trashbin first saw light of day when announced in a Bangalore User Group post sent in the wee hours of morning.

 

From: spark

Subject: metadata viewer: trashbin v0.1, src+bin release

Date: Mon, 09 Jun 2003 13:28:37 -0700

-----------------------------------------------------------

 

hi group, 

i had been spending some of my odd freetime and sometimes lost sleep into exploring the .net exe/dll format and peeking into metadata glue.

well, the glue is rather interesting to look into and gives you a small insight into where information for things like the class loader and the reflection api get their data. trashbin is small viewer for metadata info that i am releasing with src. this is the first version anywhere and so is expected to be buggy - do mail. if internals interests you, take a look:

http://www.thinkingms.com/pensieve/homepage/work/trashbin/trashbin.htm

cheers :)

rosh

 

Since its release, trashbin has more or less worked fine for me and I have been using it for about a year now.
In essence, this is what trashbin does:

 

>trashbin

Spark (?)  Managed(.Net)/Native PE-COFF file viewer. Version 0.2

May 2003, contact: rosh@mvps.org

Last update: May 2004

 

usage: trashbin [options]

 

        portable executable info:

        /dos     display dos header

        /sig     display the file signature

        /coff    display coff header

        /pe      display pe/optional header

        /dd      display data directories in pe header

        /sec     display section headers

        /exp     display export table

        /imp     display import table

        /reloc   display relocation information

        /tls     display Thread Local Storage information

 

        managed info:

        /corhdr          display the common language runtime header

        /mdhdr           display metadata headers

        /md:Strings      display metadata stream #Strings

        /md:Blob         display metadata stream #Blob

        /md:US           display metadata stream #US (user strings)

        /md:GUID         display metadata stream #GUID

        /md:#~           display optimised metadata tables stream-header

        /mdtab           display optimised metadata tables

 

        other:

        /type    indicates the type of the PE file

        /csv     enable excel compatible, CSV output

 

        ps. The name trashbin is 'inspired' from dumpbin :)

 

Since most people who are reading this entry might be interested in what metadata is and what the PE file format is like, here goes:

 

The PE file format is Microsoft’s Portable Executable File Format. Essentially most exes and dlls that you will see on a windows system have this file format. Yes Exes and DLLs have the same format. The difference largely lies in the fact that a DLL file does not necessarily have an entry point defined. Here is a little about the Exe/Dll format:

 

Once upon a time, there used to be old DOS exes that came with what was the DOS exe header. Microsoft retained the DOS exe header in all subsequent exe formats so that the executables would be compatible across their operating systems. Which is why, you can run any windows or .Net exe on any Microsoft operating system (including dos) and see it run. Of course these programs would not do anything in the dos environment other than display a message saying that the exe would run under windows. The point however is that the exe did validly execute on a 15 or twenty year old system that was built for processors that did not have a concept of memory beyond 1Mb.

 

The DOS header is known for it special signature bytes MZ. Open any exe file and notice that the very first two bytes are MZ. MZ stands for Mark Zbikowski, the person who developed the DOS exe file format. Prior to that the executable format was called the Com file format. Those of you who have had the chance to work on DOS would not have forgotten what a pleasure some of those COM files used to be. The COM format belonged to the then popular CP/M operating system of Digital of the great Gary Kildall. Gary Kildall was pioneer in a way few people were.. anyway that is story for later.

 

The following is a dump of the initial few bytes of an exe file showing the then new MZ DOS header:

 

>hexv HelloWorld.exe

 

0000:0000│ 4D 5A 90 00 03 00 00 00 │ 04 00 00 00 FF FF 00 00 │ MZÉ▒♥▒▒▒│♦▒▒▒  ▒▒

0000:0010│ B8 00 00 00 00 00 00 00 │ 40 00 00 00 00 00 00 00 │ ╕▒▒▒▒▒▒▒│@▒▒▒▒▒▒▒

0000:0020│ 00 00 00 00 00 00 00 00 │ 00 00 00 00 00 00 00 00 │ ▒▒▒▒▒▒▒▒│▒▒▒▒▒▒▒▒

 

In a sense Mr Zbikowsky has the most popular initials on the planet. The DOS exe header itself is available as a structure that is defined in the winnt.h header file that is available on almost every windows based c/cpp dev environment.

 

Now the DOS exe header did not suffice to hold a lot of the new information that the exe had to present to the operating system, when windows came along. So new structures were introduced which were the akin to the old unix based Common Object File Format (COFF). There is plenty of literature available about this on the net.

 

The PE file is denoted by the signature bytes “PE  ". If you download trashbin the source code has some embedded urls that give you information about the PE file format itself. Those may prove valuable for your understanding of the actual exe file format.

 

Just to connect all that I have been talking about to trashbin and how you can actually examine an exe file with it, these are the relevant switches.

 

        /dos     display dos header

        /sig     display the file signature

        /coff    display coff header

        /pe      display pe/optional header

 

Now that we have covered that ground, lets move on. The PE file has a data structure called the Data Directory which is displayed through the  /dd option.

 

        /dd      display data directories in pe header

 

The data directory basically contains pointers to various data structures inside the PE file. The DD has 16 entries and a dump of the DD looks like this:

 

C:\WINNT\system32>trashbin tracert.exe /dd

_IMAGE_DATA_DIRECTORY

0       VirtualAddress = 0

        Size = 0

1       VirtualAddress = 0x19ac

        Size = 0x78

2       VirtualAddress = 0x3000

        Size = 0x11b8

3       VirtualAddress = 0

        Size = 0

4       VirtualAddress = 0

        Size = 0

5       VirtualAddress = 0

        Size = 0

6       VirtualAddress = 0x1090

        Size = 0x1c

7       VirtualAddress = 0

        Size = 0

8       VirtualAddress = 0

        Size = 0

9       VirtualAddress = 0

        Size = 0

10      VirtualAddress = 0

        Size = 0

11      VirtualAddress = 0x240

        Size = 0x7c

12      VirtualAddress = 0x1000

        Size = 0x88

13      VirtualAddress = 0

        Size = 0

14      VirtualAddress = 0

        Size = 0

15      VirtualAddress = 0

        Size = 0

 

 

This is trashbin examining tracert program that comes with windows. The tracert program is a native exe (not a .net exe). Entries in the DD have a predefined meaning, these are defined in winnt.h as follows:

 

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory

#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory

#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory

#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory

#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory

#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table

#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory

//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)

#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data

#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP

#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory

#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory

#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers

#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table

#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors

#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

 

You can compare these against what is present in the tracert program dump.

Now lets do a DD dump of a .net exe

 

>trashbin HelloWorld.exe /dd

_IMAGE_DATA_DIRECTORY

0       VirtualAddress = 0

        Size = 0

1       VirtualAddress = 0x2370

        Size = 0x4b

2       VirtualAddress = 0x4000

        Size = 0x340

3       VirtualAddress = 0

        Size = 0

4       VirtualAddress = 0

        Size = 0

5       VirtualAddress = 0x6000

        Size = 0xc

6       VirtualAddress = 0

        Size = 0

7       VirtualAddress = 0

        Size = 0

8       VirtualAddress = 0

        Size = 0

9       VirtualAddress = 0

        Size = 0

10      VirtualAddress = 0

        Size = 0

11      VirtualAddress = 0

        Size = 0

12      VirtualAddress = 0x2000

        Size = 0x8

13      VirtualAddress = 0

        Size = 0

14      VirtualAddress = 0x2008

        Size = 0x48

15      VirtualAddress = 0

        Size = 0

 

The interesting thing to note is that in a managed exe, the 15th entry is non zero. The 14th entry point to the Common Runtime Header or the CorHdr. The CorHdr structure is defined in corhdr.h.

 

The CorHdr is where (so to speak) the managed world starts. So .Net exe files are regular PE files which have all the .Net specific content in a particular offset in the exe file. The idea was that .Net was designed to be platform neutral. So Microsoft could assume that the facilities provided by the exe file format would be available in file formats of other operating systems where .Net would one day run. For this sake .net specific content assumes that the whatever the parent file format that is enclosing it, the .Net specific content can be made laid out in a large well defined binary blob.

 

A description of the CorHdr, Metadata layout and other intricacies of the underlying system can be found described (in varying detail) in the ECMA specification of the Common Language Infrastructure. Partition 2 that describes metadata is the one that you would want to look it, in this regard.

http://msdn.microsoft.com/net/ecma/

 

Another interesting resource for the technical mind is Serge Lidin’s book “Inside the .Net IL Assembler” . The book is available as an Indian India edition also, so it is affordable.

 

Trashbin lets you view this part of the managed exe file.

These are the relevant switches:

 

        /corhdr          display the common language runtime header

        /mdhdr           display metadata headers

        /md:Strings      display metadata stream #Strings

        /md:Blob         display metadata stream #Blob

        /md:US           display metadata stream #US (user strings)

        /md:GUID         display metadata stream #GUID

        /md:#~           display optimised metadata tables stream-header

        /mdtab           display optimised metadata tables

 

Trashbin is not a dissembler – it simply lets you view the other data that is present in the exe/dll file. I figured I don’t need to do a dissembler because ILdasm and several other tools do the job so well. ILdasm btw is written by Serge Lidin.

 

The metadata in the managed file is divided into a set of streams. A stream is like an area of memory reserved for content of a specific type. These are namely –

#Strings

#US

#Blob

#GUID

#~ and #-

 

The #Strings stream keeps all the strings in the application that include things like names of classes, methods, parameters, namespaces, assemblies etc. So this stream basically contains all kind of strings that are part of the source code itself – these are where the names that are used by the reflection library are available.

 

The #US stream is the one for user defined strings. So when you say Console.WriteLine(“Hello World”) in your program, the “Hello World” goes into #US and the Console.WriteLine goes into #Strings. The strings present in the #US set are Unicode strings – so each character is two bytes.

 

Here are some nice and friendly hex dumps of these regions from a exe file followed by the corresponding information being ripped by trashbin:

 

#Strings

 

0000:0460│ 00 00 00 00 00 00 00 00 │ 00 3C 4D 6F 64 75 6C 65 │ ▒▒▒▒▒▒▒▒│▒

0000:0470│ 3E 00 48 65 6C 6C 6F 57 │ 6F 72 6C 64 2E 65 78 65 │ >▒HelloW│orld.exe

0000:0480│ 00 6D 73 63 6F 72 6C 69 │ 62 00 53 79 73 74 65 6D │ ▒mscorli│b▒System

0000:0490│ 00 4F 62 6A 65 63 74 00 │ 43 61 6C 63 00 48 65 6C │ ▒Object▒│Calc▒Hel

0000:04A0│ 6C 6F 57 6F 72 6C 64 53 │ 61 6D 70 6C 65 00 48 65 │ loWorldS│ample▒He

0000:04B0│ 6C 6C 6F 57 6F 72 6C 64 │ 00 41 64 64 00 2E 63 74 │ lloWorld│▒Add▒.ct

0000:04C0│ 6F 72 00 4D 61 69 6E 00 │ 53 79 73 74 65 6D 2E 44 │ or▒Main▒│System.D

0000:04D0│ 69 61 67 6E 6F 73 74 69 │ 63 73 00 44 65 62 75 67 │ iagnosti│cs▒Debug

0000:04E0│ 67 61 62 6C 65 41 74 74 │ 72 69 62 75 74 65 00 61 │ gableAtt│ribute▒a

0000:04F0│ 00 62 00 61 72 67 73 00 │ 43 6F 6E 73 6F 6C 65 00 │ ▒b▒args▒│Console▒

0000:0500│ 57 72 69 74 65 4C 69 6E │ 65 00 45 78 63 65 70 74 │ WriteLin│e▒Except

0000:0510│ 69 6F 6E 00 67 65 74 5F │ 4D 65 73 73 61 67 65 00 │ ion▒get_│Message▒

 

>trashbin HelloWorld.exe /md:Strings

METADATA STREAM #Strings

        Offset : "String"

        0x1    : ""

        0xA    : "HelloWorld.exe"

        0x19   : "mscorlib"

        0x22   : "System"

        0x29   : "Object"

        0x30   : "Calc"

        0x35   : "HelloWorldSample"

        0x46   : "HelloWorld"

        0x51   : "Add"

        0x55   : ".ctor"

        0x5B   : "Main"

        0x60   : "System.Diagnostics"

        0x73   : "DebuggableAttribute"

        0x87   : "a"

        0x89   : "b"

        0x8B   : "args"

        0x90   : "Console"

        0x98   : "WriteLine"

        0xA2   : "Exception"

        0xAC   : "get_Message"

 

#US

 

0000:04B0│ 65 00 00 00 00 17 48 00 │ 65 00 6C 00 6C 00 6F 00 │ e▒▒▒▒↨H▒│e▒l▒l▒o▒

0000:04C0│ 20 00 57 00 6F 00 72 00 │ 6C 00 64 00 00 00 00 00 │  ▒W▒o▒r▒│l▒d▒▒▒▒▒

 

>trashbin test.exe /md:US

METADATA STREAM #US

0x1, (23 bytes)

    Txt: H.e.l.l.o...W.o.r.l.d..

    Hex: 48 00 65 00 6c 00 6c 00 6f 00 20 00 57 00 6f 00 72 00 6c 00 64 00 00

 

I will just skip over the #GUID and #Blob streams for now. The #~ stream is the interesting one that is the stream that actually contains the metadata tables. These is an alternate stream thaty can be present which is the #- stream. The #- stream again contains metadata tables but these are called the un-optimized tables because certain sort orders are not maintained in these tables. The Microsoft compiles always emit optimized tables and since I have not been using any other compilers (Mono too seems to emit optimized tables) I don’t have support for #- in trashbin.

 

Lets focus on #~. The #~ is the real metadata, if you would like to think of it that way, It is actually a small relational database that is compressed down to the last bit. There are a large number of tables (with predefined schemas) that can occur here. These tables provide information about the exe or dll (or precisely the assembly) that they are trying to describe.

 

These tables cross reference each other as well as reference entries in the other streams, such as the #GUID and #Strings streams. The trashbin option /md:#~ gives you the header of #~ stream, which is a kind of summary view of what the stream contains:

 

>trashbin HelloWorld.exe /md:#~

METADATA STREAM #~

        TABLES HEADER

                MajorVersion = 1

                MinorVersion = 0

                HeapSizes = 0

                        #String Index = 2 bytes wide

                        #GUID Index = 2 bytes wide

                        #Blob Index = 2 bytes wide

                Valid  = 0x00000900021547

                Sorted = 0x0002003301fa00

 

        METADATA Tables

                RID.               TableName    [No of Rows]

                0.                    Module    [1]     Row=10 bytes

                1.                   TypeRef    [4]     Row=6 bytes

                2.                   TypeDef    [3]     Row=14 bytes

                6.                    Method    [4]     Row=14 bytes

                8.                     Param    [3]     Row=6 bytes

                10.                MemberRef    [5]     Row=6 bytes

                12.          CustomAttribute    [1]     Row=6 bytes

                17.            StandAloneSig    [2]     Row=2 bytes

                32.                 Assembly    [1]     Row=22 bytes

                35.              AssemblyRef    [1]     Row=20 bytes

        Table Count = 10

 

The above listing says that helloworld.exe contains 10 tables in its metadata headers.

To give you a taste of what I am talking about lets look at some of these tables:

 

To look at the metadata tables, use the option

 

        /mdtab           display optimised metadata tables

 

This is the table called TypeDef that has a listing of all the types defined in this assembly.

 

[RID=2] Table TypeDef

     [  DATA]Flags    [STRING]Name     [STRING]Namespace [CI:64 ]Extends  [RID: 4]FieldList [RID: 6]MethodList

   1.[  0x  ]0        [  0x  ]1        [  0x  ]0         [RID: 2] 0       [RID: 4] 1        [RID: 6] 1        

   2.[  0x  ]100001   [  0x  ]30       [  0x  ]35        [RID: 1] 1       [RID: 4] 1        [RID: 6] 1        

   3.[  0x  ]100001   [  0x  ]46       [  0x  ]35        [RID: 1] 1       [RID: 4] 1        [RID: 6] 3        

 

The table has a field called name – which is the name of the type. The value of the names field are actually offsets into the #Strings stream. So if you care to compare (I have provided a listing of the #Strings stream earlier), what the table is saying is that this assembly has 3 types which are namely , Calc and HelloWorld. At this point let me drop the source code of the HelloWorld.cs file from which this assembly was compiled:

 

//HelloWorld.cs

namespace HelloWorldSample

{

        using System;

       

        public class Calc

        {

                public int Add(int a, int b)

                {

                        return a+b;

                }

        }

       

        public class HelloWorld

        {

                public static void Main(string[] args)

                {

                        try

                        {

                                Calc c = new Calc();

                                Console.WriteLine(c.Add(10,20));

                        }

                        catch(Exception e)

                        {

                                Console.WriteLine(e.Message);

                        }

                }

        }

}

 

As you can see the source does define two classes called Calc and HelloWorld. A type in .Net terms is a very strict entity that encompasses  value types and reference types and type entities like classes, structures, enums etc.

 

Here is a table that shows me what types are being referenced in this assembly:

 

[RID=1] Table TypeRef

     [CI:75 ]ResolutionScope [STRING]Name     [STRING]Namespace

   1.[RID:35] 1              [  0x  ]29       [  0x  ]22       

   2.[RID:35] 1              [  0x  ]73       [  0x  ]60       

   3.[RID:35] 1              [  0x  ]90       [  0x  ]22       

   4.[RID:35] 1              [  0x  ]A2       [  0x  ]22       

 

The types are Console, Object, Exception etc

 

Here is a list of all the methods that are defined within this assembly;

 

[RID=6] Table Method

     [  DATA]RVA      [  DATA]ImplFlags [  DATA]Flags    [STRING]Name     [  BLOB]Signature [RID: 8]ParamList

   1.[  0x  ]2050     [  0x  ]0         [  0x  ]86       [  0x  ]51       [  0x  ]A         [RID: 8] 1       

   2.[  0x  ]2064     [  0x  ]0         [  0x  ]1886     [  0x  ]55       [  0x  ]10        [RID: 8] 3       

   3.[  0x  ]2078     [  0x  ]0         [  0x  ]96       [  0x  ]5B       [  0x  ]14        [RID: 8] 3       

   4.[  0x  ]20C8     [  0x  ]0         [  0x  ]1886     [  0x  ]55       [  0x  ]10        [RID: 8] 4       

 

You can see the method name add, Main etc here, as well as auto-generated methods such as .ctor (the constructor). 

 

In short if you examine the metadata headers carefully you get an idea about the extent of meta information being stored about your program and what can be supported by any reflection based API. A complete listing of the metadata tables are provided in the ECMA specs, so that’s your best reference, Mr Lidin’s book also does a good job of this and I used both of these while I was writing trashbin.

 

So there – I have said what I wanted to say about trashbin. This might be a good starting point to explore the exe and dll file formats.

 

Before I stop, I must add, do take a look at the switches

        /exp     display export table

        /imp     display import table

 

these show the methods that are exported from an exe/dll (so that it maybe dynamically loaded and invoked from another process) and shows what method calls it invokes. This is kind of the ‘metadata’ of the old unmanaged world.

 

Typing trashbin ntdll.dll /exp in the system32 folder will give you a listing of the native NT api, which might make for good reading again (if that sort of stuff gets your interest).

 

Most of the content of this blog entry was presented at the Bangalore .Net User group as one of the UG meetings talks.

 

 

 

All that said, here is the real reason I started out on this blog entry: Recently Kaushik Srenevasan pointed out a bug in trashbin’s #Strings parsing routine. Kaushik is a ‘MS Student Ambassador’ and has a blog here: http://dotnetjunkies.com/WebLog/kaushik/

 

I had, for some reason, assumed the presence of padding 0s when I had written the original code. That more or less always worked, because the #Strings stream was expected to be padded with zeros till is a 4 byte boundary.

 

Its has been patched according to Kaushik’s suggestion, so Thank you Kaushik.

 

 

 

Trashbin can be downloaded from its homepage here:

http://www.thinkingms.com/pensieve/homepage/work/trashbin/trashbin.htm

 

 

As foot note I must say that the Microsoft Dumpbin program that ships with Visual Studio does most of the things that trashbin does and a few additional things. Dumpbin however does not display metadata information from managed PE files yet.

 

 

Tuesday, May 18, 2004 7:20:34 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Thursday, May 13, 2004

It’s good to get back to blogging after what seems like a long break. I have been rather busy of late. Life or the lack of one, has been blowing out of proportion to take up the remaining time. I thought I’d write about something different this time than my usual languages hoopla.

 

After much reluctance on my part I have joined the personalized wireless world and have succumbed to adding a tracking device to myself. Now I can be tracked examined and demanded and terrorized. I have a mobile phone (yes people, I haven’t had one before and this is my first). Such are the pleasures of life.

 

 

 

The Samsung C100

Of course buying a Samsung C100 these days, might be considered incredibly caveman-like in certain circles and worlds, but I still belong to an utterly insignificant little blue-green planet (some of) whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.

 

The Samsung C100 is lovely phone and for its price (which was only a 6.6k for me) is a rather good deal. There are however some things I don’t like too much about the C100. I have been with it only for a short time, so a there’s lot more I need to know. This is what I know of right now – I might be wrong, in which case, expect a comment below in due time. Also, drop me a comment if you know better.

 

Things I Don’t Like

·         There is no MMS support

·         The battery seem to take a long time to charge (fully from near empty) its been plugged in for 2+ hours now. Aaah it’s done.

·         I can’t seem to find an option to add words to the T9 dictionary! I don’t think there is such an option. That’s pretty bad.

·         Cannot find a way to mass transfer content from Phone memory to SIM and vice versa.

·         Since the C100 has only an IrDa port I need a (30+mb) sized software from Samsung to work with it. I am yet to understand what sort of cable can work here.

·         The battery doesn’t last too long – a little over day. I don’t actually blame the phone or the battery too much for this. The phone is very feature ring and has a real good display and sound qualities, both of which I think eat up the battery real fast. To add to that I just cant seem to stop tinkering with it – so that’s where the battery drains out.

 

What’s actually bothering me is that Samsung has stopped making this model (which is what I heard from the dealer) and has moved on to a similar but more expensive X100 phone. Maybe that could mean that the issues are fixed only in the X100 and they are left open issues for the C100.

 

I really wonder what sort of file system structure this device has and if I can programmatically interface to it somehow. There seem to be lots of questions about how to get things working on this phone on sites like this one:

Samsung SGH-C100

http://www.techtree.com/techtree/jsp/showstory.jsp?storyid=3781

 

 

 

WML anyone?

That said, let me come to why I am actually writing this. I was fairly excited about the fact that I can browse on this phone. The phone has a 65k color screen with a 128*128 resolution and a good pixel density making for good viewing.

 

The Hutch corporate connection I am on charges 100 bucks per month for unlimited GPRS access, which seemed pretty neat. The only caveat was that downloads maybe charged according to the web site.

 

I didn’t make much of the ‘downloads can be charged’ stuff and visited hutchworld – which the Hutch GPRS homepage. Hutch world has a collection of goodies, wallpapers, ringtones, games etc. the games were priced at 50 bucks each so that didn’t seem very nice, especially considering that I would be charged whether the  game worked our not.

 

The wallpapers I found interesting. There seemed to be no price mentioned. So I go about trying various wallpapers. Some fit my screen; some were too large and so on. The fun continued till I had downloaded about two dozen of these (after much patience because this is a real slow connection) when I scrolled down till the bottom of the page.

 

There at the very bottom was strategically placed link that says ‘cost’. The cost link said – all wallpapers shall be charged at rupees 10 each. What ????

 

I would have never touched those things by a pole at that price ok. So now I easily owed Hutch about 200 bucks or more for absolutely nothing useful that I can think of. Come to thing of it, these are small 128*128 size (approx) images which are hardly a few kb and most look for corny. Why would anyone want to pay 10 bucks for that?

 

So here is the moral of the story – don’t go about downloading ‘wallpapers’ from the hutch network. Or as I was about to learn – don’t go about downloading content from any website – you’ve got to look very patiently before you notice someplace where they say how much you are being charged.

 

But why were these websites being charging so much for almost meaningless content? Was it such a big deal to be able to dish out content over a GPRS network?

 

I wanted to take a jab at it. Since I had heard of MMIT (the Mobile Internet Toolkit) for ASP.net I figured that it would be rather simple to do my own site that dishes out WML content. But unfortunately I didn’t have any site that hosted MMIT at my disposal so I was stuck.

 

So in infinite wisdom I decided to try doing WAP/WML in plain ASP.net. It was easily done. One of the time consuming parts was in figuring out that I has to set the Content-type HTTP header in the response to be text/vnd.wap.wml. This little fact I did not find documented anywhere.

 

I wrote a single aspx page that contained only C# code that would generate WML content.

I am pasting the code here, because some of you may find it useful and might want to host the code on your own servers. This page can be pointed at one of the one of the folders on your web-server where you want to provide content that can be accessed via your WAP enabled phone. The page lets you view contents on the folder as well as browse through any subfolders.  Similar to the explorer lets you browse folders.

 

(a space has been added after every angular bracket in the code below deliberately – because this blog engine has some issues with tags being displayed).

 

< %@ Page Language="C#" Debug="true" %>

< %

          string wml = @"< ?xml version=""1.0""?> < !DOCTYPE wml PUBLIC ""-//WAPFORUM//DTD WML 1.1//EN"" ""http://www.wapforum.org/DTD/wml_1.1.xml"">< wml>< card title=""File Browser"">{0}< /card>< /wml>";

        string up_content = @"< p>< a href=""list.aspx?path={0}"">up< /a>< /p>";

        string dir_content = @"< p>< b>Subfolders< /b>< /p>< p>< i>{0}< /i>< /p>";

          string gif_content = @"< p>< b>Available GIFs< /b>< /p>< p>< i>{0}< /i>< /p>";

          string entry = @"< a href=""{0}"">{1}< /a>< br/>";

          string content="";

         

        string server_path = "content/w/";

        string sub_path = Request.QueryString["path"];

        string new_path = server_path;

        bool up = false;

       

        if(!(sub_path == null || sub_path == ""))

        {

                new_path = server_path + sub_path;

                up = true;

                sub_path="";

        }

 

        //////////////////////////////////////////

        //This builds the GIF specific content

        content = "";

          string[] files = System.IO.Directory.GetFileSystemEntries(Server.MapPath(new_path),"*.gif");

        if(files.Length != 0)

        {

                foreach(string file in files)

                        content += String.Format(entry,

                                new_path+System.IO.Path.GetFileName(file),

                                System.IO.Path.GetFileNameWithoutExtension(file));

                gif_content = String.Format(gif_content, content);

        }

        else

                gif_content = @"< p>No GIFs< /p>";

 

        //////////////////////////////////////////

        //This builds the DIR specific content

        content = "";

          files = System.IO.Directory.GetDirectories(Server.MapPath(new_path));

        if(files.Length != 0)

        {

                foreach(string file in files)

                        content += String.Format(entry,

                                "list.aspx?path="+sub_path+System.IO.Path.GetFileName(file)+"/",

                                System.IO.Path.GetFileName(file));

                dir_content = String.Format(dir_content, content);

        }

        else

                dir_content = "";

       

        ///////////////////////////////////////////////////

        //Up the Dir tree link

        if(up)

        {

                int prev = 0;

                char[] arr = sub_path.ToCharArray();

                for(int i =0;i

                        if(arr[i] == '/')

                                prev=i+1;

                string path = sub_path.Substring(0,prev);

                if (path.Length == 0)

                        up_content = @"< p>< a href=""list.aspx"">up< /a>< /p>";

                else

                        up_content = String.Format(up_content,path);

        }

        else

                up_content = "";

        

        //////////////////////////////////////////

        //Summarise

          string wml_content = up_content + dir_content + gif_content;

          wml = String.Format(wml,wml_content);

 

        //////////////////////////////////////////

        //Write back

          Response.ContentType="text/vnd.wap.wml";

          Response.Charset = "";

          Response.Write(wml);

          Response.End();

%>

 

To deploy this onto your own web-server simply copy this onto some vdir. The server needs to be ASP.Net enabled of-course. You can change the server path (marked in bold above) to point to some folder where you have provided content you want to browse and download from your phone.

 

With a little bit of tinkering I figured that the Samsung C100 accepts images as GIF types. The screen display area for images is about 128*100, so just about any GIF of that dimension should display fine on this phone. I am yet to work out why animated GIFs don’t work and how the animated content on my phone works.

 

Of course all of this may not seem very interesting to someone who has done all this before or has found documentation about this – but if you haven’t then it will save you lots of trial and error time.

 

I could post a link to a URL to a WAP site I have up now, but that has some sensitive content. So I will set up something a little more generic and post a URL here so that people with Samsung C100 phones can pull off some content. (Of course I will figure out a way by which you will have to leave ‘thank you’ comments on my blog per download you make . . . kidding).

 

Having done this much, a thought bothered me – it is understandable how hutch can charge me, I am their customer and it is their network so there must be some easy way to identify who is downloading what. But what about the open network out there? How can people on a random server out there identify me so that they can send a bill that will be added up on my hutch monthly bill?

 

I called up the Hutch customer care line and the lady at the other end of the line had no clue how this happened. All she could say was that ‘all websites know who you are’ and that all download bills will show up on your monthly bill.

 

Hmm… this all ‘websites know who you are’ part didn’t sound too comfy. How could they know who I was? When I visit a website with a browser there is no way they can tell who I am. So how can they know in this case?

 

Which caused me to write this;

 

(a space has been added after every angular bracket in the code below deliberately – because this blog engine has some issues with tags being displayed).

 

< %@ Page Language="C#" Debug="true" %>

< %

          string wml = @"< ?xml version=""1.0""?> < !DOCTYPE wml PUBLIC ""-//WAPFORUM//DTD WML 1.1//EN"" ""http://www.wapforum.org/DTD/wml_1.1.xml"">< wml>< card title=""Debug"">{0}< /card>< /wml>";

        string hdr_content = @"< p>< b>Headers< /b>< /p>< p>< i>{0}< /i>< /p>";

          string entry = @"{0} = {1}< br/>";

        string content="";

       

        foreach(string key in Request.Headers.AllKeys)

                content += String.Format(entry,key,Server.HtmlEncode(Request.Headers[key]));

          hdr_content = string.Format(hdr_content,content);

       

        //////////////////////////////////////////

        //Summarise

          wml = String.Format(wml,hdr_content);

 

        //////////////////////////////////////////

        //Write back

          Response.ContentType="text/vnd.wap.wml";

          Response.Charset = "";

          Response.Write(wml);

          Response.End();

%>

 

This code, as would be obvious to ASP.net folk, simply returns all the HTTP headers. I set this up on my site and when I visited this from the phone – surprise! This is what the headers contained:

 

Connection = close

Via = Jataayu CWS Gateway 3.0.0

Accept = text/vnd.wap.wml, text/vnd.wap.wmlscript, image/vnd.wap.wbmp, application/vnd.wap.wmlc, application/vnd.wap.wmlc, application/vnd.wap.wmlc, application/vnd.wap.wmlc, application/vnd.wap.wmlc, application/vnd.wap.wmlscriptc, application/vnd.wap.multipart.related, application/vnd.wap.multipart.mixed, application/x-up-device, application/vnd.phonecom.mmc-wbxml, application/vnd.phonecom.mmc-wbxml, application/vnd.phonecom.im, application/octet-stream, application/vnd.openwave.pp, application/vnd.wap.sic, application/vnd.wap.slc, application/vnd.wap.coc, application/vnd.uplanet.bearer-choice-wbxml, image/vnd.wap.wbmp, image/png, image/gif, application/x-mmc.wallpaper, application/x-mmc.wallpaper, application/x-mmc.picture, application/x-mmc.picture, application/x-mmc.ringtone, text/vnd.sun.j2me.app-descriptor, application/java-archive, application/vnd.smaf, */*

Accept-Charset = utf-8

Accept-Language = en

Host = www.thinkingms.com/pensieve

User-Agent = SEC-SGHC100G/1.0 UP.Browser/5.0.5.1 (GUI)

X-MSISDN = 9198867xxxxx (-- my number was here)

X-Network-Info = UDP, 10.16.2.135

 

While a request from my browser would look like this:

 

Cache-Control = no-cache

Connection = Keep-Alive

Accept = */*

Accept-Encoding = gzip, deflate

Accept-Language = en-us

Cookie = portalroles=C7A51492F7A60F5D8E518..(truncated)
Host = www.thinkingms.com/pensieve

User-Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)

 

The request from the phone actually seems to be sending out a lot of data – most of this data, I would expect, does not originate form the phone (does it?) but could be added by the WAP gateway of Hutch. I really need to read up some more on these standards.

 

The request from the browser (in this case IE 6) sends out no personally identifiable data. Where that from my phone – my phone number is in there!!! (I have edited the actual number in the display) Creeps! so much for privacy in the wireless world. To put this in context – every single website you go to in the wireless world can actually get your personal phone number. I really need to see how this information can be used in regards to other data.

 

Looking at the headers also gives lots of other valuable information such as the model of my phone and its browser. What content type it can accept (this is very interesting) notice the phone can accept GIF as well as PNG (among other things).

 

I will stop this blog entry at that note. Hopefully in the future I will have more to write about my steps into the wireless world and about the C100. If there are reasons why this whole thing is naïve or obvious information – please do point me at the right documentation, I presently have none. I recommend that you take things here with a grain of salt.

 

You can download the aspx files from here.

 

Foot Note:

I just got Pandu, who has a Reliance phone (that runs on the CDMA network), to access the debug aspx page. This is what his headers looked like:

 

Via = Jataayu CWS Gateway

Accept = text/vnd.wap.wml, image/png, image/gif, application/vnd.wap.wmlscript, image/vnd.wap.wbmp, image/bmp

Accept-Charset = iso-8859-1, utf-8

Host = www.thinkingms.com/pensieve

User-Agent = jBrowser/J2ME Profile/MIDP-1.0 Configuration/CLDC-1.0

Proxy-Connection = Close

X-Client-IP = 97.243.29.205

X-Rapmin-Id = 1067263483

 

His IP address seems to be a constant and there seems to be no direct ID available here (wonder what the X-Rapmin-Id is).

 

Thursday, May 13, 2004 7:26:19 PM (Eastern Standard Time, UTC-05:00)  #    Comments [15]  | 
 Tuesday, May 11, 2004
  Today 

Today life has changed for ever.
Or the converse.

I've got to stop arguing for my limitations now. Because everytime I do that, I win. And then I lose.
I need to change the limitations I believe in and I've just got to start doing better now. There were so many things in my mnd - and now its just silence again, maybe I am not ready.

The losses of tomorrow can be much worse than today - I hope I am bracing myself for them. I think I need to go out and buy a book now.

Tuesday, May 11, 2004 7:49:29 AM (Eastern Standard Time, UTC-05:00)  #    Comments [6]  | 
  Life 

node * reverse(node * root)
{
        node *t1 = root, *t2 = 0, *p = 0;
        if(root) {
                t2 = root -> next;
                t1 -> next = 0;       
        }
        while(t2){
                p = t2 -> next;
                t2 -> next = t1;
                t1 = t2;
                t2 = p;
        }
        return t1;
}

This will terminate even if the list is a loop.

“Shall we write some code?” and I remember thinking in my mind that this will be the easy part.

 

He had already done this with his statement that lack of knowledge of what Quality is constitutes incompetence. It's an old rule of logic that the competence of a speaker has no relevance to the truth of what he says, and so talk of incompetence was pure sand.

Robert M. Pirsig
Zen and the Art of Motorcycle Maintenance

Tuesday, May 11, 2004 2:39:43 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Friday, May 07, 2004

Hi Folk, Pooja is away on a vacation to Rajasthan.

 

"am leaving for a vacation trip today. Its after 8+ years  that I am going to be traveling to Rajasthan, am quite excited. I am back on 17th May. I am going to Gujarat, from there to Udaipur. On the way back will be visiting Nadwara and Bombay.

 

Will have lots to say when I am back. I wonder how life is going to be without a computer for 2 weeks at a stretch :)). I wish I had a digital cam, that is one of the things I am going to get soon after I am back.

 

Adios!"

 

(abridged from another short opinionated big-nosed man of French descent)

I don't know about other countries, but here in India, we have a ritual we call "LOTAing."

 

When you "LOTA" something, it means that you cover it in toilet paper(umm.... )... wash the floor with a small lota of water. Typically, you would "LOTA" the house of one of your friends for a practical joke, although you might also just "LOTA" the house of the codger down the street who always tells you not to try to chew gum while walking and carrying a loaded machine gun with the safety off at the same time (some people are really uptight).

 

Anyway, Pooja is currently out of town and she does not have access to a computer, which means that her blog is wide-open for a good "LOTAing."

 

I was thinking we could "LOTA" the comments section of this post.

Just so that you don't have any excuse not to, here's a link to the page where you can post your "LOTA" comment (just write something like “LOTA!“ as the comment).

 

Also, you're probably thinking that this is pretty immature.

 

Well, you're bloody-well right. But do it anyway :) It'll be fun... It isn't often that you get a chance to LOTA the comments section of one of The

 

Greats.

cheers!

 

Friday, May 07, 2004 12:56:14 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Monday, May 03, 2004

Abilon

Found Abilon tday, an nice RSS viewer and more. I have shifted from Beaver to Abilon for good (I think).

http://www.activerefresh.com/abilon/

 

A Benchmark I find very hard to believe

Here is a benchmark that shows C# to be faster than gcc C (???)

http://www.osnews.com/story.php?news_id=5602

 

int
math

long
math

double
math


trig


I/O


TOTAL

Visual C++

9.6

18.8

6.4

3.5

10.5

48.8

Visual C#

9.7

23.9

17.7

4.1

9.9

65.3

gcc C

9.8

28.8

9.5

14.9

10.0

73.0

Visual Basic

9.8

23.7

17.7

4.1

30.7

85.9

Visual J#

9.6

23.9

17.5

4.2

35.1

90.4

Java 1.3.1

14.5

29.6

19.0

22.1

12.3

97.6

Java 1.4.2

9.3

20.2

6.5

57.1

10.1

103.1

Python/Psyco

29.7

615.4

100.4

13.1

10.5

769.1

Python

322.4

891.9

405.7

47.1

11.9

1679.0

 

 

Altair BASIC programming language

This still sounds so fairytale like

http://encyclopedia.thefreedictionary.com/Altair%20BASIC%20programming%20language

 

 

Informal Language Comparison Chart(s)

This is something rather intersting to have a link to. Follow the link to see comparison of several program languages in all sorts of ways. If you are a languages/runtimes connoisseur then this is a highly recommended link to try.

http://www.smallscript.org/Language%20Comparison%20Chart.asp

 

Defining the Game (Miguel De Icaza)

This is a link to blog entry by Miguel De Icaza – coauthor of MC, Mono, Gnome. It is nice, even refreshing to see open source in this form: nothing better than a good technical argument with both feet on the ground. And I do appreciate this man for giving credit to MS where credit is due and yet be willing to stand his ground and compete at the risk of loosing.

http://primates.ximian.com/~miguel/archive/2004/Apr-24.html

 

Joel Pobar’s Blog

Joel Pobar’s blog is a good mature resource for info about Rotor, the CLR and other stuff. If you have an interest in Rotor I recommend that you take a look.

http://blogs.msdn.com/joelpob/

 

 

Monday, May 03, 2004 6:06:17 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Saturday, May 01, 2004

This is about a small tool I have been using for a while now to let me talk to the WMI (Windows Management Instrumentation) object model exposed by the Windows operating system. WMI is used for various management related tasks and is a very powerful API.

 

I wanted to write the script at one sitting (and not a long one) and so a scripting language was an obvious choice. This could have been written in C++ or C# or any COM interface aware language also. I started of with a Ruby version, but decided to quit on that because the Ruby libraries for interfacing with OLE were not very stable when it came to WMI programming. I believe that is being fixed now. Perl was good choice because it was sufficiently dynamic and more importantly it had stable libraries. The source here can be readily translated to VBScript or JScript versions also. If you do so, drop me a mail.

 

I had mailed about this script to the user groups in the past, but I got a feeling that I had done no justice to what this script and WMI could actually achieve. Hence this article.

 

Download and Docs

You can download the script here, or simply copy paste it and save it from this article. To make this work, you need to have Perl on your machine. Perl can be downloaded for free from here.

 

If you are new to WMI and what to have a good understanding of what is available I suggest you take a look at:

·         WMI Scripting Primer Part 1 Part 2 Part 3

·         The MSDN site on scripting is a good resource: http://msdn.microsoft.com/scripting

·         Scriptomatic: playing around with this HTA application for auto-generating WMI scripts is also lots of fun

·         My previous blog entry Bangalore User Group – WSH, WMI and System.Management has some beginner level scripts that I had demoed at one the UG meetings.

 

What can you do with wmi.pl ?

 

The following is a brief description of what you can do with the wmi.pl.

 

Listing Instances

> wmi.pl

Lists all the running processes on your computer. The out looks a little like this:

D:\scr>wmi.pl

1       System Idle Process

2       System

3       SMSS.EXE

4       CSRSS.EXE

5       WINLOGON.EXE

6       SERVICES.EXE

 

Specifying properties to display

> wmi.pl -executablepath

Lists all the processes as well as their physical paths on disk.

D:\scr>wmi.pl -executablepath

1       System Idle Process

2       System

3       SMSS.EXE        C:\WINNT\System32\smss.exe

4       CSRSS.EXE

5       WINLOGON.EXE    C:\WINNT\system32\winlogon.exe

6       SERVICES.EXE    C:\WINNT\system32\services.exe

7       LSASS.EXE       C:\WINNT\system32\lsass.exe

8       svchost.exe     C:\WINNT\system32\svchost.exe

 

> wmi.pl -< property name >

Lists all the processes with the specified property of the process displayed. This requires some explanation. The processes I refer to here are instance of the WMI class called Win32_Process. You can look at the documentation for the class to see what properties this class supports. Or you can you some of the reflection capabilities of WMI which I have discussed in the next usage sample. The ‘executablepath’ that was shown earlier was one such property. There are several others that the Win32_Process class supports.

This shows the process id of each process as well as how many threads each process currently has. Powerful?

D:\scr>wmi.pl -processid -threadcount

1       System Idle Process     0       1

2       System  8       46

3       SMSS.EXE        152     6

4       CSRSS.EXE       176     12

5       WINLOGON.EXE    172     19

6       SERVICES.EXE    224     37

7       LSASS.EXE       236     18

8       svchost.exe     420     12

9       spoolsv.exe     452     11

 

Examining Classes for Properties and Methods

> wmi.pl ?

Lists all the properties and methods available for the process class. This uses the reflection capabilities of WMI. It examines the Win32_Process class and tells you what properties and methods the class provides. So you can use the information retrieved to know what property name parameters you can use. The output here is edit, the Win32_Process class support many more properties than shown here.

D:\scr>wmi.pl ?

Win32_Process Properties : -----------------------------

1       Caption

2       CreationClassName

3       CreationDate

4       CSCreationClassName

5       CSName

6       Description

7       ExecutablePath

8       ExecutionState

44      WriteTransferCount

Win32_Process Methods : -----------------------------

1       Create

2       Terminate

3       GetOwner

4       GetOwnerSid

So it is easy to see that ‘wmi.pl – creationdate’ is a valid query. Exploring this API can give you lots of useful information. I shall shortly show you what we can do with the methods.

 

Displaying all the Properties of a Class

> wmi.pl -*

This will list all available information about every process on the system. Information such as memory usage, kernel usage, thread count etc are available. In case you want to see a lot of information about every process, the tabular view (generated by specifying property names in the command line) maybe inadequate. This is again highly truncated output that is shown.

51

 Caption = perl.exe

 CreationClassName = Win32_Process

 CreationDate = 20040430233435.908750+330

 CSCreationClassName = Win32_ComputerSystem

 CSName = RMZ10F01

 Description = perl.exe

 ExecutablePath = D:\RoshanJ\Progs\perl\bin\perl.exe

 ExecutionState =

 Handle = 3368

 HandleCount = 106

PeakWorkingSetSize = 5586944

 Priority = 8

 PrivatePageCount = 2265088

 ProcessId = 3368

 QuotaNonPagedPoolUsage = 4780

 QuotaPagedPoolUsage = 20416

 QuotaPeakNonPagedPoolUsage = 5176

 QuotaPeakPagedPoolUsage = 20420

 ReadOperationCount = 54

 

 

Specifying classes to use

> wmi.pl < class name>

So far we have seen that wmi.pl retrieves information about the Win32_Process class. What about all the other information that WMI? The Perl script basically has Win32_Process hard coded as the default class to use. To provide all the behavior that you saw above, taking place for some other class (other than Win32_Process), simply specify the name of that class at the command line.

 

Doing so will cause the rest of the parameters to act for the given class name. For example Win32_Share is the class name for shares of your computer. So ‘wmi.pl win32_share’ will list all the shares and ‘wmi.pl win32_share –path’ will list the share names and their physical paths on your computer.

D:\scr>wmi.pl win32_share

2       System.DDL

4       ut

7       dotnet-talk

8       install

10      ftproot

and (notice we pass –path here, so ‘path’ must be a property of win32_share)

D:\scr>wmi.pl win32_share -path

2       System.DDL      D:\RoshanJ\Homepage\work\System.DDL

4       ut      D:\ut

7       dotnet-talk     D:\dotnet-talk

8       install D:\RoshanJ\install

10      ftproot C:\Inetpub\ftproot

To take a look at what Win23_Share provides

D:\scr>wmi.pl win32_share ?

win32_share Properties : -----------------------------

1       AccessMask

2       AllowMaximum

3       Caption

4       Description

5       InstallDate

6       MaximumAllowed

7       Name

8       Path

9       Status

10      Type

win32_share Methods : -----------------------------

1       Create

2       SetShareInfo

3       Delete

 

Win32_Service represents services on your computer. Win32_LogicalDisk represent drives on your system and so on. All available properties about a win32_LogicalDisk maybe queried by typing ‘wmi.pl win32_logicaldisk ?’, similar to the above sample with Win32_Share. Simple?

 

Now you may ask, how you can know what classes are available. Read on.

 

Examining Namespaces for Classes and Namespaces

> wmi.pl dir

In WMI classes are organized into namespaces. The script wmi.pl is configured to use the namespace ‘root/cimv2’ by default. The above line will list all the classes and nexted namespaces in a namespace. This is a rather huge list so you may want to save this into a text file. Once you find a class that you think is useful, you can query information about the classes listed here by specifying the class name and using the ‘?’ parameter.

D:\scr>wmi.pl dir

root/cimv2 Classes : -----------------------------

441     \\RMZ10F01\ROOT\CIMV2:Win32_DeviceBus

442     \\RMZ10F01\ROOT\CIMV2:Win32_CIMLogicalDeviceCIMDataFile

443     \\RMZ10F01\ROOT\CIMV2:Win32_ShareToDirectory

444     \\RMZ10F01\ROOT\CIMV2:Win32_NetworkAdapterConfiguration

445     \\RMZ10F01\ROOT\CIMV2:Win32_NetworkAdapterSetting

446     \\RMZ10F01\ROOT\CIMV2:Win32_PortableBattery

447     \\RMZ10F01\ROOT\CIMV2:Win32_SystemSlot

448     \\RMZ10F01\ROOT\CIMV2:Win32_PortConnector

449     \\RMZ10F01\ROOT\CIMV2:Win32_PhysicalMemory

450     \\RMZ10F01\ROOT\CIMV2:Win32_SystemEnclosure

451     \\RMZ10F01\ROOT\CIMV2:Win32_BaseBoard

Win32_Process Namespaces : -----------------------------

1       Applications

2       ms_409

To redirect content to text file

D:\scr>wmi.pl dir > filename.txt

 

Specifying other Namespaces

> wmi.pl /ns:< namespace> < other parameters>

            You can change the namespace where the script should find the WMI class you mention by using the /ns: specification. This might be a little abstract to understand but consider this. The previous option ‘dir’ showed you only contents of the ‘root/cimv2’ namespace, if you wanted to see the contents of the ‘root’ namespace the you could use this feature.

D:\scr>wmi.pl /ns:root dir

root Classes : -----------------------------

1       \\RMZ10F01\ROOT:__SystemClass

2       \\RMZ10F01\ROOT:__NAMESPACE

3       \\RMZ10F01\ROOT:__Provider

4       \\RMZ10F01\ROOT:__Win32Provider

5       \\RMZ10F01\ROOT:__ProviderRegistration

6       \\RMZ10F01\ROOT:__ObjectProviderRegistration

7       \\RMZ10F01\ROOT:__ClassProviderRegistration

8       \\RMZ10F01\ROOT:__InstanceProviderRegistration

9       \\RMZ10F01\ROOT:__PropertyProviderRegistration

Win32_Process Namespaces : -----------------------------

1       DEFAULT

2       SECURITY

3       CIMV2

4       WMI

5       directory

6       MSAPPS10

7       NetFrameworkv1

8       MSAPPS

9       MSAPPS11

As you can see, the root namespace contains a namespace called the ‘cimv2’.

 

Here is an example that lists instances of the Win32_OleDbProvider which is part of the ‘root/msapps’ namespace.

D:\scr>wmi.pl /ns:root/msapps win32_oledbprovider

1       VSEE Versioning Enlistment Manager Proxy Data Source

2       MediaCatalogDB OLE DB Provider

3       Microsoft OLE DB Provider for SQL Server

4       Microsoft OLE DB Provider for DTS Packages

5       SQL Server Replication OLE DB Provider for DTS

6       MediaCatalogMergedDB OLE DB Provider

7       Microsoft ISAM 1.1 OLE DB Provider

8       Microsoft OLE DB Provider For Data Mining Services

9       MSDataShape

10      VSEE Versioning Enlistment Manager Proxy Data Source

 

Collecting data from remote computers

> wmi.pl @< computer name > [some other parameters]

All the commands that you saw so far were getting information about your computer. If you are on a network, WMI can let you gather such information about a remote computer as well.

Specifying a computer name or IP address preceded by an @ will cause the whole command to run on a remote system. To execute this successfully your current logged on user id should have admin rights on the remote system. Such a security check is essential because WMI can do potentially powerful and dangerous things such as stopping and starting of services, processes, access to disk partitions, bios etc.

The following command lists shares on a remote computer and shows you the physical paths of the share on that computer.

D:\scr>wmi.pl @machine01 win32_share -path

22      v0.2_dist       D:\v0.2_dist

23      ecma    D:\rotor_text\ecma

25      rotor_text      D:\rotor_text

26      ruby1.8 D:\RoshanJ\Progs\ruby1.8

27      1.0.0001.0020   D:\blog\blogx\1.0.0001.0020

 

Calling Methods on WMI classes

wmi.pl –terminate()

Will terminate every process on your system (that you have rights to terminate). I am not running this as a demo right now, just take my word for it.

If you remember the output of the ‘?’ parameter you might recollect that it displays properties and methods that are available for a class. Thus terminate() was a method of the Win32_Process class. Calling terminate on a process, as expected, would try to terminate it.

 

wmi.pl  -< property name >=< value > < method name>()

Often we do not want to want to call a method on every instance of a class. Like, would would not want to terminate() all the process, but say only a set of them. This is when the above syntax comes in useful. This will call the method specified on every instance of the specified class where the given property has the given value.

So if you want to terminate all the instances of Internet explorer that are running on your system you would say:

D:\scr>wmi.pl -name=iexplore.exe terminate()

Query = select * from Win32_Process where   name='iexplore.exe'

0 IEXPLORE.EXE->terminate

1 IEXPLORE.EXE->terminate

2 IEXPLORE.EXE->terminate

 

Similarly if you want to terminate all instances of IE on a remote system ‘machine01’, you could say:

D:\scr>wmi.pl @machine01 -name=iexplore.exe terminate()

Query = select * from Win32_Process where   name='iexplore.exe'

0 IEXPLORE.EXE->terminate

1 IEXPLORE.EXE->terminate

2 IEXPLORE.EXE->terminate

 

Here is another example.

wmi.pl @machine01 win32_operatingsystem reboot()

This will reboot the remote machine. Of course, your current logged on user needs admin permissions on the remote system for this to happen.

 

 

That’s it for now.

I haven’t finished shaking my head at the power of this API that has been ignored to obscurity, and my neck’s beginning to hurt. Try your own stuff and in the meanwhile I will be adding to the script.

 

There are some things I want to add to script, but haven’t done out of sheer laziness and the thought of getting back to Perl’s rather arcane syntax.

·         Support for impersonation

·         Support for regular expressions

·         Support for method calls with parameters

·         Support for display the CIM class definitions

 

Here is the Perl source (Those of you who have used WMI in some way will notice the consistency when you use the API across languages). How large did you expect the source to be? Since I am a Perl newbie, this code maybe substantially clunkier than what could be written by experienced hands.

 

use Win32;

use Win32::OLE qw (in);

 

$system = ".";

$classname = "Win32_Process";

@props = ("name");

%prop_value=();

$namespace = "root/cimv2";

$call = "instances";

$serial_no = 1;

 

sub list_instances {

       $serv = Win32::OLE->GetObject("winmgmts://$system/$namespace");

       $objs = $serv->InstancesOf("$classname");

       $i = 1;

       foreach $obj (in($objs)) {

              if($serial_no == 1){

                     $str = "$i ";

              }

              else {

                     $str = "";

              }

              if ($all_props) {

                     foreach $prop (in($obj->{Properties_})) {

                           $str = "$str\n $prop->{name} = $obj->{$prop->{name}}";

                     }

                     $str = "$str\n"

              }

              else{

                     foreach $prop (in(@props)) {

                           $str = "$str\t$obj->{$prop}";

                     }

              }

              $str = "$str\n";

              print $str;

              $i = $i + 1;

       }

}

 

sub list_classinfo {

       $obj = Win32::OLE->GetObject("winmgmts://$system/$namespace:$classname");

       print "$classname Properties : -----------------------------\n";

       $i = 1;

       foreach $prop (in($obj->{Properties_})) {

              print "$i\t$prop->{name}\n";

              $i = $i + 1;

       }

       print "$classname Methods : -----------------------------\n";

       $i = 1;

       foreach $m (in($obj->{Methods_})) {

              print "$i\t$m->{name}\n";

              $i = $i + 1;

       }

}

 

sub list_namespaceinfo {

       $serv = Win32::OLE->GetObject("winmgmts://$system/$namespace");

       print "$namespace Classes : -----------------------------\n";

       $i = 1;

       foreach $class (in($serv->SubClassesOf())) {

              $path = $class->{Path_}->{Path};

              print "$i\t$path\n";

              $i = $i + 1;

       }

       print "$classname Namespaces : -----------------------------\n";

       $i = 1;

       foreach $ns (in($serv->InstancesOf("__NAMESPACE"))) {

              print "$i\t$ns->{name}\n";

              $i = $i + 1;

       }

}

 

 

sub call_method {

       $serv = Win32::OLE->GetObject("winmgmts://$system/$namespace");

      

       if ((scalar keys (%prop_value)) == 0){

              $query="select * from $classname";

       } else{

              $query="select * from $classname where ";

              $and = "";

              for $key (keys %prop_value) {

                     $value = %prop_value->{$key};

                     $query = "$query $and $key=\'$value\'";

                     $and = "and";

              }

       }

      

       print "Query = $query\n";

       $objs = $serv->ExecQuery($query);

       $i = 0;

       foreach $obj (in ($objs)) {

              print "$i $obj->{name}->$methodname\n";

              $obj->{$methodname};

              $i = $i + 1

       }

}

 

 

 

foreach $arg (in(@ARGV)) {

       if ($arg =~ m#/sys:(.*)#) {

              $system = $1;

       } elsif ($arg =~ m#@(.*)#) {

              $system = $1;

       } elsif ($arg =~ m#\/class\:(.*)#) {

              $classname = $1;

       } elsif ($arg =~ m#\-\*#) {

              $all_props = 1;

       } elsif ($arg =~ m#\-(.*)=(.*)#) {

              %prop_value->{$1}=$2;

       } elsif ($arg =~ m#\-(.*)#) {

              @props[$#props+1]=$1;

       } elsif ($arg =~ m#^\?$#) {

              $call = "classinfo";

       } elsif ($arg =~ m#dir#) {

              $call = "dir";

       } elsif ($arg =~ m#(.*)\(.*\)#) {

              $methodname = $1;

              $call = "method";

       } elsif ($arg =~ m#\/ns\:(.*)#) {

              $namespace = $1;

       } elsif ($arg eq "/no_serial") {

              $serial_no = 0;

       } elsif (($arg eq "help") || ($arg eq '/?')) {

              print "WMI command line Tool (c)Roshan James, 2004\n\n";

              print "Help is available at:\n";

              print "http://www.thinkingms.com/pensieve/CommentView,guid,64df1ee9-a582-474c-960a-0063cd848609.aspx\n";

              print "Or mail spark\@mvps.org\n\n";

              exit 0

       } else {

              $classname = $arg;

       }

}

 

if ($call eq "instances") {

       list_instances();

} elsif ($call eq "dir") {

       list_namespaceinfo();

} elsif ($call eq "classinfo") {

       list_classinfo();

} elsif ($call eq "method") {

       call_method();

}

 

Saturday, May 01, 2004 1:55:11 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |