Wednesday, May 19, 2004

This is an entry I have kept pending for a long time. I should have had this out much earlier.

 

Here I am going to talk about why iterators are a hard feature to implement in a conventional stack based language. This will probably help you understand some of the design decisions with respect to implementing iterators in a language and, on the whole, make you a better programmer with respect to this programming construct.

 

I am assuming that you have read Parts 1 and 2

Iterators in Ruby (Part - 1)

Warming up to using Iterators (Part 2)

 

After you have read this entry you should be in better light to grasp Parts 4 and 5.

SICP, Fiber api and ITERATORS ! (Part 4)

Implementation of Iterators in C# 2.0 (Part 5)

 

 

Method Calls and the Stack Frame

Since you have seen some samples of iterator based code in Part 2 lets take a look at how an iterator works. Before we delve into the related issues, first let us take a look at how method calls work on the C stack. (We could have taken an stack based language here, for example .Net IL, but I figured it is easier to stick to C).

 

Consider the following functions

 

void caller()

{

        int a;

        int b;

        int sum = callee(a, b);

}

 

 

int callee(int a, int b)

{

        int temp = a + b;

        return temp;

}

 

That is amazingly simple. However what happens here? When I say what happens here, I mean to ask what happens here at the system stack level.  To do that lets take a look at the dissembly of these functions. If you were to compile this code with the vc++ compiler, you could use the following switch to generate assembly code.

>cl /Facode.cpp.asm code.cpp

 

?callee@@YAHHH@Z PROC NEAR                      ; callee

; File d:\roshanj\work\cpp\iter\func.cpp

; Line 2

      push  ebp

      mov   ebp, esp

      push  ecx

; Line 3

      mov   eax, DWORD PTR _a$[ebp]

      add   eax, DWORD PTR _b$[ebp]

      mov   DWORD PTR _temp$[ebp], eax

; Line 4

      mov   eax, DWORD PTR _temp$[ebp]

; Line 5

      mov   esp, ebp

      pop   ebp

      ret   0

?callee@@YAHHH@Z ENDP                           ; callee

 

?caller@@YAXXZ PROC NEAR                        ; caller

; Line 8

      push  ebp

      mov   ebp, esp

      sub   esp, 12                             ; 0000000cH

; Line 11

      mov   eax, DWORD PTR _b$[ebp]

      push  eax

      mov   ecx, DWORD PTR _a$[ebp]

      push  ecx

      call  ?callee@@YAHHH@Z              ; callee

      add   esp, 8

      mov   DWORD PTR _sum$[ebp], eax

; Line 12

      mov   esp, ebp

      pop   ebp

      ret   0

?caller@@YAXXZ ENDP

 

 

The stack frame of the caller function looks like this:

 

 

And the when the caller() is calling the callee() then both the methods have their stack-frames mounted.

 

 

In Intel based systems the C stack grows downward in memory, which means that each item that is added to the stack causes the top of the stack (SP) to have a lesser address value. So the right way to draw these diagram would have been to draw them upside down. But that detail is not relevant here and so I have depicted them as a conventional stack that grows upwards.

 

When the callee() returns, the stack looks like the initial stack diagram and the variable ‘sum’ contains its required value.

 

 

To reiterate, what happens is that a method that is currently running uses the stack for storing its local variables – or more generally the state of the running method is preserved on the stack. When a method is called, it builds its own stack frame on top of whatever was already on the stack. The called method uses the stack frame to save its state, irrespective of what other methods are already on the stack.

 

You might have heard of this concept called a stack-overflow. That happens when methods calls happen to such an extent that there is not more free space left on the stack for the stack frame of a new method to be created.

 

Whenever a method returns, the part of the stack that is used for its variables is freed up – or so to speak, the methods stack frame is torn down. So when a method returns, its state information, that was on the stack is completely lost. This is necessary, because the parent function or method, might go on to call other methods that go on to use same stack space subsequently.

 

So let us say there is a calling pattern like this

 

function a()

        return

       

function b()

        call a()

        return

       

function caller()

        call a()

        call b()

        return

 

The stack usage will look like this –

 

 

Now that we have a reasonable idea of how the stack is used across method calls (though in reality there are so many many approaches), lets move on to iterators.

 

Iterators

Look at the following ruby snippet that shows an iterator. If you have an interest in C/Cpp/C# and couldn’t care less about Ruby, don’t throw your hands up in the air – the language is used here as an example of a language that implements iterators (and rather well at that), which I am using to communicate the idea.

 

def callee()

        for i in 1..10

                yield i

        end

end

 

def process(v)

        return (v * 10)

end

       

       

def caller()

        callee() {|value|

                value = process(value)

                print value

        }

end

 

If you recall the behavior of the iterator from Part 1 and 2, you will remember that when caller() calls callee() and the callee() invokes the yield statement, the control is back at the caller(). In this case, the value of ‘i’ is yielded from callee() which is received in caller() as value.

 

In C code, when the control returns to the calling function the assumption is that the called function is dead on the stack and that the stack frame is free for subsequent method calls, as shown in the stack diagram above.

 

If we have a similar diagram for the this ruby code, how would we draw it?

 

 

This is where we hit our first wall.

 

Assume that the caller() calls callee() and the callee() has yielded value 1. Now the stack frame of callee() is torn down in the old C way and the process() method is called which goes on to use the same area of the stack that was one used by callee(). After that caller() has done whatever it needs to do with the value it received from callee() it tries to invoke callee() again so that it can get the next value. This is where we hit the wall. The callee() cannot return the next value simply because the previous value it has for ‘i’ is lost when its stack frame got torn down.

 

In other words, on a conventional C stack, the callee() state is lost and therefore cannot resume execution from the point of a yield.

 

What work arounds do we have to this issue?

Let us assume that what Ruby does is simply a compiler hack – let us assume that it never really tears down the stack frame of the callee() at all, but instead it ensures that function returns back to the caller() but with the callee() stack frame still in place. That would look like this

 

 

While this is a nice diagram to look at, what does this mean? Where will the stack frame of the subsequent call to process() go? It cannot over-write the callee() stack frame – so it has to go above it. Like this?

 

 

Woo, now wait a minute, how does the caller() function know how much of the stack the callee() function is using to be able to do this? This is a difficult question to answer. Remember that the callee() is a proper function that can be using a variable amount of the stack at any point of time. So the caller() cannot predict the stack usage of the callee() but lets assume that the callee() passes back the information of its stack usage back along with the value that it is yielding.

 

Ok, so that would solve the problem of how the process() function uses the stack. But there is one more issue. What is the code block inside the caller() (the one that receives the yielded value) needs to push a value onto the stack?

 

If the code block inside the caller, needs to push a value onto the stack, then the pushed value will go on to overwrite the stack-frame of the callee above it. The obvious way to fix the problem is to shift the entire stack pointer to the top of the stack, above the callee() also. You can visualize it like this:

 

 

While this could work, there is one more issue – the local member variables of a function are accessed via EBP offsets. Now this is fine when local variables occur at known distances from the base of the stack frame for the function. To see this happen, you might want to refer back to the assembly code I posted towards the beginning of this article. You notice that most of the member variables are being accessed as _a$[ebp] or _b$[ebp], or similar syntax. The _a$ here does nothing but add a fixed positive offset to the EBP pointer. For the access of local variables when code is in the yield’s parameter block region of the caller() these rules would have to change, as there is an issue of adding an additional offset. The additional offset is introduced because now there is the stack frame of the callee() sitting squarely in the middle of what should have been the stack frame of the caller().

 

These issues crop up when using single iterators. If using multiple iterators or nested iterators and when used in conjunction with stack intensive operations like recursion, the picture because very complicated.

 

Alternative approaches of State maintenance and the concept of Method instances

While it should be clear that, to implement iterators in a language the must support the idea of functions maintaining their state even after they surrender control to their calling methods – it is not very clear as to how this can be implemented on a conventional C stack.

 

One approach is to go for a ‘stackless’ implementation. What that means is that function activations or stack frames are treated as allocated memory blocks on the heap and each function instance (when you call a function it needs to create a stack frame and that can be considered a function or method instance) lives on the heap like any other dynamically allocated object. These mini stacks as specific to function instances and so have no real concept of colliding with each other due to sharing a larger OS/platform maintained stack.

 

I believe languages like lisp/scheme behave in this manner with respect to the language stack. (I have been told this and I hope this is correct).

 

 

Another alternative is to approach the problem like this. Assume that the callee used only static variables. Immediately, the issue of maintaining state of variable of the callee on the stack is eliminated. The only issue then, is to resume execution of the function from the correct place (immediately past the yield) when the function is called again. This can be easily achieved by saving the resume position into an additional variable and implementing a switch case goto construct at the beginning of the function.

 

Now since it is persisting state in static variables we cannot use this recursively or in any other fashion. But we could fix this if we created a structure/class that contains member variables corresponding to the local variables of the function and use an instance of this structure in the function instead of using proper static variables.

 

Languages such as C# and Python use a similar approach to maintaining state of iterators between calls. You can read more about this in the Part 5 of this series.

 

For constructs like iterators to be supported in .Net and similar runtimes, in a native way, will require substantial changes in the way the runtime manages stack-frames and similar entities. Basically languages and runtimes need to be retro-fitted with a concept of function and code block instances the same way object oriented programming is fitted with concepts of class instances called objects.

 

Languages like Sun Microsystems’ Self build on similar concepts for methods/function instantiation.

 

If you give some of the ideas here a little thought, you should be well on your way to dreaming about new languages and fancy programming constructs.

Wednesday, May 19, 2004 9:09:21 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
 Tuesday, May 18, 2004

 

Trashbin finally has patch.

 

Before trashbin was written I used to wonder a bit out this mythical entity called metadata that is often talked about in the .Net framework. Metadata was mentioned on almost every discourse on .Net and libraries like reflection libs were based squarely on it. Metadata formed the central pillar of design of the self describing type/component system that is called the .Net framework.

 

However, I could find almost nothing that showed me actual metadata, in its physical form. As a consequence I decided to try and understand what metadata was all about. Thus trashbin was written. Trashbin first saw light of day when announced in a Bangalore User Group post sent in the wee hours of morning.

 

From: spark

Subject: metadata viewer: trashbin v0.1, src+bin release

Date: Mon, 09 Jun 2003 13:28:37 -0700

-----------------------------------------------------------

 

hi group, 

i had been spending some of my odd freetime and sometimes lost sleep into exploring the .net exe/dll format and peeking into metadata glue.

well, the glue is rather interesting to look into and gives you a small insight into where information for things like the class loader and the reflection api get their data. trashbin is small viewer for metadata info that i am releasing with src. this is the first version anywhere and so is expected to be buggy - do mail. if internals interests you, take a look:

http://www.thinkingms.com/pensieve/homepage/work/trashbin/trashbin.htm

cheers :)

rosh

 

Since its release, trashbin has more or less worked fine for me and I have been using it for about a year now.
In essence, this is what trashbin does:

 

>trashbin

Spark (?)  Managed(.Net)/Native PE-COFF file viewer. Version 0.2

May 2003, contact: rosh@mvps.org

Last update: May 2004

 

usage: trashbin [options]

 

        portable executable info:

        /dos     display dos header

        /sig     display the file signature

        /coff    display coff header

        /pe      display pe/optional header

        /dd      display data directories in pe header

        /sec     display section headers

        /exp     display export table

        /imp     display import table

        /reloc   display relocation information

        /tls     display Thread Local Storage information

 

        managed info:

        /corhdr          display the common language runtime header

        /mdhdr           display metadata headers

        /md:Strings      display metadata stream #Strings

        /md:Blob         display metadata stream #Blob

        /md:US           display metadata stream #US (user strings)

        /md:GUID         display metadata stream #GUID

        /md:#~           display optimised metadata tables stream-header

        /mdtab           display optimised metadata tables

 

        other:

        /type    indicates the type of the PE file

        /csv     enable excel compatible, CSV output

 

        ps. The name trashbin is 'inspired' from dumpbin :)

 

Since most people who are reading this entry might be interested in what metadata is and what the PE file format is like, here goes:

 

The PE file format is Microsoft’s Portable Executable File Format. Essentially most exes and dlls that you will see on a windows system have this file format. Yes Exes and DLLs have the same format. The difference largely lies in the fact that a DLL file does not necessarily have an entry point defined. Here is a little about the Exe/Dll format:

 

Once upon a time, there used to be old DOS exes that came with what was the DOS exe header. Microsoft retained the DOS exe header in all subsequent exe formats so that the executables would be compatible across their operating systems. Which is why, you can run any windows or .Net exe on any Microsoft operating system (including dos) and see it run. Of course these programs would not do anything in the dos environment other than display a message saying that the exe would run under windows. The point however is that the exe did validly execute on a 15 or twenty year old system that was built for processors that did not have a concept of memory beyond 1Mb.

 

The DOS header is known for it special signature bytes MZ. Open any exe file and notice that the very first two bytes are MZ. MZ stands for Mark Zbikowski, the person who developed the DOS exe file format. Prior to that the executable format was called the Com file format. Those of you who have had the chance to work on DOS would not have forgotten what a pleasure some of those COM files used to be. The COM format belonged to the then popular CP/M operating system of Digital of the great Gary Kildall. Gary Kildall was pioneer in a way few people were.. anyway that is story for later.

 

The following is a dump of the initial few bytes of an exe file showing the then new MZ DOS header:

 

>hexv HelloWorld.exe

 

0000:0000│ 4D 5A 90 00 03 00 00 00 │ 04 00 00 00 FF FF 00 00 │ MZÉ▒♥▒▒▒│♦▒▒▒  ▒▒

0000:0010│ B8 00 00 00 00 00 00 00 │ 40 00 00 00 00 00 00 00 │ ╕▒▒▒▒▒▒▒│@▒▒▒▒▒▒▒

0000:0020│ 00 00 00 00 00 00 00 00 │ 00 00 00 00 00 00 00 00 │ ▒▒▒▒▒▒▒▒│▒▒▒▒▒▒▒▒

 

In a sense Mr Zbikowsky has the most popular initials on the planet. The DOS exe header itself is available as a structure that is defined in the winnt.h header file that is available on almost every windows based c/cpp dev environment.

 

Now the DOS exe header did not suffice to hold a lot of the new information that the exe had to present to the operating system, when windows came along. So new structures were introduced which were the akin to the old unix based Common Object File Format (COFF). There is plenty of literature available about this on the net.

 

The PE file is denoted by the signature bytes “PE  ". If you download trashbin the source code has some embedded urls that give you information about the PE file format itself. Those may prove valuable for your understanding of the actual exe file format.

 

Just to connect all that I have been talking about to trashbin and how you can actually examine an exe file with it, these are the relevant switches.

 

        /dos     display dos header

        /sig     display the file signature

        /coff    display coff header

        /pe      display pe/optional header

 

Now that we have covered that ground, lets move on. The PE file has a data structure called the Data Directory which is displayed through the  /dd option.

 

        /dd      display data directories in pe header

 

The data directory basically contains pointers to various data structures inside the PE file. The DD has 16 entries and a dump of the DD looks like this:

 

C:\WINNT\system32>trashbin tracert.exe /dd

_IMAGE_DATA_DIRECTORY

0       VirtualAddress = 0

        Size = 0

1       VirtualAddress = 0x19ac

        Size = 0x78

2       VirtualAddress = 0x3000

        Size = 0x11b8

3       VirtualAddress = 0

        Size = 0

4       VirtualAddress = 0

        Size = 0

5       VirtualAddress = 0

        Size = 0

6       VirtualAddress = 0x1090

        Size = 0x1c

7       VirtualAddress = 0

        Size = 0

8       VirtualAddress = 0

        Size = 0

9       VirtualAddress = 0

        Size = 0

10      VirtualAddress = 0

        Size = 0

11      VirtualAddress = 0x240

        Size = 0x7c

12      VirtualAddress = 0x1000

        Size = 0x88

13      VirtualAddress = 0

        Size = 0

14      VirtualAddress = 0

        Size = 0

15      VirtualAddress = 0

        Size = 0

 

 

This is trashbin examining tracert program that comes with windows. The tracert program is a native exe (not a .net exe). Entries in the DD have a predefined meaning, these are defined in winnt.h as follows:

 

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory

#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory

#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory

#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory

#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory

#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table

#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory

//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)

#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data

#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP

#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory

#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory

#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers

#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table

#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors

#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

 

You can compare these against what is present in the tracert program dump.

Now lets do a DD dump of a .net exe

 

>trashbin HelloWorld.exe /dd

_IMAGE_DATA_DIRECTORY

0       VirtualAddress = 0

        Size = 0

1       VirtualAddress = 0x2370

        Size = 0x4b

2       VirtualAddress = 0x4000

        Size = 0x340

3       VirtualAddress = 0

        Size = 0

4       VirtualAddress = 0

        Size = 0

5       VirtualAddress = 0x6000

        Size = 0xc

6       VirtualAddress = 0

        Size = 0

7       VirtualAddress = 0

        Size = 0

8       VirtualAddress = 0

        Size = 0

9       VirtualAddress = 0

        Size = 0

10      VirtualAddress = 0

        Size = 0

11      VirtualAddress = 0

        Size = 0

12      VirtualAddress = 0x2000

        Size = 0x8

13      VirtualAddress = 0

        Size = 0

14      VirtualAddress = 0x2008

        Size = 0x48

15      VirtualAddress = 0

        Size = 0

 

The interesting thing to note is that in a managed exe, the 15th entry is non zero. The 14th entry point to the Common Runtime Header or the CorHdr. The CorHdr structure is defined in corhdr.h.

 

The CorHdr is where (so to speak) the managed world starts. So .Net exe files are regular PE files which have all the .Net specific content in a particular offset in the exe file. The idea was that .Net was designed to be platform neutral. So Microsoft could assume that the facilities provided by the exe file format would be available in file formats of other operating systems where .Net would one day run. For this sake .net specific content assumes that the whatever the parent file format that is enclosing it, the .Net specific content can be made laid out in a large well defined binary blob.

 

A description of the CorHdr, Metadata layout and other intricacies of the underlying system can be found described (in varying detail) in the ECMA specification of the Common Language Infrastructure. Partition 2 that describes metadata is the one that you would want to look it, in this regard.

http://msdn.microsoft.com/net/ecma/

 

Another interesting resource for the technical mind is Serge Lidin’s book “Inside the .Net IL Assembler” . The book is available as an Indian India edition also, so it is affordable.

 

Trashbin lets you view this part of the managed exe file.

These are the relevant switches:

 

        /corhdr          display the common language runtime header

        /mdhdr           display metadata headers

        /md:Strings      display metadata stream #Strings

        /md:Blob         display metadata stream #Blob

        /md:US           display metadata stream #US (user strings)

        /md:GUID         display metadata stream #GUID

        /md:#~           display optimised metadata tables stream-header

        /mdtab           display optimised metadata tables

 

Trashbin is not a dissembler – it simply lets you view the other data that is present in the exe/dll file. I figured I don’t need to do a dissembler because ILdasm and several other tools do the job so well. ILdasm btw is written by Serge Lidin.

 

The metadata in the managed file is divided into a set of streams. A stream is like an area of memory reserved for content of a specific type. These are namely –

#Strings

#US

#Blob

#GUID

#~ and #-

 

The #Strings stream keeps all the strings in the application that include things like names of classes, methods, parameters, namespaces, assemblies etc. So this stream basically contains all kind of strings that are part of the source code itself – these are where the names that are used by the reflection library are available.

 

The #US stream is the one for user defined strings. So when you say Console.WriteLine(“Hello World”) in your program, the “Hello World” goes into #US and the Console.WriteLine goes into #Strings. The strings present in the #US set are Unicode strings – so each character is two bytes.

 

Here are some nice and friendly hex dumps of these regions from a exe file followed by the corresponding information being ripped by trashbin:

 

#Strings

 

0000:0460│ 00 00 00 00 00 00 00 00 │ 00 3C 4D 6F 64 75 6C 65 │ ▒▒▒▒▒▒▒▒│▒

0000:0470│ 3E 00 48 65 6C 6C 6F 57 │ 6F 72 6C 64 2E 65 78 65 │ >▒HelloW│orld.exe

0000:0480│ 00 6D 73 63 6F 72 6C 69 │ 62 00 53 79 73 74 65 6D │ ▒mscorli│b▒System

0000:0490│ 00 4F 62 6A 65 63 74 00 │ 43 61 6C 63 00 48 65 6C │ ▒Object▒│Calc▒Hel

0000:04A0│ 6C 6F 57 6F 72 6C 64 53 │ 61 6D 70 6C 65 00 48 65 │ loWorldS│ample▒He

0000:04B0│ 6C 6C 6F 57 6F 72 6C 64 │ 00 41 64 64 00 2E 63 74 │ lloWorld│▒Add▒.ct

0000:04C0│ 6F 72 00 4D 61 69 6E 00 │ 53 79 73 74 65 6D 2E 44 │ or▒Main▒│System.D

0000:04D0│ 69 61 67 6E 6F 73 74 69 │ 63 73 00 44 65 62 75 67 │ iagnosti│cs▒Debug

0000:04E0│ 67 61 62 6C 65 41 74 74 │ 72 69 62 75 74 65 00 61 │ gableAtt│ribute▒a

0000:04F0│ 00 62 00 61 72 67 73 00 │ 43 6F 6E 73 6F 6C 65 00 │ ▒b▒args▒│Console▒

0000:0500│ 57 72 69 74 65 4C 69 6E │ 65 00 45 78 63 65 70 74 │ WriteLin│e▒Except

0000:0510│ 69 6F 6E 00 67 65 74 5F │ 4D 65 73 73 61 67 65 00 │ ion▒get_│Message▒

 

>trashbin HelloWorld.exe /md:Strings

METADATA STREAM #Strings

        Offset : "String"

        0x1    : ""

        0xA    : "HelloWorld.exe"

        0x19   : "mscorlib"

        0x22   : "System"

        0x29   : "Object"

        0x30   : "Calc"

        0x35   : "HelloWorldSample"

        0x46   : "HelloWorld"

        0x51   : "Add"

        0x55   : ".ctor"

        0x5B   : "Main"

        0x60   : "System.Diagnostics"

        0x73   : "DebuggableAttribute"

        0x87   : "a"

        0x89   : "b"

        0x8B   : "args"

        0x90   : "Console"

        0x98   : "WriteLine"

        0xA2   : "Exception"

        0xAC   : "get_Message"

 

#US

 

0000:04B0│ 65 00 00 00 00 17 48 00 │ 65 00 6C 00 6C 00 6F 00 │ e▒▒▒▒↨H▒│e▒l▒l▒o▒

0000:04C0│ 20 00 57 00 6F 00 72 00 │ 6C 00 64 00 00 00 00 00 │  ▒W▒o▒r▒│l▒d▒▒▒▒▒

 

>trashbin test.exe /md:US

METADATA STREAM #US

0x1, (23 bytes)

    Txt: H.e.l.l.o...W.o.r.l.d..

    Hex: 48 00 65 00 6c 00 6c 00 6f 00 20 00 57 00 6f 00 72 00 6c 00 64 00 00

 

I will just skip over the #GUID and #Blob streams for now. The #~ stream is the interesting one that is the stream that actually contains the metadata tables. These is an alternate stream thaty can be present which is the #- stream. The #- stream again contains metadata tables but these are called the un-optimized tables because certain sort orders are not maintained in these tables. The Microsoft compiles always emit optimized tables and since I have not been using any other compilers (Mono too seems to emit optimized tables) I don’t have support for #- in trashbin.

 

Lets focus on #~. The #~ is the real metadata, if you would like to think of it that way, It is actually a small relational database that is compressed down to the last bit. There are a large number of tables (with predefined schemas) that can occur here. These tables provide information about the exe or dll (or precisely the assembly) that they are trying to describe.

 

These tables cross reference each other as well as reference entries in the other streams, such as the #GUID and #Strings streams. The trashbin option /md:#~ gives you the header of #~ stream, which is a kind of summary view of what the stream contains:

 

>trashbin HelloWorld.exe /md:#~

METADATA STREAM #~

        TABLES HEADER

                MajorVersion = 1

                MinorVersion = 0

                HeapSizes = 0

                        #String Index = 2 bytes wide

                        #GUID Index = 2 bytes wide

                        #Blob Index = 2 bytes wide

                Valid  = 0x00000900021547

                Sorted = 0x0002003301fa00

 

        METADATA Tables

                RID.               TableName    [No of Rows]

                0.                    Module    [1]     Row=10 bytes

                1.                   TypeRef    [4]     Row=6 bytes

                2.                   TypeDef    [3]     Row=14 bytes

                6.                    Method    [4]     Row=14 bytes

                8.                     Param    [3]     Row=6 bytes

                10.                MemberRef    [5]     Row=6 bytes

                12.          CustomAttribute    [1]     Row=6 bytes

                17.            StandAloneSig    [2]     Row=2 bytes

                32.                 Assembly    [1]     Row=22 bytes

                35.              AssemblyRef    [1]     Row=20 bytes

        Table Count = 10

 

The above listing says that helloworld.exe contains 10 tables in its metadata headers.

To give you a taste of what I am talking about lets look at some of these tables:

 

To look at the metadata tables, use the option

 

        /mdtab           display optimised metadata tables

 

This is the table called TypeDef that has a listing of all the types defined in this assembly.

 

[RID=2] Table TypeDef

     [  DATA]Flags    [STRING]Name     [STRING]Namespace [CI:64 ]Extends  [RID: 4]FieldList [RID: 6]MethodList

   1.[  0x  ]0        [  0x  ]1        [  0x  ]0         [RID: 2] 0       [RID: 4] 1        [RID: 6] 1        

   2.[  0x  ]100001   [  0x  ]30       [  0x  ]35        [RID: 1] 1       [RID: 4] 1        [RID: 6] 1        

   3.[  0x  ]100001   [  0x  ]46       [  0x  ]35        [RID: 1] 1       [RID: 4] 1        [RID: 6] 3        

 

The table has a field called name – which is the name of the type. The value of the names field are actually offsets into the #Strings stream. So if you care to compare (I have provided a listing of the #Strings stream earlier), what the table is saying is that this assembly has 3 types which are namely , Calc and HelloWorld. At this point let me drop the source code of the HelloWorld.cs file from which this assembly was compiled:

 

//HelloWorld.cs

namespace HelloWorldSample

{

        using System;

       

        public class Calc

        {

                public int Add(int a, int b)

                {

                        return a+b;