
Trashbin finally has patch.
Before trashbin was written I used to wonder a bit out this mythical entity called metadata that is often talked about in the .Net framework. Metadata was mentioned on almost every discourse on .Net and libraries like reflection libs were based squarely on it. Metadata formed the central pillar of design of the self describing type/component system that is called the .Net framework.
However, I could find almost nothing that showed me actual metadata, in its physical form. As a consequence I decided to try and understand what metadata was all about. Thus trashbin was written. Trashbin first saw light of day when announced in a Bangalore User Group post sent in the wee hours of morning.
From: spark
Subject: metadata viewer: trashbin v0.1, src+bin release
Date: Mon, 09 Jun 2003 13:28:37 -0700
-----------------------------------------------------------
hi group,
i had been spending some of my odd freetime and sometimes lost sleep into exploring the .net exe/dll format and peeking into metadata glue.
well, the glue is rather interesting to look into and gives you a small insight into where information for things like the class loader and the reflection api get their data. trashbin is small viewer for metadata info that i am releasing with src. this is the first version anywhere and so is expected to be buggy - do mail. if internals interests you, take a look:
http://www.thinkingms.com/pensieve/homepage/work/trashbin/trashbin.htm
cheers :)
rosh
Since its release, trashbin has more or less worked fine for me and I have been using it for about a year now.
In essence, this is what trashbin does:
>trashbin
Spark (?) Managed(.Net)/Native PE-COFF file viewer. Version 0.2
May 2003, contact: rosh@mvps.org
Last update: May 2004
usage: trashbin [options]
portable executable info:
/dos display dos header
/sig display the file signature
/coff display coff header
/pe display pe/optional header
/dd display data directories in pe header
/sec display section headers
/exp display export table
/imp display import table
/reloc display relocation information
/tls display Thread Local Storage information
managed info:
/corhdr display the common language runtime header
/mdhdr display metadata headers
/md:Strings display metadata stream #Strings
/md:Blob display metadata stream #Blob
/md:US display metadata stream #US (user strings)
/md:GUID display metadata stream #GUID
/md:#~ display optimised metadata tables stream-header
/mdtab display optimised metadata tables
other:
/type indicates the type of the PE file
/csv enable excel compatible, CSV output
ps. The name trashbin is 'inspired' from dumpbin :)
Since most people who are reading this entry might be interested in what metadata is and what the PE file format is like, here goes:
The PE file format is Microsoft’s Portable Executable File Format. Essentially most exes and dlls that you will see on a windows system have this file format. Yes Exes and DLLs have the same format. The difference largely lies in the fact that a DLL file does not necessarily have an entry point defined. Here is a little about the Exe/Dll format:
Once upon a time, there used to be old DOS exes that came with what was the DOS exe header. Microsoft retained the DOS exe header in all subsequent exe formats so that the executables would be compatible across their operating systems. Which is why, you can run any windows or .Net exe on any Microsoft operating system (including dos) and see it run. Of course these programs would not do anything in the dos environment other than display a message saying that the exe would run under windows. The point however is that the exe did validly execute on a 15 or twenty year old system that was built for processors that did not have a concept of memory beyond 1Mb.
The DOS header is known for it special signature bytes MZ. Open any exe file and notice that the very first two bytes are MZ. MZ stands for Mark Zbikowski, the person who developed the DOS exe file format. Prior to that the executable format was called the Com file format. Those of you who have had the chance to work on DOS would not have forgotten what a pleasure some of those COM files used to be. The COM format belonged to the then popular CP/M operating system of Digital of the great Gary Kildall. Gary Kildall was pioneer in a way few people were.. anyway that is story for later.
The following is a dump of the initial few bytes of an exe file showing the then new MZ DOS header:
>hexv HelloWorld.exe
0000:0000│ 4D 5A 90 00 03 00 00 00 │ 04 00 00 00 FF FF 00 00 │ MZÉ▒♥▒▒▒│♦▒▒▒ ▒▒
0000:0010│ B8 00 00 00 00 00 00 00 │ 40 00 00 00 00 00 00 00 │ ╕▒▒▒▒▒▒▒│@▒▒▒▒▒▒▒
0000:0020│ 00 00 00 00 00 00 00 00 │ 00 00 00 00 00 00 00 00 │ ▒▒▒▒▒▒▒▒│▒▒▒▒▒▒▒▒
In a sense Mr Zbikowsky has the most popular initials on the planet. The DOS exe header itself is available as a structure that is defined in the winnt.h header file that is available on almost every windows based c/cpp dev environment.
Now the DOS exe header did not suffice to hold a lot of the new information that the exe had to present to the operating system, when windows came along. So new structures were introduced which were the akin to the old unix based Common Object File Format (COFF). There is plenty of literature available about this on the net.
The PE file is denoted by the signature bytes “PE ". If you download trashbin the source code has some embedded urls that give you information about the PE file format itself. Those may prove valuable for your understanding of the actual exe file format.
Just to connect all that I have been talking about to trashbin and how you can actually examine an exe file with it, these are the relevant switches.
/dos display dos header
/sig display the file signature
/coff display coff header
/pe display pe/optional header
Now that we have covered that ground, lets move on. The PE file has a data structure called the Data Directory which is displayed through the /dd option.
/dd display data directories in pe header
The data directory basically contains pointers to various data structures inside the PE file. The DD has 16 entries and a dump of the DD looks like this:
C:\WINNT\system32>trashbin tracert.exe /dd
_IMAGE_DATA_DIRECTORY
0 VirtualAddress = 0
Size = 0
1 VirtualAddress = 0x19ac
Size = 0x78
2 VirtualAddress = 0x3000
Size = 0x11b8
3 VirtualAddress = 0
Size = 0
4 VirtualAddress = 0
Size = 0
5 VirtualAddress = 0
Size = 0
6 VirtualAddress = 0x1090
Size = 0x1c
7 VirtualAddress = 0
Size = 0
8 VirtualAddress = 0
Size = 0
9 VirtualAddress = 0
Size = 0
10 VirtualAddress = 0
Size = 0
11 VirtualAddress = 0x240
Size = 0x7c
12 VirtualAddress = 0x1000
Size = 0x88
13 VirtualAddress = 0
Size = 0
14 VirtualAddress = 0
Size = 0
15 VirtualAddress = 0
Size = 0
This is trashbin examining tracert program that comes with windows. The tracert program is a native exe (not a .net exe). Entries in the DD have a predefined meaning, these are defined in winnt.h as follows:
#define IMAGE_DIRECTORY_ENTRY_EXPORT 0 // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT 1 // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE 2 // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3 // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY 4 // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC 5 // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG 6 // Debug Directory
// IMAGE_DIRECTORY_ENTRY_COPYRIGHT 7 // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7 // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS 9 // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10 // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11 // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT 12 // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13 // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14 // COM Runtime descriptor
You can compare these against what is present in the tracert program dump.
Now lets do a DD dump of a .net exe
>trashbin HelloWorld.exe /dd
_IMAGE_DATA_DIRECTORY
0 VirtualAddress = 0
Size = 0
1 VirtualAddress = 0x2370
Size = 0x4b
2 VirtualAddress = 0x4000
Size = 0x340
3 VirtualAddress = 0
Size = 0
4 VirtualAddress = 0
Size = 0
5 VirtualAddress = 0x6000
Size = 0xc
6 VirtualAddress = 0
Size = 0
7 VirtualAddress = 0
Size = 0
8 VirtualAddress = 0
Size = 0
9 VirtualAddress = 0
Size = 0
10 VirtualAddress = 0
Size = 0
11 VirtualAddress = 0
Size = 0
12 VirtualAddress = 0x2000
Size = 0x8
13 VirtualAddress = 0
Size = 0
14 VirtualAddress = 0x2008
Size = 0x48
15 VirtualAddress = 0
Size = 0
The interesting thing to note is that in a managed exe, the 15th entry is non zero. The 14th entry point to the Common Runtime Header or the CorHdr. The CorHdr structure is defined in corhdr.h.
The CorHdr is where (so to speak) the managed world starts. So .Net exe files are regular PE files which have all the .Net specific content in a particular offset in the exe file. The idea was that .Net was designed to be platform neutral. So Microsoft could assume that the facilities provided by the exe file format would be available in file formats of other operating systems where .Net would one day run. For this sake .net specific content assumes that the whatever the parent file format that is enclosing it, the .Net specific content can be made laid out in a large well defined binary blob.
A description of the CorHdr, Metadata layout and other intricacies of the underlying system can be found described (in varying detail) in the ECMA specification of the Common Language Infrastructure. Partition 2 that describes metadata is the one that you would want to look it, in this regard.
http://msdn.microsoft.com/net/ecma/
Another interesting resource for the technical mind is Serge Lidin’s book “Inside the .Net IL Assembler” . The book is available as an Indian India edition also, so it is affordable.
Trashbin lets you view this part of the managed exe file.
These are the relevant switches:
/corhdr display the common language runtime header
/mdhdr display metadata headers
/md:Strings display metadata stream #Strings
/md:Blob display metadata stream #Blob
/md:US display metadata stream #US (user strings)
/md:GUID display metadata stream #GUID
/md:#~ display optimised metadata tables stream-header
/mdtab display optimised metadata tables
Trashbin is not a dissembler – it simply lets you view the other data that is present in the exe/dll file. I figured I don’t need to do a dissembler because ILdasm and several other tools do the job so well. ILdasm btw is written by Serge Lidin.
The metadata in the managed file is divided into a set of streams. A stream is like an area of memory reserved for content of a specific type. These are namely –
#Strings
#US
#Blob
#GUID
#~ and #-
The #Strings stream keeps all the strings in the application that include things like names of classes, methods, parameters, namespaces, assemblies etc. So this stream basically contains all kind of strings that are part of the source code itself – these are where the names that are used by the reflection library are available.
The #US stream is the one for user defined strings. So when you say Console.WriteLine(“Hello World”) in your program, the “Hello World” goes into #US and the Console.WriteLine goes into #Strings. The strings present in the #US set are Unicode strings – so each character is two bytes.
Here are some nice and friendly hex dumps of these regions from a exe file followed by the corresponding information being ripped by trashbin:
#Strings
0000:0460│ 00 00 00 00 00 00 00 00 │ 00 3C 4D 6F 64 75 6C 65 │ ▒▒▒▒▒▒▒▒│▒
0000:0470│ 3E 00 48 65 6C 6C 6F 57 │ 6F 72 6C 64 2E 65 78 65 │ >▒HelloW│orld.exe
0000:0480│ 00 6D 73 63 6F 72 6C 69 │ 62 00 53 79 73 74 65 6D │ ▒mscorli│b▒System
0000:0490│ 00 4F 62 6A 65 63 74 00 │ 43 61 6C 63 00 48 65 6C │ ▒Object▒│Calc▒Hel
0000:04A0│ 6C 6F 57 6F 72 6C 64 53 │ 61 6D 70 6C 65 00 48 65 │ loWorldS│ample▒He
0000:04B0│ 6C 6C 6F 57 6F 72 6C 64 │ 00 41 64 64 00 2E 63 74 │ lloWorld│▒Add▒.ct
0000:04C0│ 6F 72 00 4D 61 69 6E 00 │ 53 79 73 74 65 6D 2E 44 │ or▒Main▒│System.D
0000:04D0│ 69 61 67 6E 6F 73 74 69 │ 63 73 00 44 65 62 75 67 │ iagnosti│cs▒Debug
0000:04E0│ 67 61 62 6C 65 41 74 74 │ 72 69 62 75 74 65 00 61 │ gableAtt│ribute▒a
0000:04F0│ 00 62 00 61 72 67 73 00 │ 43 6F 6E 73 6F 6C 65 00 │ ▒b▒args▒│Console▒
0000:0500│ 57 72 69 74 65 4C 69 6E │ 65 00 45 78 63 65 70 74 │ WriteLin│e▒Except
0000:0510│ 69 6F 6E 00 67 65 74 5F │ 4D 65 73 73 61 67 65 00 │ ion▒get_│Message▒
>trashbin HelloWorld.exe /md:Strings
METADATA STREAM #Strings
Offset : "String"
0x1 : ""
0xA : "HelloWorld.exe"
0x19 : "mscorlib"
0x22 : "System"
0x29 : "Object"
0x30 : "Calc"
0x35 : "HelloWorldSample"
0x46 : "HelloWorld"
0x51 : "Add"
0x55 : ".ctor"
0x5B : "Main"
0x60 : "System.Diagnostics"
0x73 : "DebuggableAttribute"
0x87 : "a"
0x89 : "b"
0x8B : "args"
0x90 : "Console"
0x98 : "WriteLine"
0xA2 : "Exception"
0xAC : "get_Message"
#US
0000:04B0│ 65 00 00 00 00 17 48 00 │ 65 00 6C 00 6C 00 6F 00 │ e▒▒▒▒↨H▒│e▒l▒l▒o▒
0000:04C0│ 20 00 57 00 6F 00 72 00 │ 6C 00 64 00 00 00 00 00 │ ▒W▒o▒r▒│l▒d▒▒▒▒▒
>trashbin test.exe /md:US
METADATA STREAM #US
0x1, (23 bytes)
Txt: H.e.l.l.o...W.o.r.l.d..
Hex: 48 00 65 00 6c 00 6c 00 6f 00 20 00 57 00 6f 00 72 00 6c 00 64 00 00
I will just skip over the #GUID and #Blob streams for now. The #~ stream is the interesting one that is the stream that actually contains the metadata tables. These is an alternate stream thaty can be present which is the #- stream. The #- stream again contains metadata tables but these are called the un-optimized tables because certain sort orders are not maintained in these tables. The Microsoft compiles always emit optimized tables and since I have not been using any other compilers (Mono too seems to emit optimized tables) I don’t have support for #- in trashbin.
Lets focus on #~. The #~ is the real metadata, if you would like to think of it that way, It is actually a small relational database that is compressed down to the last bit. There are a large number of tables (with predefined schemas) that can occur here. These tables provide information about the exe or dll (or precisely the assembly) that they are trying to describe.
These tables cross reference each other as well as reference entries in the other streams, such as the #GUID and #Strings streams. The trashbin option /md:#~ gives you the header of #~ stream, which is a kind of summary view of what the stream contains:
>trashbin HelloWorld.exe /md:#~
METADATA STREAM #~
TABLES HEADER
MajorVersion = 1
MinorVersion = 0
HeapSizes = 0
#String Index = 2 bytes wide
#GUID Index = 2 bytes wide
#Blob Index = 2 bytes wide
Valid = 0x00000900021547
Sorted = 0x0002003301fa00
METADATA Tables
RID. TableName [No of Rows]
0. Module [1] Row=10 bytes
1. TypeRef [4] Row=6 bytes
2. TypeDef [3] Row=14 bytes
6. Method [4] Row=14 bytes
8. Param [3] Row=6 bytes
10. MemberRef [5] Row=6 bytes
12. CustomAttribute [1] Row=6 bytes
17. StandAloneSig [2] Row=2 bytes
32. Assembly [1] Row=22 bytes
35. AssemblyRef [1] Row=20 bytes
Table Count = 10
The above listing says that helloworld.exe contains 10 tables in its metadata headers.
To give you a taste of what I am talking about lets look at some of these tables:
To look at the metadata tables, use the option
/mdtab display optimised metadata tables
This is the table called TypeDef that has a listing of all the types defined in this assembly.
[RID=2] Table TypeDef
[ DATA]Flags [STRING]Name [STRING]Namespace [CI:64 ]Extends [RID: 4]FieldList [RID: 6]MethodList
1.[ 0x ]0 [ 0x ]1 [ 0x ]0 [RID: 2] 0 [RID: 4] 1 [RID: 6] 1
2.[ 0x ]100001 [ 0x ]30 [ 0x ]35 [RID: 1] 1 [RID: 4] 1 [RID: 6] 1
3.[ 0x ]100001 [ 0x ]46 [ 0x ]35 [RID: 1] 1 [RID: 4] 1 [RID: 6] 3
The table has a field called name – which is the name of the type. The value of the names field are actually offsets into the #Strings stream. So if you care to compare (I have provided a listing of the #Strings stream earlier), what the table is saying is that this assembly has 3 types which are namely , Calc and HelloWorld. At this point let me drop the source code of the HelloWorld.cs file from which this assembly was compiled:
//HelloWorld.cs
namespace HelloWorldSample
{
using System;
public class Calc
{
public int Add(int a, int b)
{
return a+b;
}
}
public class HelloWorld
{
public static void Main(string[] args)
{
try
{
Calc c = new Calc();
Console.WriteLine(c.Add(10,20));
}
catch(Exception e)
{
Console.WriteLine(e.Message);
}
}
}
}
As you can see the source does define two classes called Calc and HelloWorld. A type in .Net terms is a very strict entity that encompasses value types and reference types and type entities like classes, structures, enums etc.
Here is a table that shows me what types are being referenced in this assembly:
[RID=1] Table TypeRef
[CI:75 ]ResolutionScope [STRING]Name [STRING]Namespace
1.[RID:35] 1 [ 0x ]29 [ 0x ]22
2.[RID:35] 1 [ 0x ]73 [ 0x ]60
3.[RID:35] 1 [ 0x ]90 [ 0x ]22
4.[RID:35] 1 [ 0x ]A2 [ 0x ]22
The types are Console, Object, Exception etc
Here is a list of all the methods that are defined within this assembly;
[RID=6] Table Method
[ DATA]RVA [ DATA]ImplFlags [ DATA]Flags [STRING]Name [ BLOB]Signature [RID: 8]ParamList
1.[ 0x ]2050 [ 0x ]0 [ 0x ]86 [ 0x ]51 [ 0x ]A [RID: 8] 1
2.[ 0x ]2064 [ 0x ]0 [ 0x ]1886 [ 0x ]55 [ 0x ]10 [RID: 8] 3
3.[ 0x ]2078 [ 0x ]0 [ 0x ]96 [ 0x ]5B [ 0x ]14 [RID: 8] 3
4.[ 0x ]20C8 [ 0x ]0 [ 0x ]1886 [ 0x ]55 [ 0x ]10 [RID: 8] 4
You can see the method name add, Main etc here, as well as auto-generated methods such as .ctor (the constructor).
In short if you examine the metadata headers carefully you get an idea about the extent of meta information being stored about your program and what can be supported by any reflection based API. A complete listing of the metadata tables are provided in the ECMA specs, so that’s your best reference, Mr Lidin’s book also does a good job of this and I used both of these while I was writing trashbin.
So there – I have said what I wanted to say about trashbin. This might be a good starting point to explore the exe and dll file formats.
Before I stop, I must add, do take a look at the switches
/exp display export table
/imp display import table
these show the methods that are exported from an exe/dll (so that it maybe dynamically loaded and invoked from another process) and shows what method calls it invokes. This is kind of the ‘metadata’ of the old unmanaged world.
Typing trashbin ntdll.dll /exp in the system32 folder will give you a listing of the native NT api, which might make for good reading again (if that sort of stuff gets your interest).
Most of the content of this blog entry was presented at the Bangalore .Net User group as one of the UG meetings talks.
All that said, here is the real reason I started out on this blog entry: Recently Kaushik Srenevasan pointed out a bug in trashbin’s #Strings parsing routine. Kaushik is a ‘MS Student Ambassador’ and has a blog here: http://dotnetjunkies.com/WebLog/kaushik/
I had, for some reason, assumed the presence of padding 0s when I had written the original code. That more or less always worked, because the #Strings stream was expected to be padded with zeros till is a 4 byte boundary.
Its has been patched according to Kaushik’s suggestion, so Thank you Kaushik.
Trashbin can be downloaded from its homepage here:
http://www.thinkingms.com/pensieve/homepage/work/trashbin/trashbin.htm
As foot note I must say that the Microsoft Dumpbin program that ships with Visual Studio does most of the things that trashbin does and a few additional things. Dumpbin however does not display metadata information from managed PE files yet.