15th Jan 2003, 12.30 am, Wednesday
Roshan James
rosh_spark@gmx.net
rosh_spark@hotmail.com
This implementation is based on the .Net GC as described in Jeffrey Richter's MSDN article. I started out writing the GC because (a) I had heard that the .Net GC was one the best GCs ever written (b) I was mortally afraid of something like this because I had defined the limitations of my understanding to not be able to include something as complex as a GC (c) the article was very good and made me feel like I understood the whole thing. Eventually I began wondering if there are any unanswered questions left. There were many regarding the basic implementation, as described by Mr. Richter, and they have popped up in a more pertinent way now that I have tried to implement a GC of my own, they probably wouldn't have if I hadn't tried this. So largely writing the GC was out of my own curiosity and to possibly get some company of people at my level, who would like to try a more childish implementation before they move onto the real thing.
Jeffrey Richter's GC Algorithm articles at MSDN Magazine
http://msdn.microsoft.com/msdnmag/issues/1100/GCI/
http://msdn.microsoft.com/msdnmag/issues/1200/GCI2/
Spirit
To avoid potential show stoppers let me say that at the end of this article
you are not going to go home with a usable GC. Also this entire thing is just
probably a piece of wasted engineering through which I wanted to stretch my own
imagination and in the process get some questions about the .Net gc ironed out.
Unless the GC interests you this is probably a waste of time to read.
This has been developed when I have had no formal education that covers GCs in
any way and I haven't peeped into rotor to see how it is actually being done. I
just wanted to keep it real simple and understandable from the non-compscience-proffessor
perspective.
This implementation is a long way of asking some basic questions, that is why
the questions index.
Please accept it in the spirit in which it is offered.
How does the GC locate the roots of the applications ?
How does the GC know the references that a given object will contain ?
How does back patching pointers work ?
How is the finalize queue implemented ?
How is weak reference table implemented ?
Generations : Is this on the right track ?
Disclaimer.
Implementation:
How objects are allocated on the heap
References to Objects and Object that hold references :
The Roots of the application
Code
Mark and sweep
The ref table and fref stack
Finalizers - finalize queue and freachable queue
Weak references
Generations
Finally
Acknowledgements
This is only meant to be my interpretation of the GC algo and is written in the
spirit of the .Net GC. This is not meant to be factually correct about the
actual GC in terms of algo, design choices or data structures.
This GC in its current form is not practically useful and cannot be plugged into a
'runtime'
I will assume that you have read the articles in msdn magazine and continue
from there. If you haven't, please do so now. If you have lets get straight into
the matter.
In the first few sections I will describe how I have created the equivalent of a
heap, objects, reference within objects and application roots.
The central entity in the gc is the managed heap. It is where the gc allocates space for the memory requests by the application. In the managed environment every block of memory requested by the program is used to hold some specific data type.

The heap represents the managed heap stack and heap_top is top pointer. In my GC the memory used for the heap is allocated by the function gc_init() dynamically, This function also initializes the other data structures of the gc. Of course in the real runtime the way space is initially allocated to the managed heap would be rather different.
How objects are allocated on the heap
The main structure that you will need to be familiar with is 'object'. An 'object' describes an instance of an object/a block of allocated memory on the managed heap.

The important member of 'object' to note here is object::size. Whenever a memory block is to be allocated on the managed heap, the gc is to be told the number of bytes required for it. This size is the valued contained in the object::size.
The GC creates an instance of object on the heap, immediately following which
is the memory area of size bytes. Whenever the gc allocates memory on the
managed heap for an application, the total memory used on managed heap (in
bytes) will be:
space allocated = sizeof(object) + size_requested

The 'object' class is thus a header information for each memory block, that defines certain characteristics of the
memory region located immediately after it. On the managed heap the request of
a memory allocation would look as shown above.
The managed heap consists of many such allocations - each allocation
starting with a block defined by the 'object' structure that defines the memory
region following it.
Notice from the diagram that the allocation of memory starts from the bottom of
the heap and proceeds upwards. The first 'object' will have the same base
address as the heap base. The next object will be at location
sizeof(object)+object::size and so on.

References to Objects and Objects that hold references :
Now any object can contain references to other objects right ? Since this is meant to be a toy gc I decided to implement references within objects as follows. Every memory allocation is treated as an object. A reference to any object is created through a root_entry. Every object (optionally) contains an array of root entries. A root_entry in physical terms is a structure that contains a pointer to an object and certain other members. The number of root entries that can be contained by an object is simply object::size/sizeof(root_entry). I use the memory allocated per object to (ie the dark blue area immediately above each red area in the above diagram) to be the region used for creating the objects root_entries.
A root_table is an array of root_entries. Each object can be asked to return a root_table which acts as an array of root_entries. The maximum number of root_entries that the object can contain is determined by its size. Each root entry can contain a short name string and a pointer to another object. Thus an object (object structure + its attached memory region) would now look as follows :

Now, I know that this is not how it works in a real gc at all.
This is just to simulate the effect of an object that has references to other
objects. This way I can add any number of pointers to an object (limited by the
amount of space allocated to it) and reference other objects from this one.
In a real situation the references contained in an object will be defined by the
kind of class that the object is an instance of. The runtime will have this
knowledge according to the definition of the class.
If the gc is going to be used for some practical purpose I will have to remove
this idea of creating root_entrys in the user requested space. I will need
another mechanism to detect the presence of the references within an object. But
we need not cross that bridge now, for our purpose of simulating objects that
have references to other objects this will do.
The Roots of the application :
The roots of the applications are the set of all the pointers held by the application to various objects used by the application. The complete root set of the application are all pointers that reside in applications global area/ static heap/ stack - local to method calls .. etc.
Now the way I look at it, there is a mechanism by which the runtime can access all the roots of the application and thus decide which objects they reference on the managed heap. The set of all reachable objects on the managed heap would be the set of all objects reachable from the applications roots, and objects referenced by each of those objects, and so on recursively.
I just needed some mechanism to have set of pointers to some
objects on the stack. In the toy gc I have chosen to allocate a rather large
object outside of the heap and used that as a global main object. The root
entries in the memory region of that 'main' object is treated as the applications
roots.
So if I am to simulate the effect of a method creating an instance of object on the managed heap, In
my GC I would (1) call gc_alloc() to allocate the object on the
managed heap and (2) use the address returned by gc_alloc as an entry in the root_table
exposed by the main_object - in effect adding one more
pointer to my application roots.
The point is that the exact way in which the gc simulates .Net runtime's accessing the program's roots is not an important factor here, as long as I can simulate the effect of the process.

In reality each object on the heap would be referenced not just from the roots of the application but also from other objects, and might also contain references to other objects within itself. The following is more realistic :

The present setup is similar to what Mr. Richter had at the beginning of the article. This shall for the basis on which I build up the rest of the gc and questions and decisions of the rest of the implementation.
Question : How does the real GC keep track
of the roots of the application ? The roots are spread out over many memory
areas of an application and further since the application has actually run
through the JIT at the time of execution, these need to be location accessed by
actual machine code and at times even parameters of operations. How are these
tracked by the GC ?
How are the references in a object of a particular type tracked by the GC ? this
is a known set unlike the roots of the application, and can be determined by
looking at the
definition of the class. However does the GC maintain a list of locations for
each class, that are references, in the classes instances ? How are these tracked ?
Maintaining such lists are both expensive and memory hogging for the GC aren't
they ?
These are the definitions of some of the important things mentioned here : possibly worth a look.
| #object | |||
struct object
{
int flag_type_indicator:2;
int flag_used:1;
int flag_suppress_finalize:1;
int flag_modified:1;
object ** pthis;
object ** ppwref;
int size;
int rtab_count;
int rtab_max;
};
|
|||
| #root_entry | |||
// the following structures are used for simulating
// pointers/refrences/roots
//
//root_entry - denotes single root
struct root_entry
{
char name[5]; //name of pointer
object ** ppob; //pointer to object
int flag_valid:1;
};
//
// root_table - an array of roots
//(usually located in the extra bytes after
// an object)abstracted as a table.
struct root_table
{
object * parent_object;
root_entry * rtab;
int * rtab_max;
int * rtab_count;
};
|
|||
| #weak references | |||
//weak_reference[]s will form the wref tables
struct weak_reference
{
int flag_type_indicator:2;
int flag_used:1;
weak_reference ** pthis;
object ** ppob;
};
|
|||
| #gc calls | |||
// GC
//
extern void gc_collect(int generation=0);
extern void gc_init();
extern object ** gc_alloc(object *pob);
extern void gc_register_for_finalize(object ** ppob);
extern void gc_clear_freachable();
extern object ** gc_add_wref(object ** ppob,int flag_wref_type);
extern void gc_reset();
//contains the root table (global) of the application
extern root_table main_rtab;
|
|||
The roots of the application and the references of objects to each other form a graph of objects in the computer memory. All these objects reside in the managed heap. Marking is basically the process by which the GC determines which objects are used. The basic approach is that the GC does a recursive depth first search of the all objects visitable from any given object starting at the roots of the application. Every time an object is encountered it is marked as used, by setting a flag in the 'object' structure.

Consider the above example. The graph formed among the objects will look like this :

A DFS (depth first search) of this graph would not visit the two unreferenced objects and these will not have their 'used' flag set. After the mark phase, the GC will have a colored graph of the heap, something like this :

Here the unreferenced objects have been shown as grey. The GC now goes for its compaction phase - the sweep of the managed heap. In the sweep phase, it collects toward the bottom of the heap, all the objects that have their used flag set. Cool ? Since I can traverse through all the objects on the heap by the simple formula :
object * pob =
(object *) heap; //first object
pob = (object *)((char *)pob + pob->size + sizeof(object));
//next object
Now that I can visit each object, sequentially starting from the bottom-most, I need to visit them in O(n) time and decide to push them down in memory by a simple memcpy() (lets just take that as atomic for now). Most of this however would be known to you if you have been through Mr. Richter's article. After my compaction phase I would not have these grey objects left on the stack, instead of which I would have only the objects that have their used flag set during the 'mark' phase. Thus the GC has swept all garbage away !
Now the only thing to do is to patch all the pointers in the application so that they point to the new locations of the objects..... !!! what ? In the above example its only one object that moved down in memory. But that is not the case in a real situation, there are many objects that have moved. How am I going to find all the pointers that point to each of those objects ?
Question : How are pointers/references back
patched after the objects have been moved ?
(1) Does each object maintain an array of all the addresses of all pointers that
point to it ? If this is so (a) this will have to dynamically grow as references
to am object change (b) memory management of these lists per object would become
a major issue (c) its an expensive task as the runtime has to constantly keep
track of need to change these.
(2) Does the GC create such a structure temporarily during its mark phase ? per
object, per reference ? So when an object dies that list is to be searched for all pointers
to that object ? What is the efficiency of such a search algorithm, what are the
data structures used ? Further the application will have to be frozen when this
happens, because if the application creates references to an object after the
references to it have been considered, the new reference will not get
back patched...
(3) Does the GC search the entire applications memory graph to find references to
each of the objects ? Unlikely, the DFS is expensive, doing it twice is a rather
sad thing.
How, what is the mechanism used ?
The following is the approach that I have chosen to work around these - now you know why I call this a toy GC. :)
I decided to route the pointers to objects through an indirect reference. Each object on the stack has only one unique location that knows about its actual address in memory. The table of all such pointers to objects on the heap is called the ref table (short for reference table). The following picture shows the relation of an object with its ref table entry.

Every object contains a pointer to its reference table entry called the 'pthis' (makes me smile). If the object is moved around in memory the object goes on to to change the value in *pthis to this and the reference is patched. So when the GC moves objects around in memory, it calls the back_patch_ref() method of each object when it moves. This method will modify the reference table entry and thus the ref table will always to point to the valid location of the object.
Now all references in the application will be actually references to the ref table, which will contain the valid address of the object in memory. So the above diagrams to pointers among objects and app roots will actually look like this, by adding the ref table to the picture.

I have not shown the links from the object to ref table (through pthis).
This solves the problem of back patching references, but brings
along some baggage of its own.
Lets for a minute assume that the MS implementation actually used a ref table.
Does that mean that every access to an object in .Net is through an indirect
memory addressing scheme, two memory fetches per access, every access ? Isn't
that a little expensive ? Lets argue this both ways, if this was actually the
case, the ref table for a 32bit processor is 32bits per ref. That means that if
an application has 10 thousand objects, the ref table would come to < 40k, this
can reside in the processor cache, frequently used sections will, so this will
be rather fast.
Of course, this is only as long as they have 'ref table'.
Another possibility is that during DFS traversal, they would have changed the
addressing mechanism of the application to actually call a stub. The stub when
invoked will go on to back patch the actual reference in the application with
the latest address available in the ref table. This is simply empty fantasizing,
they have to have an asm instruction that they can back patch safely in the
context of the application so that the application doesn't break when the stub
is invoked. Back patching wouldn't be a hassle in a Java style fully interpreted
framework, but in an actual runtime demand compiled JIT its not a simple stunt.
That said, lets consider another aspect, when an object is being allocated by the gc using gc_alloc(). The gc return the address of the ref table entry for the object, ie the value of the objects 'pthis'. How does the entry in the ref table get created ?
The ref table entries cannot be moved around in the ref table because there are actual pointers in the application that point to these entries. As objects are created and they die the ref table will have used and unused entries scattered throughout. So I will have to search the ref table linearly till I find a free entry. In worst case this will be O(n). That is not good. So I seem to have moved the problem of compaction from one place to another and seem to have removed famous O(n) allocation.
Is there a workaround ? Here is mine. I decided that if MS is moving along these tracks then they would still have a O(1) allocation. I just need to find free locations in my ref table in O(1) time. Stacks are O(1) data structures. I decided to give the ref table a supporting stack to keep track of free locations in the ref table. This stack is called fref stack.

The grey areas are vacant slots in the ref table. So every time I need to get a free location in ref table, I need to pop from fref and use the value as an index in my ref table.
void * free_ref
= ref_table + fref_stack [ fref_top -- ];
pob->pthis = (object **)free_ref; //get pthis to point to the
free entry
*(pob->pthis) = this; //make it point to
'this'
Cool ? So that's about my approach to the problem, that comes along with a serious performance hit regarding the indirect addressing for accessing objects.
Freeing an object's ref table entry : When an object is being released by the GC, the GC uses the object's pthis to locate the ref entry, calculates the offset of the entry in the ref table and pushes the offset in the fref stack, so that it can be used by other variables.
While not relevant here, what if the real GC also works like this ? what sizes will ref and fref have, after all these finite arrays. Well, I would say that multiple ref+fref combinations can be cascaded as a linklist. You could search the linklist serially to find a fref_top that is non zero to find a free entry somewhere. Optionally you can ensure that your first node in the linklist is always one that has a free node - this is simple to implement. Every time an object releases its entry the ref table can check if its the first node in the ref link list, else make itself the first. Similarly if a object is allocated and that causes fref top to be zero, then the node is moved to the end of the linklist, and if the next node has a zero fref (it means that all nodes between these have zero fref top) an empty node is added to the start of the linklist.
So at this point we have (to summarize) :
A method for allocating objects on the heap in O(1) time complexity
A method for marking the heap using DFS of
the application's graphs = time complexity of the DFS
A compaction method of the heap that is O(n) for the heap, O(1) per object -
assuming atomic memcpy()
A O(1) method for backpatching references
A O(1) method for deallocating an object - actually to remove the ref table
entry to the object
A indirect memory reference over head to access every object on the heap
Data overheads of the object structure per heap object, for the ref table and
fref stack.
That includes pros and cons. Lets move on.
Finalizers - finalize queue and freachable queue
The finalizer queue is an array of pointers to objects on the heap (indirectly through the ref table of course, from this point on when I mean pointer to an abject I will mean a pointer to the ref table entry that points to the heap, if otherwise I will say so explicitly). Now the finalizer queue needs the following properties.
It must be efficient to add an entry to this list
It must easy to remove an entry from this list
It must be easy to move over all entries in this list easily.
It ok to move the position of entries in this list because relative/absolute positions are not important.
No one links to the entries in this list unlike in the case of the ref table.
The Finalize queue is implemented as follows. It fundamentally
behaves like a stack, at least in terms of adding elements. It pushes new
entries onto the stack, the the finalize_top is the top pointer.
Unlike a conventional stack where elements are removed only from the top, here elements can be removed from anywhere - every time an
element at a interior position is to be cleared, I pop from the stack and insert the
popped element at the vacant middle
position. When come to think about it, its again an O(1) operation - though here
the cost of copying an entry makes a relatively expensive fixed cost step. Then
again if you compare that with the overhead of copying 4 bytes to the cost
implementing the finalize() for the object, The copy expense can probably be
made up for easily if the finalize handling was a little optimized - this is
open to debate... please do.

The freachable is implemented in the same way as the finalize is - I cant think of a case where the freachable needs to drop entries from random locations. That need arose in finalize because any object that needs a finalizer call can go out of use at any time. In comparison, all the freachable entries do need finalization - the .Net framework assures no order in which finalizers will be called among the various objects, so this can be a simple stack implementation. So I am not drawing anything here.
Question : How is the finalize queue implemented in the real framework ? Is this how it happens in the real framework ?
This is one more thick thing, bear with me.
The thing about weak references is that there are actual pointers from the application that point to these entries, so they cannot be moved around in the weak reference table (which shall now on be called the wref table).
It must be cheap to create a new weak ref. Thus it must be easy to make an entry in wref - it must be cheap to find a free slot in wref.
It must be cheap to remove weak references - this happens when there are no references in the program to the wref entry.
It must be cheap to set a wref entry to 0 and still retain its value on the wref table (because there are pointers in the application to the wref table entry that have yet to check to see if their object is still in memory)
Finally it must be easy to search through all the wref entries efficiently (to see which objects are marked as used)
Now if I use a model like the finalize queue, I have no way of preserving the positions of the entries, those are important here - app roots refer to these. Further the entries in the finalize queue had no need of knowing if they were being referred by anybody, so it was cheap for them to die when they don't point to a valid object. Here after the object dies, the value 0 has to be maintained as long as the program has a reference to the wref entry.
I cant use a simple idea like ref table earlier - the problem is that I need to access all the used entries fast. In the ref table I will have the traverse the entire array to hit all the used entries. It didn't sound like a good idea to maintain two supporting stacks, one holding used entries and one holding free entries to make insertion and removal cheap in a array where I cant move around the entries. Further it still doesnt let me know how long I will have to preserve entries there.
When you come to think about it, wref table entries behave a lot like actual objects of the heap. They need to know their life time - ie how long they are referenced by the application. Further if I rephrase the 'cannot be moved around' to need to be accessed from a fixed location - a lot of problems seem to lift.
The weak ref entries are treated as objects, similar to objects
on the heap. But they are slightly special objects in the sense that they reside
on a separate region, not the heap - a region where only wref objects reside.
Life time of these objects are determined during the mark phase of the GC, same
as for other objects. Weak references that are not marked as used are lost during compaction
of the wref table. To keep it easy to access all the used entries in the wref
table, I need to compact the entries. If I compact them, how can I have pointers to the
entries ? By treating them similar to the way as I treat other objects - I reference them
indirectly through the ref table. :)

Its the same diagram as before, the heap has been sidelined a bit and some entries are added to the ref table to point to the wref table entries. ref table entries to the wref table have been given a dark shade just to make them more noticeable - they are in effect not treated any differently from ref table entries to normal objects on the heap. Similarly, there can be any arbitrary number of references to a ref table entry, of a wref object.
Now what would a wref object look like ? It would essentially need a reference to the object that it is a weak reference of. Next it would need a 'pthis' so that it can back patch its address after it is compacted in the wref table. And it needs a flag that can be set during the mark phase.

The above picture shows the object and its ref table relation in grey lines (you are already familiar with that). Further it shows the same kind of relation for the wref object in blue lines. The pob member of the wref object is the pointer to the object. Similarly the object keeps a pwref back to the wref object, I will cover this later. These are shown in red.
How does this work ?
The wref table works as follows. During the mark phase wref
objects are marked as used, same as normal objects on the managed heap. Next after the mark is over wref is scanned
sequentially from 0 to wref top. This is a simple linear scan. Here objects that
don't have their used flag set are overwritten by objects found higher up the
stack that are moved down. This can be done in the heap object style
'compaction' or the finalizer queue style pop and overwrite (the later would be
cheaper). Each used object has its pob checked to see if the heap
object it is referring to is still valid. If not, pob is set to zero. Further, if
the wref object is being moved down in memory it back patches the ref table
using its 'pthis'.
After compaction is over the wref top is set to the new value. Or if the pop
method (which I have implemented) is used, the top will be already be at its correct value.
Similarly inserting into the wref table is just a push
operation.
The 'pwref' member for the object has been added because I see no need for a
heap object to have multiple wref objects. In case the user requests a wref
object, the system checks for nonzero pwref. If so it just returns to the user
the value of pwref. If pwref is zero then a new wref object is created, pwref is
set and then returned. Having this is at the cost of having an additional 4
bytes in the object. It gives us the benefit of not having multiple wref entries
to the same heap object.
Thus all the operations required on wref table can complete in bounded fixed finite time per wref entry. Thus all operations are O(1) for each wref object.
Short and Long weak ref tables are essentially the same, the difference is only according to when the table scan is called. One table is scanned after the initial mark phase. The other table is scanned after the freachable also marks used objects. Both can use the same approach.
Question : How does the .Net GC handle weak references ? How different is it from this ?
The last part of the GC. Sorry I don't have any nice pictures on this section, I don't have access to Visio anymore, you will have to bear with my writing skills.
'Generations' is largely a construct for improving the performance of the GC. Like Mr. Richter points out its faster to do all the GC operations for a part of the heap than for the whole heap. Objects are classified to belong to various generations depending upon how long they have lived. How long they have lived in this context is the equivalent of how many sweeps of the GC they have survived. Since objects are created in time-space order on the heap-stack, all the objects of a generation are close together. The heap is partition into three areas, generation 0, 1 and 2.
In the implementation, one will find the pointers :
heap_base - which always has the base address of the heap
heap_top - which has the location of the next free byte on the heap
heap0 - all objects between heap_top and heap0 are gen 0 objects
heap1 - all objects between heap0 and heap1 are gen1 and those between heap1 and heap_base are gen2
So the task of implementing generations resolves down into 2 functions for the GC (1) how are heap0 and heap1 set ? and (2) how do you sweep a particular part of the heap only.
Lets cover 2 first :
Sweep: Lets assume that the GC has just finished a sweep. All objects left have
survived so they are at least generation 1 objects. Now, program execution
continues and more objects are added to the heap in generation 0. Lets assume that
none of the old objects are modified at all during this part of the run.
When the
GC sweeps generation 0 objects :
Remember none of the objects in gen1 or 2 have been modified. Therefore none of
them will have any references to a gen0 object. So the reason for survival of a
gen0 object is only due to a root reference to it or a reference for another
object in gen0. Therefore the GC's DFS (recursive depth first search) can just
ignore any references from a gen0 object to higher generation as these paths
will never lead back to any gen0 object and its only gen0 objects that we mean
to collect. So the DFS is pruned so to speak and is now faster. It will also
mark only gen0 objects. Now all the above algorithms hold good as long as we
just assume that the managed heap actually begins from the current address of
heap0 (heap0 - heap_top is the address range for gen0 objects remember).
The same argument applies for generation 1 sweep. Here we assume that no
generation 2 objects are modified, and so we don't DFS into them.
Now, what if older generation objects change ? Coming back to our gen0 sweep, if a gen1 object has been written into, it might contain a reference to a gen0 object. If this object is not marked as used through a traversal of only gen0 objects it would have been collected by the GC. Hence the GC does separate DFS on each of the modified older generation objects. From each of these objects only paths through gen0 objects (if any will be traversed, because our DFS will not traverse down any references to a gen1 or gen2 object ). These individual DFS from these objects are usually short tree spans - the size of which is purely determined by the region of the graph in the gen0 region. Thus now even if objects of any generation are changed we have a method for doing a gen0 sweep.
In steps:
Do a DFS from the application roots, ignore all references to gen1,2 objects
Do DFS from each modified gen1 and gen2 object, ignore all member references to gen1,2 objects
Similar logic can be applied to a generation 1 sweep. A generation 2 sweep will do a full DFS of the entire heap and a compaction of the entire heap.
Just a minute, how do we know which objects are modified ? Good
question, there seems to be some magic happening here - Mr. Richter mention the
GetWriteWatch() kernel call. Let me just say that the msdn
reference about the function says that it returns a list of modified memory
pages of a region set up for memory watch. Well what I think that means is that
the OS API can help in tracking changes :) I didn't use that in my
implementation because (a) I wanted it to be portable to places where people can
just see the algo in effect even is they didnt have OS support for it and (b) I
just didn't understand how to use the function (we all (some of us) have our
memory blocks about what we are capable of doing, and this just frightened me
off), probably when I find a sample usage of the function I will be more
comfortable.
In the toy GC when a root_entry in an object is updated I mark
the object as modified. If the objects flag is newly set I add a direct
reference to this object in of two arrays depending on (and if) it belongs to
gen1 or gen2.
If a gen0 sweep is done, then the modified flag and entries are reset for all
gen1 objects. If a gen1 sweep occurs modified flags and entries of gen1 and gen2
objects are reset. If a gen2 sweep occurs, modified objects aren't important,
these flags are unset during the compaction phase.
Come to think of it, the way the GC is implemented as 3 generations, how rarely they expect gen1 sweeps to occur in comparison to gen0 sweeps, the three generation architecture is nice. More generations would have added to the overhead of keep the modification information, multiple DFSs etc and would probably not have added much to performance. These are not things that can be vacantly claimed, statistics and algorithmic analysis should prove it.
Question : Is this on the right track ?
That was a lot of typing and I feel tired. I usually have to be dragged around to write documents - but this was somehow a pleasure to work on. Basically because I am curious about the GC and secondly because it was easy to draw these diagrams using Visio. Amazing tool, as each diagram kept coming out, I felt I had to write about it.
The GC implementation isn't complete in the sense that I haven't thought much about optimizing the DFS by not traversing unmodified sub-branches. I haven't thought threading and synchronization issues. I haven't thought about references to actual objects that might only be a part of a allocated memory block ... etc, to name a few.
I am hoping to get a few answers through this effort and possibly consolidate that into an addendum to the MSDN article and get to know the GC better.
Thanks, Sid for the weak reference
brainstorming. Thanks Pooj for the finalize queue thoughts. Thanks Don and Arun
Dev for letting me have someone to talk to, in the initial days when I was up to
this.
Thanks Pooj for the pile of errors you pointed out and the patience.
The article will be hosted at
http://www27.brinkster.com/sparksite/work/gc/v1/gc.htm
You can check here for updates.
Some Garbage Collector related urls
A Real-time Garbage Collector based on lifetime of objects - MIT
http://lieber.www.media.mit.edu/people/lieber/Lieberary/GC/Realtime/Realtime.html
Boehm-Demers-Weiser Conservative GC for C++
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
Reference Objects and Garbage collection - feature description - Sun/Java
http://developer.java.sun.com/developer/technical
A garbage collection framework for c++
http://www.codeproject.com/cpp/garbage_collect.asp
Gorik's Garbage collection page
http://www.cs.rpi.edu/~gorik/garbage
The Garbage Collection page - Richard Jones
http://www.cs.ukc.ac.uk/people/staff/rej/gc.html