| This implementation is based on the .NET
GC as described in Jeffrey Richter's MSDN article. I started out
writing the GC because (a) I had heard that the .NET GC was
one the best GCs ever written (b) I was mortally afraid of something
like this because I had defined the limitations of my understanding
to not be able to include something as complex as a GC (c) the article
was very good and made me feel like I understood the whole thing.
Eventually I began wondering if there are any unanswered questions
left. There were many regarding the basic implementation, as described
by Mr. Richter, and they have popped up in a more pertinent way
now that I have tried to implement a GC of my own, they probably
wouldn't have if I hadn't tried this. So largely writing the GC
was out of my own curiosity and to possibly get some company of
people at my level, who would like to try a more childish
implementation before they move onto the real thing.
Jeffrey Richter's GC Algorithm articles at MSDN Magazine
http://msdn.microsoft.com/msdnmag/issues/1100/GCI/
http://msdn.microsoft.com/msdnmag/issues/1200/GCI2/
Spirit
To avoid potential show stoppers let me say that at the end of
this article you are not going to go home with a usable GC. Also
this entire thing is just probably a piece of wasted engineering
through which I wanted to stretch my own imagination and in the
process get some questions about the .NET gc ironed out. Unless
the GC interests you this is probably a waste of time to read.
This has been developed when I have had no formal education that
covers GCs in any way and I haven't peeped into rotor to see how
it is actually being done. I just wanted to keep it real simple
and understandable from the non-compscience-proffessor perspective.
This implementation is a long way of asking some basic questions,
that is why the questions index.
Please accept it in the spirit in which it is offered.
Questions Index
How
does the GC locate the roots of the applications ?
How does the GC
know the references that a given object will contain ?
How
does back patching pointers work ?
How
is the finalize queue implemented ?
How
is weak reference table implemented ?
Generations
: Is this on the right track?
Index
Disclaimer
Implementation
How objects are allocated
on the heap
References to Objects and Object
that hold references
The Roots of the application
Code
Mark and sweep
The ref table and fref
stack
Finalizers
- finalize queue and freachable queue
Weak references
Generations
Finally
Acknowledgements
This is only meant to be my interpretation of the GC algo and is
written in the spirit of the .NET GC. This is not meant to be factually
correct about the actual GC in terms of algo, design choices or
data structures.
This GC in its current form is not practically useful and cannot
be plugged into a 'runtime'
Implementation
I will assume that you have read the articles in msdn magazine
and continue from there. If you haven't, please do so now. If you
have lets get straight into the matter.
In the first few sections I will describe how I have created the
equivalent of a heap, objects, reference within objects and application
roots.
The central entity in the gc is the managed heap. It is where the
gc allocates space for the memory requests by the application. In
the managed environment every block of memory requested by the program
is used to hold some specific data type.

The heap represents the managed heap stack and heap_top is top
pointer. In my GC the memory used for the heap is allocated by the
function gc_init() dynamically, This function also initializes the
other data structures of the gc. Of course in the real runtime the
way space is initially allocated to the managed heap would be rather
different.
How objects are allocated
on the heap
The main structure that you will need to be familiar with is 'object'.
An 'object' describes an instance of an object/a block of allocated
memory on the managed heap.

The important member of 'object' to note here is object::size.
Whenever a memory block is to be allocated on the managed heap,
the gc is to be told the number of bytes required for it. This size
is the valued contained in the object::size.
The GC creates an instance of object on the heap, immediately following
which is the memory area of size bytes. Whenever the gc allocates
memory on the managed heap for an application, the total memory
used on managed heap (in bytes) will be:
space allocated = sizeof(object) + size_requested

The 'object' class is thus a header information for each memory
block, that defines certain characteristics of the memory region
located immediately after it. On the managed heap the request of
a memory allocation would look as shown above.
The managed heap consists of many such allocations
- each allocation starting with a block defined by the 'object'
structure that defines the memory region following it.
Notice from the diagram that the allocation of memory starts from
the bottom of the heap and proceeds upwards. The first 'object'
will have the same base address as the heap base. The next object
will be at location sizeof(object)+object::size and so on.

References to Objects and
Objects that hold references :
Now any object can contain references to other objects right ?
Since this is meant to be a toy gc I decided to implement references
within objects as follows. Every memory allocation is treated as
an object. A reference to any object is created through a
root_entry. Every object (optionally) contains an array of root
entries. A root_entry in physical terms is a structure that contains
a pointer to an object and certain other members. The number of
root entries that can be contained by an object is simply object::size/sizeof(root_entry).
I use the memory allocated per object to (ie the dark blue area
immediately above each red area in the above diagram) to be the
region used for creating the objects root_entries.
A root_table is an array of root_entries. Each object can be asked
to return a root_table which acts as an array of root_entries. The
maximum number of root_entries that the object can contain is determined
by its size. Each root entry can contain a short name string and
a pointer to another object. Thus an object (object structure +
its attached memory region) would now look as follows :

Now, I know that this is not how it works in a real
gc at all. This is just to simulate the effect of an object that
has references to other objects. This way I can add any number of
pointers to an object (limited by the amount of space allocated
to it) and reference other objects from this one.
In a real situation the references contained in an object will be
defined by the kind of class that the object is an instance of.
The runtime will have this knowledge according to the definition
of the class.
If the gc is going to be used for some practical purpose I will
have to remove this idea of creating root_entrys in the user requested
space. I will need another mechanism to detect the presence of the
references within an object. But we need not cross that bridge now,
for our purpose of simulating objects that have references to other
objects this will do.
The Roots
of the application :
The roots of the applications are the set of all the
pointers held by the application to various objects used by the
application. The complete root set of the application are all pointers
that reside in applications global area/ static heap/ stack
- local to method calls .. etc.
Now the way I look at it, there is a mechanism by
which the runtime can access all the roots of the application and
thus decide which objects they reference on the managed heap. The
set of all reachable objects on the managed heap would be the set
of all objects reachable from the applications roots, and objects
referenced by each of those objects, and so on recursively.
I just needed some mechanism to have set of pointers
to some objects on the stack. In the toy gc I have chosen to allocate
a rather large object outside of the heap and used that as a global
main object. The root entries in the memory region of that 'main'
object is treated as the applications roots.
So if I am to simulate the effect of a method creating an instance
of object on the managed heap, In my GC I would (1) call gc_alloc()
to allocate the object on the managed heap and (2) use the address
returned by gc_alloc as an entry in the root_table exposed by the
main_object - in effect adding one more pointer to my application
roots.
The point is that the exact way in which the gc simulates
.NET runtime's accessing the program's roots is not an important
factor here, as long as I can simulate the effect of the process.

In reality each object on the heap would be referenced
not just from the roots of the application but also from other objects,
and might also contain references to other objects within itself.
The following is more realistic :

The present setup is similar to what Mr. Richter had
at the beginning of the article. This shall for the basis on which
I build up the rest of the gc and questions and decisions of the
rest of the implementation.
Question
: How does the real GC keep track of the roots
of the application? The roots are spread out over many
memory areas of an application and further since the application
has actually run through the JIT at the time of execution, these
need to be location accessed by actual machine code and at times
even parameters of operations. How are these tracked by the GC?
How are the
references in a object of a particular type
tracked by the GC? this is a known set unlike the roots
of the application, and can be determined by looking at the definition
of the class. However does the GC maintain a list of locations for
each class, that are references, in the classes instances? How are
these tracked?
Maintaining such lists are both expensive and memory hogging for
the GC aren't they?
Code
These are the definitions of some of the important
things mentioned here : possibly worth a look.
| |
|
#object |
|
| |
|
struct object
{
int flag_type_indicator:2;
int flag_used:1;
int flag_suppress_finalize:1;
int flag_modified:1;
object ** pthis;
object ** ppwref;
int size;
int rtab_count;
int rtab_max;
};
|
|
| |
|
|
|
| |
|
#root_entry |
|
| |
|
// the following structures are used for simulating
// pointers/refrences/roots
//
//root_entry - denotes single root
struct root_entry
{
char name[5]; //name of pointer
object ** ppob; //pointer to object
int flag_valid:1;
};
//
// root_table - an array of roots
//(usually located in the extra bytes after
// an object)abstracted as a table.
struct root_table
{
object * parent_object;
root_entry * rtab;
int * rtab_max;
int * rtab_count;
};
|
|
| |
|
|
|
| |
|
#weak
references |
|
| |
|
//weak_reference[]s will form the wref tables
struct weak_reference
{
int flag_type_indicator:2;
int flag_used:1;
weak_reference ** pthis;
object ** ppob;
};
|
|
| |
|
|
|
| |
|
#gc
calls |
|
| |
|
// GC
//
extern void gc_collect(int generation=0);
extern void gc_init();
extern object ** gc_alloc(object *pob);
extern void gc_register_for_finalize(object ** ppob);
extern void gc_clear_freachable();
extern object ** gc_add_wref(object ** ppob,int flag_wref_type);
extern void gc_reset();
//contains the root table (global) of the application
extern root_table main_rtab;
|
|
| |
|
|
|
Mark and sweep
The roots of the application and the references of objects to each
other form a graph of objects in the computer memory. All these
objects reside in the managed heap. Marking is basically the process
by which the GC determines which objects are used. The basic approach
is that the GC does a recursive depth first search of the all objects
visitable from any given object starting at the roots of the application.
Every time an object is encountered it is marked as used, by setting
a flag in the 'object' structure.

Consider the above example. The graph formed among the objects
will look like this :

A DFS (depth first search) of this graph would not
visit the two unreferenced objects and these will not have their
'used' flag set. After the mark phase, the GC will have a colored
graph of the heap, something like this :

Here the unreferenced objects have been shown as grey.
The GC now goes for its compaction phase - the sweep of the managed
heap. In the sweep phase, it collects toward the bottom of the heap,
all the objects that have their used flag set. Cool ? Since I can
traverse through all the objects on the heap by the simple formula
:
object
* pob = (object *) heap; //first object
pob = (object *)((char *)pob + pob->size +
sizeof(object)); //next object
Now that I can visit each object, sequentially starting
from the bottom-most, I need to visit them in O(n) time and decide
to push them down in memory by a simple memcpy() (lets just take
that as atomic for now). Most of this however would be known to
you if you have been through Mr. Richter's article. After
my compaction phase I would not have these grey objects left on
the stack, instead of which I would have only the objects that have
their used flag set during the 'mark' phase. Thus the GC has swept
all garbage away !
Now the only thing to do is to patch all the
pointers in the application so that they point to the new locations
of the objects..... !!! what ? In the above example its only one
object that moved down in memory. But that is not the case in a
real situation, there are many objects that have moved. How am I
going to find all the pointers that point to each of those objects
?
Question
: How are pointers/references back patched
after the objects have been moved?
(1) Does each object maintain an array of all the addresses of all
pointers that point to it? If this is so (a) this will have to dynamically
grow as references to am object change (b) memory management of
these lists per object would become a major issue (c) its an expensive
task as the runtime has to constantly keep track of need to change
these.
(2) Does the GC create such a structure temporarily during its mark
phase ? per object, per reference? So when an object dies that list
is to be searched for all pointers to that object ? What is
the efficiency of such a search algorithm, what are the data structures
used ? Further the application will have to be frozen when this
happens, because if the application creates references to an object
after the references to it have been considered, the new reference
will not get back patched...
(3) Does the GC search the entire applications memory graph to find
references to each of the objects? Unlikely, the DFS is expensive,
doing it twice is a rather sad thing.
How, what is the mechanism used ?
The following is the approach that I have chosen to
work around these - now you know why I call this a toy GC. :)
The ref
table and fref stack
I decided to route the pointers to objects through
an indirect reference. Each object on the stack has only one unique
location that knows about its actual address in memory. The table
of all such pointers to objects on the heap is called the ref table
(short for reference table). The following picture shows the relation
of an object with its ref table entry.

Every object contains a pointer to its reference table
entry called the 'pthis' (makes me smile). If the object is moved
around in memory the object goes on to to change the value in *pthis
to this and the reference is patched. So when the GC moves objects
around in memory, it calls the back_patch_ref() method of each object
when it moves. This method will modify the reference table entry
and thus the ref table will always to point to the valid location
of the object.
Now all references in the application will be actually
references to the ref table, which will contain the valid address
of the object in memory. So the above diagrams to pointers among
objects and app roots will actually look like this, by adding the
ref table to the picture.

I have not shown the links from the object to ref
table (through pthis).
This solves the problem of back patching references,
but brings along some baggage of its own.
Lets for a minute assume that the MS implementation actually used
a ref table. Does that mean that every access to an object in .NET
is through an indirect memory addressing scheme, two memory fetches
per access, every access ? Isn't that a little expensive ? Lets
argue this both ways, if this was actually the case, the ref table
for a 32bit processor is 32bits per ref. That means that if an application
has 10 thousand objects, the ref table would come to < 40k, this
can reside in the processor cache, frequently used sections will,
so this will be rather fast.
Of course, this is only as long as they have 'ref table'.
Another possibility is that during DFS traversal, they would have
changed the addressing mechanism of the application to actually
call a stub. The stub when invoked will go on to back patch the
actual reference in the application with the latest address available
in the ref table. This is simply empty fantasizing, they have to
have an asm instruction that they can back patch safely in the context
of the application so that the application doesn't break when the
stub is invoked. Back patching wouldn't be a hassle in a Java style
fully interpreted framework, but in an actual runtime demand compiled
JIT its not a simple stunt.
That said, lets consider another aspect, when an object
is being allocated by the gc using gc_alloc(). The gc return the
address of the ref table entry for the object, ie the value of the
objects 'pthis'. How does the entry in the ref table get created
?
The ref table entries cannot be moved around in the
ref table because there are actual pointers in the application that
point to these entries. As objects are created and they die the
ref table will have used and unused entries scattered throughout.
So I will have to search the ref table linearly till I find a free
entry. In worst case this will be O(n). That is not good. So I seem
to have moved the problem of compaction from one place to another
and seem to have removed famous O(n) allocation.
Is there a workaround ? Here is mine. I decided that
if MS is moving along these tracks then they would still have a
O(1) allocation. I just need to find free locations in my ref table
in O(1) time. Stacks are O(1) data structures. I decided to give
the ref table a supporting stack to keep track of free locations
in the ref table. This stack is called fref stack.

The grey areas are vacant slots in the ref table.
So every time I need to get a free location in ref table, I need
to pop from fref and use the value as an index in my ref table.
void
* free_ref = ref_table + fref_stack [ fref_top -- ];
pob->pthis = (object **)free_ref; //get pthis
to point to the free entry
*(pob->pthis) = this; //make
it point to 'this'
Cool? So that's about my approach to the problem,
that comes along with a serious performance hit regarding the indirect
addressing for accessing objects.
Freeing an object's ref table entry : When an object
is being released by the GC, the GC uses the object's pthis to locate
the ref entry, calculates the offset of the entry in the ref table
and pushes the offset in the fref stack, so that it can be used
by other variables.
While not relevant here, what if the real GC also
works like this ? what sizes will ref and fref have, after all these
finite arrays. Well, I would say that multiple ref+fref combinations
can be cascaded as a linklist. You could search the linklist serially
to find a fref_top that is non zero to find a free entry somewhere.
Optionally you can ensure that your first node in the linklist is
always one that has a free node - this is simple to implement. Every
time an object releases its entry the ref table can check if its
the first node in the ref link list, else make itself the first.
Similarly if a object is allocated and that causes fref top to be
zero, then the node is moved to the end of the linklist, and if
the next node has a zero fref (it means that all nodes between these
have zero fref top) an empty node is added to the start of the linklist.
So at this point we have (to summarize) :
A method for allocating objects on the heap in O(1) time complexity
A method for marking the heap using DFS of the application's graphs
= time complexity of the DFS
A compaction method of the heap that is O(n) for the heap, O(1)
per object - assuming atomic memcpy()
A O(1) method for backpatching references
A O(1) method for deallocating an object - actually to remove the
ref table entry to the object
A indirect memory reference over head to access every object on
the heap
Data overheads of the object structure per heap object, for the
ref table and fref stack.
That includes pros and cons. Lets move on.
Finalizers
- finalize queue and freachable queue
The finalizer queue is an array of pointers to objects
on the heap (indirectly through the ref table of course, from this
point on when I mean pointer to an abject I will mean a pointer
to the ref table entry that points to the heap, if otherwise I will
say so explicitly). Now the finalizer queue needs the following
properties.
-
It must be efficient to add an entry to this list
-
It must easy to remove an entry from this list
-
It must be easy to move over all entries in this
list easily.
-
It ok to move the position of entries in this
list because relative/absolute positions are not important.
-
No one links to the entries in this list unlike
in the case of the ref table.
The Finalize queue is implemented as follows. It fundamentally
behaves like a stack, at least in terms of adding elements. It pushes
new entries onto the stack, the the finalize_top is the top
pointer.
Unlike a conventional stack where elements are removed only from
the top, here elements can be removed from anywhere - every time
an element at a interior position is to be cleared, I pop from the
stack and insert the popped element at the vacant middle position.
When come to think about it, its again an O(1) operation - though
here the cost of copying an entry makes a relatively expensive fixed
cost step. Then again if you compare that with the overhead of copying
4 bytes to the cost implementing the finalize() for the object,
The copy expense can probably be made up for easily if the finalize
handling was a little optimized - this is open to debate... please
do.

The freachable is implemented in the same way as the
finalize is - I cant think of a case where the freachable needs
to drop entries from random locations. That need arose in finalize
because any object that needs a finalizer call can go out of use
at any time. In comparison, all the freachable entries do need finalization
- the .NET framework assures no order in which finalizers will be
called among the various objects, so this can be a simple stack
implementation. So I am not drawing anything here.
Question
: How is the finalize queue implemented
in the real framework? Is this how it happens in the real
framework?
Weak references
This is one more thick thing, bear with me.
The thing about weak references is that there are
actual pointers from the application that point to these entries,
so they cannot be moved around in the weak reference table (which
shall now on be called the wref table).
-
It must be cheap to create a new weak ref. Thus
it must be easy to make an entry in wref - it must be cheap
to find a free slot in wref.
-
It must be cheap to remove weak references - this
happens when there are no references in the program to the wref
entry.
-
It must be cheap to set a wref entry to 0 and
still retain its value on the wref table (because there are
pointers in the application to the wref table entry that have
yet to check to see if their object is still in memory)
-
Finally it must be easy to search through all
the wref entries efficiently (to see which objects are marked
as used)
Now if I use a model like the finalize queue, I have
no way of preserving the positions of the entries, those are important
here - app roots refer to these. Further the entries in the finalize
queue had no need of knowing if they were being referred by anybody,
so it was cheap for them to die when they don't point to a valid
object. Here after the object dies, the value 0 has to be maintained
as long as the program has a reference to the wref entry.
I cant use a simple idea like ref table earlier -
the problem is that I need to access all the used entries fast.
In the ref table I will have the traverse the entire array to hit
all the used entries. It didn't sound like a good idea to maintain
two supporting stacks, one holding used entries and one holding
free entries to make insertion and removal cheap in a array where
I cant move around the entries. Further it still doesnt let me know
how long I will have to preserve entries there.
When you come to think about it, wref table entries
behave a lot like actual objects of the heap. They need to know
their life time - ie how long they are referenced by the application.
Further if I rephrase the 'cannot be moved around' to need to be
accessed from a fixed location - a lot of problems seem to lift.
The weak ref entries are treated as objects, similar
to objects on the heap. But they are slightly special objects in
the sense that they reside on a separate region, not the heap -
a region where only wref objects reside.
Life time of these objects are determined during the mark phase
of the GC, same as for other objects. Weak references that are not
marked as used are lost during compaction of the wref table. To
keep it easy to access all the used entries in the wref table, I
need to compact the entries. If I compact them, how can I have pointers
to the entries ? By treating them similar to the way as I treat
other objects - I reference them indirectly through the ref table.
:)

Its the same diagram as before, the heap has been
sidelined a bit and some entries are added to the ref table to point
to the wref table entries. ref table entries to the wref table have
been given a dark shade just to make them more noticeable - they
are in effect not treated any differently from ref table entries
to normal objects on the heap. Similarly, there can be any arbitrary
number of references to a ref table entry, of a wref object.
Now what would a wref object look like ? It would
essentially need a reference to the object that it is a weak reference
of. Next it would need a 'pthis' so that it can back patch its address
after it is compacted in the wref table. And it needs a flag that
can be set during the mark phase.

The above picture shows the object and its ref table
relation in grey lines (you are already familiar with that). Further
it shows the same kind of relation for the wref object in blue lines.
The pob member of the wref object is the pointer to the object.
Similarly the object keeps a pwref back to the wref object, I will
cover this later. These are shown in red.
How does this work ?
The wref table works as follows. During the mark phase
wref objects are marked as used, same as normal objects on the managed
heap. Next after the mark is over wref is scanned sequentially from
0 to wref top. This is a simple linear scan. Here objects that don't
have their used flag set are overwritten by objects found higher
up the stack that are moved down. This can be done in the heap object
style 'compaction' or the finalizer queue style pop and overwrite
(the later would be cheaper). Each used object has its pob checked
to see if the heap object it is referring to is still valid. If
not, pob is set to zero. Further, if the wref object is being moved
down in memory it back patches the ref table using its 'pthis'.
After compaction is over the wref top is set to the new value. Or
if the pop method (which I have implemented) is used, the top will
be already be at its correct value.
Similarly inserting into the wref table is just a
push operation.
The 'pwref' member for the object has been added because I see no
need for a heap object to have multiple wref objects. In case the
user requests a wref object, the system checks for nonzero pwref.
If so it just returns to the user the value of pwref. If pwref is
zero then a new wref object is created, pwref is set and then returned.
Having this is at the cost of having an additional 4 bytes in the
object. It gives us the benefit of not having multiple wref entries
to the same heap object.
Thus all the operations required on wref table can
complete in bounded fixed finite time per wref entry. Thus all operations
are O(1) for each wref object.
Short and Long weak ref tables are essentially the
same, the difference is only according to when the table scan is
called. One table is scanned after the initial mark phase. The other
table is scanned after the freachable also marks used objects. Both
can use the same approach.
Question
: How does the .NET GC handle weak references? How different
is it from this?
Generations
The last part of the GC.
Sorry I don't have any nice pictures on this section, I don't have
access to Visio anymore, you will have to bear with my writing skills.
'Generations' is largely a construct for improving
the performance of the GC. Like Mr. Richter points out its faster
to do all the GC operations for a part of the heap than for the
whole heap. Objects are classified to belong to various generations
depending upon how long they have lived. How long they have lived
in this context is the equivalent of how many sweeps of the GC they
have survived. Since objects are created in time-space order on
the heap-stack, all the objects of a generation are close together.
The heap is partition into three areas, generation 0, 1 and 2.
In the implementation, one will find the pointers
:
-
heap_base - which always has the base address
of the heap
-
heap_top - which has the location of the next
free byte on the heap
-
heap0 - all objects between heap_top and heap0
are gen 0 objects
-
heap1 - all objects between heap0 and heap1 are
gen1 and those between heap1 and heap_base are gen2
So the task of implementing generations resolves down
into 2 functions for the GC (1) how are heap0 and heap1 set ? and
(2) how do you sweep a particular part of the heap only.
Lets cover 2 first :
Sweep: Lets assume that the GC has just finished a sweep. All objects
left have survived so they are at least generation 1 objects. Now,
program execution continues and more objects are added to the heap
in generation 0. Lets assume that none of the old objects are modified
at all during this part of the run.
When the GC sweeps generation 0 objects :
Remember none of the objects in gen1 or 2 have been modified.
Therefore none of them will have any references to a gen0 object.
So the reason for survival of a gen0 object is only due to a root
reference to it or a reference for another object in gen0. Therefore
the GC's DFS (recursive depth first search) can just ignore any
references from a gen0 object to higher generation as these paths
will never lead back to any gen0 object and its only gen0 objects
that we mean to collect. So the DFS is pruned so to speak and is
now faster. It will also mark only gen0 objects. Now all the above
algorithms hold good as long as we just assume that the managed
heap actually begins from the current address of heap0 (heap0 -
heap_top is the address range for gen0 objects remember).
The same argument applies for generation 1 sweep. Here we assume
that no generation 2 objects are modified, and so we don't DFS into
them.
Now, what if older generation objects change ? Coming
back to our gen0 sweep, if a gen1 object has been written into,
it might contain a reference to a gen0 object. If this object is
not marked as used through a traversal of only gen0 objects it would
have been collected by the GC. Hence the GC does separate DFS on
each of the modified older generation objects. From each of these
objects only paths through gen0 objects (if any will be traversed,
because our DFS will not traverse down any references to a gen1
or gen2 object ). These individual DFS from these objects are usually
short tree spans - the size of which is purely determined by the
region of the graph in the gen0 region. Thus now even if objects
of any generation are changed we have a method for doing a gen0
sweep.
In steps:
-
Do a DFS from the application roots, ignore all
references to gen1,2 objects
-
Do DFS from each modified gen1 and gen2 object,
ignore all member references to gen1,2 objects
Similar logic can be applied to a generation 1 sweep.
A generation 2 sweep will do a full DFS of the entire heap and a
compaction of the entire heap.
Just a minute, how do we know which objects are modified
? Good question, there seems to be some magic happening here - Mr.
Richter mention the GetWriteWatch() kernel call. Let me just say
that the msdn
reference about the function says that it returns a list of
modified memory pages of a region set up for memory watch. Well
what I think that means is that the OS API can help in tracking
changes :) I didn't use that in my implementation because (a) I
wanted it to be portable to places where people can just see the
algo in effect even is they didnt have OS support for it and (b)
I just didn't understand how to use the function (we all (some of
us) have our memory blocks about what we are capable of doing, and
this just frightened me off), probably when I find a sample usage
of the function I will be more comfortable.
In the toy GC when a root_entry in an object is updated
I mark the object as modified. If the objects flag is newly set
I add a direct reference to this object in of two arrays depending
on (and if) it belongs to gen1 or gen2.
If a gen0 sweep is done, then the modified flag and entries are
reset for all gen1 objects. If a gen1 sweep occurs modified flags
and entries of gen1 and gen2 objects are reset. If a gen2 sweep
occurs, modified objects aren't important, these flags are unset
during the compaction phase.
Come to think of it, the way the GC is implemented
as 3 generations, how rarely they expect gen1 sweeps to occur in
comparison to gen0 sweeps, the three generation architecture is
nice. More generations would have added to the overhead of keep
the modification information, multiple DFSs etc and would probably
not have added much to performance. These are not things that
can be vacantly claimed, statistics and algorithmic analysis should
prove it.
Question
: Is this on the right track?
Finally That was a lot
of typing and I feel tired. I usually have to be dragged around
to write documents - but this was somehow a pleasure to work on.
Basically because I am curious about the GC and secondly because
it was easy to draw these diagrams using Visio. Amazing tool, as
each diagram kept coming out, I felt I had to write about it.
The GC implementation isn't complete in the sense
that I haven't thought much about optimizing the DFS by not
traversing unmodified sub-branches. I haven't thought threading
and synchronization issues. I haven't thought about references to
actual objects that might only be a part of a allocated memory block
... etc, to name a few.
I am hoping to get a few answers through this effort
and possibly consolidate that into an addendum to the MSDN article
and get to know the GC better.
Acknowledgements
Thanks, Sid for the weak reference
brainstorming. Thanks Pooj for the finalize queue thoughts. Thanks
Don and Arun Dev for letting me have someone to talk to, in the
initial days when I was up to this.
Thanks Pooj for the pile of errors you pointed out and the patience.
Some Garbage Collector
related urls
A Real-time Garbage Collector
based on lifetime of objects - MIT
http://lieber.www.media.mit.edu/people/lieber/Lieberary/GC/Realtime/Realtime.html
Boehm-Demers-Weiser Conservative GC for C++
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
Reference Objects and Garbage collection - feature description
- Sun/Java
http://developer.java.sun.com/developer/technical
A garbage collection framework for c++
http://www.codeproject.com/cpp/garbage_collect.asp
Gorik's Garbage collection page
http://www.cs.rpi.edu/~gorik/garbage
The Garbage Collection page - Richard Jones
http://www.cs.ukc.ac.uk/people/staff/rej/gc.html
|