When a system has a garbage collector it is usually part of the language run-time system and integrated into the language. The language is said to be garbage collected. Garbage collection was invented by John McCarthy as part of the first Lisp system. It is a bit sloppy to talk of a garbage collected language, however; being garbage collected is a property of an implementation of a language.
The basic principle of how a garbage collector works is:
Table of contents |
2 Reference counting 3 Languages which use automatic garbage collection 4 External links |
Tracing garbage collectors are the most common type of garbage collector. They focus on locating reachable objects, which are objects that may be referenced in the future, and then discard all remaining objects.
Some variations on the algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties hold.
Statistically speaking, the objects most recently created in the runtime system are also those most likely to quickly become unreachable. A Generational GC divides objects into generations and will generally only place the objects of a subset of all the generations into the initial white set (the grey set being everything else). This can result in faster cycles.
Tracing garbage collectors
Reachability of an object
More formally, objects can be reachable in only two ways:
Informally, a reachable object is one that the program could get to by starting at an object it can reach directly, and then following a chain of pointer references.Basic algorithm
Tracing garbage collectors perform garbage collection cycles. A cycle is started when the collector decides (or is notified) that it needs to reclaim storage, which in particular happens when the system is low on memory. A cycle consists of the following steps:
The tricolour invariant is an important property of the objects and their colours. It is simply this:
Notice that the algorithm above preserves the tricolour invariant.
The initial partition has no black objects, so the invariant trivially holds.
Subsequently whenever an object is made black any white objects that it references are made grey, ensuring that the invariant remains true.
When the last step of the algorithm is reached, because the tricolour invariant holds, none of the objects in the black set point to any of the objects in the white set (and there are no grey objects) so the white objects must be unreachable from the roots,
and the system calls their finalisers and frees their storage.Variants of the basic algorithm
The basic algorithm has several variants.
Mark and sweep
Tracing collectors can also be divided by considering how the three sets (of white, grey, and black objects) are implemented. A Mark sweep GC maintains a bit (or two) with each object to record whether it is white or black; the grey set is either maintained as a separate list or using another bit. A Copying GC identifies grey and black objects by copying them to another area of memory (the copy space) and often distinguishes black from grey objects by dividing the copy space into two portions (in the simple case by maintaining a single pointer that marks the division between black and grey objects).Generational GC
There is the issue of when to perform a cycle and what objects to place in the initial grey set. A simple collector will always put only the roots in the initial grey set, and everything else will be initially white. Reference counting
Reference counting is a form of garbage collection where each object stores a count of the number of references to it, and the object is reclaimed when the count reaches zero. Although it requires additional mechanisms to deal with cycles of references, it's typically able to recover memory more quickly. See reference counting for more information.Languages which use automatic garbage collection
External links