The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

docs/pdds/pdd09_gc.pod - Garbage Collection Subsystems

ABSTRACT

This PDD specifies Parrot's garbage collection subsystems.

VERSION

$Revision: 26722 $

DEFINITIONS

Garbage collection (GC)

Garbage collection is a process of freeing up memory that is no longer used by the interpreter, by determining which objects will not be referenced again and can be reclaimed.

Simple mark

All reachable objects are marked as alive, first marking a root set, and then recursively marking objects reachable from other reachable objects. Objects not reached are considered dead. After collection, all objects are reset to unmarked, and the process starts again.

Tri-color mark

Instead of a simple separation of marked (as live) and unmarked (dead), the object set is divided into three parts: white, gray, and black. The white objects are presumed dead. The gray objects have been marked as live by some other object, but haven't yet marked the objects they refer to. The black objects are live, and have marked all objects they directly refer to.

In the initial run, all objects start as white and the root set is marked gray. The marking process changes white objects to gray (marking them from another gray object), and gray objects to black (when all objects they refer to are marked). When the gray set is empty, all live objects have been marked and the white set can be collected. After a collection run, all black objects are reset to white, the root set to gray, and the process begins again.

The advantage of a tri-color mark over a simple mark is that it can be broken into smaller stages.

Mark-and-sweep

In this GC scheme, after all reachable objects are marked as live, a sweep through the object arenas collects all unmarked objects.

Mark-and-don't-sweep

In this scheme, all objects are marked black (live) when created. White objects are free memory available for allocation. When no white objects remain (out of memory), the black objects are all changed to white, and a marking process runs to mark all reachable objects as live. Any unreachable objects are left white, and available for allocation.

In some implementations, the change from black to white is made by simply changing the interpretation of the mark bit, for example, from 1 == black to 1 == white.

Copying collection

In this scheme, live objects are copied into a new memory region. The entire old memory region can then be reclaimed.

Compacting collection

In this scheme, live objects are moved closer together, eliminating fragments of free space between live objects. This compaction makes later allocation of new objects faster, since the allocator doesn't have to scan for fragments of free space.

Reference counting

In this scheme, all objects have a count of how often they are referred to by other objects. If that count reaches zero, the object's memory can be reclaimed. This scheme doesn't cope well with reference loops--loops of dead objects, all referencing one another but not reachable from elsewhere, never get collected.

Stop-the-world

A common disadvantage of a simple mark implemenation is that the entire system (including all threads that use the same memory pools) must be suspended while the whole memory set is examined during marking and collection. Normal operation continues only after the whole GC cycle is performed. This can lead to arbitrarily long pauses during program execution.

Incremental

Rather than suspending the system for marking and collection, GC is done in small increments intermittent with normal program operation. Some implementations perform the marking as part of ordinary object access.

Real-time

The pauses caused by GC don't exceed a certain limit.

Generational

The object space is divided between a young generation (short-lived temporaries) and one or more old generations. Only young generations are reset to white (presumed dead). Avoiding scanning the old generations repeatedly can considerably speed up GC.

Generational collection does not guarantee that all unreachable objects will be reclaimed, so in large systems it is sometimes combined with a mark-and-sweep or copying collection scheme, one for light collection runs performed frequently, and the other for more complete runs performed rarely.

Concurrent

GC marking and collection runs as a separate thread, sometimes with multiple threads participating in GC. On a multi-processor machine, concurrent GC may be truly parallel.

SYNOPSIS

Not applicable.

DESCRIPTION

- Parrot provides swappable garbage collection schemes. The GC scheme can be selected at configure/compile time. The GC scheme cannot be changed on-the-fly at runtime, but in the future may be selected with a command-line option at execution time.
- All live PMCs must be reachable from the root set of objects in the interpreter.
- Garbage collection must be safe for objects shared across multiple interpreters/threads.
- The phrase "dead object detection" and abbreviation "DOD" are deprecated.

IMPLEMENTATION

Parrot supports pluggable garbage collection cores, so ultimately any garbage collection model devised can run on it. However, different GC models are more or less appropriate for different application areas. The current default stop-the-world mark-and-sweep model is not well suited for concurrent/parallel execution. We will keep the simple mark-and-sweep implementation, but it will no longer be primary.

Parrot really has two independent GC models, one used for objects (PMCs) and the other used for buffers (including strings). The core difference is that buffers cannot contain other buffers, so incremental marking is unnecessary. Currently, PMCs are not allowed to move after creation, so the GC model used there is not copying nor compacting.

The primary GC model for PMCs, at least for the 1.0 release, will use a tri-color incremental marking scheme, combined with a concurrent sweep scheme.

Initial Marking

Each PMC has a flags member which, among other things, facilitates garbage collection. At the beginning of the mark phase, the PObj_is_live_FLAG and PObj_is_fully_marked_FLAG are both unset, which flags the PMC as presumed dead (white). The initial mark phase of the collection cycle goes through each PMC in the root set and sets the PObj_is_live_FLAG bit in the flags member (the PMC is gray). It does not set the PObj_is_fully_marked_FLAG bit (changing the PMC to black), because in the initial mark, the PMCs or buffers contained by a PMC are not marked. It also appends the PMC to the end of a list used for further marking. However, if the PMC has already been marked as black, the current end of list is returned (instead of appending the already processed PMC) to prevent endless looping.

The fourth combination of the two flags, where PObj_is_live_FLAG is unset and PObj_is_fully_marked_FLAG is set, is reserved for PMCs of an older generation not actively participating in the GC run.

The root set for the initial marking phase includes the following core storage locations:

Global stash
System stack
Current PMC register set
Stashes
PMC register stack
General/User stack

Incremental Marking

After the root set of PMCs have been marked, a series of incremental mark runs are performed. These may be performed frequently, between other operations. The incremental mark runs work to move gray PMCs to black. They take a PMC from the list for further marking, mark any PMCs or buffers it contains as gray (the PObj_is_live_FLAG is set and the PObj_is_fully_marked_FLAG is left unset), and add the contained PMCs or buffers to the list for further marking. If the PMC has a custom mark function in its vtable, it is called at this point.

After all contained PMCs or buffers have been marked, the PMC itself is marked as black (the PObj_is_live_FLAG and PObj_is_fully_marked_FLAG are both set). A limit may be placed on the number of PMCs handled in each incremental mark run.

Buffer Marking

The initial marking phase also marks the root set of buffers. Because buffers cannot contain other buffers, they are immediately marked as black and not added to the list for further marking. Because PMCs may contain buffers, the buffer collection phase can't run until the incremental marking of PMCs is completed.

The root set for buffers includes the following locations:

Current String register set
String register set stack
General/User stack
Control stack

Once a buffer is found to be live, the flags member of the buffer structure has the PObj_live_FLAG and PObj_is_fully_marked_FLAG bits set.

Collection

When the list for further marking is empty (all gray PMCs have changed to black), the collection stage is started. First, PMCs are collected, followed by buffers. In both cases (PMC and buffer), the "live" and "fully_marked" flags are reset after examination for reclamation.

Collecting PMCs

To collect PMCs, each PMC arena is examined from the most recently created backwards. Each PMC is examined to see if it is live, already on the free list, or constant. If it is not, then it is added to the free list and marked as being on the free list with the PObj_on_free_list_FLAG.

Collecting buffers

To collect buffers, each Buffer arena is examined from the most recently created backwards. If the buffer is not live, not already on the free list and it is not a constant or copy on write, then it is added to the free pool for reuse and marked with the PObj_on_free_list_FLAG.

Concurrent collection

For the most part, the variable sets between concurrent tasks don't interact. They have independent root sets and don't require information on memory usage from other tasks before performing a collection phase. In Parrot, tasks tend to be short-lived, and their variables can be considered young generations from a generational GC perspective. Because of this, a full heavyweight task will maintain its own small memory pools, quickly born and quickly dying.

Shared variables, on the other hand, do require information from multiple concurrent tasks before they can be collected. Because of this, they live in the parent interpreter's global pools, and can only be collected after all concurrent tasks have completed a full mark phase without marking the shared variable as live. Because GC in the concurrent tasks happens incrementally between operations, a full collection of the shared variables can happen lazily, and does not require a stop-the-world sweep through all concurrent tasks simultaneously.

Internal Structures

The different GC cores are independent, but they share some code and resources. The arena structures and arena creation routines are common across most GC cores, and some GC cores also share mark routines.

The main interpreter structure has an arena_base member, which is a pointer to an Arenas struct.

The Arenas structure

The Arenas structure contains pointers to a variety of memory pools, each used for a specific purpose. Two are Memory_Pool pointers (memory_pool, constant_string_pool), and six are Small_Object_Pool structures (pmc_pool, pmc_ext_pool, constant_pmc_pool, buffer_header_pool, constant_string_header_pool).

The Arenas structure holds function pointers for the core defined interface of the currently active GC subsystem: init_pool, do_gc_mark, finalize_gc_system. It holds various accounting information for the GC subsystem, including how many GC runs have been completed, amount of memory allocated since the last run, and total memory allocated. This accounting information is updated by the GC system. The current block level for GC mark and sweep phases is stored in the Arenas structure. (See "Blocking GC".)

The pointer void *gc_private is reserved for use by the currently active GC subsystem (with freedom for variation between GC implementations).

The Memory_Pool structure

The Memory_Pool structure is a simple memory pool. It contains a pointer to the top block of the allocated pool, the total allocated size of the pool, the block size, and some details on the reclamation characteristics of the pool.

The Small_Object_Pool structure

The Small_Object_Pool structure is a richer memory pool for object allocation. It tracks details like the number of allocated and free objects in the pool, a list of free objects, and for the generational GC implementation maintains linked lists of white, black, and gray PMCs. It contains a pointer to a simple Memory_Pool (the base storage of the pool). It holds function pointers for adding and retrieving free objects in the pool, and for allocating objects.

Internal API

Currently only one GC system is active at a time, selected at configure or compile time. Future versions will support switching GC systems at execution-time to accommodate different work loads.

Each GC core provides a standard interface for interaction with the core.

Initialization

Each GC core declares an initialization routine, which is called from src/memory.c:mem_setup_allocator() after creating arena_base in the interpreter struct.

void Parrot_gc_XXX_init(Interp *)

A routine to initialize the GC system named XXX.

The initialization code is responsible for the creation of the header pools and fills the function pointer slots in the interpreter's arena_base member.

Arenas structure function pointers

Each GC system declares 3 function pointers, stored in the Arenas structure.

void (*do_gc_mark) (Interp *, int flags)

Trigger or perform a GC run. With an incremental GC core, this may only start/continue a partial mark phase, rather than marking the entire tree of live objects.

{{DEPRECATION NOTE: do_gc_mark used to be do_dod_run.}}.

Flags is one of:

GC_trace_normal | GC_trace_stack_FLAG

Run a normal GC cycle. This is normally called by resource shortage in the buffer memory pools before a collection is run. The bit named GC_trace_stack_FLAG indicates that the C-stack (and other system areas like the processor registers) have to be traced too.

The implementation might or might not actually run a full GC cycle. If an incremental GC system just finished the mark phase, it would do nothing. OTOH if no objects are currently marked live, the implementation should run the mark phase, so that copying of dead objects is avoided.

{{DEPRECATION NOTE: GC_trace_normal used to be DOD_trace_normal. GC_trace_stack_FLAG used to be DOD_trace_stack_FLAG.}}

GC_lazy_FLAG

Do a timely destruction run. The goal is either to detect all objects that need timely destruction or to do a full collection. In the former case the collection can be interrupted or postponed. This is called from the Parrot run-loop. No system areas have to be traced.

{{DEPRECATION NOTE: GC_lazy_FLAG used to be DOD_lazy_FLAG.}}

GC_finish_FLAG

Called during interpreter destruction. The GC subsystem must clear the live state of all objects and perform a sweep in the PMC header pool, so that destructors and finalizers get called.

{{DEPRECATION NOTE: GC_finish_FLAG used to be DOD_finish_FLAG.}}

GC_no_trace_volatile_roots

Trace root set except volatile roots. That is skip e.g. registers.

{{DEPRECATION NOTE: GC_no_trace_volatile_roots used to be DOD_no_trace_volatile_roots.}}

void (*finalize_gc_system) (Interp *)

Called during interpreter destruction. Free used resources and memory pools.

{{DEPRECATION NOTE: finalize_gc_system used to be de_init_gc_system.}}

void (*init_pool) (Interp *, struct Small_Object_Pool *)

A function to initialize the given pool. This function should set the following object allocation functions for the given pool.

Small_Object_Pool function pointers

Each GC core defines 4 function pointers stored in the Small_Object_Pool structures.

PObj * (*get_free_object) (Interp *, struct Small_Object_Pool*)

Each header pool provides one function pointer to get a new object from that pool. It should return one free object from the given pool (removing it from the pool's free list). Object flags are returned clear, except flags that are used by the garbage collector itself. If the pool is a buffer header pool, all other object memory is zeroed.

void (*add_free_object) (Interp *, struct Small_Object_Pool *, PObj *);

Add a freed object to the pool's free list.

void (*alloc_objects) (Interp *, struct Small_Object_Pool *);

Initial allocation of objects for the pool.

void (*more_objects) (Interp *, struct Small_Object_Pool *);

Reallocation for additional objects. It has the same signature as alloc_objects, and in some GC cores the same function pointer is used for both. In some GC cores, more_objects may do a GC run.

Write Barrier

Each GC core has to provide these (possibly empty) macros:

GC_WRITE_BARRIER(Interp *, PMC *agg, PMC *old, PMC *new)

This macro is invoked when in aggregate agg the element old is getting overritten by new. Both old and new may be NULL.

{{DEPRECATION NOTE: used to be DOD_WRITE_BARRIER.}}

GC_WRITE_BARRIER_KEY(Interp *, PMC *agg, PMC *old, PObj *old_key, PMC *new, PObj *new_key)

Like above. Invoked when a hash key is inserted, possibly replacing an old key.

{{DEPRECATION NOTE: used to be DOD_WRITE_BARRIER_KEY.}}

Blocking GC

Being able to block GC is important, so newly allocated Buffers or PMCs won't be collected before they're attached to the live tree. The following routines control GC:

Parrot_block_GC_mark(Interp *interpreter)

Block the GC mark phase for the passed interpreter. (But not the sweep phase)

{{DEPRECATION NOTE: used to be Parrot_block_DOD.}}

Parrot_block_GC_sweep(Interp *interpreter)

Block the GC sweep phase for the passed interpreter. (But not the mark phase)

{{DEPRECATION NOTE: used to be Parrot_block_GC.}}

Parrot_unblock_GC_mark(Interp *interpreter)

Unblock the GC mark phase for the passed interpreter. (But not the sweep phase)

{{DEPRECATION NOTE: used to be Parrot_unblock_DOD.}}

Parrot_unblock_GC_sweep(Interp *interpreter)

Unblock the GC sweep phase for the passed interpreter. (But not the mark phase)

{{DEPRECATION NOTE: used to be Parrot_unblock_GC.}}

Note that the blocking is recursive--if you call Parrot_block_GC_sweep() three times in succession, you need to call Parrot_unblock_GC_sweep() three times to re-enable the GC sweep phase.

PMC/Buffer API

Flags

For PMCs and Buffers to be collected properly, you must set the appropriate flags on them.

Directly manipulating these flags is not recommended. Always use the macros defined in include/parrot/pobj.h.

PObj_active_destroy_FLAG

The PMC has some sort of active destructor, and will have that destructor called when the PMC is destroyed.

PObj_custom_mark_FLAG

The mark vtable slot will be called during the GC mark phase. The mark function must call pobject_lives for all non-NULL objects (Buffers and PMCs) that PMC refers to.

Please note that pobject_lives may be a macro.

PObj_data_is_PMC_array_FLAG

Set if the data pointer points to an array of objects. The length of the array is PMC_int_val(SELF).

PObj_external_FLAG

Set if the buffer points to memory that came from outside Parrot's memory system.

PObj_sysmem_FLAG

Set if the memory came from the system malloc. When the buffer is considered dead, the memory will be freed back to the system.

PObj_COW_FLAG

The buffer's memory is copy on write. Any changes to the buffer must first have the buffer's memory copied. The COW flag should then be removed.

The following flags can be used by the GC subsystem:

PObj_live_FLAG

The system considers the object to be alive for collection purposes.

PObj_on_free_list_FLAG

The object is unused, and on the free list for later allocation.

PObj_custom_GC_FLAG

Mark the buffer as needing GC.

ATTACHMENTS

None.

FOOTNOTES

None.

REFERENCES

"A unified theory of garbage collection": http://portal.acm.org/citation.cfm?id=1028982

"Scalable Locality-Conscious Multithreaded Memory Allocation": http://people.cs.vt.edu/~scschnei/papers/ismm06.pdf

"Parallel and concurrent garbage collectors": http://chaoticjava.com/posts/parallel-and-concurrent-garbage-collectors/

"Region-Based Memory Management": http://www.irisa.fr/prive/talpin/papers/ic97.pdf

Dan's first musings on the GC subsystem: http://www.mail-archive.com/perl6-all@perl.org/msg14072.html

Semi-timely and ordered destruction: http://www.sidhe.org/~dan/blog/archives/000199.html