Parallel Garbage Collection

Download Report

Transcript Parallel Garbage Collection

Parallel Garbage Collection
Timmie Smith
CPSC 689
Spring 2002
Outline




Sequential Garbage Collection Methods
Multi-threaded Methods
Parallel Methods for Shared Memory
Parallel Methods for Distributed Memory
Motivation

Good software design requires it

Modular programming, OO even more so, mandates
components be independent



Garbage collection allows modules to not worry about
memory management



Explicit memory management requires modules to know what
others are doing so they can deallocate objects safely.
Introduces bookkeeping that makes modules brittle, hard to
reuse, and hard to extend
Modules don’t have to have bookkeeping code
Reusability and extensibility are improved immediately
Memory leaks are avoided
Sequential Garbage Collection

Basic Collection Techniques







Reference Counting
Mark-Sweep
Mark-Compact
Copying
Non-Copying Implicit Collection
Incremental Tracing Techniques
Generational Techniques
Garbage Collection Abstraction


An object is not garbage if it is live, or is
reachable from any live object.
2-phase abstraction of garbage detection
followed by collection used.

Detection determines which objects are live.



Root Set – all global objects,local objects, and objects on
stack
Iteratively find and add objects to the Root Set reachable
from the Root Set until nothing is added
Collection frees any object that is not live.
Reference Counting

Object headers store number of references to object





Object collected as soon as there are no references to it
Operations to update count make technique expensive
Reference cycles between objects limit effectiveness
Method can be incremental to limit program pauses
Overhead of method is proportional to work done by program
Root Set
1
1
1
2
1
1
1
Mark-Sweep Collectors

Traces from the root set and marks all live objects,
then sweeps heap to collect unmarked objects


Collected objects linked to free lists used by allocator
Disadvantages include fragmentation, cost of
collection, and decrease of locality



Fragmentation caused by objects not being compacted
Cost of collection is proportional to size of the heap
Spatial locality lost as objects allocated among older objects
Mark-Compact Collectors

Sweep phase of Mark-Sweep modified




Collected objects not linked to free list
Marked objects copied into contiguous memory
Pointer to end of contiguous space maintained for new
allocation
Overhead of Sweep not improved


Entire heap still swept to find unreachable objects
Live objects must be swept several times



First pass relocates objects
Additional passes required to update pointers
Mechanisms to handle pointers also adds overhead


Lookup table kept while objects being relocated
Indirection of forward pointers used if program not stopped
Copying Collectors


Heap is split into “from space”
and “to space”
Collection triggered when object
cannot be allocated in the
current space




Program stopped to avoid
pointer inconsistencies
Forward pointers used to
handle objects referenced
multiple times
Work proportional to number of
live objects
Collection frequency decreased
by increasing size of memory
spaces
From Space
Root
Set
To Space
Non-copying Collectors

Spaces of copying collector treated as a set




Tracing moves live objects to second set
After tracing objects in first set are garbage
Sets are implemented as a linked list
Subject to same locality and fragmentation
issues as Mark-Sweep collectors
Incremental Tracing Collectors

Collection interleaved with program execution




No “Stop the World” pause in program execution.
Program can change reachability of objects while
collector is running.
Program is referred to as the mutator.
Collector must be conservative to be correct


Restarting to collect all garbage caused by
changes doesn’t help.
Some garbage “floating” until the next collection
Tri-color marking system

Object traversal status kept by object coloring


Simple mark-sweep or copying need only two
colors because collection occurs when mutator
paused.
Incremental approaches require third color to
handle changes in reachability.




Black – object is live and all children have been traversed
Grey – object is live, children have not been traversed
White – object not yet reached
Mutator must coordinate with collector if a pointer
to a white object is added to a black object.
Tri-color Marking Example
A
A
B
C
D


C
B
D
Mutator modifies A and B while garbage collector
examines B’s descendants
Mutator must coordinate with garbage collector to
prevent D being collected.
Mutator/Collector Coordination

Coordination must update collector when a
pointer is overwritten.



Read Barrier – detects when mutator accesses a
pointer to a white object and immediately colors
the object grey.
Write Barrier – mutator attempts to write a pointer
into an object are trapped.
Two different write barrier approaches
Write Barrier Approaches

Snapshot-at-the-Beginning



Ensures a pointer to an object is not destroyed
before the collector traverses it.
Pointers are saved before they are overwritten.
Incremental Update


When a pointer is written into a black object, the
object is changed to gray and is rescanned before
collection is completed.
No extra bookkeeping structure needed.
Generational Collectors


Based on empirical evidence that most objects are
short lived.
Heap space split into generational spaces



Older generation spaces are smaller
Spaces collected when allocation in the space fails
Live objects found during collection of a generation
advanced to older generation


Long-lived objects copied fewer times than in copying
collector
Heuristics used to determine when to advance objects to
next generation
Intergenerational References

Method must be able to collect one
generation without collecting others

Pointers from older generations to younger
generation.



Table to store pointers in older objects used in root set
Write barrier technique used in incremental collectors
Pointers from young generations into older
generations


Write barrier technique to trap all pointer assignments
Use live objects in all younger generations in root set
Multi-threaded Methods



Attempt to reduce pauses caused by
“stopping the world” [2]
Garbage collector is a separate thread that is
run concurrently with the application.
Coordination with application is minimized



Sweep proceeds while application running
Application marks pages when object modified
Dirty pages rescanned before collection
Parallel Garbage Collection

Parallelization of sequential methods



Different issues in each environment



Mark-and-Sweep
Reference Counting
Shared variable access in shared memory systems
Disjoint address spaces in distributed memory systems
Scheduling in both environments involves stopping
application threads during tracing.


Long pauses avoided by incremental collection
Improves performance in SPMD programs since application
has frequent global synchronizations.
Shared Memory

Reference Counting



References to object updated by all processors
Locks on object headers limit scalability
Mark-Sweep


Each processor begins marking from a local root set, and
atomically marks an object
Poor scalability unless some mechanism for load balancing
implemented



Processor must mark all descendants of an object it marks
Work stealing allows load rebalancing and improved results
Splitting large objects also allows for better load balance.
Distributed Memory

Biggest challenge is representing cross-processor
references.

Remote Processor – a stub entry is pointed to by the pointer



Local Processor – an entry table maintains all references



Processor id of the object owner
Complement of the remote object address
First export of an object reference enters object in table
Object is never reclaimed without cooperation of processors
Fields of stub and entry table objects are the same


Flag – distinguishes type of object
Count – a count of the number of unrecieved messages
referencing the object.
Distributed Memory

Marking Phase


Processors begin with local root set and mark all local
objects
When local marking is complete, “mark messages” are sent
to remote processors for each marked stub



Remote processor receives message and adds object to mark
stack and continues local marking.
When local marking complete and no more messages are
received, remote processor acknowledges messages sent.
Marking complete when acknowledgement for first message
sent is received.
Distributed Memory

Collection Phase

Expand the heap



Processors notified of largest local heap at end of each
collection. H < cM, where c < 1 and M is the max heap size.
Local collection occurs when the heap cannot be expanded.
Global collection occurs when local collection is insufficient.


Global collection allows entry tables to be cleared.
Infrequent global collections minimize impact of collector on
application performance.
Summary

Non-copying methods are the safest for languages
where pointers are not identifiable



Fragmentation and loss of locality limit performance of these
methods
Copying collectors are preferred in cases where memory is
limited and pointers can be found
Parallel Garbage Collection can be based on
parallelization of sequential methods.


Parallel collectors subject to same issues as their sequential
counterparts
Parallel collectors also subject to synchronization and
communication issues while maintaining references and
performing collection.
References
[1] Hans Boehm and Mark Weiser. Garbage Collection in an Uncooperative Environment. Software Practice and
Experience. September, 1988.
[2] Hans-J. Boehm, Alan J. Demers, and Scott Shenker Mostly Parallel Garbage Collection. Proceedings of the
Conference on Programming Language Design and Implementation (PLDI). 1991
[3] Hans-J. Boehm Fast Multiprocessor Memory Allocation and Garbage Collection. External Technical Report
HPL-2000-165, HP Labs. December 2000.
[4] David L. Detlefs, Al Dosser and Benjamin Zorn. Memory Allocation Costs in Large C and C++ Programs.
Technical Report CU-CS-665-93, University of Colorado - Boulder, 1993.
[5] John R. Ellis and David L. Detlefs. Safe, efficient garbage collection for c++. Technical report, Xerox Palo
Alto Research Center, June 1993.
[6] Kenjiro Taura and Akinori Yonezawa An Effective Garbage Collection Strategy for Parallel Programming
Languages on Large Scale Distributed-Memory Machines. Proceedings of the Symposium on Principles
and Practice of Parallel Programming (PPOPP). 1997.
[7] Paul R. Wilson Uniprocessor Garbage Collection Techniques. Proceedings of the International Workshop on
Memory Management (IWMM). 1992.
[8] Toshio Endo, Kenjiro Taura and Akinori Yonezawa, A Scalable Mark-Sweep Garbage Collector on Large-Scale
Shared-Memory Machines in Proceedings of High Performance Networking and Computing (SC97),
November 1997.
[9] Hirotaka Yamamoto, Kenjiro Taura, and Akinori Yonezawa. Comparing Reference Counting and Global Markand-Sweep on Parallel Computers in Lecture Notes for Computer Science (LNCS), Languages,
Compilers, and Run-time Systems (LCR98), volume 1511, pp. 205-218. May 1998.