parallel garbage collection

25
Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002

Upload: marged

Post on 13-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Parallel Garbage Collection. Timmie Smith CPSC 689 Spring 2002. Outline. Sequential Garbage Collection Methods Multi-threaded Methods Parallel Methods for Shared Memory Parallel Methods for Distributed Memory. Motivation. Good software design requires it - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallel Garbage Collection

Parallel Garbage Collection

Timmie SmithCPSC 689

Spring 2002

Page 2: Parallel Garbage Collection

Outline Sequential Garbage Collection Methods Multi-threaded Methods Parallel Methods for Shared Memory Parallel Methods for Distributed Memory

Page 3: Parallel Garbage Collection

Motivation Good software design requires it

Modular programming, OO even more so, mandates components be independent

Explicit memory management requires modules to know what others are doing so they can deallocate objects safely.

Introduces bookkeeping that makes modules brittle, hard to reuse, and hard to extend

Garbage collection allows modules to not worry about memory management

Modules don’t have to have bookkeeping code Reusability and extensibility are improved immediately

Memory leaks are avoided

Page 4: Parallel Garbage Collection

Sequential Garbage Collection Basic Collection Techniques

Reference Counting Mark-Sweep Mark-Compact Copying Non-Copying Implicit Collection

Incremental Tracing Techniques Generational Techniques

Page 5: Parallel Garbage Collection

Garbage Collection Abstraction An object is not garbage if it is live, or is

reachable from any live object. 2-phase abstraction of garbage

detection followed by collection used. Detection determines which objects are live.

Root Set – all global objects,local objects, and objects on stack

Iteratively find and add objects to the Root Set reachable from the Root Set until nothing is added

Collection frees any object that is not live.

Page 6: Parallel Garbage Collection

Reference Counting Object headers store number of references to object

Object collected as soon as there are no references to it Operations to update count make technique expensive

Reference cycles between objects limit effectiveness Method can be incremental to limit program pauses Overhead of method is proportional to work done by

program

1 1

1

1

1 2 1

Root Set

Page 7: Parallel Garbage Collection

Mark-Sweep Collectors Traces from the root set and marks all live objects,

then sweeps heap to collect unmarked objects Collected objects linked to free lists used by allocator

Disadvantages include fragmentation, cost of collection, and decrease of locality

Fragmentation caused by objects not being compacted Cost of collection is proportional to size of the heap Spatial locality lost as objects allocated among older objects

Page 8: Parallel Garbage Collection

Mark-Compact Collectors Sweep phase of Mark-Sweep modified

Collected objects not linked to free list Marked objects copied into contiguous memory Pointer to end of contiguous space maintained for new

allocation Overhead of Sweep not improved

Entire heap still swept to find unreachable objects Live objects must be swept several times

First pass relocates objects Additional passes required to update pointers

Mechanisms to handle pointers also adds overhead Lookup table kept while objects being relocated Indirection of forward pointers used if program not

stopped

Page 9: Parallel Garbage Collection

Copying Collectors Heap is split into “from

space” and “to space” Collection triggered when

object cannot be allocated in the current space

Program stopped to avoid pointer inconsistencies

Forward pointers used to handle objects referenced multiple times

Work proportional to number of live objects

Collection frequency decreased by increasing size of memory spaces

From Space To SpaceRootSet

Page 10: Parallel Garbage Collection

Non-copying Collectors Spaces of copying collector treated as a

set Tracing moves live objects to second set After tracing objects in first set are garbage

Sets are implemented as a linked list Subject to same locality and

fragmentation issues as Mark-Sweep collectors

Page 11: Parallel Garbage Collection

Incremental Tracing Collectors Collection interleaved with program execution

No “Stop the World” pause in program execution. Program can change reachability of objects while

collector is running. Program is referred to as the mutator.

Collector must be conservative to be correct Restarting to collect all garbage caused by changes

doesn’t help. Some garbage “floating” until the next collection

Page 12: Parallel Garbage Collection

Tri-color marking system Object traversal status kept by object coloring

Simple mark-sweep or copying need only two colors because collection occurs when mutator paused.

Incremental approaches require third color to handle changes in reachability.

Black – object is live and all children have been traversed Grey – object is live, children have not been traversed White – object not yet reached

Mutator must coordinate with collector if a pointer to a white object is added to a black object.

Page 13: Parallel Garbage Collection

Tri-color Marking ExampleA

B

A

BC

D D

C

Mutator modifies A and B while garbage collector examines B’s descendants

Mutator must coordinate with garbage collector to prevent D being collected.

Page 14: Parallel Garbage Collection

Mutator/Collector Coordination Coordination must update collector

when a pointer is overwritten. Read Barrier – detects when mutator

accesses a pointer to a white object and immediately colors the object grey.

Write Barrier – mutator attempts to write a pointer into an object are trapped.

Two different write barrier approaches

Page 15: Parallel Garbage Collection

Write Barrier Approaches Snapshot-at-the-Beginning

Ensures a pointer to an object is not destroyed before the collector traverses it.

Pointers are saved before they are overwritten.

Incremental Update When a pointer is written into a black

object, the object is changed to gray and is rescanned before collection is completed.

No extra bookkeeping structure needed.

Page 16: Parallel Garbage Collection

Generational Collectors Based on empirical evidence that most objects

are short lived. Heap space split into generational spaces

Older generation spaces are smaller Spaces collected when allocation in the space fails

Live objects found during collection of a generation advanced to older generation

Long-lived objects copied fewer times than in copying collector

Heuristics used to determine when to advance objects to next generation

Page 17: Parallel Garbage Collection

Intergenerational References Method must be able to collect one

generation without collecting others Pointers from older generations to younger

generation. Table to store pointers in older objects used in root set Write barrier technique used in incremental collectors

Pointers from young generations into older generations

Write barrier technique to trap all pointer assignments Use live objects in all younger generations in root set

Page 18: Parallel Garbage Collection

Multi-threaded Methods Attempt to reduce pauses caused by

“stopping the world” [2] Garbage collector is a separate thread that

is run concurrently with the application. Coordination with application is minimized

Sweep proceeds while application running Application marks pages when object modified Dirty pages rescanned before collection

Page 19: Parallel Garbage Collection

Parallel Garbage Collection Parallelization of sequential methods

Mark-and-Sweep Reference Counting

Different issues in each environment Shared variable access in shared memory systems Disjoint address spaces in distributed memory

systems Scheduling in both environments involves

stopping application threads during tracing. Long pauses avoided by incremental collection Improves performance in SPMD programs since

application has frequent global synchronizations.

Page 20: Parallel Garbage Collection

Shared Memory Reference Counting

References to object updated by all processors Locks on object headers limit scalability

Mark-Sweep Each processor begins marking from a local root set, and

atomically marks an object Poor scalability unless some mechanism for load

balancing implemented Processor must mark all descendants of an object it marks Work stealing allows load rebalancing and improved results Splitting large objects also allows for better load balance.

Page 21: Parallel Garbage Collection

Distributed Memory Biggest challenge is representing cross-processor

references. Remote Processor – a stub entry is pointed to by the

pointer Processor id of the object owner Complement of the remote object address

Local Processor – an entry table maintains all references First export of an object reference enters object in table Object is never reclaimed without cooperation of processors

Fields of stub and entry table objects are the same Flag – distinguishes type of object Count – a count of the number of unrecieved messages

referencing the object.

Page 22: Parallel Garbage Collection

Distributed Memory Marking Phase

Processors begin with local root set and mark all local objects

When local marking is complete, “mark messages” are sent to remote processors for each marked stub

Remote processor receives message and adds object to mark stack and continues local marking.

When local marking complete and no more messages are received, remote processor acknowledges messages sent.

Marking complete when acknowledgement for first message sent is received.

Page 23: Parallel Garbage Collection

Distributed Memory Collection Phase

Expand the heap Processors notified of largest local heap at end of each

collection. H < cM, where c < 1 and M is the max heap size.

Local collection occurs when the heap cannot be expanded.

Global collection occurs when local collection is insufficient.

Global collection allows entry tables to be cleared. Infrequent global collections minimize impact of

collector on application performance.

Page 24: Parallel Garbage Collection

Summary Non-copying methods are the safest for

languages where pointers are not identifiable Fragmentation and loss of locality limit performance

of these methods Copying collectors are preferred in cases where

memory is limited and pointers can be found Parallel Garbage Collection can be based on

parallelization of sequential methods. Parallel collectors subject to same issues as their

sequential counterparts Parallel collectors also subject to synchronization and

communication issues while maintaining references and performing collection.

Page 25: Parallel Garbage Collection

References[1] Hans Boehm and Mark Weiser. Garbage Collection in an Uncooperative Environment. Software

Practice and Experience. September, 1988. [2] Hans-J. Boehm, Alan J. Demers, and Scott Shenker Mostly Parallel Garbage Collection.

Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 1991  

[3] Hans-J. Boehm Fast Multiprocessor Memory Allocation and Garbage Collection. External Technical Report HPL-2000-165, HP Labs. December 2000.

[4] David L. Detlefs, Al Dosser and Benjamin Zorn. Memory Allocation Costs in Large C and C++ Programs. Technical Report CU-CS-665-93, University of Colorado - Boulder, 1993.

[5] John R. Ellis and David L. Detlefs. Safe, efficient garbage collection for c++. Technical report, Xerox Palo Alto Research Center, June 1993.

[6] Kenjiro Taura and Akinori Yonezawa An Effective Garbage Collection Strategy for Parallel Programming Languages on Large Scale Distributed-Memory Machines. Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPOPP). 1997.

[7] Paul R. Wilson Uniprocessor Garbage Collection Techniques. Proceedings of the International Workshop on Memory Management (IWMM). 1992.

[8] Toshio Endo, Kenjiro Taura and Akinori Yonezawa, A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines in Proceedings of High Performance Networking and Computing (SC97), November 1997.

[9] Hirotaka Yamamoto, Kenjiro Taura, and Akinori Yonezawa. Comparing Reference Counting and Global Mark-and-Sweep on Parallel Computers in Lecture Notes for Computer Science (LNCS), Languages, Compilers, and Run-time Systems (LCR98), volume 1511, pp. 205-218. May 1998.