conservative garbage collection

39
Conservative Garbage Collection Stephan Lesch January 9, 2002 [email protected]

Upload: wright

Post on 06-Jan-2016

45 views

Category:

Documents


1 download

DESCRIPTION

Conservative Garbage Collection. Stephan Lesch January 9, 2002 [email protected]. Contents. Intro Conservative GC Mostly Copying Collection Hidden Pointer Problems GC for C++. Type-accurate GC: locations of pointers are known no pointer arithmetic - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Conservative Garbage Collection

Conservative Garbage Collection

Stephan LeschJanuary 9, 2002

[email protected]

Page 2: Conservative Garbage Collection

Contents

• Intro• Conservative GC• Mostly Copying Collection• Hidden Pointer Problems• GC for C++

Page 3: Conservative Garbage Collection

So Far

Type-accurate GC:– locations of pointers are known– no pointer arithmetic– often tailored to one software product– usually supported by compiler/runtime system

Page 4: Conservative Garbage Collection

Ambiguous Roots Collection• every register/word potiential pointer• non-supportive environment• little/no knowledge about

– register usage– object/stack layout

• should work with any C/C++ programs• programmers don‘t want to pay for GC unless needed• must coexist with explicit memory management

The middle way:

• programmer/compiler provide information to recognize pointers

Page 5: Conservative Garbage Collection

Boehm/Demers/Weiser (Xerox PARC) [1988]• non-moving mark-and-deferred-sweep collector• fully conservative, no reliance on compiler

no extra bits to distinguish pointer/non-pointer

no additional object headers

• for C and C++• for Unix, OS/2, Mac, Win95/NT• supports incremental/generational collection• can function as space leak detector

Conservative GC

Page 6: Conservative Garbage Collection

Heap Layout

Two logically distinct heaps:

Standard heap• malloc / free• compatible with

existing code• no pointers to

collected heap!

Collected heap• GC_malloc• GC_free to free known

garbage• pointers to standard

heap ignored

Page 7: Conservative Garbage Collection

Layout of Collected Heap

• made up of blocks (e.g. 4 K, aligned to 4 K boundaries)

• one object size per block

• for each object size:– bitmap to mark allocated objects

– freelist (linked list of heap block slots)

– reclaimable blocks queue (deferred sweep)

• heap-block free-list

Page 8: Conservative Garbage Collection

Finding headers & bit maps

Page 9: Conservative Garbage Collection

for small objects:pop free-list for this size

free-list is empty

resume sweep phasestill empty

GC not enough spacereclaimed

expand heap

Allocation

for objects > 1/2 block:allocate chunk of blocks(heap-block free list)

none available

GC not enough spacereclaimed

expand heap

Clear object after allocation!

Page 10: Conservative Garbage Collection

Finding Roots & Pointers• possible roots: registers, stack, static areas• no cooperation from compiler

– treat every word as potential pointer– ignore interior pointers (standard)– prefer marking from false pointers over ignoring valid pointers

Conservative Pointer Identification: given word p;– does p refer to the collected heap?– does it point into heap block allocated by collector?– does it point to the beginning of an object in that block?

if yes,– mark object in block header– push object onto mark stack

finally: reset mark bits of objects on free-lists

Page 11: Conservative Garbage Collection

Misidentification• integers accidentally fulfilling validity tests• avoid need to trace from interior pointers...• ... or unaligned pointers:

000000090000000A– avoid addresses with lots of trailing 0’s

• try to avoid generating false references:– collector clears non-atomic objects after alloc

– GC_malloc_atomic for objects without pointers

– programmer initialize structures

– programmer destroy obsolete pointers (“dead pointers on stack are often the most significant source of leaks”)

Page 12: Conservative Garbage Collection

Black Listing

Idea: don’t allocate in heap blocks at addresses likely to collide with invalid pointers:– black list references to vincinity of heap which fail

validity tests

– extra run before first allocation finds false references in static data

• additional space overhead < 10%• but: difficult to allocate >100K without spanning

black-listed blocks

Page 13: Conservative Garbage Collection

Influence of Data StructuresProblems with:

large structures + interior pointersstrongly connected structures

Lisp:– small disjoint garbage structures– lists constructed of cons-cells=> Conservative GC worked well, memory leaks remain bounded

(<8% leakage, constant amount)

KRC: – large, strongly connected structures – next pointers in objects=> collector thrashed

[Wentworth, 1990]

Page 14: Conservative Garbage Collection

Efficiency (1)

Comparative studies by Zorn, 1992; Detlefs et al. 1994

• „real-world“ C programs: (perl, xfig, GhostScript)

• comparing BDW w. explicit managers

• replace malloc() w. GC_malloc(), remove free()

• no further adaption

• used outdated versions (4.3 vs. 1.6/2.6)

Page 15: Conservative Garbage Collection

Efficiency (2)

• realistic alternative to explicit mem management(20% avg execution time overhead over best managers, up to 57% in worst case)

• marks 3 MB/s on SparcStation II

• up to 3 times heap usage for small heaps (fixed cost for collector’s internal structs)

• needs substantially more space to avoid over-frequent GC

• works best w. programs using very small objects

• might co-exist poorly with cache management(heap blocks aligned on 4K boundaries)

Page 16: Conservative Garbage Collection

Incremental/Generational Mode

• marking in small steps interleaved with mutator• need to detect later changes to connectivity in

traced parts of graph:– read dirty bits for pages

– write-protect memory and catch faults

• when mark stack is empty:trace from all marked objects on dirty heap blocks

• reduces avg. pause times, increases total exec time• generational: GC uses knowledge which pages

were recently modified

Page 17: Conservative Garbage Collection

Mostly Copying Collection

• Joel Bartlett, 1988 (Digital)

• hybrid conservative / copying collector:– roots are treated conservative (don’t move referenced objects)

– objects only accessible from heap-allocated objects are copied(assumes pointers in heap-allocated data can be found accurately)

faster allocationless problems with pointer identification

more accurate GC

Page 18: Conservative Garbage Collection

Object layout

size #pointers

pointers

non-pointers

user data

header

– programmer has no control over object layout

– what if object layout should match hardware registers or file structures?

Page 19: Conservative Garbage Collection

Heap layout

current_space = 1next_space = 1

1

root

01

42

blocks with space identifiers

currently unused

currently unused

Page 20: Conservative Garbage Collection

Allocation

• within a block:– inc free-pointer

– dec free-slots-count

• if necessary: search for free block(space_id current_space/next_space)

set its space_id to next_space

• current_space = next_space during allocation

Page 21: Conservative Garbage Collection

Collection

• GC when heap is half full (half of heap blocks have space_id=current_space)

• next_space = current_space +1 mod n• Fromspace = current_space blocks• Tospace = next_space blocks• scan roots conservatively for pointers into heap• move potentially referred objects to Tospace:

– changing space_id of their blocks to next_space

– add block to Tospace scan list

• copy graphs accessible from blocks on scan list

Page 22: Conservative Garbage Collection

Heap after Collection

current_space = 2next_space = 2

1

root

22

42

currently unused currently unused

Page 23: Conservative Garbage Collection

Bartlett‘s GC algorithm (1)gc() =

next_space = (current_space + 1) mod 077777

Tospace_queue = empty

for R in Roots

promote(block(R))

while Tospace_queue != empty

blk = pop(Tospace_queue)

for obj in blk

for S in Children(obj)

S = copy(S)

current_space = next_space

Page 24: Conservative Garbage Collection

Bartlett‘s GC algorithm (2)promote (block) =

if Heap_bottom block Heap_topand space(block) == current_space

space(block) = next_spaceallocatedBlocks = allocatedBlocks + 1push(block, Tospace_queue)

copy (p) = if space(p) == next_space or p == nil

return pif forwarded(p)

return forwarding_address(p)np = move(p, free)free = free + size(p)forwarding_address(p) = npreturn np

Page 25: Conservative Garbage Collection

Generational Mode (1)

• One bit in space_id indicates young/old generation• Other bits approximate age of objects/blocks• Minor collection:

– when 50% of free space after last GC is full

– young objects reachable from roots/remembered set are promoted en masse (change space_id/copy)

– remembered set: maintained via memory protection

Page 26: Conservative Garbage Collection

Generational Mode (2)

• Major collection (mark-compact):– when old generation occupies >85% of heap– mark accessible objects in old generation– pass 1: find old generation blocks <1/3 filled

copy objects to free space leaving forwarding addresses– pass 2: rescan old generation, correct pointers using

forwarding addresses– expand heap if >75% full

• maintaining remembered set costs time, but often saves more time during GC(20% time improvement on Scheme compiler)also reduces pause times in interactive programs

Page 27: Conservative Garbage Collection

Efficiency (1)

• no thorough studies• space overhead:

space_ids, type info, block links, promotion bits 2% for 512 byte blocks; tagging data increases overhead

• Mostly Copying vs. BDW:Mostly Copying probably better with many shortlived objects, benefit from faster allocation

Page 28: Conservative Garbage Collection

Experiences

• generational version: 20% runtime improvement for Scheme-to-C compiler

• significant performance increase in CAD program (reduced paging)

• bad results for non-generational collector for Modula-2 w. very large heaps (10s of Megabytes)

• choose GC strategy that fits behaviour of mutator

Page 29: Conservative Garbage Collection

The optimising Compiler/User Devil• conservative GC defeated by temporarily hidden pointers - parts of graph may be

unreachable during a GC:– pointer arithmetic– adding tag bits

• e.g. optimized array traversal:

for (i=0; i<SIZE; i++)...x[i]...;

...x...;

xend = x+SIZE;for(; x<xend; x++)

...*x...;x -= SIZE;...x...;

inside loop x is interior pointer,

afterwards x points one past the end

Page 30: Conservative Garbage Collection

Machine-specific Optimizationsstruct l_thing {

char thing[35000];

struct l_thing *next;

}

struct l_thing *;

tail(struct l_thing *x) {

return (x->next);

}

on IBM RISC System/6000, tail() translates toAIU r3=r3,1 ; r3+=65536

L r3=SHADOW(r3, -30536) ;= r3+35000

BA lr

Page 31: Conservative Garbage Collection

Boehm and Chase’s Solution (1)

• local root set of function f at any point in execution:– register/auto variables

– previously computed values of direct sub-expressions of incompletely evaluated expressions:malloc‘s return value in malloc(size) + 4

• global root set:– declared static and extern variables

– local root sets of all call sites in call chain

– any values stored in other areas scanned by collector

• valid base pointer:– pointer to anywhere inside an object or one past its end

– BDW can handle such pointers

Page 32: Conservative Garbage Collection

Boehm and Chase’s Solution (2)• every object on garbage collected heap must be accessible

from global root set through chain of base pointers

conservative collection safe with strictly ANSI-compatible programs

• suggested implementation:– preprocess source using macros that prevent code generator from

discarding live base pointers prematurely– compile normally– post-process assembly code, removing macro artifacts

• transparent to programmer & compiler• may interfere with instruction scheduling• may increase register pressure

Page 33: Conservative Garbage Collection

Ellis and Detlef’s solution

• annotate operations on pointers with names of base pointers from which they’re derived

• compiler treats these operations as uses of the original base pointers, extending their live ranges

• code generation must respect live ranges• requires changes to compiler• does not alter sources• does not rely on behaviour of volatile declarations

Page 34: Conservative Garbage Collection

GC for C++

• object-oriented languages often use more heap-allocated data

• generate more complex data structures• GC uncouples memory management from class

interfaces instead of dispersing it through code

Page 35: Conservative Garbage Collection

Conservative GC for C++

• requires no changes to language• restriction on coding style holds:

no hidden pointers (converted to int)– existing code may violate the restriction

– aggressive optimisers may as well

– safety must be enforced in code-generator

• some support for finalization (GC_register_finalizer) - assuming few objects need finalization

Page 36: Conservative Garbage Collection

Mostly Copying for C++• storing all pointers at beginning of objects interferes with

inheritance (fast field lookup)• here: user supplies callback methods to identify pointers

class Tree {public:

Tree* left;Tree* right;int data;Tree (int x);

GCCLASS(Tree);...

};

GCPOINTERS(Tree) {gcpointer(left);gcpointer(right);

}GCPOINTERS macro generates callback method Tree::GCPointers

• currently no support for finalisation

Page 37: Conservative Garbage Collection

Benefits of pointer locating methods

• programmer may solve unsure reference problem:

union {int n;thing *ptr;

} x;

• enables semantically accurate marking:e.g. stacks, queues– automatic GC retains uncleared references to removed elements

– programmer can omit them

even better than type-accurate GC

Page 38: Conservative Garbage Collection

Using Object Descriptors• Detlefs, 1991: extension to Mostly Copying • insert descriptor into object headers• Bitmap format:

– 1 word with 32 bits indicating pointer/non-pointer words – use if only first 32 words of user data contain pointers,

can’t handle unsure references

• Indirect format:– pointer to byte array encoding sure/unsure references and non-

pointer values– array can be compressed using repeat counts

• Fast indirect format:– array of ints; 1st number indicates repetitions of rest– subsequent numbers = number of words to skip to reach next

pointer, negative number indicates unsure reference

Page 39: Conservative Garbage Collection

Conclusion• GC effective for traditional imperative languages• realistic alternative to explicit mem management for most

applications• not yet suitable for real-time / safety-critical applications• no big onstraints to coding style, except hidden pointer

problem• gc’ing allocators competitive even with code not written

for GC• GC should have hooks for client/programmer to

communicate their knowledge:– explicit deallocation calls– atomic objects– hints of appropriate times to collect