0 parallel and concurrent real-time garbage collection part ii: memory allocation and sweeping david...

50
1 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center QuickTime™ and a decompressor are needed to see this picture.

Upload: jonathan-russell

Post on 18-Dec-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

1

Parallel and ConcurrentReal-time Garbage Collection

Part II: Memory Allocation and Sweeping

David F. Bacon

T.J. Watson Research Center

QuickTime™ and a decompressor

are needed to see this picture.

2

Page Data Synchronization, Take 1

1616 6464 256256 freefree

3

Page Data, Take 2

1616 6464 256256 freefree

1616 6464 256256

Thread 1

1616 6464 256256

Thread 2

4

freefree?

free16

free16

free64

free64

free256free256

2562561616 6464

tempfulltempfull

fullfull

5

Compare-and-Swap*

boolean CAS(int* addr, int old, int new) { atomic { if (* addr == old) { *addr = new; return true; } else { return false; } }}

• IBM 370 since 1970, Intel 80486 since 1989• Load-Reserved: ARM 6 since 1991, IBM POWER2 since 1993

6

Non-locking* Stackpush(Page page) { do { Page t = top; page->next = t; } while (! CAS(&top, t, page);}

Page pop() { Page t = null; do { t = top; if (t == null) return; Page n = t->next; } while (! CAS(&top, t, n);

return t;}

toptop

7

AA

BB

CC

AA

BB

CC

CC

AA

BB

Thread 1x=pop()

start

Thread 2a=pop()b=pop()

toptop

toptoptoptop

Thread 2push(A)

CC

BB

toptop

Thread 1pop()finish

1 2

3 4

n

t

n

t

n

t

AA

b

a

b

8

Correct Non-locking Stack

push(PageId page) { do { <PageId t, short v> = top; page->next = t; } while (! CAS(&top, <t,v>, <page,(v+1) % 0xffff>);}

PageId pop() { PageId t = 0; do { <PageId t, short version> = top; if (t==0) return 0; } while (! CAS(&top, <t,v>, <next(t),(v+1) % 0xffff>);

return t;}

9

Really Correct Non-locking Stack

push(Page page) { do { Page t = LOAD_AND_RESERVE(&top); page->next = t; } while (! STORE_CONDITIONAL(&top, page));}

Page pop() { Page t = null; do { Page t = LOAD_AND_RESERVE(&top); if (t==null) return null; } while (! STORE_CONDITIONAL(&top, t->next));

return t;}

10

Coalescing

• Ideal:– getMultiplePages(n) returns best fit– Does not cause mutator activities to block– Does not cause collection activities to block– Does not suffer pathology under contention

• Reality: tradeoffs between– Quality of fit– Probability of blocking– Overhead (number of CAS’s)

freefree?

11

Cartesian Tree + Try-lock

4444?

x.left.size < x.size >= x.right.sizex.left.address < x.address < x.right.address

4545 4646 4747

3333 3434 3535

6262

51515050

1111

2525

2424

22222121 6363

5959

12

Non-locking Rebalancing Tree

• Homework…

• …but I won’t use it.

• Solution would be– Too complex to be reliable– Require too many CAS operations to be fast

13

The Fix: Transactional Memory

AtomicCartesianTree extends CartesianTree {

atomic Block getBlock() { return super.getBlock(); }

atomic void releaseBlock(Block b) { super.releaseBlock(b); }

atomic Block[] getMultipleBlocks(int n) { return super.getMultipleBlocks(n); }}

14

Hierarchical Non-locking Buddy?

4444 4545

4646 4747

3333

3434 3535

6262

51515050

1111

25252424

2222

2121

63635959

15

So much for allocation…

Garbage Collection

16

Handling the Concurrency

?

rr

pp TTa b

XXa b UU

a b

ZZa b

WWa b

YYa b

YYa b

XXa b

GCThreads

Application(“mutator”)Threads

• Trace• Collect• Format

• Load pointer• Store pointer• Allocate

17

Yuasa Snapshot Algorithm (1990)• Logically

– Mark-and-Sweep Style Algorithm– Take a “copy-on-write” heap snapshot– Collect the garbage in that snapshot

• Physically– Stop all threads– Copy their stacks (“roots”)– Force them to save over-written pointers– Trace roots and over-written pointers

* Dijktstra’75, Steele’76

18

1: Take Logical Snapshot

Stack

rr

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

19

2(a): Copy Over-written Pointers

Stack

pp

rr

TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

20

2(b): Trace

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

WWa

b

ZZa

b

YYa

b

XXa

b

* Color is per-object mark bit

21

2(c): Allocate “Black”

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

ss VVa

b

WWa

b

ZZa

b

YYa

b

XXa

b

22

3(a): Sweep Garbage

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

WWa

b

ZZa

b

YYa

b

XXa

b

ss VVa

b

freefree

23

3(b): Allocate “White”

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

YYa

b

WWa

b

ZZa

b

YYa

b

XXa

b

ss VVa

b

freefree

V2V2a

b

tt

24

4: Clear Marks

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

YYa

b

ZZa

b

YYa

b

XXa

b

ss VVa

b

freefree

V2V2a

b

tt

VVa

b

WWa

b

WWa

b

25

Yuasa Algorithm Phases

• Snapshot stack and global roots

• Trace

• Flip

• Sweep

• Flip

• Clear Marks

* Synchronous

26

Metronome-2 Concurrency

GCWorkerThreads

GCMasterThread

ApplicationThreads

(may do GC work)

27

Metronome-2 Phases• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

28

Part 3: Sweep• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

29

Assume Trace Has Finished

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

ss VVa

b

WWa

b

ZZa

b

YYa

b

XXa

b

30

JVM Application Thread Structure

1616 6464 256256

Thread 1 inVM

color

1616 6464 256256

Thread 2

epoch 37

inVM

color

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

dblb

dblb sweep

epoch 43

sweep

BarrierOnBarrierOntrue

BarrierOnBarrierOnfalse

31

“Move Available” Phase

32

freefree?

free16

free16

free64

free64

free256free256

tempfulltempfull

fullfull

2562561616 6464

available

33

freefree?

free16

free16

free64

free64

free256free256

tempfulltempfull

fullfull

2562561616 6464

available

contention?

34

Generic Structure of Move Phase:Shared Monotonic Work Pool

GCWorkerThreads

GCMasterThread

ApplicationThreads

(may do GC work)

pool of work

35

Performing a GC Work Quantum

every (500us) tryToDoSomeWorkForGC();

tryToDoSomeWorkForGC() { short phase = enterPhase(); if (phase > 0) doQuantum(phase); leavePhase();}

36

Phase Entry

short enterPhase() { thread->epoch = Epoch.newest;

while (true) <short phase, short workers> = PhaseInfo;

if (workers == phaseTable[phase].maxWorkers) return -1;

if (CAS(PhaseInfo,<phase,workers>,<phase,workers+1>)) return phase;}

PhaseInfo

phase workers

PhaseInfo

phase workersmoveavail 3

ABA?

37

Phase Exit

short leavePhase() { while (true) <short phase, short workers> = PhaseInfo; if (workers > 1 || moreWorkForPhase(phase)) newPhase = <phase,workers-1>; else if (phase == TRACE_PHASE) newPhase = <phaseTable[phase].nextPhase, 1>; else newPhase = <phaseTable[phase].nextPhase, 0>;

if (CAS(PhaseInfo, <phase,workers>, newPhase)) return newPhase;}

PhaseInfo

phase workers

PhaseInfo

phase workersmoveavail 3

38

“Per-Thread Flush” Phase

39

Flush Per-Thread Pages

1616 6464 256256

Thread 1 inVM

color

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

dblb

dblb

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOnfalse

color

PhaseInfo

phase workers

PhaseInfo

phase workers

flush 1

color

color

sweep

sweep sweep

sweep

40

freefree?

free16

free16

free64

free64

free256free256

tempfulltempfull

fullfull

2562561616 6464

available

Put Full White Pages on temp-full List

41

How is this Phase Different?

GCWorkerThreads

GCMasterThread

ApplicationThreads

(may do GC work)

pool of work

Per-Thread State Update

42

Callback Mechanismbool doCallbacks(int callbackId) { bool alldone = true;

LOCK(threadlist); for each (Thread thread in threadlist) if (hasCallbackWork(thread, callbackId); if (! thread.inVM && TRY_LOCK(thread)) doCallbackWorkForThread(thread, callbackId); UNLOCK(thread); else postCallback(thread, callbackId); alldone = false; UNLOCK(threadlist);

return alldone;}

* Non-locking list with cursor?

43

“Sweep” Phase

44

freefree?

free16

free16

free64

free64

free256free256

tempfulltempfull

fullfull

6464

available

Sweep Phase

2562561616

45

“Switch Full List” Phase

46

Switch Back to Normal Full List

1616 6464 256256

Thread 1 inVM

color

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

dblb

dblb

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOnfalse

color

PhaseInfo

phase workers

PhaseInfo

phase workers

switchfull 1

color

color

sweep

sweep sweep

sweep

47

“Drain Temp Full” Phase

48

freefree?

free16

free16

free64

free64

free256free256

tempfulltempfull

fullfull

6464

available

Drain Temp Full

2562561616

49

Done with Collection!!

• But…– That Was the Easy Part

• And there are still some problems– What if threads go out to lunch?

50

http://www.research.ibm.com/metronome

https://sourceforge.net/projects/tuningforkvp