0 parallel and concurrent real-time garbage collection part ii: memory allocation and sweeping david...
Post on 18-Dec-2015
229 Views
Preview:
TRANSCRIPT
1
Parallel and ConcurrentReal-time Garbage Collection
Part II: Memory Allocation and Sweeping
David F. Bacon
T.J. Watson Research Center
QuickTime™ and a decompressor
are needed to see this picture.
5
Compare-and-Swap*
boolean CAS(int* addr, int old, int new) { atomic { if (* addr == old) { *addr = new; return true; } else { return false; } }}
• IBM 370 since 1970, Intel 80486 since 1989• Load-Reserved: ARM 6 since 1991, IBM POWER2 since 1993
6
Non-locking* Stackpush(Page page) { do { Page t = top; page->next = t; } while (! CAS(&top, t, page);}
Page pop() { Page t = null; do { t = top; if (t == null) return; Page n = t->next; } while (! CAS(&top, t, n);
return t;}
toptop
7
AA
BB
CC
AA
BB
CC
CC
AA
BB
Thread 1x=pop()
start
Thread 2a=pop()b=pop()
toptop
toptoptoptop
Thread 2push(A)
CC
BB
toptop
Thread 1pop()finish
1 2
3 4
n
t
n
t
n
t
AA
b
a
b
8
Correct Non-locking Stack
push(PageId page) { do { <PageId t, short v> = top; page->next = t; } while (! CAS(&top, <t,v>, <page,(v+1) % 0xffff>);}
PageId pop() { PageId t = 0; do { <PageId t, short version> = top; if (t==0) return 0; } while (! CAS(&top, <t,v>, <next(t),(v+1) % 0xffff>);
return t;}
9
Really Correct Non-locking Stack
push(Page page) { do { Page t = LOAD_AND_RESERVE(&top); page->next = t; } while (! STORE_CONDITIONAL(&top, page));}
Page pop() { Page t = null; do { Page t = LOAD_AND_RESERVE(&top); if (t==null) return null; } while (! STORE_CONDITIONAL(&top, t->next));
return t;}
10
Coalescing
• Ideal:– getMultiplePages(n) returns best fit– Does not cause mutator activities to block– Does not cause collection activities to block– Does not suffer pathology under contention
• Reality: tradeoffs between– Quality of fit– Probability of blocking– Overhead (number of CAS’s)
freefree?
11
Cartesian Tree + Try-lock
4444?
x.left.size < x.size >= x.right.sizex.left.address < x.address < x.right.address
4545 4646 4747
3333 3434 3535
6262
51515050
1111
2525
2424
22222121 6363
5959
12
Non-locking Rebalancing Tree
• Homework…
• …but I won’t use it.
• Solution would be– Too complex to be reliable– Require too many CAS operations to be fast
13
The Fix: Transactional Memory
AtomicCartesianTree extends CartesianTree {
atomic Block getBlock() { return super.getBlock(); }
atomic void releaseBlock(Block b) { super.releaseBlock(b); }
atomic Block[] getMultipleBlocks(int n) { return super.getMultipleBlocks(n); }}
14
Hierarchical Non-locking Buddy?
4444 4545
4646 4747
3333
3434 3535
6262
51515050
1111
25252424
2222
2121
63635959
16
Handling the Concurrency
?
rr
pp TTa b
XXa b UU
a b
ZZa b
WWa b
YYa b
YYa b
XXa b
GCThreads
Application(“mutator”)Threads
• Trace• Collect• Format
• Load pointer• Store pointer• Allocate
17
Yuasa Snapshot Algorithm (1990)• Logically
– Mark-and-Sweep Style Algorithm– Take a “copy-on-write” heap snapshot– Collect the garbage in that snapshot
• Physically– Stop all threads– Copy their stacks (“roots”)– Force them to save over-written pointers– Trace roots and over-written pointers
* Dijktstra’75, Steele’76
20
2(b): Trace
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
WWa
b
ZZa
b
YYa
b
XXa
b
* Color is per-object mark bit
21
2(c): Allocate “Black”
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
ss VVa
b
WWa
b
ZZa
b
YYa
b
XXa
b
22
3(a): Sweep Garbage
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
WWa
b
ZZa
b
YYa
b
XXa
b
ss VVa
b
freefree
23
3(b): Allocate “White”
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
YYa
b
WWa
b
ZZa
b
YYa
b
XXa
b
ss VVa
b
freefree
V2V2a
b
tt
24
4: Clear Marks
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
YYa
b
ZZa
b
YYa
b
XXa
b
ss VVa
b
freefree
V2V2a
b
tt
VVa
b
WWa
b
WWa
b
25
Yuasa Algorithm Phases
• Snapshot stack and global roots
• Trace
• Flip
• Sweep
• Flip
• Clear Marks
* Synchronous
27
Metronome-2 Phases• Initiation
– Setup• turn double barrier on
• Root Scan– Active Finalizer scan– Class scan– Thread scan**
• switch to single barrier, color to black
– Debugger, JNI, Class Loader scan
• Trace – Trace*– Trace Terminate***
• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)
• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)
• Re-materialization 2– Finalizable Processing
• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing
• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading
• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping
* Parallel** Callback*** Single actor symmetric
• Flip – Move Available Lists to Full List* (contention)
• turn write barrier off
– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list
• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)
28
Part 3: Sweep• Initiation
– Setup• turn double barrier on
• Root Scan– Active Finalizer scan– Class scan– Thread scan**
• switch to single barrier, color to black
– Debugger, JNI, Class Loader scan
• Trace – Trace*– Trace Terminate***
• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)
• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)
• Re-materialization 2– Finalizable Processing
• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing
• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading
• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping
* Parallel** Callback*** Single actor symmetric
• Flip – Move Available Lists to Full List* (contention)
• turn write barrier off
– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list
• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)
29
Assume Trace Has Finished
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
ss VVa
b
WWa
b
ZZa
b
YYa
b
XXa
b
30
JVM Application Thread Structure
1616 6464 256256
Thread 1 inVM
color
1616 6464 256256
Thread 2
epoch 37
inVM
color
writebuf
43
pthread pthread_tpthread_t
writebuf
41
pthread pthread_tpthread_t
next
next
threadlist
dblb
dblb sweep
epoch 43
sweep
BarrierOnBarrierOntrue
BarrierOnBarrierOnfalse
32
freefree?
free16
free16
free64
free64
free256free256
tempfulltempfull
fullfull
2562561616 6464
available
33
freefree?
free16
free16
free64
free64
free256free256
tempfulltempfull
fullfull
2562561616 6464
available
contention?
34
Generic Structure of Move Phase:Shared Monotonic Work Pool
GCWorkerThreads
GCMasterThread
ApplicationThreads
(may do GC work)
pool of work
35
Performing a GC Work Quantum
every (500us) tryToDoSomeWorkForGC();
tryToDoSomeWorkForGC() { short phase = enterPhase(); if (phase > 0) doQuantum(phase); leavePhase();}
36
Phase Entry
short enterPhase() { thread->epoch = Epoch.newest;
while (true) <short phase, short workers> = PhaseInfo;
if (workers == phaseTable[phase].maxWorkers) return -1;
if (CAS(PhaseInfo,<phase,workers>,<phase,workers+1>)) return phase;}
PhaseInfo
phase workers
PhaseInfo
phase workersmoveavail 3
ABA?
37
Phase Exit
short leavePhase() { while (true) <short phase, short workers> = PhaseInfo; if (workers > 1 || moreWorkForPhase(phase)) newPhase = <phase,workers-1>; else if (phase == TRACE_PHASE) newPhase = <phaseTable[phase].nextPhase, 1>; else newPhase = <phaseTable[phase].nextPhase, 0>;
if (CAS(PhaseInfo, <phase,workers>, newPhase)) return newPhase;}
PhaseInfo
phase workers
PhaseInfo
phase workersmoveavail 3
39
Flush Per-Thread Pages
1616 6464 256256
Thread 1 inVM
color
1616 6464 256256
Thread 2
epoch 37
inVM
writebuf
43
pthread pthread_tpthread_t
writebuf
41
pthread pthread_tpthread_t
next
next
threadlist
dblb
dblb
epoch 43
BarrierOnBarrierOntrue
BarrierOnBarrierOnfalse
color
PhaseInfo
phase workers
PhaseInfo
phase workers
flush 1
color
color
sweep
sweep sweep
sweep
40
freefree?
free16
free16
free64
free64
free256free256
tempfulltempfull
fullfull
2562561616 6464
available
Put Full White Pages on temp-full List
41
How is this Phase Different?
GCWorkerThreads
GCMasterThread
ApplicationThreads
(may do GC work)
pool of work
Per-Thread State Update
42
Callback Mechanismbool doCallbacks(int callbackId) { bool alldone = true;
LOCK(threadlist); for each (Thread thread in threadlist) if (hasCallbackWork(thread, callbackId); if (! thread.inVM && TRY_LOCK(thread)) doCallbackWorkForThread(thread, callbackId); UNLOCK(thread); else postCallback(thread, callbackId); alldone = false; UNLOCK(threadlist);
return alldone;}
* Non-locking list with cursor?
44
freefree?
free16
free16
free64
free64
free256free256
tempfulltempfull
fullfull
6464
available
Sweep Phase
2562561616
46
Switch Back to Normal Full List
1616 6464 256256
Thread 1 inVM
color
1616 6464 256256
Thread 2
epoch 37
inVM
writebuf
43
pthread pthread_tpthread_t
writebuf
41
pthread pthread_tpthread_t
next
next
threadlist
dblb
dblb
epoch 43
BarrierOnBarrierOntrue
BarrierOnBarrierOnfalse
color
PhaseInfo
phase workers
PhaseInfo
phase workers
switchfull 1
color
color
sweep
sweep sweep
sweep
48
freefree?
free16
free16
free64
free64
free256free256
tempfulltempfull
fullfull
6464
available
Drain Temp Full
2562561616
49
Done with Collection!!
• But…– That Was the Easy Part
• And there are still some problems– What if threads go out to lunch?
top related