a mostly non-copying real-time collector with low overhead and consistent utilization david bacon...
TRANSCRIPT
![Page 1: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/1.jpg)
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization
David BaconPerry Cheng (presenting)V.T. Rajan
IBM T.J. Watson Research
![Page 2: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/2.jpg)
What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and
Space Usage Heap Architecture
Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance
Scheduling: Time-Based vs. Work-Based Empirical Results
Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times
Summary and Conclusion
Roadmap
![Page 3: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/3.jpg)
Real-time Embedded Systems Memory usage important
Uniprocessor
Problem Domain
![Page 4: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/4.jpg)
3 Styles of Uniprocessor Garbage Collection:Stop-the-World vs. Incremental vs. Real-Time
STW
Inc
RT
time
![Page 5: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/5.jpg)
Pause Times (Average and Maximum)
STW
Inc
RT
1.5s 1.7s
0.5s 0.7s 0.3s 0.5s 0.9s 0.3s
0.15 - 0.19 s
1.6s
0.5s
0.18s
![Page 6: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/6.jpg)
Coarse-Grained Utilization vs. Time
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
Time (s)
Uti
liza
tio
n (
%)
STW
Inc
RT
2.0 s window
![Page 7: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/7.jpg)
Fine-Grained Utilization vs. Time
STW
Inc
RT
0
0.2
0.4
0.6
0.8
1
0
0.25 0.5
0.75 1
1.25 1.5
1.75 2
2.25 2.5
2.75 3
3.25 3.5
3.75 4
4.25 4.5
4.75 5
5.25 5.5
5.75 6
6.25 6.5
6.75 7
7.25 7.5
7.75 8
Time (s)
Uti
liza
tio
n
0.4 s window
![Page 8: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/8.jpg)
Minimum Mutator Utilization (MMU)
STW
Inc
RT
0
20
40
60
80
100
Window Size (s) - logarithmic scale
MM
U
![Page 9: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/9.jpg)
Space Usage over Time
0
10
20
30
40
50
60
70
80
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Time (s)
Use
d S
pace
(M
b)
STW
Inc
RTmax live
trigger
2 X max live
![Page 10: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/10.jpg)
Problems with Existing RT Collectors
0
20
40
60
80
100
0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0
T i me (s )
Spa
ce (M
b)
max live2 X max live3 X max live4 X max live
Non-moving Collector
0
20
40
60
80
100
T i me (s )
MM
U
0
20
40
60
80
100
0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0
T i me (s )
Spa
ce (M
b)
max live2 X max live3 X max live4 X max live
Replicating Collector
Not fully incremental,Tight coupling,Work-based scheduling
![Page 11: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/11.jpg)
Our Collector Goals Results
Real-Time ~10 ms Low Space Overhead ~2X Good Utilization during GC ~ 40%
Solution Incremental Mark-Sweep Collector Write barrier – snapshot-at-the-beginning [Yuasa] Segregated free list heap architecture Read Barrier – to support defragmentation [Brooks]
Incremental defragmentation Segmented arrays – to bound fragmentation
![Page 12: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/12.jpg)
What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage
Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance
Scheduling: Time-Based vs. Work-Based Empirical Results
Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times
Summary and Conclusion
Roadmap
![Page 13: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/13.jpg)
Fragmentation and Compaction
Intuitively: available but unusable memory
avoidance and coalescing - no guarantees compaction
used
needed
free
![Page 14: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/14.jpg)
Heap Architecture Segregated Free Lists
– heap divided into pages– each page has equally-sizes blocks (1 object
per block)– Large arrays are segmented
used free
sz 24
sz 32
external
internal page-internal
![Page 15: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/15.jpg)
Controlling Internal and Page-Internal Fragmentation
Choose page size (page) and block sizes (sk)
If sk = sk-1 (1 + ), internal fragmentation
page-internal fragmentation page / smax
E.g. If page = 16K, = 1/8, smax= 2K, maximum non-external fragmentation to 12.5%.
![Page 16: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/16.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
dbja
ck
java
cje
ssm
trt
mpeg
audi
o
com
press
Internal Page-Internal External Recently Dead Live
Fragmentation - small heap ( = 1/8 vs.
= 1/2)
=1/8 =1/2
![Page 17: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/17.jpg)
Incremental Compaction
Compact only a part of the heapRequires knowing what to compact ahead of time
Key ProblemsPopular objectsDetermining references to moved objects
used
![Page 18: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/18.jpg)
Incremental Compaction: Redirection
Access all objects via per-object redirection pointers
Redirection is initially self-referential
Move an object by updating ONE redirection pointer
original replica
![Page 19: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/19.jpg)
Consistency via Read Barrier [Brooks]
Correctness requires always using the replica
E.g. field selection must be modified
x[offset]
x
x[redirect][offset]
x
normal access
read barrier access
x
![Page 20: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/20.jpg)
Some Important Details Our read barrier is decoupled from collection Complication: In Java, any reference might be null
actual read barrier for GetField(x,offset) must be augmented
tmp = x[offset];return (tmp == null) ? null : tmp[redirect]
CSE, code motion (LICM and sinking), null-check combining
Barrier Variants - when to redirectlazy - easier for collectoreager - better for optimization
![Page 21: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/21.jpg)
Barrier Overhead to Mutator Conventional wisdom says read barriers are too
expensiveStudies found overhead of 20-40% (Zorn, Nielsen)Our barrier has 4-6% overhead with optimizations
0
2
4
6
8
10
12
com
press
jess db
java
c
mpeg
audio
mtrt
jack
Geo. M
ean
Lazy
Eager
![Page 22: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/22.jpg)
Heap (one size only)Stack
Program Start
![Page 23: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/23.jpg)
HeapStack
free
allocated
Program is allocating
![Page 24: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/24.jpg)
HeapStack
free
unmarked
GC starts
![Page 25: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/25.jpg)
HeapStack
free
unmarked
marked orallocated
Program allocating and GC marking
![Page 26: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/26.jpg)
HeapStack
free
unmarked
marked orallocated
Sweeping away blocks
![Page 27: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/27.jpg)
HeapStack
free
allocated
evacuated
GC moving objects and installing redirection
![Page 28: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/28.jpg)
HeapStack
free
unmarked
evacuated
marked orallocated
2nd GC starts tracing and redirection fixup
![Page 29: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/29.jpg)
HeapStack
free
allocated
2nd GC complete
![Page 30: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/30.jpg)
What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage
Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance
Scheduling: Time-Based vs. Work-Based Empirical Results
Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times
Summary and Conclusion
Roadmap
![Page 31: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/31.jpg)
Scheduling the Collector Scheduling Issues
bad CPU utilization and space usage loose program and collector coupling
Time-Based Trigger the collector to run for CT seconds whenever the program runs for QT seconds
Work-Based Trigger the collector to collect CW work whenever the program allocate QW bytes
![Page 32: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/32.jpg)
Time-Based Scheduling
Trigger the collector to run for CT seconds whenever the program runs for QT seconds
Sp
ace
(M
b)
Time (s)
0
10
20
30
40
50
60
70
80
90
100
Smooth Alloc Uneven Alloc High Alloc
0
0.2
0.4
0.6
0.8
1
Any
MM
U (
CP
U
Uti
liza
tio
n)
Window Size (s)
![Page 33: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/33.jpg)
Work-Based Scheduling
0
0.2
0.4
0.6
0.8
1
Smooth Alloc Uneven Alloc
High Alloc
MM
U (
CP
U
Uti
liza
tio
n)
Trigger the collector to collect CW bytes whenever the program allocates QW bytes
Window Size (s)
0
20
40
60
80
100
Any
Sp
ace
(M
b)
Time (s)
![Page 34: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/34.jpg)
What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage
Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance
Scheduling: Time-Based vs. Work-Based Empirical Results
Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times
Summary and Conclusion
Roadmap
![Page 35: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/35.jpg)
Pause Time Distribution for javac
(Time-Based vs. Work-Based)
12 ms 12 ms
![Page 36: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/36.jpg)
Utilization vs. Time for javac
(Time-Based vs. Work-Based)
Uti
liza
tio
n
(%)
Time (s) Time (s)
0.4
0.2
0
0.6
0.8
1.0
0.4
0.2
0
0.6
0.8
1.0
Uti
liza
tio
n
(%)
0.45
![Page 37: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/37.jpg)
Minimum Mutator Utilization for javac
(Time-Based vs. Work-Based)
![Page 38: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/38.jpg)
Space Usage for javac (Time-Based vs. Work-
Based)
![Page 39: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/39.jpg)
3 inter-related factors:Space Bound (tradeoff)Utilization (tradeoff)Allocation Rate (lower is better)
Other factorsCollection rate (higher is better)Pointer density (lower is better)
Intrinsic Tradeoff
![Page 40: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/40.jpg)
Summary: Mostly Non-moving RT GC
Read Barriers Permits incremental defragmentation Overhead is 4-6% with compiler optimizations
Low Space Overhead Space usage is only about 2 X max live data
Fragmentation still bounded Consistent Utilization
Always at least 45% at 12 ms resolution
![Page 41: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/41.jpg)
Conclusions Real-time GC is real
There are tradeoffs just like in traditional GC
Scheduling should be primarily time-based
Fallback to work-based due to user’s incorrect parameter estimations
Incremental defragmentation is possible
Compiler support is important!
![Page 42: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed15503460f94bdf615/html5/thumbnails/42.jpg)
Future Work Lowering the real-time resolution
Sub-millisecond worst-case pause Main issue: breaking up stack scan
Segmented array optimizations Reduce segmented array cost below ~2%
Opportunistic contiguous layout Type-based specialization with invalidation
Strip-mining