steve blackburn department of computer science australian national university perry cheng tj watson...
TRANSCRIPT
Steve BlackburnDepartment of Computer Science
Australian National University
Perry ChengTJ Watson Research Center
IBM Research
Kathryn McKinleyDepartment of Computer Sciences
University of Texas at Austin
IBM Research
Myths & RealitiesThe Performance Impact of Garbage
Collection
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Background• No prior apples-to-apples comparisons
• MMTk• Canonical policies implemented (SS, MS, RC, genX,
etc)
– Shared mechanisms– Good performance (match/beat old Watson GCs)
– Ideal platform for apples-to-apples comparisons
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Some Questions• Architecture
– How well do modern OO languages play to modern architectures?
• Collection– Is generational GC “a waste of time”?– Are write barriers expensive?
• Allocation– Free list or bump pointer?
• “Locality is everything”– Really???– Is it different for young & old? Why?
• Locality and architecture– What is the impact, what is the trend?
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Methodology• Jikes RVM & MMTk• Platforms• 1.6GHz G5 (PowerPC 970) • 1.9GHz AMD Athlon 2600+• 2.6GHz Intel P4• Linux 2.6.0 with perfctr patch & libraries– Separate accounting of GC & Mutator perf counts
• SPECjvm98 & pseudojbb
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Architecture
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Relative PerformanceAthlon
2600+ 1.9GHz
P42.6GHz
G51.6GHz
compress 0.93 1.00 1.18
jess 0.88 1.00 1.20
raytrace 0.71 1.00 0.73
db 0.97 1.00 1.68
javac 0.67 1.00 1.37
mtrt 0.69 1.00 0.75
jack 0.62 1.00 1.11
pseudojbb 0.77 1.00 1.24
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Architecture - Q & A
How big is the mismatch between modern arch & modern
languages???
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Allocation
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Allocation Choices• Bump pointer– ~70 bytes IA32 instructions, 726MB/s
• Free list– ~140 bytes IA32 instructions, 654MB/s
• Bump pointer 11% faster in tight loop– < 1% in practical setting– No significant difference (?)
• Second order effects?– Locality??– Collection mechanism??
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Implications for Locality• Compare SS & MS mutator– Mutator time = total – GC time– Mutator memory performance: L1, L2 & TLB
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
jess
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
jess mutator time
MarkSweepSemiSpace
Normalized Heap Size
Nor
mal
ized
mu
tato
r ti
me
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
jess
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61
1.2
1.4
1.6
1.8
2
jess L1 misses
MarkSweepSemiSpace
Normalized Heap Size
Nor
mal
ized
L1
mis
ses
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
jess
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61
2
3
4
5
6
7
8
9
jess L2 misses
MarkSweepSemiSpace
Normalized Heap Size
Nor
mal
ized
L2
mis
ses
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
jess
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61
1.5
2
2.5
3
jess TLB misses
MarkSweepSemiSpace
Normalized Heap Size
Nor
mal
ized
TLB
mis
ses
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
javac
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.05
1.1
1.15
1.2
javac mutator time
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d m
uta
tor
tim
e
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.1
1.2
1.3
1.4
1.5
javac L1 misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d L
1 m
isse
s
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.2
1.4
1.6
1.8
javac L2 misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d L
2 m
isse
s
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.2
1.4
1.6
1.8
javac TLB misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d T
LB m
isse
s
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
pseudojbb
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.05
1.1
1.15
1.2
1.25
jbb mutator time
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d m
uta
tor
tim
e
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.1
1.2
1.3
1.4
jbb L1 misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d L
1 m
isse
s
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
jbb L2 misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d L
2 m
isse
s
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
jbb TLB misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d T
LB m
isse
s
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
db
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.02
1.04
1.06
1.08
1.1
1.12
db L1 misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d L
1 m
isse
s
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.025
1.05
1.075
1.1
1.125
1.15
db mutator time
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d m
uta
tor
tim
e
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.01
1.02
1.03
1.04
1.05
1.06
1.07
db L2 misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d L
2 m
isse
s
1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6
1
1.05
1.1
1.15
1.2
1.25
db TLB misses
MarkSweep
SemiSpace
Normalized Heap Size
No
rma
lize
d T
LB m
isse
s
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Locality
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
• Is the locality differential age-dependant?• Re-run experiment with GenCopy &
GenMS– Generational variants of MarkSweep &
SemiSpace– Young objects treated identically– Mature objects either SemiSpace or
MarkSweep
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07
javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26
db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17
Mutator L2MS/SS
Mutator TLBMS/SSMS/SS
Mutator Time Mutator L1MS/SS
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07
javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26
db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17
Mutator L2MS/SS
Mutator TLBMS/SSMS/SS
Mutator Time Mutator L1MS/SS
• Why? Mature space locality?
• Nursery absorbs most allocs – lower frag• Relatively frequent copying in SS
Contigious allocation in nursery?
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07
javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26
db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17
Mutator L2MS/SS
Mutator TLBMS/SSMS/SS
Mutator Time Mutator L1MS/SS
• Why?• Mature space locality?
• Nursery absorbs most allocs – lower frag• Relatively frequent copying in SS
• Contigious allocation in nursery?
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07
javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26
db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17
Mutator L2MS/SS
Mutator TLBMS/SSMS/SS
Mutator Time Mutator L1MS/SS
• Why? Mature space locality
• Nursery absorbs most allocs – lower frag• Relatively frequent copying in SS
Contigious allocation in nursery
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
Run SS & MS in “infinite” heap
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
Run SS & MS in “infinite” heap
MarkSweep SemiSpace1.5X 1.5X
jess 3.37 3.44 1.02 2.63 3.00 1.14javac 8.51 8.34 0.98 7.38 7.60 1.03
pseudojbb 10.82 11.04 1.02 9.58 9.68 1.01db 14.12 14.40 1.02 13.06 11.88 0.91
geomean 1.01 1.02
1.5/ 1.5/
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Bump Pointer & Free List
Run SS & MS in “infinite” heap
MarkSweep SemiSpace1.5X 1.5X
jess 3.37 3.44 1.02 2.63 3.00 1.14javac 8.51 8.34 0.98 7.38 7.60 1.03
pseudojbb 10.82 11.04 1.02 9.58 9.68 1.01db 14.12 14.40 1.02 13.06 11.88 0.91
geomean 1.01 1.02
1.5/ 1.5/
• Infinite heap does not degrade locality (!?)– Exceptions: jess (degrades), db (improves)
why?– Is spatial locality unimportant in mature
space???
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
BP & FL Locality Implications
• Is spatial locality unimportant in mature space??– No [Huang et al OOPSLA 2004]– But perhaps temporal locality is more significant
• Seems clear contiguous allocation is good– Vast majority of objects < cache line– h/w prefetcher may be significant
• Hard to improve over alloc order, easy to mess up?– Unlikely to be true: MarkSweep < Compacting <
SemiSpace
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Locality &Architecture
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
MS/SS Crossover: 1.6GHz PPC
1
1.5
2
2.5
3
1 2 3 4 5 6
Heap Size Relative to Minimum
Normalized Total Time
1.6GHz PPC SemiSpace
1.6GHz PPC MarkSweep
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
MS/SS Crossover: 1.9GHz AMD
1
1.5
2
2.5
3
1 2 3 4 5 6
Heap Size Relative to Minimum
Normalized Total Time
1.6GHz PPC SemiSpace
1.6GHz PPC MarkSweep
1.9GHz AMD SemiSpace
1.9GHz AMD MarkSweep
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
MS/SS Crossover: 2.6GHz P4
1
1.5
2
2.5
3
1 2 3 4 5 6
Heap Size Relative to Minimum
Normalized Total Time
1.6GHz PPC SemiSpace
1.6GHz PPC MarkSweep
1.9GHz AMD SemiSpace
1.9GHz AMD MarkSweep
2.6GHz P4 SemiSpace
2.6GHz P4 MarkSweep
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
MS/SS Crossover: 3.2GHz P4
1
1.5
2
2.5
3
1 2 3 4 5 6
Heap Size Relative to Minimum
Normalized Total Time
1.6GHz PPC SemiSpace
1.6GHz PPC MarkSweep
1.9GHz AMD SemiSpace
1.9GHz AMD MarkSweep
2.6GHz P4 SemiSpace
2.6GHz P4 MarkSweep
3.2GHz P4 SemiSpace
3.2GHz P4 MarkSweep
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
1
1.5
2
2.5
3
1 2 3 4 5 6
Heap Size Relative to Minimum
Normalized Total Time
1.6GHz PPC SemiSpace
1.6GHz PPC MarkSweep
1.9GHz AMD SemiSpace
1.9GHz AMD MarkSweep
2.6GHz P4 SemiSpace
2.6GHz P4 MarkSweep
3.2GHz P4 SemiSpace
3.2GHz P4 MarkSweep
MS/SS Crossover
2.6GHz2.6GHz
1.9GHz1.9GHz
1.6GHz1.6GHz
locality space
3.2GHz3.2GHz
Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage
collection
Conclusions• Need for (re) evaluation of GC
performance– Key GC insights > 20yrs old– Technology has changed– Absence of apples-to-apples comparisons– Highly architecturally sensitive
• MMTk + perf counters– High performance infrastructure– Multiple GCs, shared mechanisms
• Some myths exposed & new realities