michael bond milind kulkarni minjia zhang meisam fathi...
Post on 17-Jul-2020
5 Views
Preview:
TRANSCRIPT
Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang
Ohio State
Purdue
Parallel programming is mainstream
Shared memory with locks
Challenge: performance & correctness
• Help express parallelism better • Eliminate concurrency errors • Diagnose production bugs • Deal with nondeterminism
Need practical runtime support
• Atomicity checking • Data race detection • Record & replay
• Transactional memory • DRF/SC enforcement • Deterministic execution
Need practical runtime support
• Atomicity checking • Data race detection • Record & replay
• Transactional memory • DRF/SC enforcement • Deterministic execution
Track dependences Control dependences
Need practical runtime support
o.f = …
… = o.f
• Atomicity checking • Data race detection • Record & replay
• Transactional memory • DRF/SC enforcement • Deterministic execution
Track dependences Control dependences
Need practical runtime support
o.f = …
… = o.f
• Atomicity checking • Data race detection • Record & replay
• Transactional memory • DRF/SC enforcement • Deterministic execution
Track dependences Control dependences
Need practical runtime support
o.f = …
… = o.f
Commodity (software-only) approaches slow programs by several times
• Atomicity checking • Data race detection • Record & replay
• Transactional memory • DRF/SC enforcement • Deterministic execution
Track dependences Control dependences
o.f = …
check
… = o.f
check
Need practical runtime support
Commodity (software-only) approaches slow programs by several times
• Atomicity checking • Data race detection • Record & replay
• Transactional memory • DRF/SC enforcement • Deterministic execution
Track dependences Control dependences
o.f = …
check
… = o.f
check
Need practical runtime support
Any access could race add synchronization at every access
Octet
Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement
Octet
Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement
Proofs!
Octet tracks ownership
Each object’s state Є { WrExT , RdExT , RdShc }
wr o.f
T1 T2
write check
o’s state = WrExT1 T
ime
wr o.f
T1 T2
read check
write check
o’s state = WrExT1 T
ime
wr o.f
T1 T2
read check
write check
o’s state = WrExT1 T
ime
wr o.f
T1 T2
safe point
write check
read check
o’s state = WrExT1 T
ime
wr o.f
T1 T2
safe point
write check
read check
o’s state = WrExT1
Implicit safe point T
ime
wr o.f
T1 T2
safe point
write check
read check
o’s state = WrExT1 T
ime
wr o.f
T1 T2
safe point
write check
read check
o’s state = RdExT2 T
ime
wr o.f
T1 T2
rd o.f
safe point
write check
read check
o’s state = RdExT2 T
ime
wr o.f
T1 T2
rd o.f
safe point
read check
write check
o’s state = RdExT2
T3 T4
wr o.f
T1 T2
rd o.f
T3
safe point
T4
read check
write check
read check
o’s state = RdExT2
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
T4
read check
write check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
T4
read check
write check
read check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
o’s state = RdShc
Sharing detection [von Praun & Gross ’01] Comparison in our paper Distributed shared memory Shasta [Scales et al. ’96] Biased locking [Kawachiya et al. ’02] [Russell & Detlefs ’06] [Hindman & Grossman ’06]
• Atomicity checking • Data race detection • Record & replay
• Transactional memory • DRF/SC enforcement • Deterministic execution
Practical runtime support
Track dependences Control dependences
Framework for runtime support Concurrency control mechanism
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
Dependence recorder records happens-before edges
Implementation in Jikes RVM Publicly available http://jikesrvm.org/Research+Archive
Parallel programs DaCapo Benchmarks 2006 & 2009
SPEC JBB 2000 & 2005
Parallel platform 32 cores (AMD Opteron 6272)
0
100
200
300
400
500
600
700
800
900
1000
Ove
rhe
ad
(%
)
Pessimistic
34,600% 3,000%
0
20
40
60
80
100
120O
verh
ea
d (
%)
Octet w/o coord
Octet w/o coord
0
20
40
60
80
100
120O
verh
ea
d (
%)
Octet
Octet w/o coord
0
20
40
60
80
100
120O
verh
ea
d (
%)
Recorder
Octet
Octet w/o coord
Octet helps enable practical runtime support for reliable, scalable concurrency Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement
top related