octet: capturing and controlling cross-thread dependences efficiently
Post on 22-Mar-2016
65 Views
Preview:
DESCRIPTION
TRANSCRIPT
Octet: Capturing and Controlling Cross-Thread Dependences Efficiently
Michael BondMilind KulkarniMan CaoMinjia ZhangMeisam Fathi SalmiSwarnendu BiswasAritra SenguptaJipeng Huang
Ohio State
Purdue
Parallel programming is mainstream
Shared memory with locks
Challenge:performance & correctness
• Help express parallelism better• Eliminate concurrency errors• Diagnose production bugs• Deal with nondeterminism
Need practical runtime support
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic execution
Need practical runtime support
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic execution
Track dependences Control dependences
Need practical runtime support
o.f = …… = o.f
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic execution
Track dependences Control dependences
Need practical runtime support
o.f = …… = o.f
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic execution
Track dependences Control dependences
Need practical runtime support
o.f = …… = o.f
Commodity (software-only) approachesslow programs by several times
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic execution
Track dependences Control dependences
o.f = …check
… = o.fcheck
Need practical runtime support
Commodity (software-only) approachesslow programs by several times
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic execution
Track dependences Control dependences
o.f = …check
… = o.fcheck
Need practical runtime support
Any access could race add synchronization at every access
Octet
Framework for runtime supportHB edges all dependencesAtomicity of analysis & access
Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement
Octet
Framework for runtime supportHB edges all dependencesAtomicity of analysis & access
Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement
Proofs!
Octet tracks ownership
Each object’s state Є { WrExT , RdExT , RdShc }
wr o.f
T1 T2
write check
o’s state = WrExT1Ti
me
wr o.f
T1 T2
read check
write check
o’s state = WrExT1Ti
me
wr o.f
T1 T2
read check
write check
o’s state = WrExT1Ti
me
wr o.f
T1 T2
safe point
write check
read check
o’s state = WrExT1Ti
me
wr o.f
T1 T2
safe point
write check
read check
o’s state = WrExT1
Implicit safe pointTi
me
wr o.f
T1 T2
safe point
write check
read check
o’s state = WrExT1Ti
me
wr o.f
T1 T2
safe point
write check
read check
o’s state = RdExT2Ti
me
wr o.f
T1 T2
rd o.f
safe point
write check
read check
o’s state = RdExT2Ti
me
wr o.f
T1 T2
rd o.f
safe point
read check
write check
o’s state = RdExT2
T3 T4
wr o.f
T1 T2
rd o.f
T3
safe point
T4
read check
write check
read check
o’s state = RdExT2
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
T4
read check
write check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
T4
read check
write check
read check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
o’s state = RdShc
Sharing detection[von Praun & Gross ’01]Comparison in our paper
Distributed shared memoryShasta [Scales et al. ’96]
Biased locking[Kawachiya et al. ’02][Russell & Detlefs ’06][Hindman & Grossman ’06]
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic execution
Practical runtime support
Track dependences Control dependences
Framework for runtime supportConcurrency control mechanism O
ctet
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
Dependence recorder records happens-before edges
Implementation in Jikes RVMPublicly availablehttp://jikesrvm.org/Research+Archive
Parallel programsDaCapo Benchmarks 2006 & 2009SPEC JBB 2000 & 2005
Parallel platform32 cores (AMD Opteron 6272)
eclips
e6
hsqldb
6
lusea
...
xalan
6
avror
a9
jytho
n9
luind
ex9
lusea
...pm
d9
sunflo
w9xa
lan9
jbb20
00
jbb20
05 geo
0
100
200
300
400
500
600
700
800
900
1000
Pessimi...
Ove
rhea
d (%
)34,600% 3,000%
eclips
e6
hsqldb
6
lusea
...
xalan
6
avror
a9
jytho
n9
luind
ex9
lusea
...pm
d9
sunflo
w9xa
lan9
jbb20
00
jbb20
05 geo
0
20
40
60
80
100
120 Octet w/o coord
Octet w/o coord
Ove
rhea
d (%
)
eclips
e6
hsqldb
6
lusea
...
xalan
6
avror
a9
jytho
n9
luind
ex9
lusea
...pm
d9
sunflo
w9xa
lan9
jbb20
00
jbb20
05 geo
0
20
40
60
80
100
120
OctetOctet w/o coord
Ove
rhea
d (%
)
eclips
e6
hsqldb
6
lusea
...
xalan
6
avror
a9
jytho
n9
luind
ex9
lusea
...pm
d9
sunflo
w9xa
lan9
jbb20
00
jbb20
05 geo
0
20
40
60
80
100
120
RecorderOctetOctet w/o coord
Ove
rhea
d (%
)
Octet helps enable practical runtime support for reliable, scalable concurrency
Framework for runtime supportHB edges all dependencesAtomicity of analysis & access
Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement
top related