michael bond milind kulkarni minjia zhang meisam fathi...

Post on 17-Jul-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang

Ohio State

Purdue

Parallel programming is mainstream

Shared memory with locks

Challenge: performance & correctness

• Help express parallelism better • Eliminate concurrency errors • Diagnose production bugs • Deal with nondeterminism

Need practical runtime support

• Atomicity checking • Data race detection • Record & replay

• Transactional memory • DRF/SC enforcement • Deterministic execution

Need practical runtime support

• Atomicity checking • Data race detection • Record & replay

• Transactional memory • DRF/SC enforcement • Deterministic execution

Track dependences Control dependences

Need practical runtime support

o.f = …

… = o.f

• Atomicity checking • Data race detection • Record & replay

• Transactional memory • DRF/SC enforcement • Deterministic execution

Track dependences Control dependences

Need practical runtime support

o.f = …

… = o.f

• Atomicity checking • Data race detection • Record & replay

• Transactional memory • DRF/SC enforcement • Deterministic execution

Track dependences Control dependences

Need practical runtime support

o.f = …

… = o.f

Commodity (software-only) approaches slow programs by several times

• Atomicity checking • Data race detection • Record & replay

• Transactional memory • DRF/SC enforcement • Deterministic execution

Track dependences Control dependences

o.f = …

check

… = o.f

check

Need practical runtime support

Commodity (software-only) approaches slow programs by several times

• Atomicity checking • Data race detection • Record & replay

• Transactional memory • DRF/SC enforcement • Deterministic execution

Track dependences Control dependences

o.f = …

check

… = o.f

check

Need practical runtime support

Any access could race add synchronization at every access

Octet

Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement

Octet

Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement

Proofs!

Octet tracks ownership

Each object’s state Є { WrExT , RdExT , RdShc }

wr o.f

T1 T2

write check

o’s state = WrExT1 T

ime

wr o.f

T1 T2

read check

write check

o’s state = WrExT1 T

ime

wr o.f

T1 T2

read check

write check

o’s state = WrExT1 T

ime

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1 T

ime

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1

Implicit safe point T

ime

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1 T

ime

wr o.f

T1 T2

safe point

write check

read check

o’s state = RdExT2 T

ime

wr o.f

T1 T2

rd o.f

safe point

write check

read check

o’s state = RdExT2 T

ime

wr o.f

T1 T2

rd o.f

safe point

read check

write check

o’s state = RdExT2

T3 T4

wr o.f

T1 T2

rd o.f

T3

safe point

T4

read check

write check

read check

o’s state = RdExT2

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

T4

read check

write check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

T4

read check

write check

read check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

o’s state = RdShc

Sharing detection [von Praun & Gross ’01] Comparison in our paper Distributed shared memory Shasta [Scales et al. ’96] Biased locking [Kawachiya et al. ’02] [Russell & Detlefs ’06] [Hindman & Grossman ’06]

• Atomicity checking • Data race detection • Record & replay

• Transactional memory • DRF/SC enforcement • Deterministic execution

Practical runtime support

Track dependences Control dependences

Framework for runtime support Concurrency control mechanism

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

Dependence recorder records happens-before edges

Implementation in Jikes RVM Publicly available http://jikesrvm.org/Research+Archive

Parallel programs DaCapo Benchmarks 2006 & 2009

SPEC JBB 2000 & 2005

Parallel platform 32 cores (AMD Opteron 6272)

0

100

200

300

400

500

600

700

800

900

1000

Ove

rhe

ad

(%

)

Pessimistic

34,600% 3,000%

0

20

40

60

80

100

120O

verh

ea

d (

%)

Octet w/o coord

Octet w/o coord

0

20

40

60

80

100

120O

verh

ea

d (

%)

Octet

Octet w/o coord

0

20

40

60

80

100

120O

verh

ea

d (

%)

Recorder

Octet

Octet w/o coord

Octet helps enable practical runtime support for reliable, scalable concurrency Framework for runtime support HB edges all dependences Atomicity of analysis & access Concurrency control mechanism Synchronization cross-thread dependence Qualitative performance improvement

top related