octet: capturing and controlling cross-thread dependences efficiently

Post on 22-Mar-2016

65 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang. Purdue. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. Ohio State. Parallel programming is mainstream. Shared memory with locks Challenge: - PowerPoint PPT Presentation

TRANSCRIPT

Octet: Capturing and Controlling Cross-Thread Dependences Efficiently

Michael BondMilind KulkarniMan CaoMinjia ZhangMeisam Fathi SalmiSwarnendu BiswasAritra SenguptaJipeng Huang

Ohio State

Purdue

Parallel programming is mainstream

Shared memory with locks

Challenge:performance & correctness

• Help express parallelism better• Eliminate concurrency errors• Diagnose production bugs• Deal with nondeterminism

Need practical runtime support

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic execution

Need practical runtime support

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic execution

Track dependences Control dependences

Need practical runtime support

o.f = …… = o.f

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic execution

Track dependences Control dependences

Need practical runtime support

o.f = …… = o.f

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic execution

Track dependences Control dependences

Need practical runtime support

o.f = …… = o.f

Commodity (software-only) approachesslow programs by several times

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic execution

Track dependences Control dependences

o.f = …check

… = o.fcheck

Need practical runtime support

Commodity (software-only) approachesslow programs by several times

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic execution

Track dependences Control dependences

o.f = …check

… = o.fcheck

Need practical runtime support

Any access could race add synchronization at every access

Octet

Framework for runtime supportHB edges all dependencesAtomicity of analysis & access

Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement

Octet

Framework for runtime supportHB edges all dependencesAtomicity of analysis & access

Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement

Proofs!

Octet tracks ownership

Each object’s state Є { WrExT , RdExT , RdShc }

wr o.f

T1 T2

write check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

read check

write check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

read check

write check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1

Implicit safe pointTi

me

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

safe point

write check

read check

o’s state = RdExT2Ti

me

wr o.f

T1 T2

rd o.f

safe point

write check

read check

o’s state = RdExT2Ti

me

wr o.f

T1 T2

rd o.f

safe point

read check

write check

o’s state = RdExT2

T3 T4

wr o.f

T1 T2

rd o.f

T3

safe point

T4

read check

write check

read check

o’s state = RdExT2

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

T4

read check

write check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

T4

read check

write check

read check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

o’s state = RdShc

Sharing detection[von Praun & Gross ’01]Comparison in our paper

Distributed shared memoryShasta [Scales et al. ’96]

Biased locking[Kawachiya et al. ’02][Russell & Detlefs ’06][Hindman & Grossman ’06]

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic execution

Practical runtime support

Track dependences Control dependences

Framework for runtime supportConcurrency control mechanism O

ctet

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

Dependence recorder records happens-before edges

Implementation in Jikes RVMPublicly availablehttp://jikesrvm.org/Research+Archive

Parallel programsDaCapo Benchmarks 2006 & 2009SPEC JBB 2000 & 2005

Parallel platform32 cores (AMD Opteron 6272)

eclips

e6

hsqldb

6

lusea

...

xalan

6

avror

a9

jytho

n9

luind

ex9

lusea

...pm

d9

sunflo

w9xa

lan9

jbb20

00

jbb20

05 geo

0

100

200

300

400

500

600

700

800

900

1000

Pessimi...

Ove

rhea

d (%

)34,600% 3,000%

eclips

e6

hsqldb

6

lusea

...

xalan

6

avror

a9

jytho

n9

luind

ex9

lusea

...pm

d9

sunflo

w9xa

lan9

jbb20

00

jbb20

05 geo

0

20

40

60

80

100

120 Octet w/o coord

Octet w/o coord

Ove

rhea

d (%

)

eclips

e6

hsqldb

6

lusea

...

xalan

6

avror

a9

jytho

n9

luind

ex9

lusea

...pm

d9

sunflo

w9xa

lan9

jbb20

00

jbb20

05 geo

0

20

40

60

80

100

120

OctetOctet w/o coord

Ove

rhea

d (%

)

eclips

e6

hsqldb

6

lusea

...

xalan

6

avror

a9

jytho

n9

luind

ex9

lusea

...pm

d9

sunflo

w9xa

lan9

jbb20

00

jbb20

05 geo

0

20

40

60

80

100

120

RecorderOctetOctet w/o coord

Ove

rhea

d (%

)

Octet helps enable practical runtime support for reliable, scalable concurrency

Framework for runtime supportHB edges all dependencesAtomicity of analysis & access

Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement

top related