an case for an interleaving constrained shared-memory multi-processor jie yu and satish narayanasamy...

26
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Upload: tia-heslop

Post on 14-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

An Case for an Interleaving Constrained Shared-Memory

Multi-Processor

Jie Yu and Satish Narayanasamy

University of Michigan

Page 2: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Why is Parallel Programming Hard?

• Is single-threaded programming relatively easy?– Verification is NP-hard

– BUT, properties such as a function’s pre/post-conditions, loop invariants are verifiable in polynomial time

• Parallel programming is harder– Verifying properties for even small code regions is NP-

hard

– Reason: Unbounded number of legal thread interleavings exposed to the parallel runtime

– Impractical to test/verify properties for all legal interleavings

Page 3: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Legal Thread Interleavings

Too much freedom given to parallel runtime?

Tested Correct

Interleavings

Incorrect interleavings found during testing

Incorrect interleavings eliminated by adding synchronization constraints

Untested interleavings - cause for concurrency bugs

Page 4: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Solution : Limit Freedom

Programmer tests as many legal interleavingsas practically possible

Interleaving constraints from

correct test runs are encoded in the program binary

Runtime System Avoids Untested Interleavings

i.e. avoid corner cases

Page 5: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Result of Constraining Interleavings

• A majority of the concurrency bugs are avoidable– Data races, atomicity violations, and

also order violations

• Performance overhead is low– Untested interleavings in well-tested

programs are likely to manifest rarely– Processor support helps reduce the cost

of enforcing interleaving constraints

Page 6: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Challenges

• How to encode tested interleavings in a program’s binary?– Predecessor Set (PSet)

interleaving constraints

• How to efficiently enforce interleaving constraints at runtime?• Detect violations of PSet

constraints using processor support

• Avoid violations by stalling or using rollback-and-re-execution support

Page 7: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Encoding Tested Interleavings

• Interleaving Constraints from Test Runs– Too specific to a test input Performance

loss for a different input– Too generic Might allow untested

interleavings

• Predecessor Set (Pset)– PSet(m)defined for each static memory

operation m– pred PSet(m), if m is immediately and

remotely memory dependent on pred in at least one tested execution

Page 8: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

A Test RunThread

1Thread

2Thread

3

R2

W1

R1

R3

W2

R4

W3

{ W1 }

{ }

{ }

{ W1 }

{ W2 }

{ }

{ R3, R4 }

PSet(W1) = {}PSet(R1) = {}PSet(R2) = {W1}PSet(R3) = {W1}PSet(R4) = {}PSet(W2) = {R3,R4}PSet(W3) = {W2}

R2

R4

W1

Page 9: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Enforcing Tested Interleaving

• Processor support for detecting and avoiding PSet constraints

• Detecting PSet constraint violations– For each memory location, track its last accessor

• Cache extension – Detect PSet constraint violation

• Piggyback cache coherence reply with last accessor • Processor executes PSet membership test by executing

additional micro-ops

• Overcoming a PSet Constraint violation– Stall– Re-execute using checkpoint-and-rollback support

• E.g. SafetyNet, ReVive, etc.

Page 10: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Two Case Studies

• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall

• Case Study 2– An order violation bug in Mozilla

• neither a data race nor an atomicity violation

– Avoided using rollback and re-execution

Page 11: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Two Case Studies

• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall

• Case Study 2– An order violation bug in Mozilla

• neither a data race nor an atomicity violation

– Avoided using rollback and re-execution

Page 12: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

An Atomicity Violation Bug in MySQL

MYSQL_LOG::new_file(){ … close(); open(…); …}

mysql_insert(…){ … if (log_status != LOG_CLOSED) { // write into a log file } …}

…log_status = LOG_CLOSED;…

…log_status = LOG_OPEN;…

Thread 1

sql/log.cc sql/sql_insert.cc

W2

W1

R1

Thread 2

Page 13: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Correct Interleaving #1 -- “frequent”, therefore likely to be

tested

Thread 1

Thread 2

log_status = LOG_CLOSED

log_status = LOG_OPENW2

log_status != LOG_CLOSED ?

W1

R1

{ R1 }

{ }

{ }

PSet(W1) = {R1}PSet(W2) = {}PSet(R1) = {}

Page 14: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Correct Interleaving #2 -- “frequent”, therefore likely to be

tested

Thread 1

Thread 2

log_status = LOG_CLOSED

log_status = LOG_OPENW2

log_status != LOG_CLOSED ?

W1

R1

{ R1 }

{ }

{ }{ W2 }

PSet(W1) = {R1}PSet(W2) = {}PSet(R1) = {W2}

Page 15: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

log_status != LOG_CLOSED ?

Incorrect Interleaving -- rare, and therefore likely to be

untested

Thread 1

Thread 2

log_status = LOG_CLOSED

log_status = LOG_OPENW2

W1

R1

{ R1 }

{ }

{ W2 }

Constraint ViolationPSet(R1)W1

PSet(R1)W2

Page 16: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Two Case Studies

• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall

• Case Study 2– An order violation bug in Mozilla

• neither a data race nor an atomicity violation

– Avoided using rollback and re-execution

Page 17: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Correct Test RunTimerThread::Run() { ... Lock(lock); mProcessing = TRUE; while (mProcessing) { ... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock); ...}

TimerThread.cpp

TimerThread::Shutdown() { ... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock); ... mThread->Join(); return NS_OK;}

TimerThread.cpp

mWaiting = TRUE

if (mWaiting) ?

Thread 1

Thread 2

W

R

W

R

{ }

{ W }

PSet(W) = {}PSet(R) = {W}

Page 18: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Avoiding Order ViolationTimerThread::Run() { ... Lock(lock); mProcessing = TRUE; while (mProcessing) { ... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock); ...}

TimerThread.cpp

TimerThread::Shutdown() { ... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock); ... mThread->Join(); return NS_OK;}

TimerThread.cpp

mWaiting = TRUE

if (mWaiting) ?

W

R

Thread 1

Thread 2

W

R

{ }

{ W }

Constraint ViolationPSet(W)R

Rollback

Page 19: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Methodology

• Pin based analysis

• 17 documented bugs analyzed– MySQL, Apache, Mozilla, pbzip, aget, pfscan

+ Parsec, Splash for performance study

• Applications tested using regression test suites when available or random test input

Page 20: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

PSet Constraints from Test Runs

• Concurrent workload– MySQL: run regression test

suite in parallel with OSDB– FFT, pbzip2: random test

input

Page 21: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Bug Avoidance Capability• 17 bugs from MySQL, Apache, Mozilla, pbzip, aget,

pfscan

• 15/17 bugs avoided by enforcing PSet contraints– Including a bug that is neither a data race nor an

atomicity violation bug

• 2/17 false negatives– a multi-variable atomicity violation – a context sensitive deadlock bug

• 6 bugs are avoided using stalling mechanism. Other require rollback mechanism.

Page 22: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

PSet violations in Bug Free Execution

• 2 PSet constraint violations in MySQL not avoided– MySQL, bmove512 unrolls a loop 128 times

Page 23: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

PSet Size of Instructions

Over 95% of the inst. have PSets of size zero

Less than 2% of static memory inst. have a PSet of size greater than two

Page 24: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Summary• Multi-threaded programming is hard

– Existing shared-memory programming model exposes too many legal interleavings to the runtime

– Most interleavings remain untested in production code

• Interleaving constrained shared-memory

multiprocessor – Avoids untested (rare) interleavings to avoid

concurrency bugs

• Predecessor Set interleaving constraints– 15/17 concurrency bugs are avoidable– Acceptable performance and space overhead

Page 25: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Thanks

• Q & A

Page 26: An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Memory Space Overhead

ProgramApp. Size

# PSet Pairs

Overhead w.r.t App.

Pbzip2 39KB 201 2.16%

Aget 90KB 365 1.69%

Pfscan 17KB 295 7.34%

Apache 2435KB 4119 0.69%

MySQL 4284KB 6604 0.64%

FFT 24KB 158 2.74%

FMM 73KB 1764 10.13%

LU 24KB 244 4.31%

Radix 21KB 255 5.00%

Blackscholes

54KB 41 0.32%

Canneal 59KB 752 5.24%

Space Overhead In the worst case, 10%

code size increase