analysis of multithreaded programs martin rinard laboratory for computer science massachusetts...

Post on 30-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Analysis of Multithreaded Programs

Martin RinardLaboratory for Computer Science

Massachusetts Institute of Technology

What is a multithreaded program?

Multiple ParallelThreads Of Control

SharedMutableMemory

read write

Lock Acquire and Release

NOT general parallel programs

No message passing

No tuple spaces

No functional programs

No concurrent constraint programs

NOT just multiple threads of control

No continuations

No reactive systems

Why do programmers use threads?

• Performance (parallel computing programs)• Single computation• Execute subcomputations in parallel• Example: parallel sort

• Program structuring mechanism (activity management programs)

• Multiple activities• Thread for each activity• Example: web server

• Properties have big impact on analyses

Practical Implications

• Threads are useful and increasingly common• POSIX threads standard for C, C++• Java has built-in thread support • Widely used in industry

• Threads introduce complications• Programs viewed as more difficult to

develop• Analyses must handle new model of

execution• Lots of interesting and important problems!

Outline

• Examples of multithreaded programs• Parallel computing program• Activity management program

• Analyses for multithreaded programs• Handling data races• Future directions

Parallel Sort

Example - Divide and Conquer Sort

47 6 1 53 8 2

8 2536 147

Example - Divide and Conquer Sort

47 6 1 53 8 2

Divide

2 8531 674

8 2536 147

47 6 1 53 8 2

Example - Divide and Conquer Sort

Conquer

Divide

Example - Divide and Conquer Sort

2 8531 674 Conquer

8 2536 147 Divide

47 6 1 53 8 2

41 6 7 32 5 8Combine

Example - Divide and Conquer Sort

2 8531 674 Conquer

8 2536 147 Divide

47 6 1 53 8 2

41 6 7 32 5 8Combine

21 3 4 65 7 8

Divide and Conquer Algorithms

• Lots of Recursively Generated Concurrency• Solve Subproblems in Parallel

Divide and Conquer Algorithms

• Lots of Recursively Generated Concurrency• Recursively Solve Subproblems in

Parallel

Divide and Conquer Algorithms

• Lots of Recursively Generated Concurrency• Recursively Solve Subproblems in

Parallel• Combine Results in Parallel

“Sort n Items in d, Using t as Temporary Storage”

void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

“Sort n Items in d, Using t as Temporary Storage”

void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

Divide array into subarrays and recursively sort subarrays in

parallel

“Sort n Items in d, Using t as Temporary Storage”

void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

Subproblems Identified

Using Pointers Into Middle of Array

47 6 1 53 8 2d

d+n/4d+n/2

d+3*(n/4)

“Sort n Items in d, Using t as Temporary Storage”

void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

74 1 6 53 2 8d

d+n/4d+n/2

d+3*(n/4)

Sorted Results Written Back Into

Input Array

“Merge Sorted Quarters of d Into Halves of t”

void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

74 1 6 53 2 8

41 6 7 32 5 8

d

tt+n/2

“Merge Sorted Halves of t Back Into d”void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

21 3 4 65 7 8

41 6 7 32 5 8

d

tt+n/2

“Use a Simple Sort for Small Problem Sizes”

void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

47 6 1 53 8 2

dd+n

“Use a Simple Sort for Small Problem Sizes”

void sort(int *d, int *t, int n) if (n > CUTOFF) {

spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn

sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn

merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);

} else insertionSort(d,d+n);

47 1 6 53 8 2

dd+n

Key Properties of Parallel Computing Programs

• Structured form of multithreading• Parallelism confined to small region • Single thread coming in• Multiple threads exist during computation• Single thread going out

• Deterministic computation• Tasks update disjoint parts of data

structure in parallel without synchronization

• May also have parallel reductions

Web Server

Accept newconnection

Start newclient thread

Main Loop

Client Threads

Accept newconnection

Start newclient thread

Main Loop

Client Threads

Accept newconnection

Start newclient thread

Main Loop

Client Threads

Wait for input

Produce output

Accept newconnection

Start newclient thread

Main Loop

Client Threads

Wait for input

Produce output

Accept newconnection

Start newclient thread

Main Loop

Client Threads

Wait for input

Produce output

Wait for input

Accept newconnection

Start newclient thread

Main Loop

Client Threads

Wait for input

Produce output

Wait for input

Accept newconnection

Start newclient thread

Main Loop

Client Threads

Wait for input

Produce output

Wait for inputWait for input

Produce output

Accept newconnection

Start newclient thread

Main Loop

Wait for input

Produce output

Wait for input

Produce output

Wait for input

Produce output

Client Threads

Main Loop

Class Main { static public void loop(ServerSocket

s) { c = new Counter();

while (true) { Socket p = s.accept();

Worker t = new Worker(p,c); t.start();

}}

Accept newconnection

Start newclient thread

Worker threads

class Worker extends Thread { Socket s; Counter c; public void run() { out = s.getOutputStream(); in = s.getInputStream();

while (true) { inputLine = in.readLine();

c.increment(); if (inputLine == null) break;

out.writeBytes(inputLine + "\n");

} }}

Wait for input

Increment counter

Produce output

Synchronized Shared Counter

Class Counter {

int contents = 0;

synchronized void increment() {

contents++;

}

}

Acquire lock

Increment counter

Release lock

Simple Activity Management Programs

• Fixed, small number of threads• Based on functional decomposition

User InterfaceThread

Device Managemen

t Thread

Compute Thread

Key Properties of Activity Management Programs

• Threads manage interactions• One thread per client or activity• Blocking I/O for interactions

• Unstructured form of parallelism • Object is unit of sharing

• Mutable shared objects (mutual exclusion) • Private objects (no synchronization)• Read shared objects (no synchronization)• Inherited objects passed from parent to

child

Why analyze multithreaded programs?

Discover or certify absence of errors(multithreading introduces new kinds of errors)

Discover or verify application-specific properties(interactions between threads complicate analysis)

Enable optimizations(new kinds of optimizations with multithreading)

(complications with traditional optimizations)

Classic Errors in Multithreaded Programs

DeadlocksData Races

Deadlock

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock

Threads 1 and 2 Start Execution

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock

Thread 1 acquires lock l

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock

Thread 2 acquires lock m

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock

Thread 1 holds l and waits for m

while

Thread 2 holds m and waits for l

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Data Races

A[i] = v;

A[j] = w;

||

A[j] = w

A[i] = v

Data race

Data race if two parallel threads access same memory location and at least one access is a

write

A[j] = w

A[i] = v

No data race

Synchronization and Data Races

Thread 1:

lock(l);

x = x + 1;

unlock(l);

Thread 2:

lock(l);

x = x + 2;

unlock(l);

No data race if synchronization separates accesses

Synchronization protocol: Associate lock with data

Acquire lock to update data atomically

Why are data races errors?

• Exist correct programs which contain races

• But most races are programming errors• Code intended to execute atomically• Synchronization omitted by mistake

• Consequences can be severe• Nondeterministic, timing-dependent

errors• Data structure corruption• Complicates analysis and optimization

Overview of Analyses for Multithreaded Programs

Key problem: interactions between threads

• Flow-insensitive analyses• Escape analyses • Dataflow analyses

• Explicit parallel flow graphs• Interference summary analysis

• State space exploration

Escape Analyses

void compute(d,e) ———— ———— ————

void multiplyAdd(a,b,c) ————————— ————————— —————————

void multiply(m) ———— ———— ————

void add(u,v) —————— ——————

void main(i,j) ——————— ——————— ———————

void evaluate(i,j) —————— —————— ——————

void abs(r) ———— ———— ————

void scale(n,m) —————— ——————

Program With Allocation Sites

void compute(d,e) ———— ———— ————

void multiplyAdd(a,b,c) ————————— ————————— —————————

void multiply(m) ———— ———— ————

void add(u,v) —————— ——————

void main(i,j) ——————— ——————— ———————

void evaluate(i,j) —————— —————— ——————

void abs(r) ———— ———— ————

void scale(n,m) —————— ——————

Program With Allocation Sites

Correlate lifetimes of objectswith lifetimes of computations

void compute(d,e) ———— ———— ————

void multiplyAdd(a,b,c) ————————— ————————— —————————

void multiply(m) ———— ———— ————

void add(u,v) —————— ——————

void main(i,j) ——————— ——————— ———————

void evaluate(i,j) —————— —————— ——————

void abs(r) ———— ———— ————

void scale(n,m) —————— ——————

Program With Allocation Sites

Correlate lifetimes of objectswith lifetimes of computations

Objects allocated

at this site

Do not escapecomputation of

this method

Classical Approach

• Reachability analysis• If an object is reachable only from local

variables of current procedure, then object does not escape

that procedure

Escape Analysis for Multithreaded Programs

• Extend analysis to recognize when objects do not escape to parallel thread – OOPSLA 1999• Blanchet• Bogda, Hoelzle• Choi, Bupta, Serrano, Sreedhar, Midkiff• Whaley, Rinard

• Analyze interactions to recapture objects that do not escape multithreaded subcomputation• Salcianu, Rinard – PPoPP 2001

Applications

• Synchronization elimination• Stack allocation • Region-based allocation• Data race detection

Eliminate accesses to captured objects as source of data races

Analysis via Parallel Flow Graphs

Parallel Flow Graphs

p = &x

*p = &y

p = &z

q = &a

*q = &b

x y

p z

q a b

Thread 1 Thread 2

Intrathread control-flow edges

Interthread control-flow edges

Heap

Basic Idea: Do dataflow analysis on parallel flow graph

Infeasible Paths Issue

p = &x

*p = &y

p = &z

q = &a

*q = &b

Thread 1 Thread 2

Infeasible Path

x y

p z

q a b

Heap

Infeasible paths cause analysis to lose precision

Because of infeasible path, analysis thinks x z

Analysis Time IssuePotential Solutions

• Partial Order Approaches

p = &x

*p = &y

p = &z

q = &a

*q = &b

Thread 1 Thread 2

Analysis Time IssuePotential Solutions

• Partial Order Approaches – remove edges between statements

in independent regionsp = &x

*p = &y

p = &z

q = &a

*q = &b

Thread 1 Thread 2

Analysis Time IssuePotential Solutions

• Partial Order Approaches – remove edges between statements

in independent regions• How to recognize

independent regions?• Seems like might

need analysis…

p = &x

*p = &y

p = &z

q = &a

*q = &b

Thread 1 Thread 2

Potential Solutions• Partial Order Approaches• Control flow/synchronization analysis

• Synchronization may prevent m from immediately preceding n in execution

• If so, no edge from m to n

No edges between these

statements

y = 1lock(a)y = y + wx = x + 1unlock(a)

x = 1lock(a)x = x + vy = y + 1unlock(a)

Analysis Time Issue

Experience

• Lots of research in field over last two decades• Deadlock detection• Data race detection• Control analysis for multithreaded programs

(mutual exclusion, precedence properties)• Finite-state properties

• Scope – simple activity management programs• Inlinable programs• Bounded threads and objects

References

•FLAVERS•Dwyer, Clarke - FSE 1994•Naumovich, Avrunin, Clarke – FSE 1999•Naumovich, Clarke, Cobleigh – PASTE 1999

•Masticola, Ryder •ICPP 1990 (deadlock detection)•PPoPP 1993 (control-flow analysis)

•Duesterwald, Soffa - TAV 1991•Handles procedures

•Blieberger, Burgstaller, Scholz – Ada Europe 2000

•Symbolic analysis for dynamic thread creation

Scope•Inlinable

programs•Bounded objects

and threads

Interference Approaches

Dataflow Analysis for Bitvector Problems

• Knoop, Steffen, Vollmer – TOPLAS 1996• Bitvector problems

• Dataflow information is a vector of bits• Transfer function for one bit does not

depend on values of other bits• Examples

•Reaching definitions•Available expressions

• As efficient and precise as sequential version!

Available Expressions Example

a = x + y

c = x + y

x = b

b = x + y

d = x + y

Where is x+y available?

Available here!

parbegin

parend

Available here!

Not available here (killed by x = b)

???

Three Interleavings

a = x + y

c = x + y

x = b

b = x + y

d = x + y

Available here!

a = x + y

c = x + y

x = b

b = x + y

d = x + y

a = x + y

c = x + y

x = b

b = x + y

d = x + y

Not available here (killed by x = b)

Available here!

Available Expressions Example

a = x + y

c = x + y

x = b

b = x + y

d = x + y

Where is x+y available?

Available here!

Not available here (killed by x = b)

Not available here (killed by x = b)

parbegin

parend

Available here!

Available here!

Key Concept: Interference

•x=b interferes with x+y•x+y not available at any

statement that executes in parallel with x=b

•Nice algorithm:• Precompute

interference• Propagate information

along sequential control-flow edges only!

• Handle parallel joins specially

a = x + y

c = x + y

x = b

b = x + y

d = x + y

parbegin

parend

Limitations

• No procedures• Bitvector problems only (no pointer analysis)• But can remove these limitations

• Integrate interference into abstraction•Adjust rules to flow information from end

of thread to start of parallel threads•Iteratively compute interactions

• Summary-based approach for procedures • Lose precision for non-bitvector problems

k = j

Pointer Analysis for Multithreaded Programs

• Dataflow information is a triple <C, I, E> :• C = current points-to information

• I = interference points-to edges from parallel threads

• E = set of points-to edges created by current thread

• Interference: Ik = U Ej

where t1 … tn are n parallel threads

• Invariant: I C

• Within each thread, interference points-to edges are always added to the current information

Analysis for Example

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

Analysis for Example

parbegin

parend

p = &x;

p = &y;*p = 1;

Where does p point to at this

statement?

Where does p point to at this

statement?*p = 2;

Analysis for Example

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

p x , , > < p x

Analysis of Parallel Threads

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > < p x

p x , , > < p x

, , > < p x

Analysis of Parallel Threads

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > < p x

p x , , > < p x

, , > < p x

, , > < p x

Analysis of Parallel Threads

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > <

p y , , > < p y

p x

p x , , > < p x

, , > < p x

, , > < p x

Analysis of Parallel Threads

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > <

p y , , > < p y

p x

p x , , > < p x

, , > < p x

, , > < p x

Analysis of Parallel Threads

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > <

p y , , > < p y

p x

p x , , > < p x

p , > < x

yp y,

Analysis of Parallel Threads

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > <

p y , , > < p y

p x

p x , , > < p x

p , > < x

yp y,

p > < x

yp y , ,

Analysis of Parallel Threads

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > <

p y , , > < p y

p x

p x , , > < p x

p , > < x

yp y,

p , > < x

yp y,

Analysis of Thread Joins

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

p, , > < x

y

p y , , > < p y

p y

p , > < x

yp y,

p x , , > < p x

, , > < p xp , > < x

yp y,

Analysis of Thread Joins

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

p, , > < x

y

p y , , > < p y

p y

p , > < x

yp y,

p x , , > < p x

, , > < p xp , > < x

yp y,

Final Result

parbegin

parend

p = &x;

p = &y;*p = 1;

*p = 2;

, , > <

p, , > < x

y

p y , , > < p y

p x

p y

p x , , > < p x

p , > < x

yp y,

p , > < x

yp y,

General Dataflow Equations

parbegin

parend

Parent Thread

Thread 2Thread 1

Parent Thread

C E, I , > <

C U E2 , I U E2 , > <

U

C U E1 , I U E1 , < >

C1 E2, I U E1 , > <

C1 C2 E U E1 U E2, I , > <

C1 E1, I U E2 , > <

General Dataflow Equations

parbegin

parend

Thread 2Thread 1

Parent Thread

C U E2 , I U E2 , > < C U E1 , I U E1 , < >

C2 E2, I U E1 , > < C1 E1, I U E2 , > <

U

C1 C2 E U E1 U E2, I , > <

Parent Thread

C E, I , > <

General Dataflow Equations

parbegin

parend

Thread 2Thread 1

Parent Thread

C U E2 , I U E2 , > < C U E1 , I U E1 , < >

C2 E2, I U E1 , > < C1 E1, I U E2 , > <

U

C1 C2 E U E1 U E2, I , > <

Parent Thread

C E, I , > <

Compositionality Extension

• Compositional at thread level• Analyze each thread once in

isolation• Abstraction captures potential

interactions• Compute interactions whenever need

information• Combine with escape analysis to

obtain partial program analysis

Experience & Expectations

• Limited implementation experience• Pointer analysis (Rugina, Rinard – PLDI

2000)• Compositional pointer and escape

analysis (Salcianu, Rinard – PPoPP 2001)• Small but real programs

• Promising approach• Scales like analyses for sequential

programs• Partial program analyses

Issues

• Developing abstractions• Need interference abstraction• Need fork/join rules• Need interaction analysis

• Analysis time• Precision for richer abstractions

State Space Exploration

State Space Exploration for Multithreaded Programs

Thread 1:

lock(a)lock(b)t = xx = yy = tunlock(b)unlock(a)

Thread 2:

lock(b)lock(a)s = yy = xx = sunlock(a)unlock(b)

/* a controls x, b controls y */

lock a, b;int x, y;

State Space Exploration

2: lock(b)1: lock(b) 2: lock(b)1: lock(a)

1: lock(a) 2: lock(b)

DeadlockedStates

Strengths

• Conceptually simple (at least at first…)• Harmony with other areas of computer

science(simple search often beats more sophisticated approaches)

• Can test for lots of properties and errors• Lots of technology and momentum in this

area• Packaged model checkers• Big successes in hardware verification

Challenges• Analysis time• Unbounded program features

• Dynamic thread creation• Dynamic object creation

• Potential solutions• Sophisticated abstractions (increases

complexity…)• Cousot, Cousot - 1984• Chow, Harrison – POPL 1992• Yahav – POPL 2001

• Granularity coarsening/partial-order techniques• Chow, Harrison – ICCL 1994• Valmari – CAV 1990• Godefroid, Wolper – LICS 1991

Granularity Coarsening

x = 1

y = 2

a = 3

b = 4

x = 1

y = 2

x = 1

y = 2

a = 3

b = 4

a = 3

b = 4

x = 1

y = 2

a = 3

b = 4

Basic Idea:Eliminate Analysis ofInterleavings from

Independent Statements

Issue: Aliasing

x = 1 *p = 3

Are these two statements independent?

Depends…

Potential Solution: Layered analysis (Ball, Rajamani - PLDI 2001)

Potential Problem: Information from later analyses may be needed or useful in previous analyses

ModelExtraction

ModelChecking

PointerAnalysis

PropertiesProgram

Experience

• Program analysis style• Has been used for very detailed

properties• Analysis time issues limit to tiny

programs• Explicit model extraction/model checking

style• Still exploring how to work for software in

general, not just multithreaded programs• No special technology required for

multithreaded programs (at first …)

Expectations

In principle, approach should be quite useful

• Multithreaded programs typically have sparse interaction patterns

• Just not obvious from code• Need some way to target tool to only

those that can actually occur/are interesting

• Pointer preanalysis seems like promising approach

Application to safety problems

• Deadlock detection• Variety of existing approaches• Complex programs can have very

simple synchronization behavior• Ripe for model extraction/model

checking• Data race detection

• More complicated problem• Largely unsolved• Very important in practice

Why data races are so important

• Inadvertent atomicity violations• Timing-dependent data structure corruption• Nondeterministic, irreproducible failures

• Architecture effects• Data races expose weak memory consistency

models• Destroy abstraction of single shared memory

• Compiler optimization effects• Data races expose effect of standard optimizations• Compiler can change meaning of program

• Analysis complications

Atomicity Violations

class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {

list n = new list(i);n.next = head;head = n;length++;

}}

1length

head

4

Atomicity Violations

class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {

list n = new list(i);n.next = head;head = n;length++;

}}

1length

head

4

insert(5)

insert(6)

||

Atomicity Violations

class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {

list n = new list(i);n.next = head;head = n;length++;

}}

1length

head

4

insert(5)

insert(6)

||

5

6

Atomicity Violations

class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {

list n = new list(i);n.next = head;head = n;length++;

}}

1length

head

4

insert(5)

insert(6)

||

5

6

Atomicity Violations

class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {

list n = new list(i);n.next = head;head = n;length++;

}}

1length

head

4

insert(5)

insert(6)

||

5

6

Atomicity Violations

class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {

list n = new list(i);n.next = head;head = n;length++;

}}

2length

head

4

insert(5)

insert(6)

||

5

6

Atomicity Violations

class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {

list n = new list(i);n.next = head;head = n;length++;

}}

3length

head

4

insert(5)

insert(6)

||

5

6

Atomicity Violation Solution

class list { static int length=0;static list head = null;list next; int value;

Synchronized

static void insert(int i) { list n = new list(i);n.next = head;head = n;length++;

}}

2length

head

4

insert(5)

insert(6)

||

5

6

Analysis Complications

Analysis unsound if does not take effect of data races into account

• Desirable to analyze program at granularity of atomic operations• Reduces state space• Required to extract interesting properties

• But must verify that operations are atomic!• Complicated analysis problem

• Extract locking protocol• Verify that program obeys protocol

Architecture Effects

Weak Memory Consistency Models

y=0

x=1

z = x+y

Initially:y=1x=0

Thread 2Thread 1

What is value of z?

y=0

x=1

z = x+y

Initially:y=1x=0

Thread 2Thread 1

What is value of z?

y=0

x=1

z = x+y y=0

x=1

z = x+y

y=0

x=1

z = x+y

z = 1 z = 0

z = 1

Three Interleavings

y=0

x=1

z = x+y

Initially:y=1x=0

Thread 2Thread 1

What is value of z?

y=0

x=1

z = x+y y=0

x=1

z = x+y

y=0

x=1

z = x+y

z = 1 z = 0

z = 1

Three Interleavings

z can be 0 or 1

y=0

x=1

z = x+y

Initially:y=1x=0

Thread 2Thread 1

What is value of z?

y=0

x=1

z = x+y y=0

x=1

z = x+y

y=0

x=1

z = x+y

z = 1 z = 0

z = 1

Three Interleavings

z can be 0 or 1INCORRECT

REASONING!

y=0

x=1

z = x+y

Initially:y=1x=0

Thread 2Thread 1

What is value of z?

z can be 0 or 1 OR 2!

Memory system can reorder writes as long as it preserves

illusion of sequential executionwithin each thread!

z = x+y

y=0

x=1

Different threads can observedifferent orders!

Analysis Complications

• Interleaving semantics is incorrect• No soundness guarantee for current

analyses• Formal semantics of weak memory

consistency models still under developmentMaessen, Arvind, Shen – OOPSLA 2000Manson, Pugh – Java Grande/ISCOPE 2001

• Unclear how to prove ANY analysis sound…

• State space is larger than one might think• Complicates state space exploration• Complicates human reasoning

How does one write a correct program?

y=0

z = x+y

Initially:y=1x=0

Thread 2Thread 1

What is value of z?

Operations not reorderedacross synchronizations

x=1

If synchronization separates conflicting actions from

parallel threads

Then reorderings not visible

Race-free programs can useinterleaving semantics

z is 1

lock(l)

unlock(l)

lock(l)

unlock(l)

Compiler Optimization Effects

• Standard optimizations assume single thread

• With interleaving semantics, optimizations may change meaning of program

• Even if only apply optimizations within serial parts of program!• Superset of reordering effects • Midkiff, Padua – ICPP 1990

Options

• Rethink and reimplement all compilers• Lee, Padua, Midkiff – PPoPP 1999

• Transform program to restore sequential memory consistency model• Shasha, Snir – TOPLAS 1998• Lee, Padua – PACT 2000

• No optimizations across synchronizations• Java memory model (Pugh - JavaGrande

1999)• Semantics no longer interleaving semantics

Program AnalysisAnalyze program, verify absence of data races

• Appealing option• Unlikely to be feasible for full range of

programs• Reconstruct association between locks, data

that they protect, threads that access data•Dynamic object and thread creation•References and pointers•Diversity of locking protocols

• Whole-program analysis• Exception:

simple activity management programs

Eliminate races at language level

• Type system formalizes sharing patterns • Check accesses properly synchronized• Not as difficult as fully automatic approach

• Separate analysis of each module• No need to reconstruct locking protocol• Types provide locking information

• Limits sharing patterns program can use• Key question: Is limitation worth benefit?

• Depends on expressiveness, flexibility, intrusiveness, perceived value of system

Standard Sharing Patterns for Activity Management Programs

Private data - single thread ownership

Mutual exclusion datalock protects data, acquire lock to get ownership

Migrating dataOwnership moves between threads in response to data structure insertions and removals

Published data - distributed for read-only access

General Principle of Ownership

• Formalize as ownership relation• Relation between data items and

threads• Basic requirement for reads

• When a thread reads a data item• Must own item (but can share

ownership with other threads)• Basic requirement for writes

• When a thread writes data item• Must be sole owner of item

Typical Actions to Change Ownership

Object creation (creator owns new object)Synchronization operations

Lock acquire (acquire data that lock protects)Lock release (release data)Similarly for post/wait, Ada accept, …

Thread creation (thread inherits data from parent)

Thread termination (parent gets data back)Unique reference acquisition and release

(acquire or release referenced data)

Proposed Systems

• Monitors + copy in/copy out• Concurrent Pascal (Brinch Hansen TSE 1975)• Guava (Bacon, Strom, Tarafdar – OOPSLA

2000)

• Mutual exclusion data + private data• Flanagan, Abadi – ESOP 2000• Flanagan, Freund – PLDI 2000

• Mutual exclusion data + private data + linear/ownership types• de Line, Fahndrich – PLDI 2001• Boyapati, Rinard – OOPSLA 2001

Thread + Private Data

•Private data identified as such in type system

•Type system ensures reachable only from•Local variables•Other private data

Lock + Shared Data•Type system identifies correspondence

•Type system ensures• Threads hold lock

when access data• Data accessible only

from other data protected by same lock

Copy model of communicati

on

Basic Approach

Thread + Private Data

•Private data identified as such in type system

•Type system ensures reachable only from•Local variables•Other private data

Lock + Shared Data•Type system identifies correspondence

•Type system ensures• Threads hold lock

when access data• Data accessible only

from other data protected by same lock

Type system ensures at most one

reference to this object

Extension: Unique References

Thread + Private Data

•Private data identified as such in type system

•Type system ensures reachable only from•Local variables•Other private data

Lock + Shared Data•Type system identifies correspondence

•Type system ensures• Threads hold lock

when access data• Data accessible only

from other data protected by same lock

Step One:Grab Lock

Extension: Unique References

Thread + Private Data

•Private data identified as such in type system

•Type system ensures reachable only from•Local variables•Other private data

Lock + Shared Data•Type system identifies correspondence

•Type system ensures• Threads hold lock

when access data• Data accessible only

from other data protected by same lock

Step One:Grab Lock

Extension: Unique References

Thread + Private Data

•Private data identified as such in type system

•Type system ensures reachable only from•Local variables•Other private data

Lock + Shared Data•Type system identifies correspondence

•Type system ensures• Threads hold lock

when access data• Data accessible only

from other data protected by same lock

Step Two:Transfer Reference

Extension: Unique References

Thread + Private Data

•Private data identified as such in type system

•Type system ensures reachable only from•Local variables•Other private data

Lock + Shared Data•Type system identifies correspondence

•Type system ensures• Threads hold lock

when access data• Data accessible only

from other data protected by same lock

Step Three:Release Lock

Extension: Unique References

Thread + Private Data

•Private data identified as such in type system

•Type system ensures reachable only from•Local variables•Other private data

Lock + Shared Data•Type system identifies correspondence

•Type system ensures• Threads hold lock

when access data• Data accessible only

from other data protected by same lock

Result:Transferred

ObjectOwnership

Relation Changes Over Time

Extension: Unique References

Prospects

• Remaining challenge: general data structures• Objects with multiple references• Ownership changes correlated with

movements between to data structures• Recognize insertions and deletions

• Language-level solutions are the way to go for activity management programs• Tractable for typical sharing patterns• Big impact in practice

Benefits of ownership formalization

• Identification of atomic regions• Weak memory invisible to programmer• Enables coarse-grain program analysis

• Promote lots of new and interesting analyses• Component interaction analyses• Object propagation analyses

• Better understanding of software structure• Analysis and transformation• Software engineering

What about parallel computing programs?

Parallel Computing Sharing Patterns

Specialized Sharing Patterns• Unsynchronized accesses to disjoint

regions of a single aggregate structure• Threads update disjoint regions of

array• Threads update disjoint subtrees

• Generalized reductions• Commuting updates• Reduction trees

Parallel Computing Prospects

• No language-level solution likely to be feasible

• Race freedom depends on arbitrarily complicated properties of updated data structures

• Impact of data races not as large • Parallelism confined to specific algorithms

• Range of targeted analysis algorithms• Parallel loops with dense matrices• Divide and conquer programs• Generalized reduction recognition

Future Directions

Integrating Specifications

• Past focus: discovering properties• Future focus: verifying properties• Understanding atomicity structure crucial

• Assume race-free programs• Type system or previous analysis

• Enable Owicki/Gries style verification• Assume property holds• Show that each atomic action preserves

it• Consider only actions that affect property

Failure Containment

• Threads as unit of partial failure• Partial executions of failed atomic

actions• Rollback mechanism• Optimization opportunity

• New analyses and transformations• Failure propagation analysis• Failure response transformations

Model Checking

• Avalanche of model checking research• Layered analyses for model extraction• Flow-insensitive pointer analysis

• Initial focus on control problems• Deadlock detection• Operation sequencing constraints• Checking finite-state properties

Steps towards practicality

• Java threads prompt experimentation• Threads as standard part of safe

language• Available multithreaded benchmarks• Open Java implementation platforms

• More implementations• Interprocedural analyses• Scalability emerges as key concern• Directs analyses to relevant problems

Summary• Multithreaded programs common and important• Two kinds of multithreaded programs

• Parallel computing programs• Activity management programs

• Data races as key analysis problem• Programming errors• Complicate analysis and transformation

• Different solutions for different programs• Language solution for activity management• Targeted analyses for parallel computing

• Future directions – specifications, failure containment, model checking, practical implementations

top related