Download - Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science

Pointer Analysis for Pointer Analysis for Multithreaded Multithreaded

ProgramsProgramsRadu Rugina and Martin

Rinard M I T Laboratory for

Computer Science

OutlineOutline

• Example• Review of Pointer Analysis

for Sequential Programs• Pointer Analysis for

Multithreaded Programs• Experimental Results• Conclusions

PLDI 99R. Rugina, M. Rinard

ExampleExample

• two concurrent threads

• two questions:Q1 : what location is written by

*p=1 ?Q2: what location is written by

*p=2 ? OR :

Q1: p? in left thread Q2: p? after both threads

completed


parend

parbegin

*q = &y*p = 1

*p = 2

p = &x;q = &p;

Two Possible ExecutionsTwo Possible Executions


*q = &y

*p = 1

*p = 2

p = &x;q = &p;

p x

p y

*p = 1

*q = &y

*p = 2

p = &x;q = &p;

p y

p y

Analysis ResultAnalysis Result

x

parend

parbegin

x

y

*q = &y*p = 1

*p = 2

p = &x;q = &p;

pq

ypq

pq y

pq x

pq x

pq y


Result = a points-to graph at each program point

=

Analysis of Multithreaded Analysis of Multithreaded ProgramsPrograms

• Problem:– analyze interactions between concurrent

threads

• Straightforward solution:– analyze all possible interleavings and

merge results– fails because of exponential complexity

– for n threads with s1 , ... , sn statements :

Number of interleavings =

s1 + ... + sn

s1 , ... , sn

s1 ! ... sn !

(s1 + ... + sn) !

( )


Our ApproachOur Approach• We introduce interference

information :– interference = points-to edges created by the

other concurrent threads– models the effect of “all possible interleavings”

• Efficiency: polynomial complexity in program size

• Derive dataflow equations :– recursive equations– fixed-point algorithms to solve the equations– theoretically less precise than “all interleavings”– in practice : no loss of precision


Algorithm OverviewAlgorithm Overview

• intra-procedural:– flow-sensitive (dataflow

analysis)– handles unstructured flow of

control– defines dataflow equations for:

• pointer assignments• parallel constructs

• inter-procedural :– context-sensitive– handles recursive functions


Review of Review of Pointer Analysis for Pointer Analysis for

Sequential ProgramsSequential Programs

Points-to GraphsPoints-to Graphs

• Points-to graphs [EGH94]

– nodes = program variables– edges = points-to relationships

• Example :

px

yq


Basic Pointer AssignmentsBasic Pointer Assignments

• Four types of pointer assignments:• x = &y ( address-of assign )• x = y ( copy assign ) • x = *y ( load assign )• *x = y ( store assign )

• More complex assignments:– transformed into a sequence of basic

statements tmp = &t; *z = tmp;

tmp = &t; *z = tmp;*z = &t;*z = &t;


Generated EdgesGenerated Edges

y t u

z

load: x = *y

y t

w

store: *x = y

x

y

z

address-of: x = &y

y t

z

copy: x = y

x z

x x


Strong vs. Weak UpdatesStrong vs. Weak Updates

• strong updates : – kill existing points-to

relationships– result in more precise analysis

results

• weak updates : – leave existing points-to edges

in place– reasons for weak updates:

• control flow uncertainty:

• arrays of pointers :

• heap-allocated pointers :

if (cond) p = &q; else p = &r;

*p = &x;

v[i] = &x;

p = malloc( sizeof(int*) ) *p = &x;

p

q

r

y

z

x


Dataflow InformationDataflow Information


address-of: x = &y

gen = { (x, y) }

kill = { (x, z) | (x, z) C }

strong = not (array_elem(x) heap(x))

copy: x = y

gen = { (x, t) | (y, t) C }

kill = { (x, z) | (x, z) C }


load: x = *y

gen = { (x, u) | (y, t) C (t, u) C }

kill = { (x, z) | (x, z) C }


store: *x = y

gen = { (z, t) | (x, z) C (y, t) C }

kill = { (z, w) | (x, z) C (z, w) C }

strong = { z | (x, z) C } = {v}

not (array_elem(v)

heap(v))

– the dataflow information is : <C, I, E> P3

• C = the current points-to relationships• I = the interference information from

other threads• E = edges created by the current thread

– as a set of edges, P3 is a lattice:• partial order relation = set inclusion• merge operator = set union <C1,I1,E1> <C2,I2,E2> = <C1UC2 , I1UI2, E1UE2>

Dataflow AnalysisDataflow Analysis


• P = set of points-to graphs, • Stat = set of program statements

• abstract semantics is defined by a functional :

: Stat (P3 P3)

Abstract InterpretationAbstract Interpretation


= <C’, I’, E’> = stmt <C, I, E>

C’ =

I’ = I E’ = E U gen

(C-kill) U gen U I if strong C U gen U I if strongstmt

<C, I, E>

stmt <C, I, E>

Parallel Parallel parpar Statements Statements• syntax:

par { {t1}, ..., {tn} } – concurrent execution– interleaving semantics– may be nested

• interference:– is the union of points-

to edges created by all other concurrent threads

– may be different for different concurrent threads

interference

Concurrentthreads

Analyzedthread

titi-1ti+1 tnt1

......


• Interference information:– I = “global” interference - generated

by enclosing par’s

– Li =“local” interference - generated by current par

– E = points-to edges created by the current thread

• Analysis result for thread ti :

Analysis of Individual Analysis of Individual ThreadsThreads

< Ci’ , Ii , Ei > = ti < Ci , Ii , >

Ii = I Li

Ci = C Li

< Ci’ , Ii , Ei > = ti < Ci , Ii , >

Ii = I Li

Ci = C Li

parbegin

Li = U Ekk i

<Ci , I

i , >

<Ci’, I

i , E

i >

< C, I, E>

tk, k i

ti

parend


Li = U Ekk i

Analysis result :

ParendParend Analysis Analysis

< C’, I’ , E’ > = par < C, I , E > < Ci’ , Ii , Ei > = ti < Ci , Ii , >

I’ = IE’ = E ( Ei) C’ = Ci’

< C’, I’ , E’ > = par < C, I , E > < Ci’ , Ii , Ei > = ti < Ci , Ii , >

I’ = IE’ = E ( Ei) C’ = Ci’

parbegin

parend

Li = U Ekk i

<Ci , I

i , >

<Ci’, I

i , E

i >

< C’, I, E’>

< C, I, E>

tk , k i

ti


Recursive dataflow equations :

Analysis of Entire Analysis of Entire parpar ConstructConstruct

Ci = C Li Ii = I Li

< Ci’ , Ii , Ei > = ti < Ci , Ii , >(thread rule)

E’ = E ( Ei) C’ = Ci’

< C’, I , E’ > = par < C, I , E >( par rule )

Ci = C Li Ii = I Li

< Ci’ , Ii , Ei > = ti < Ci , Ii , >(thread rule)

E’ = E ( Ei) C’ = Ci’

< C’, I , E’ > = par < C, I , E >( par rule )

information flowing INTO par

construct

information flowing OUT of

par construct

parbegin

parend

Li = U Ekk i

<Ci , I

i , >

<Ci’, I

i , E

i >

< C’, I, E’>

< C, I, E>

tk , k i

ti


Li = U E

kk i

Example AnalysisExample Analysis


x

p a r e n d

p a r b e g i n

x

y

* q = & y* p = 1

* p = 2

p = & x ;q = & p ;

pq

ypq

pq y

pq x

pq x

pq y

, ,<

, , ><

, ,<

, ,<

, ><

< , >

,

,

p y

p y >p y

pq x >

x

ypq >

Inter-Procedural AnalysisInter-Procedural Analysis• Context-sensitive :

– procedures re-analyzed at each call site

• Ghost variables: – replace variables not in the scope of the procedure– distinguish locals in different activations of recursive

functions

• Sequential Partial Transfer Functions (Seq-PTFs) [WL95]

– associate a points-to output graph to an input context

– can be reused when there is a match for the input context

C1 O1Seq-PTF1

C2 O2

Input Space( P )

Output Space( P )

Seq-PTF2


Multithreaded ExtensionsMultithreaded Extensions

• Multithreaded Input Context = input points-to information

+ interference information

• Multithreaded PTF = = associates output points-to graph

+ created edges to an input context

• Mapping and unmapping :– map the interference information I– unmap created points-to edges E

C1,I1 O1,E1Par-PTF1

C2,I2 O2,E2

Input Space( P2 )

Output Space( P2 )

Par-PTF2


Other Parallel ConstructsOther Parallel Constructs• Parallel for loops

– generate a symmetric dataflow equation:

t1 < C U E1, I U E1 , > = < C1’ , I U E1 , E1 >

t1 < C U E1, I U E1 , > = < C1’ , I U E1 , E1 >

C’ = (Ci’ U Ci )C’ = (Ci’ U Ci )

for(i=0; i<n; i++) spawn thread(i);sync;

for(i=0; i<n; i++) spawn thread(i);sync;

if (c1) spawn thread1();if (c2) spawn thread2();sync;

if (c1) spawn thread1();if (c2) spawn thread2();sync;

• Conditional Thread Creation– merge analysis result with initial points-to graph


Advanced FeaturesAdvanced Features• Recursive procedures:

– result in recursive dataflow equations– fixed-point algorithm to solve recursion

• Function pointers:– result in a dynamic call-graph– handled using the computed pointer information– methodology: analyze all possible callees and

merge results

• Thread-private global variables:– at parbegin nodes: save their values in the parent

thread and make them point to unknown in the child threads

– at parend nodes: restore saved values in the parent thread


Algorithm EvaluationAlgorithm Evaluation• Soundness :

– the multithreaded algorithm conservatively approximates all possible interleavings of concurrent threads’ statements

• Termination of fixed-point algorithms:– follows from the monotonicity of the abstract semantics

functional

• Complexity of fixed-point algorithms:– worst-case size of points-to graphs: O(n2), where n = | Stat |

– n program points imply worst-case O(n3) iterations– worst-case polynomial complexity: O(n4)

• Precision of analysis:– if the concurrent threads do not (pointer-)interfere then

thisalgorithm gives the same result as the “ideal algorithm”


Experimental ResultsExperimental Results• Implementation :

– SUIF infrastructure; Cilk benchmarks

• Benchmark characteristics :Program

Linesof

Code

PointerLocation

Sets

Number ofpar

constructs

Mean Number ofiterations perpar construct

Description

Barnes 1149 125 12 2.00 Barnes-Hut N-body SimulationBlock 342 9 13 1.00 Blocked Matrix MultiplyCholesky 932 29 109 1.83 Sparse Cholesky FactorizationCilksort 499 14 8 1.00 Parallel MergesortCk 505 38 3 1.00 Checkers ProgramFft 3255 335 182 1.73 Fast Fourier TransformFib 53 0 1 1.00 Fibonacci CalculationGame 195 8 3 1.00 Simple GameHeat 360 12 8 1.62 Heat Diffusion on MeshKnapsack 122 6 1 1.00 Knapsack, Branch and BoundKnary 114 0 1 1.00 Synthetic BenchmarkLu 594 13 10 1.00 Heat Diffusion on MeshMagic 965 74 24 1.00 Magic SquaresMol 4478 387 99 1.18 Viral Protein SimulationNotemp 341 6 15 1.00 Blocked Matrix MultiplyPousse 1379 118 9 1.22 Pousse Game ProgramQueens 106 3 8 2.25 N Queens ProgramSpace 458 13 15 1.00 Blocked matrix multiply


Precision MeasurementsPrecision Measurements• Pointer values at load/store:

– usually unique target:83 % of the loads

88 % of the stores

– few potentially uninitialized pointers

– very few pointers with more than two targets

• Comparison :– Multithreaded, Interleaved,

Sequential: MT Interleaved

Seq– results: Multithreaded = Sequential

– conclusion: Multithreaded = Interleaved


0

3000

6000

9000

12000

1 2 3 4

Number of Target Location Sets

Num

ber

of L

oad

Inst

ruct

ions

Definitely Initialized

Potentially Uninitialized

0

1000

2000

3000

4000

1 2 3 4

Number of Target Location Sets

Num

ber

of S

tore

Inst

ruct

ions Definitely Initialized

Potentially Uninitialized

ApplicationsApplications

• Current Uses: – MIT RAW project

• memory disambiguation for static promotion (ISCA 99)

• C-to-silicon compiler generating small memories (FCCM 99)

– automatic parallelization of divide-and-conquer algorithms (PPoPP 99)

• Future Uses:– data race detection in multithreaded

programs– static elimination of array bounds checks


FutureFuture

• Multithreaded programs:– are becoming very common– are hard to debug– are hard to analyze

• The current algorithm:– gives precise MT pointer information – may be used as a foundation for other

MT analyses– gives a framework for other MT analyses


Additional SlidesAdditional Slides

• Applications Heavily Optimized By Hand– Pousse - timed competition, won ICFP ‘98 contest

• Pointer Arithmetic• Casts• Divide and Conquer Algorithms

– Recursion– Pointers Into Heap-Allocated Arrays– Pointer-Based Data Structures (octrees, hash

tables, ...)– Recursive Linked Data Structures Allocated On

Stack

Challenging Benchmark Challenging Benchmark SetSet


Related WorkRelated Work• Pointer analysis

– existing pointer analyses are focused to sequential programs

[LR92], [LRZ93], [CBC93], [EGH94], [Ruf95], [WL95], [And94], [Ste96], [SH97]

– flow-sensitive vs. flow-insensitive analysis– context-sensitive vs. context-insensitive analysis

• Multithreaded program analysis:– relatively unexplored field– flow-sensitive analysis :

• dataflow framework for bitvector problems [KSV96]

• does not apply to pointer analysis

– flow-insensitive analysis: • trivially model the interleaving semantics of concurrent

threads• locality analysis [ZH97] ( uses type-inference techniques)


Download - Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science

Top Related