Pointer Analysis for Pointer Analysis for Multithreaded Multithreaded
ProgramsProgramsRadu Rugina and Martin
Rinard M I T Laboratory for
Computer Science
OutlineOutline
• Example• Review of Pointer Analysis
for Sequential Programs• Pointer Analysis for
Multithreaded Programs• Experimental Results• Conclusions
PLDI 99R. Rugina, M. Rinard
ExampleExample
• two concurrent threads
• two questions:Q1 : what location is written by
*p=1 ?Q2: what location is written by
*p=2 ? OR :
Q1: p? in left thread Q2: p? after both threads
completed
PLDI 99R. Rugina, M. Rinard
parend
parbegin
*q = &y*p = 1
*p = 2
p = &x;q = &p;
Two Possible ExecutionsTwo Possible Executions
PLDI 99R. Rugina, M. Rinard
*q = &y
*p = 1
*p = 2
p = &x;q = &p;
p x
p y
*p = 1
*q = &y
*p = 2
p = &x;q = &p;
p y
p y
Analysis ResultAnalysis Result
x
parend
parbegin
x
y
*q = &y*p = 1
*p = 2
p = &x;q = &p;
pq
ypq
pq y
pq x
pq x
pq y
PLDI 99R. Rugina, M. Rinard
Result = a points-to graph at each program point
=
Analysis of Multithreaded Analysis of Multithreaded ProgramsPrograms
• Problem:– analyze interactions between concurrent
threads
• Straightforward solution:– analyze all possible interleavings and
merge results– fails because of exponential complexity
– for n threads with s1 , ... , sn statements :
Number of interleavings =
s1 + ... + sn
s1 , ... , sn
s1 ! ... sn !
(s1 + ... + sn) !
( )
PLDI 99R. Rugina, M. Rinard
Our ApproachOur Approach• We introduce interference
information :– interference = points-to edges created by the
other concurrent threads– models the effect of “all possible interleavings”
• Efficiency: polynomial complexity in program size
• Derive dataflow equations :– recursive equations– fixed-point algorithms to solve the equations– theoretically less precise than “all interleavings”– in practice : no loss of precision
PLDI 99R. Rugina, M. Rinard
Algorithm OverviewAlgorithm Overview
• intra-procedural:– flow-sensitive (dataflow
analysis)– handles unstructured flow of
control– defines dataflow equations for:
• pointer assignments• parallel constructs
• inter-procedural :– context-sensitive– handles recursive functions
PLDI 99R. Rugina, M. Rinard
Review of Review of Pointer Analysis for Pointer Analysis for
Sequential ProgramsSequential Programs
Points-to GraphsPoints-to Graphs
• Points-to graphs [EGH94]
– nodes = program variables– edges = points-to relationships
• Example :
px
yq
PLDI 99R. Rugina, M. Rinard
Basic Pointer AssignmentsBasic Pointer Assignments
• Four types of pointer assignments:• x = &y ( address-of assign )• x = y ( copy assign ) • x = *y ( load assign )• *x = y ( store assign )
• More complex assignments:– transformed into a sequence of basic
statements tmp = &t; *z = tmp;
tmp = &t; *z = tmp;*z = &t;*z = &t;
PLDI 99R. Rugina, M. Rinard
Generated EdgesGenerated Edges
y t u
z
load: x = *y
y t
w
store: *x = y
x
y
z
address-of: x = &y
y t
z
copy: x = y
x z
x x
PLDI 99R. Rugina, M. Rinard
Strong vs. Weak UpdatesStrong vs. Weak Updates
• strong updates : – kill existing points-to
relationships– result in more precise analysis
results
• weak updates : – leave existing points-to edges
in place– reasons for weak updates:
• control flow uncertainty:
• arrays of pointers :
• heap-allocated pointers :
if (cond) p = &q; else p = &r;
*p = &x;
v[i] = &x;
p = malloc( sizeof(int*) ) *p = &x;
p
q
r
y
z
x
PLDI 99R. Rugina, M. Rinard
Dataflow InformationDataflow Information
PLDI 99R. Rugina, M. Rinard
address-of: x = &y
gen = { (x, y) }
kill = { (x, z) | (x, z) C }
strong = not (array_elem(x) heap(x))
copy: x = y
gen = { (x, t) | (y, t) C }
kill = { (x, z) | (x, z) C }
strong = not (array_elem(x) heap(x))
load: x = *y
gen = { (x, u) | (y, t) C (t, u) C }
kill = { (x, z) | (x, z) C }
strong = not (array_elem(x) heap(x))
store: *x = y
gen = { (z, t) | (x, z) C (y, t) C }
kill = { (z, w) | (x, z) C (z, w) C }
strong = { z | (x, z) C } = {v}
not (array_elem(v)
heap(v))
– the dataflow information is : <C, I, E> P3
• C = the current points-to relationships• I = the interference information from
other threads• E = edges created by the current thread
– as a set of edges, P3 is a lattice:• partial order relation = set inclusion• merge operator = set union <C1,I1,E1> <C2,I2,E2> = <C1UC2 , I1UI2, E1UE2>
Dataflow AnalysisDataflow Analysis
PLDI 99R. Rugina, M. Rinard
• P = set of points-to graphs, • Stat = set of program statements
• abstract semantics is defined by a functional :
: Stat (P3 P3)
Abstract InterpretationAbstract Interpretation
PLDI 99R. Rugina, M. Rinard
= <C’, I’, E’> = stmt <C, I, E>
C’ =
I’ = I E’ = E U gen
(C-kill) U gen U I if strong C U gen U I if strongstmt
<C, I, E>
stmt <C, I, E>
Parallel Parallel parpar Statements Statements• syntax:
par { {t1}, ..., {tn} } – concurrent execution– interleaving semantics– may be nested
• interference:– is the union of points-
to edges created by all other concurrent threads
– may be different for different concurrent threads
interference
Concurrentthreads
Analyzedthread
titi-1ti+1 tnt1
......
PLDI 99R. Rugina, M. Rinard
• Interference information:– I = “global” interference - generated
by enclosing par’s
– Li =“local” interference - generated by current par
– E = points-to edges created by the current thread
• Analysis result for thread ti :
Analysis of Individual Analysis of Individual ThreadsThreads
< Ci’ , Ii , Ei > = ti < Ci , Ii , >
Ii = I Li
Ci = C Li
< Ci’ , Ii , Ei > = ti < Ci , Ii , >
Ii = I Li
Ci = C Li
parbegin
Li = U Ekk i
<Ci , I
i , >
<Ci’, I
i , E
i >
< C, I, E>
tk, k i
ti
parend
PLDI 99R. Rugina, M. Rinard
Li = U Ekk i
Analysis result :
ParendParend Analysis Analysis
< C’, I’ , E’ > = par < C, I , E > < Ci’ , Ii , Ei > = ti < Ci , Ii , >
I’ = IE’ = E ( Ei) C’ = Ci’
< C’, I’ , E’ > = par < C, I , E > < Ci’ , Ii , Ei > = ti < Ci , Ii , >
I’ = IE’ = E ( Ei) C’ = Ci’
parbegin
parend
Li = U Ekk i
<Ci , I
i , >
<Ci’, I
i , E
i >
< C’, I, E’>
< C, I, E>
tk , k i
ti
PLDI 99R. Rugina, M. Rinard
Recursive dataflow equations :
Analysis of Entire Analysis of Entire parpar ConstructConstruct
Ci = C Li Ii = I Li
< Ci’ , Ii , Ei > = ti < Ci , Ii , >(thread rule)
E’ = E ( Ei) C’ = Ci’
< C’, I , E’ > = par < C, I , E >( par rule )
Ci = C Li Ii = I Li
< Ci’ , Ii , Ei > = ti < Ci , Ii , >(thread rule)
E’ = E ( Ei) C’ = Ci’
< C’, I , E’ > = par < C, I , E >( par rule )
information flowing INTO par
construct
information flowing OUT of
par construct
parbegin
parend
Li = U Ekk i
<Ci , I
i , >
<Ci’, I
i , E
i >
< C’, I, E’>
< C, I, E>
tk , k i
ti
PLDI 99R. Rugina, M. Rinard
Li = U E
kk i
Example AnalysisExample Analysis
PLDI 99R. Rugina, M. Rinard
x
p a r e n d
p a r b e g i n
x
y
* q = & y* p = 1
* p = 2
p = & x ;q = & p ;
pq
ypq
pq y
pq x
pq x
pq y
, ,<
, , ><
, ,<
, ,<
, ><
< , >
,
,
p y
p y >p y
pq x >
x
ypq >
Inter-Procedural AnalysisInter-Procedural Analysis• Context-sensitive :
– procedures re-analyzed at each call site
• Ghost variables: – replace variables not in the scope of the procedure– distinguish locals in different activations of recursive
functions
• Sequential Partial Transfer Functions (Seq-PTFs) [WL95]
– associate a points-to output graph to an input context
– can be reused when there is a match for the input context
C1 O1Seq-PTF1
C2 O2
Input Space( P )
Output Space( P )
Seq-PTF2
PLDI 99R. Rugina, M. Rinard
Multithreaded ExtensionsMultithreaded Extensions
• Multithreaded Input Context = input points-to information
+ interference information
• Multithreaded PTF = = associates output points-to graph
+ created edges to an input context
• Mapping and unmapping :– map the interference information I– unmap created points-to edges E
C1,I1 O1,E1Par-PTF1
C2,I2 O2,E2
Input Space( P2 )
Output Space( P2 )
Par-PTF2
PLDI 99R. Rugina, M. Rinard
Other Parallel ConstructsOther Parallel Constructs• Parallel for loops
– generate a symmetric dataflow equation:
t1 < C U E1, I U E1 , > = < C1’ , I U E1 , E1 >
t1 < C U E1, I U E1 , > = < C1’ , I U E1 , E1 >
C’ = (Ci’ U Ci )C’ = (Ci’ U Ci )
for(i=0; i<n; i++) spawn thread(i);sync;
for(i=0; i<n; i++) spawn thread(i);sync;
if (c1) spawn thread1();if (c2) spawn thread2();sync;
if (c1) spawn thread1();if (c2) spawn thread2();sync;
• Conditional Thread Creation– merge analysis result with initial points-to graph
PLDI 99R. Rugina, M. Rinard
Advanced FeaturesAdvanced Features• Recursive procedures:
– result in recursive dataflow equations– fixed-point algorithm to solve recursion
• Function pointers:– result in a dynamic call-graph– handled using the computed pointer information– methodology: analyze all possible callees and
merge results
• Thread-private global variables:– at parbegin nodes: save their values in the parent
thread and make them point to unknown in the child threads
– at parend nodes: restore saved values in the parent thread
PLDI 99R. Rugina, M. Rinard
Algorithm EvaluationAlgorithm Evaluation• Soundness :
– the multithreaded algorithm conservatively approximates all possible interleavings of concurrent threads’ statements
• Termination of fixed-point algorithms:– follows from the monotonicity of the abstract semantics
functional
• Complexity of fixed-point algorithms:– worst-case size of points-to graphs: O(n2), where n = | Stat |
– n program points imply worst-case O(n3) iterations– worst-case polynomial complexity: O(n4)
• Precision of analysis:– if the concurrent threads do not (pointer-)interfere then
thisalgorithm gives the same result as the “ideal algorithm”
PLDI 99R. Rugina, M. Rinard
Experimental ResultsExperimental Results• Implementation :
– SUIF infrastructure; Cilk benchmarks
• Benchmark characteristics :Program
Linesof
Code
PointerLocation
Sets
Number ofpar
constructs
Mean Number ofiterations perpar construct
Description
Barnes 1149 125 12 2.00 Barnes-Hut N-body SimulationBlock 342 9 13 1.00 Blocked Matrix MultiplyCholesky 932 29 109 1.83 Sparse Cholesky FactorizationCilksort 499 14 8 1.00 Parallel MergesortCk 505 38 3 1.00 Checkers ProgramFft 3255 335 182 1.73 Fast Fourier TransformFib 53 0 1 1.00 Fibonacci CalculationGame 195 8 3 1.00 Simple GameHeat 360 12 8 1.62 Heat Diffusion on MeshKnapsack 122 6 1 1.00 Knapsack, Branch and BoundKnary 114 0 1 1.00 Synthetic BenchmarkLu 594 13 10 1.00 Heat Diffusion on MeshMagic 965 74 24 1.00 Magic SquaresMol 4478 387 99 1.18 Viral Protein SimulationNotemp 341 6 15 1.00 Blocked Matrix MultiplyPousse 1379 118 9 1.22 Pousse Game ProgramQueens 106 3 8 2.25 N Queens ProgramSpace 458 13 15 1.00 Blocked matrix multiply
PLDI 99R. Rugina, M. Rinard
Precision MeasurementsPrecision Measurements• Pointer values at load/store:
– usually unique target:83 % of the loads
88 % of the stores
– few potentially uninitialized pointers
– very few pointers with more than two targets
• Comparison :– Multithreaded, Interleaved,
Sequential: MT Interleaved
Seq– results: Multithreaded = Sequential
– conclusion: Multithreaded = Interleaved
PLDI 99R. Rugina, M. Rinard
0
3000
6000
9000
12000
1 2 3 4
Number of Target Location Sets
Num
ber
of L
oad
Inst
ruct
ions
Definitely Initialized
Potentially Uninitialized
0
1000
2000
3000
4000
1 2 3 4
Number of Target Location Sets
Num
ber
of S
tore
Inst
ruct
ions Definitely Initialized
Potentially Uninitialized
ApplicationsApplications
• Current Uses: – MIT RAW project
• memory disambiguation for static promotion (ISCA 99)
• C-to-silicon compiler generating small memories (FCCM 99)
– automatic parallelization of divide-and-conquer algorithms (PPoPP 99)
• Future Uses:– data race detection in multithreaded
programs– static elimination of array bounds checks
PLDI 99R. Rugina, M. Rinard
FutureFuture
• Multithreaded programs:– are becoming very common– are hard to debug– are hard to analyze
• The current algorithm:– gives precise MT pointer information – may be used as a foundation for other
MT analyses– gives a framework for other MT analyses
PLDI 99R. Rugina, M. Rinard
Additional SlidesAdditional Slides
• Applications Heavily Optimized By Hand– Pousse - timed competition, won ICFP ‘98 contest
• Pointer Arithmetic• Casts• Divide and Conquer Algorithms
– Recursion– Pointers Into Heap-Allocated Arrays– Pointer-Based Data Structures (octrees, hash
tables, ...)– Recursive Linked Data Structures Allocated On
Stack
Challenging Benchmark Challenging Benchmark SetSet
PLDI 99R. Rugina, M. Rinard
Related WorkRelated Work• Pointer analysis
– existing pointer analyses are focused to sequential programs
[LR92], [LRZ93], [CBC93], [EGH94], [Ruf95], [WL95], [And94], [Ste96], [SH97]
– flow-sensitive vs. flow-insensitive analysis– context-sensitive vs. context-insensitive analysis
• Multithreaded program analysis:– relatively unexplored field– flow-sensitive analysis :
• dataflow framework for bitvector problems [KSV96]
• does not apply to pointer analysis
– flow-insensitive analysis: • trivially model the interleaving semantics of concurrent
threads• locality analysis [ZH97] ( uses type-inference techniques)
PLDI 99R. Rugina, M. Rinard