pointer analysis

41
Pointer Analysis G. Ramalingam Microsoft Research, India

Upload: lars-sharp

Post on 02-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

Pointer Analysis. G. Ramalingam Microsoft Research, India. A Constant Propagation Example. x = 3; y = 4; z = x + 5;. x is always 3 here can replace x by 3 and replace x+5 by 8 and so on. A Constant Propagation Example With Pointers. x = 3; *p = 4; z = x + 5;. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pointer Analysis

Pointer Analysis

G. RamalingamMicrosoft Research, India

Page 2: Pointer Analysis

A Constant Propagation Example

x = 3;

y = 4;

z = x + 5;

• x is always 3 here• can replace x by 3• and replace x+5 by 8• and so on

Page 3: Pointer Analysis

A Constant Propagation ExampleWith Pointers

x = 3;

*p = 4;

z = x + 5;

• Is x always 3 here?

Page 4: Pointer Analysis

p = &y; x = 3; *p = 4; z = x + 5;

A Constant Propagation ExampleWith Pointers

x is always 3

p = &x; x = 3; *p = 4; z = x + 5;

if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5;

x is always 4

x may be 3 or 4(i.e., x is unknown in our lattice)

pointers affectmost program analyses

Page 5: Pointer Analysis

p = &y; x = 3; *p = 4; z = x + 5;

A Constant Propagation ExampleWith Pointers

p alwayspoints-to

y

p = &x; x = 3; *p = 4; z = x + 5;

if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5;

p alwayspoints-to

x

p may point-to x or y

Page 6: Pointer Analysis

Points-to Analysis

• Determine the set of targets a pointer variable could point-to (at different points in the program)– “p points-to x” == “*p has l-value x”– targets could be variables or locations in

the heap (dynamic memory allocation)• p = &x;• p = new Foo(); or p = malloc (…);

–must-point-to vs. may-point-to

Page 7: Pointer Analysis

A Constant Propagation ExampleWith Pointers

*q = 3;

*p = 4;

z = *q + 5; what values

canthis take?

Can *p denote thesame location as *q?

Page 8: Pointer Analysis

More Terminology

• *p and *q are said to be aliases (in a given concrete state) if they represent the same location

• Alias analysis– determine if a given pair of references

could be aliases at a given program point

– *p may-alias *q– *p must-alias *q

Page 9: Pointer Analysis

Pointer Analysis

• Points-To Analysis– may-point-to– must-point-to

• Alias Analysis– may-alias– must-alias

Page 10: Pointer Analysis

Points-To Analysis:A Simple Example

p = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;

Page 11: Pointer Analysis

Points-To Analysis:A Simple Example

p = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;

p q x y z

null null null null null

x null null null null

x y null null null

x y null null null

x x null null null

x {x,y} null null null

x {x,y} a null null

x {x,y} a b null

x {x,y} a b {a,b}

p q x y z

null null null null null

x null null null null

x y null null null

x y null null null

x x null null null

x {x,y} null null null

x {x,y} a null null

x {x,y} a b null

p q x y z

null null null null null

x null null null null

x y null null null

x y null null null

x x null null null

x {x,y} null null null

x {x,y} a null null

p q x y z

null null null null null

x null null null null

x y null null null

x y null null null

x x null null null

x {x,y} null null null

p q x y z

null null null null null

x null null null null

x y null null null

x y null null null

x x null null null

p q x y z

null null null null null

x null null null null

x y null null null

x y null null null

p q x y z

null null null null null

x null null null null

x y null null null

p q x y z

null null null null null

x null null null null

p q x y z

null null null null null

p q x y z

Page 12: Pointer Analysis

Points-To Analysisx = &a;y = &b;if (?) { p = &x;} else { p = &y;}

*x = &c;*p = &c;

How should we handlethis statement?

x: a y: bp: {x,y}

a: null

x: a y: bp: {x,y}

a: c

x: {a,c}

y: {b,c}

p: {x,y}

a: c

Strong update

Weak updateWeak update

Page 13: Pointer Analysis

Questions

• When is it correct to use a strong update? A weak update?

• Is this points-to analysis precise?

• What does it mean to say– p must-point-to x at pgm point u– p may-point-to x at pgm point u– p must-not-point-to x at u– p may-not-point-to x at u

Page 14: Pointer Analysis

Points-To Analysis, Formally

• We must formally define what we want to compute before we can answer many such questions

Page 15: Pointer Analysis

Static Program Analysis

• A static program analysis computes approximate information about the runtime behavior of a given program1. The set of valid programs is defined by the

programming language syntax2. The runtime behavior of a given program

is defined by the programming language semantics

3. The analysis problem defines what information is desired

4. The analysis algorithm determines what approximation to make

Page 16: Pointer Analysis

Programming Language: Syntax

• A program consists of– a set of variables Var– a directed graph (V,E,entry) with a distinguished

entry vertex, with every edge labelled by a primitive statement

• A primitive statement is of the form• x = null• x = y• x = *y• x = &y;• *x = y• skip(where x and y are variables in Var)

Omitted (for now)• Dynamic memory allocation• Pointer arithmetic• Structures and fields• Procedures

Page 17: Pointer Analysis

Example Program

x = &a;y = &b;if (?) { p = &x;} else { p = &y;}

*x = &c;*p = &c;

1

2

3

4 5

x = &a

y = &b

6

7

8

*x = &c

*p = &c

p = &yp = &x

skipskip

Vars = {x,y,p,a,b,c}

Page 18: Pointer Analysis

Programming Language:Operational Semantics

• Operational semantics == an interpreter (defined mathematically)

• State– Data-State ::= Var -> (Var U {null})– PC ::= V (the vertex set of the CFG)– Program-State ::= PC x Data-State

• Initial-state:– (entry, \x. null)

Page 19: Pointer Analysis

Example States

1

2

3

4 5

x = &a

y = &b

6

7

8

*x = &c

*p = &c

p = &yp = &x

skipskip

Vars = {x,y,p,a,b,c}x: N, y:N, p:N, a:N, b:N, c:N

Initial data-state

Initial program-statex: N, y:N, p:N, a:N, b:N, c:N

<1, >

Page 20: Pointer Analysis

Programming Language:Operational Semantics

• Meaning of primitive statements– CS[stmt] : Data-State -> Data-State

• CS[ x = y ] s = s[x s(y)]• CS[ x = *y ] s = s[x s(s(y))]• CS[ *x = y ] s = s[s(x) s(y)]• CS[ x = null ] s = s[x null]• CS[ x = &y ] s = s[x y]

must say what happens if null is dereferenced

Page 21: Pointer Analysis

Programming Language:Operational Semantics

• Meaning of program– a transition relation on program-states– Program-State X Program-State– state1 state2 means that the execution of

some edge in the program can transform state1 into state2

• Defining

– (u,s) (v,s’) iff the program contains a control-flow edge u->v labelled with a statement stmt such that M[stmt]s = s’

Page 22: Pointer Analysis

Programming Language:Operational Semantics

• A sequence of states s1s2 … sn is said to be an execution (of the program) iff – s1 is the Initial-State

– si si+1 for 1 <= I < n

• A state s is said to be a reachable state iff there exists some execution s1s2 … sn

is such that sn = s.

• Define RS(u) = { s | (u,s) is reachable }

Page 23: Pointer Analysis

Programming Language:Operational Semantics

• A sequence of states s1s2 … sn is said to be an execution (of the program) iff – s1 is the Initial-State

– si | si+1 for 1 <= I < n

• A state s is said to be a reachable state iff there exists some execution s1s2 … sn

is such that sn = s.

• Define RS(u) = { s | (u,s) is reachable }

All of this formalism for this

one definition

Page 24: Pointer Analysis

Ideal Points-To Analysis:Formal Definition

• Let u denote a vertex in the CFG

• Define IdealMustPT (u) to be { (p,x) | forall s in RS(u). s(p) == x }

• Define IdealMayPT (u) to be{ (p,x) | exists s in RS(u). s(p) == x }

Page 25: Pointer Analysis

May-Point-To Analysis:Formal Requirement Specification

For every vertex u in the CFG,compute a set R(u) such that

R(u) { (p,x) | $sÎRS(u). s(p) == x }

Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)

(where Var’ = Var U {null})

May Point-To Analysis

Page 26: Pointer Analysis

May-Point-To Analysis:Formal Requirement Specification

• An algorithm is said to be correct if the solution R it computes satisfies

"uÎV. R(u) IdealMayPT(u)• An algorithm is said to be precise if the solution R it

computes satisfies"uÎV. R(u) = IdealMayPT(u)

• An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if

"uÎV. R1(u) Í R2(u)

Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)

Page 27: Pointer Analysis

Back To OurMay-Point-To Algorithm

p = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;

p q x y z

null null null null null

x null null null null

x y null null null

x y null null null

x x null null null

x {x,y} null null null

x {x,y} a null null

x {x,y} a b null

x {x,y} a b {a,b}

Page 28: Pointer Analysis

(May-Point-To Analysis)Algorithm A

• Is this algorithm correct?• Is this algorithm precise?

• Let’s first completely and formally define the algorithm.

Page 29: Pointer Analysis

Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe

• Define semi-lattice of abstract-values– AbsDataState ::= Var -> 2Var’

– f1 f2 = \x. (f1 (x) È f2 (x))

– bottom = \x.{}

• Define initial abstract-value– InitialAbsState = \x. {null}

• Define transformers for primitive statements

• AS[stmt] : AbsDataState -> AbsDataState

Page 30: Pointer Analysis

Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe

• Let st(v,u) denote stmt on edge v->u

• Compute the least-fixed-point of the following “dataflow equations”– x(entry) = InitialAbsState

– x(u) = v->u AS(st(v,u)) x(v)

v w

u

st(w,u)st(v,u)

x(v) x(w)

x(u)

Page 31: Pointer Analysis

Algorithm A:The Transformers

• Abstract transformers for primitive statements– AS[stmt] : AbsDataState -> AbsDataState

• AS[ x = y ] s = s[x s(y)]• AS[ x = null ] s = s[x {null}]• AS[ x = &y ] s = s[x {y}]• AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn)

• AS[ *x = y ] s = ???

Page 32: Pointer Analysis

Correctness & Precision

• We have a complete & formal definition of the problem.

• We have a complete & formal definition of a proposed solution.

• How do we reason about the correctness & precision of the proposed solution?

Page 33: Pointer Analysis

Enter: The French Recipe(Abstract Interpretation)

Abstract Domain

• A semi-lattice (A, b)• Transfer Functions• For every statement

st,AS[st] : A -> A

Concrete (Collecting) Domain

• A semi-lattice (2C, b)• Transfer Functions• For every statement st,

CS*[st] : 2C -> 2C

2Data-State 2Var x Var’

Concrete Domain• Concrete states: C• Semantics: For every statement st,

CS[st] : C -> C

g

a

Page 34: Pointer Analysis

Points-To Analysis(Abstract Interpretation)

a(Y) = { (p,x) | exists s in Y. s(p) == x }

RS(u)

2Data-State 2Var x Var’

IdealMayPT(u)

MayPT(u)

Íaa

IdealMayPT (u) = a ( RS(u) )

Page 35: Pointer Analysis

Approximating Transformers:Correctness Criterion

C A

correctly approximated by

c1

c2

f

a1

a2

f#

correctly approximated by

c is said to be correctly approximated by a

iffa(c) Í a

Page 36: Pointer Analysis

Approximating Transformers:Correctness Criterion

C A

c1

c2

f

a1

a2

f#

concretizationg

abstractiona

requirement:f#(a1) ≥ a (f( g(a1))

Page 37: Pointer Analysis

Concrete Transformers

• CS[stmt] : Data-State -> Data-State

• CS[ x = y ] s = s[x s(y)]• CS[ x = *y ] s = s[x s(s(y))]• CS[ *x = y ] s = s[s(x) s(y)]• CS[ x = null ] s = s[x null]

• CS*[stmt] : 2Data-State -> 2Data-State

• CS*[st] X = { CS[st]s | s Î X }

Page 38: Pointer Analysis

Abstract Transformers

• AS[stmt] : AbsDataState -> AbsDataState

• AS[ x = y ] s = s[x s(y)]• AS[ x = null ] s = s[x {null}]• AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn)

• AS[ *x = y ] s = ???

Page 39: Pointer Analysis

Algorithm A: TranformersWeak/Strong Update

x: {&y} y: {&x,&z} z: {&a}

x: &b y: &x z: &a

x: &y y: &z z: &bx: {&y,&b} y: {&x,&z} z: {&a,&b}

x: &y y: &x z: &a

x: &y y: &z z: &a

*y = &b;f#*y = &b;f

a

g

Page 40: Pointer Analysis

Algorithm A: TranformersWeak/Strong Update

x: {&y} y: {&x,&z} z: {&a}

x: &y y: &b z: &a

x: &y y: &b z: &ax: {&y} y: {&b} z: {&a}

x: &y y: &x z: &a

x: &y y: &z z: &a

*x = &b;f#*x = &b;f

a

g

Page 41: Pointer Analysis

Algorithm A: TransformersWeak/Strong Update

• Transformer for “*p = q”