semi-sparse flow-sensitive pointer analysis

24
Semi-Sparse Flow- Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor

Upload: zorion

Post on 24-Feb-2016

69 views

Category:

Documents


0 download

DESCRIPTION

Semi-Sparse Flow-Sensitive Pointer Analysis. Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor. Overview. Background Flow-Sensitive Analysis Semi-Sparse Flow-Sensitive Analysis Questions. Uses. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semi-Sparse Flow-Sensitive Pointer Analysis

Semi-Sparse Flow-Sensitive Pointer Analysis

Ben Hardekopf Calvin LinThe University of Texas at Austin

POPL ’09Simplified by Eric Villasenor

Page 2: Semi-Sparse Flow-Sensitive Pointer Analysis

Overview

• Background• Flow-Sensitive Analysis• Semi-Sparse Flow-Sensitive Analysis• Questions

Page 3: Semi-Sparse Flow-Sensitive Pointer Analysis

Uses

• Gather pointer information to improve precision which allows optimizations

• Flow sensitive is beneficial for the following– Security analysis– Deep error checking– Hardware synthesis– Multi-threaded programs

Page 4: Semi-Sparse Flow-Sensitive Pointer Analysis

Types of Analysis

• Types of pointer Analysis– Flow • Consider statement ordering in code • Little progress made in scalability

– Context• Consider Procedure calls• Good progress in scalability

• Complimentary improvement of precision

Page 5: Semi-Sparse Flow-Sensitive Pointer Analysis

Analysis Tradeoffs

• Scalability vs Precision– It takes time to analysis code– It takes memory to hold the analysis

• Insensitive vs Sensitive– Insensitive less complex/precise– Sensitive more complex/precise

• Larger pieces of code in general are complex

Page 6: Semi-Sparse Flow-Sensitive Pointer Analysis

Traditional Flow-Sensitive Analysis

• Lattice of dataflow facts• Meet operator on lattice• Transfer functions map lattice elements to

other lattice elements• Use CFG = <N,E>– N nodes (program points)– E edges (flow)

Page 7: Semi-Sparse Flow-Sensitive Pointer Analysis

Traditional Flow-Sensitive Analysis• Iterative algorithm– Runs until convergence

• Adds successor nodes to work list when output set changes

• Propagates pointer information to all reachable nodes

• Prohibitive in memory and computation complexity

Page 8: Semi-Sparse Flow-Sensitive Pointer Analysis

Contributions

• Two Ideas– Semi-sparse analysis– Novel use of Binary Decision Diagrams

• Two new optimizations– Top-level pointer equivalence– Local points-to graph equivalence

Page 9: Semi-Sparse Flow-Sensitive Pointer Analysis

Static Single Assignment

• Def/use relation captured

• Let us use it to reduce information sent to nodes

w = a;x = b;y = &c;z = y;y = &d;

w1 = a1;x1 = b1;y1 = c1;z1 = y1;y2 = d1;

w = a;x = b;y = c;z = y;y = d;

w1 = a1;x1 = b1;y1 = ?;z1 = ?;y2 = ?;

Pointer Analysis SSA

Page 10: Semi-Sparse Flow-Sensitive Pointer Analysis

Partial Single Static Assignment

• Two classes of variable– Address-Taken

• In memory• Use ALLOC/STORE

– Top-level• Never expose

address• Not dynamically

allocated

int a, b, *c, *d;

int* w = &a;int* x = &b;int** y = &c;int** z = y;

c = 0;*y = w;*z = x; y = &d; z = y;*y = w;*z = x;

w1 = ALLOCa

x1 = ALLOCb

y1 = ALLOCc

z1 = y1

STORE 0 y1

STORE w1 y1

STORE x1 z1

y2 = ALLOCd

z2 = y2

STORE w1 y2

STORE x1 z2

Page 11: Semi-Sparse Flow-Sensitive Pointer Analysis

Partial Single Static Assignment

• Advantages– Single global points-to graph for top-level

variables• They have same pointer information over entire

program– Top-level def/use info immediately available– Local points-to graph only contain address-taken

information

Page 12: Semi-Sparse Flow-Sensitive Pointer Analysis

Dataflow Graph

• DFG - combination of sparse evaluation graph (SEG) and def-use chain– Optimized version of CFG• Omits nodes that neither define nor use pointer info

– Connects adr-taken statements so defs reach uses• Two stage construction– First DEFadr and USEadr are considered– Second stage connects top-level defs to uses

Page 13: Semi-Sparse Flow-Sensitive Pointer Analysis

Dataflow GraphInst Type

Example Def-Use Info

ALLOC x = ALLOCi DEFtop

COPY x = y z DEFtop, USEtop

LOAD x = *y DEFtop, USEtop, USEadr

STORE *x = y USEtop, DEFadr, USEadr

CALL x = foo(y) DEFtop, USEtop, DEFadr, USEadr

RET return x USEtop, USEadr

Page 14: Semi-Sparse Flow-Sensitive Pointer Analysis

Dataflow Graphy1 = ALLOCc

STORE 0 y1w1 = ALLOCa

x1 = ALLOCb

z1 = y1STORE w1 y1

y2 = ALLOCd

STORE x1 z1

z2 = y2

STORE w1 y2

STORE x1 z2

w1 = ALLOCa

x1 = ALLOCb

y1 = ALLOCc

z1 = y1

STORE 0 y1

STORE w1 y1

STORE x1 z1

y2 = ALLOCd

z2 = y2

STORE w1 y2

STORE x1 z2

Page 15: Semi-Sparse Flow-Sensitive Pointer Analysis

Semi-Sparse Analysis• Each function has program statement work list– Initialized to statements that define variables

• Each program statement that uses or defines address-taken variables has two points-to graphs– IN = incoming address-taken info– OUT = outgoing address-taken info

• Global points-to graph holds pointer info for top-level variables

• Function work list that holds function waiting to be processed– Initialized to contain all functions in program

Page 16: Semi-Sparse Flow-Sensitive Pointer Analysis

Semi-Sparse Analysis

• Iterative algorithm• Computes for all nodes until convergence

• INk = U(x in pred(k)) OUTx

• OUTk = GENk U (INk – KILLk)

• KILL set determines strong or weak update– Know value of left hand side do strong update• precise

– Unsure of left hand side do weak update• conservative

Page 17: Semi-Sparse Flow-Sensitive Pointer Analysis

Top-Level Pointer Equivalence

• Optimization– Reduces number of top-level variables in DFG– x equiv y iff x points-to z and y points-to z

• Key Idea– Replace variables with identical points-to sets with

single set representative– Member of the set selected as representative

Page 18: Semi-Sparse Flow-Sensitive Pointer Analysis

Top-Level Pointer Equivalencey1 = ALLOCc

STORE 0 y1w1 = ALLOCa

x1 = ALLOCb

z1 = y1STORE w1 y1

y2 = ALLOCd

STORE x1 z1

z2 = y2

STORE w1 y2

STORE x1 z2

w1 = ALLOCa

x1 = ALLOCb

y1 = ALLOCc

z1 = y1

STORE 0 y1

STORE w1 y1

STORE x1 z1

y2 = ALLOCd

z2 = y2

STORE w1 y2

STORE x1 z2

STORE x1 y1

STORE x1 y1

STORE x1 y2

STORE x1 y2

w1 = ALLOCa

x1 = ALLOCb

y1 = ALLOCc

STORE 0 y1

STORE w1 y1

STORE x1 y1

y2 = ALLOCd

STORE w1 y2

STORE x1 y2

Page 19: Semi-Sparse Flow-Sensitive Pointer Analysis

Local Points-to Graph Equivalence

• Optimization– Eliminates nodes in DFG with identical points-to

graphs• Share a single points-to graph

– Used in SEG portion of graph• Key Idea– Non-preserving nodes

• Only STORE and CALL modify adr-taken pointer info.– Preserving nodes

• Propagate pointer info to other nodes

Page 20: Semi-Sparse Flow-Sensitive Pointer Analysis

Local Points-to Graph Equivalence

• Process takes O(n3)– N is the number of nodes in SEG portion of DFG• (DEFadr or USEadr)

• Further optimized to only use STORE– 0.1% precision loss

• Similar to RTL– STORE to STORE collapsible

CollapsedPoints-to

Graph

RETPoints-to

Graph

LOADPoints-to

Graph

STORE Points-to

Graph

Page 21: Semi-Sparse Flow-Sensitive Pointer Analysis

BDDs

• Compressed representation of set relations– Operations performed without decompression

• Set operations can be performed in polynomial-time

• Useful to store CFG and points-to graph• Transfer functions are BDD operations– Set operations

Page 22: Semi-Sparse Flow-Sensitive Pointer Analysis

Semi-Sparse Symbolic Analysis

• Encode top-level points-to information in BDD– Most variables are top-level

• BDDs can not operate on individual statements efficiently– Use iterative algorithm for address-taken points-to

information• Strong and weak updates• Allows BDD to operate efficiently

Page 23: Semi-Sparse Flow-Sensitive Pointer Analysis

Results of the AnalysisPointer Information Representation

Semi-Sparse Flow-Sensitive

Semi-Sparse Flow-Sensitive Optimized

SSO vs SS

bitmap 75x faster26x less memoryAgainst baseline

183x faster47x less memoryAgainst baseline

2.5x faster6.8x less memoryAgainst SS

BDD 44.8x faster1.4x less memoryAgainst baseline

114x faster1.4x less memoryAgainst baseline

4.4x faster1.03x less memoryAgainst SS

Page 24: Semi-Sparse Flow-Sensitive Pointer Analysis

Questions