swerve: semester in review. topics symbolic pointer analysis model checking –c programs...

26
Swerve: Swerve: Semester in Review Semester in Review

Post on 21-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Swerve:Swerve:Semester in ReviewSemester in Review

TopicsTopics

Symbolic pointer analysisSymbolic pointer analysis Model checkingModel checking

– C programsC programs– Abstract counterexamplesAbstract counterexamples

Symbolic simulation and executionSymbolic simulation and execution Cousot: the Galois connectionCousot: the Galois connection

Pointer Analysis (in 2001)Pointer Analysis (in 2001)

P.A. TerminologyP.A. Terminology

Context-sensitivity: do we take calling Context-sensitivity: do we take calling context into accountcontext into account– Doing so leads to very precise but very non-Doing so leads to very precise but very non-

polynomial algorithmspolynomial algorithms

Flow-sensitivity: sensitive to control flowFlow-sensitivity: sensitive to control flow– Equality = unification-based = SteensgaardEquality = unification-based = Steensgaard

Almost linear, but not very preciseAlmost linear, but not very precise

– Subset = inclusion-based = AndersonSubset = inclusion-based = Anderson Polynomial but more precisePolynomial but more precise

– Sensitive analyses even more expensiveSensitive analyses even more expensive

P.A.: Flow-sensitivityP.A.: Flow-sensitivity

P.A.: Flow-sensitivityP.A.: Flow-sensitivity

P.A.: Flow-sensitivityP.A.: Flow-sensitivity

P.A.: Problem FormulationP.A.: Problem Formulation

Phase one: find constraints in the codePhase one: find constraints in the code– Depends on sensitivities (context, flow)Depends on sensitivities (context, flow)– Examine stores, loads, etc.Examine stores, loads, etc.

Phase two: solve system of constraints for Phase two: solve system of constraints for the complete points-to relationthe complete points-to relation– Explicit: Steensgaard using union-findExplicit: Steensgaard using union-find– Implicit: Anderson-style using BDDsImplicit: Anderson-style using BDDs

Pointer Analysis ExamplePointer Analysis Example

hh11: v: v11 = new Object(); = new Object();

hh22: v: v22 = new Object(); = new Object();

vv11.f = v.f = v22;;

vv33 = v = v11.f;.f;

Input RelationsInput Relations

vPointsTo(vvPointsTo(v11,h,h11))

vPointsTo(vvPointsTo(v22,h,h22))

Store(vStore(v11,f,v,f,v22))

Load(vLoad(v11,f,v,f,v33))

Output RelationsOutput Relations

hPointsTo(hhPointsTo(h11,f,h,f,h22))

vPointsTo(vvPointsTo(v33,h,h22))

v1 h1

v2 h2

fv3

Zhu: Symbolic P.A.Zhu: Symbolic P.A.

Points-to relation can be huge, but BDDs Points-to relation can be huge, but BDDs are great at implicitly representing relationsare great at implicitly representing relations

Berndl et al: Symbolic P.A.Berndl et al: Symbolic P.A.

Subset-based formulation using BDDsSubset-based formulation using BDDs Variable ordering experimentsVariable ordering experiments

– Sets of heap objects (“pointed to”) tend to be Sets of heap objects (“pointed to”) tend to be large and regular: putting them together at the large and regular: putting them together at the end of the ordering helpsend of the ordering helps

– Interleaving the bits for sets of variables Interleaving the bits for sets of variables (“pointers”) helps a little(“pointers”) helps a little

– In general, important to partition the bits of the In general, important to partition the bits of the different sets in the relationsdifferent sets in the relations

Whaley & Lam: Datalog, bddbddbWhaley & Lam: Datalog, bddbddb

All these symbolic pointer analyses are All these symbolic pointer analyses are devoting a lot of implementation time to get devoting a lot of implementation time to get the BDD part correct and fastthe BDD part correct and fast

Datalog: a declarative language for Datalog: a declarative language for expressing (possibly recursive) relationsexpressing (possibly recursive) relations

bddbddb: a tool to convert Datalog bddbddb: a tool to convert Datalog operations (join, project, rename, recursion) operations (join, project, rename, recursion) into BDD operationsinto BDD operations

Points-to analyses can now be described Points-to analyses can now be described much more concisely in Datalogmuch more concisely in Datalog

hPointsTo(hhPointsTo(h11, f, h, f, h22)):- Store(v:- Store(v11, f, v, f, v22),),

vPointsTo(v vPointsTo(v11, h, h11),),

vPointsTo(v vPointsTo(v22, h, h22).).

v1 h1

v2 h2

f

Inference Rule in DatalogInference Rule in Datalog

vv11.f = v.f = v22;;

Stores:Stores:

Whaley & Lam: With ContextWhaley & Lam: With Context

Context sensitive analysis by cloning Context sensitive analysis by cloning methods and doing a context insensitive methods and doing a context insensitive analysis on the new call graphanalysis on the new call graph

Can use Datalog to express constraints Can use Datalog to express constraints necessary to determine the call graphnecessary to determine the call graph

Cloned call graph is exponentially bigger, Cloned call graph is exponentially bigger, but clever encoding lets BDDs handle it wellbut clever encoding lets BDDs handle it well

CBMC: Prototype ToolCBMC: Prototype ToolANSI-CModel

ANSI-CModel

VHDL/VerilogProduct

VHDL/VerilogProduct convertconvert

convertconvert +*

=

+*

=

Parsing andtype checking

BV Logic(Tree)

+*

=

BV LogicDecisionProblem

Equivalence reduced to bit vector logic decision problemEquivalence reduced to bit vector logic decision problem Tool requires decision procedure for large bit vector problemsTool requires decision procedure for large bit vector problems BV problems are HUGE – directly passed to Chaff in CNFBV problems are HUGE – directly passed to Chaff in CNF

CNF ChaffChaff

ExampleExample

Explaining CounterexamplesExplaining Counterexamples

Counterexamples provided by model Counterexamples provided by model checkers are often difficult to understand checkers are often difficult to understand and locate within the codeand locate within the code

Previous work: find a concrete execution Previous work: find a concrete execution “close to” the counterexample by some “close to” the counterexample by some distance metricdistance metric

This work: find an This work: find an abstractabstract execution— execution—provides more meaningful explanationsprovides more meaningful explanations

Distance MetricDistance Metric

Execution = (state, action) sequenceExecution = (state, action) sequence– State = (control location, predicate)State = (control location, predicate)

Metric: compare two executions Metric: compare two executions aa and and bb– Don’t just compare Don’t just compare aaii to to bbii since small changes since small changes

in control flow can yield “misalignment”in control flow can yield “misalignment”– Distance is defined as the number of changes Distance is defined as the number of changes

(in predicates and actions) to convert (in predicates and actions) to convert aa to to bb

Symbolic ExecutionSymbolic Execution

Quasi-symbolic simulationQuasi-symbolic simulation

Symbolic simulation externallySymbolic simulation externally scalar values internallyscalar values internally

– simulation run requires constant memory.simulation run requires constant memory. Key ideasKey ideas

– Don’t compute Don’t compute exactexact value unless necessary. value unless necessary. many don’t cares in large designs.many don’t cares in large designs.

– Trade time for memory.Trade time for memory. Multiple runs to generate exact values.Multiple runs to generate exact values.

Reliability of directed testing with efficiency Reliability of directed testing with efficiency closer to that of symbolic methodscloser to that of symbolic methods

Don’t care logic

Basic AlgorithmBasic Algorithm

&

&&

&Xaa

Xbb

Xcc

Symbolic variable

X-a-a

Xaa

0

Obeys law of excluded middle!

X

Conservative approximation

X

XX “traditional”

X value

0

Don’t care variables

Decision ProcedureDecision Procedure

X

?

a=0 a=1

Variable selection heuristic:

pick relevant variable by propagating from inputs.

&

&

O

Xaa

Xbb

X

X

X

00

0

Xbb0

1

0

Xbb

0 ?0

Test is Unsatisfiable!

BDDs with Approximate ValuesBDDs with Approximate Values Generic Approximate BDD Generic Approximate BDD applyapply algorithm. algorithm.

Approx_Apply(F,G)find top variable Vcompute L=left(F,G), R=right(F,G)if node(V,L,R) exists, return itelse if (want_exact(V,L,R))

create node (V,L,R)return node

else /* approximate */return X

Classification AlgorithmClassification Algorithm

Simulator’s classificationSimulator’s classification– CareCare– Don’t CareDon’t Care

AlgorithmAlgorithm– Initially, all variables are Initially, all variables are Don’t Care.Don’t Care.– Simulate using Simulate using sub-domainsub-domain values only. values only.– Re-classify 1 variable as Re-classify 1 variable as CareCare..– Repeat until sufficient variables classified.Repeat until sufficient variables classified.

ReviewReview

What we’ve done:What we’ve done:– Symbolic pointer analysisSymbolic pointer analysis– Symbolic simulations and executionsSymbolic simulations and executions– Model checkingModel checking

C programsC programs Abstract explanationsAbstract explanations

Where do we go from here?Where do we go from here?