swerve: semester in review. topics symbolic pointer analysis model checking –c programs...
Post on 21-Dec-2015
223 views
TRANSCRIPT
TopicsTopics
Symbolic pointer analysisSymbolic pointer analysis Model checkingModel checking
– C programsC programs– Abstract counterexamplesAbstract counterexamples
Symbolic simulation and executionSymbolic simulation and execution Cousot: the Galois connectionCousot: the Galois connection
P.A. TerminologyP.A. Terminology
Context-sensitivity: do we take calling Context-sensitivity: do we take calling context into accountcontext into account– Doing so leads to very precise but very non-Doing so leads to very precise but very non-
polynomial algorithmspolynomial algorithms
Flow-sensitivity: sensitive to control flowFlow-sensitivity: sensitive to control flow– Equality = unification-based = SteensgaardEquality = unification-based = Steensgaard
Almost linear, but not very preciseAlmost linear, but not very precise
– Subset = inclusion-based = AndersonSubset = inclusion-based = Anderson Polynomial but more precisePolynomial but more precise
– Sensitive analyses even more expensiveSensitive analyses even more expensive
P.A.: Problem FormulationP.A.: Problem Formulation
Phase one: find constraints in the codePhase one: find constraints in the code– Depends on sensitivities (context, flow)Depends on sensitivities (context, flow)– Examine stores, loads, etc.Examine stores, loads, etc.
Phase two: solve system of constraints for Phase two: solve system of constraints for the complete points-to relationthe complete points-to relation– Explicit: Steensgaard using union-findExplicit: Steensgaard using union-find– Implicit: Anderson-style using BDDsImplicit: Anderson-style using BDDs
Pointer Analysis ExamplePointer Analysis Example
hh11: v: v11 = new Object(); = new Object();
hh22: v: v22 = new Object(); = new Object();
vv11.f = v.f = v22;;
vv33 = v = v11.f;.f;
Input RelationsInput Relations
vPointsTo(vvPointsTo(v11,h,h11))
vPointsTo(vvPointsTo(v22,h,h22))
Store(vStore(v11,f,v,f,v22))
Load(vLoad(v11,f,v,f,v33))
Output RelationsOutput Relations
hPointsTo(hhPointsTo(h11,f,h,f,h22))
vPointsTo(vvPointsTo(v33,h,h22))
v1 h1
v2 h2
fv3
Zhu: Symbolic P.A.Zhu: Symbolic P.A.
Points-to relation can be huge, but BDDs Points-to relation can be huge, but BDDs are great at implicitly representing relationsare great at implicitly representing relations
Berndl et al: Symbolic P.A.Berndl et al: Symbolic P.A.
Subset-based formulation using BDDsSubset-based formulation using BDDs Variable ordering experimentsVariable ordering experiments
– Sets of heap objects (“pointed to”) tend to be Sets of heap objects (“pointed to”) tend to be large and regular: putting them together at the large and regular: putting them together at the end of the ordering helpsend of the ordering helps
– Interleaving the bits for sets of variables Interleaving the bits for sets of variables (“pointers”) helps a little(“pointers”) helps a little
– In general, important to partition the bits of the In general, important to partition the bits of the different sets in the relationsdifferent sets in the relations
Whaley & Lam: Datalog, bddbddbWhaley & Lam: Datalog, bddbddb
All these symbolic pointer analyses are All these symbolic pointer analyses are devoting a lot of implementation time to get devoting a lot of implementation time to get the BDD part correct and fastthe BDD part correct and fast
Datalog: a declarative language for Datalog: a declarative language for expressing (possibly recursive) relationsexpressing (possibly recursive) relations
bddbddb: a tool to convert Datalog bddbddb: a tool to convert Datalog operations (join, project, rename, recursion) operations (join, project, rename, recursion) into BDD operationsinto BDD operations
Points-to analyses can now be described Points-to analyses can now be described much more concisely in Datalogmuch more concisely in Datalog
hPointsTo(hhPointsTo(h11, f, h, f, h22)):- Store(v:- Store(v11, f, v, f, v22),),
vPointsTo(v vPointsTo(v11, h, h11),),
vPointsTo(v vPointsTo(v22, h, h22).).
v1 h1
v2 h2
f
Inference Rule in DatalogInference Rule in Datalog
vv11.f = v.f = v22;;
Stores:Stores:
Whaley & Lam: With ContextWhaley & Lam: With Context
Context sensitive analysis by cloning Context sensitive analysis by cloning methods and doing a context insensitive methods and doing a context insensitive analysis on the new call graphanalysis on the new call graph
Can use Datalog to express constraints Can use Datalog to express constraints necessary to determine the call graphnecessary to determine the call graph
Cloned call graph is exponentially bigger, Cloned call graph is exponentially bigger, but clever encoding lets BDDs handle it wellbut clever encoding lets BDDs handle it well
CBMC: Prototype ToolCBMC: Prototype ToolANSI-CModel
ANSI-CModel
VHDL/VerilogProduct
VHDL/VerilogProduct convertconvert
convertconvert +*
=
+*
=
Parsing andtype checking
BV Logic(Tree)
+*
=
BV LogicDecisionProblem
Equivalence reduced to bit vector logic decision problemEquivalence reduced to bit vector logic decision problem Tool requires decision procedure for large bit vector problemsTool requires decision procedure for large bit vector problems BV problems are HUGE – directly passed to Chaff in CNFBV problems are HUGE – directly passed to Chaff in CNF
CNF ChaffChaff
Explaining CounterexamplesExplaining Counterexamples
Counterexamples provided by model Counterexamples provided by model checkers are often difficult to understand checkers are often difficult to understand and locate within the codeand locate within the code
Previous work: find a concrete execution Previous work: find a concrete execution “close to” the counterexample by some “close to” the counterexample by some distance metricdistance metric
This work: find an This work: find an abstractabstract execution— execution—provides more meaningful explanationsprovides more meaningful explanations
Distance MetricDistance Metric
Execution = (state, action) sequenceExecution = (state, action) sequence– State = (control location, predicate)State = (control location, predicate)
Metric: compare two executions Metric: compare two executions aa and and bb– Don’t just compare Don’t just compare aaii to to bbii since small changes since small changes
in control flow can yield “misalignment”in control flow can yield “misalignment”– Distance is defined as the number of changes Distance is defined as the number of changes
(in predicates and actions) to convert (in predicates and actions) to convert aa to to bb
Quasi-symbolic simulationQuasi-symbolic simulation
Symbolic simulation externallySymbolic simulation externally scalar values internallyscalar values internally
– simulation run requires constant memory.simulation run requires constant memory. Key ideasKey ideas
– Don’t compute Don’t compute exactexact value unless necessary. value unless necessary. many don’t cares in large designs.many don’t cares in large designs.
– Trade time for memory.Trade time for memory. Multiple runs to generate exact values.Multiple runs to generate exact values.
Reliability of directed testing with efficiency Reliability of directed testing with efficiency closer to that of symbolic methodscloser to that of symbolic methods
Don’t care logic
Basic AlgorithmBasic Algorithm
&
&&
&Xaa
Xbb
Xcc
Symbolic variable
X-a-a
Xaa
0
Obeys law of excluded middle!
X
Conservative approximation
X
XX “traditional”
X value
0
Don’t care variables
Decision ProcedureDecision Procedure
X
?
a=0 a=1
Variable selection heuristic:
pick relevant variable by propagating from inputs.
&
&
O
Xaa
Xbb
X
X
X
00
0
Xbb0
1
0
Xbb
0 ?0
Test is Unsatisfiable!
BDDs with Approximate ValuesBDDs with Approximate Values Generic Approximate BDD Generic Approximate BDD applyapply algorithm. algorithm.
Approx_Apply(F,G)find top variable Vcompute L=left(F,G), R=right(F,G)if node(V,L,R) exists, return itelse if (want_exact(V,L,R))
create node (V,L,R)return node
else /* approximate */return X
Classification AlgorithmClassification Algorithm
Simulator’s classificationSimulator’s classification– CareCare– Don’t CareDon’t Care
AlgorithmAlgorithm– Initially, all variables are Initially, all variables are Don’t Care.Don’t Care.– Simulate using Simulate using sub-domainsub-domain values only. values only.– Re-classify 1 variable as Re-classify 1 variable as CareCare..– Repeat until sufficient variables classified.Repeat until sufficient variables classified.
ReviewReview
What we’ve done:What we’ve done:– Symbolic pointer analysisSymbolic pointer analysis– Symbolic simulations and executionsSymbolic simulations and executions– Model checkingModel checking
C programsC programs Abstract explanationsAbstract explanations
Where do we go from here?Where do we go from here?