Learning Symbolic Interfaces of Software Components
Zvonimir Rakamarić
This Work
Published at Static Analysis Symposium 2012 Joint work with Dimitra Giannakopoulou
(NASA) and Vishwanath Raman (CMU/NASA)
Introduction
Motivating Exampleclass Example { private static int x = 0; private static int y = 0;
public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
• init can be called unconditionally
• a can be called unconditionally
• b can be called after init only when y != 10
Goal
Learn temporal interfaces of software components Legal and illegal sequences of method calls
defined as an automaton Why?
Documentation Reverse engineering Model-based testing Regression testing Compositional verification …
Limitations of Prior Approaches
Since method b in Example cannot be called unconditionally after init, prior approaches either consider calling b after init an error no matter what
the values of the parameters it depends on are, or expect init to be manually partitioned
Our Contribution
class Example { ...}
Background
Symbolic Execution
Key idea: execution of programs using symbolic input values instead of concrete data
Concrete vs symbolic Concrete execution
Program takes only one path determined by input values
Symbolic execution Program can take any feasible path – coverage! Limited by the power of constraint solver Scalability issues when faced with large (exponential)
number of paths – path explosion
Symbolic Program State
Symbolic values of program variables Path condition (PC)
Logical formula over symbolic inputs Accumulates constraints that inputs have to satisfy
for the particular path to be executed If a path is feasible its PC is satisfiable
Program location
Symbolic Execution Tree
Characterizes execution paths constructed during symbolic execution
Nodes are symbolic program states Edges are labeled with program transitions
Example
1) int x, y;2) if (x > y) {3) x = x + y;4) y = x – y;5) x = x – y;6) if (x > y)7) assert false;8) }
x:X, y:YPC:truex:X, y:YPC:true
x:X, y:YPC:X>Yx:X, y:YPC:X>Y
x:X, y:YPC:X<=Yx:X, y:YPC:X<=Y
x:X+Y, y:YPC:X>Y
x:X+Y, y:YPC:X>Y
x:X+Y, y:XPC:X>Y
x:X+Y, y:XPC:X>Y
x:Y, y:XPC:X>Yx:Y, y:XPC:X>Y
x:Y, y:XPC:X>Y Æ
Y>X
x:Y, y:XPC:X>Y Æ
Y>X
x:Y, y:XPC:X>Y Æ
Y<=X
x:Y, y:XPC:X>Y Æ
Y<=X
true
true false
false
SAT
SATUNSAT
SAT
1) int x, y;
2) if (x > y) {
3) x = x + y;
4) y = x – y;
5) x = x – y;
6) if (x > y)
7) assert false;
8) }
Active Automata Learning
D. Angluin, 1987: “Learning Regular Sets from Queries and Counterexamples”
Algorithm is called L* L* learns unknown regular language U (over
alphabet ) and produces minimal DFA A such that L(A) = U
Complexity of the original algorithm is O(||*|A|3)
Active Automata Learning cont.
L* learner communicates with a teacher using two types of queries
Membership queries: Should word w be included in L(A)? Expected answer: yes/no
Equivalence queries: Here is a conjectured DFA A – is L(A) = U? Expected answer: yes/no+counterexample
L* Learner Teacher
word w
yes/no
DFA A
yes/no+cex
DFA A
PSYCO Algorithm
Interface Learning with L*
L* uses a teacher to answer the following queries Membership queries
Whether or not a given sequence of method calls leads to an error or not in the implementation
Equivalence queries Whether a conjectured DFA captures all the behaviors
of the implementation
Answering Membership Queries
L* uses a teacher to answer the following queries Membership queries
Whether or not a given sequence of method calls leads to an error or not in the implementation
Equivalence queries Whether a conjectured DFA captures all the behaviors
of the implementation
Running Exampleclass Example { private static int x = 0; private static int y = 0;
public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
Executing query <init;b>
class Example { private static int x = 0; private static int y = 0;
public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
x:P, y:QPC: truex:P, y:QPC: true
OKPC: Q != 10
OKPC: Q != 10
p:P, q:QPC: truep:P, q:QPC: true
ERRORPC: Q == 10
ERRORPC: Q == 10
Executing query <init;b>
class Example { private static int x = 0; private static int y = 0;
public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
x:P, y:QPC: truex:P, y:QPC: true
OKPC: Q != 10
OKPC: Q != 10
p:P, q:QPC: truep:P, q:QPC: true
ERRORPC: Q == 10
ERRORPC: Q == 10
Executing query <init;b>
class Example { private static int x = 0; private static int y = 0;
public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
x:P, y:QPC: truex:P, y:QPC: true
p:P, q:QPC: truep:P, q:QPC: true
OKPC: Q != 10
OKPC: Q != 10
ERRORPC: Q == 10
ERRORPC: Q == 10
Refinement: Split init
public static void init(int p, int q) { x = p; y = q;}
public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}
x:P, y:QPC: truex:P, y:QPC: true
p:P, q:QPC: truep:P, q:QPC: true
OKPC: Q != 10
OKPC: Q != 10
ERRORPC: Q == 10
ERRORPC: Q == 10
init_0 := init[q != 10]
init_1 := init[q == 10]
Restart Learning
public static void init(int p, int q) { x = p; y = q;}
public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}
new learner alphabet:{init_0, init_1, a, b}
learning restarts, re-using results from previous iterations
x:P, y:QPC: truex:P, y:QPC: true
p:P, q:QPC: truep:P, q:QPC: true
OKPC: Q != 10
OKPC: Q != 10
ERRORPC: Q == 10
ERRORPC: Q == 10
Executing query <init_0;a;b>
class Example { private static int x = 0; private static int y = 0;
public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
x:P, y:QPC: truex:P, y:QPC: true
x:P, y:10PC: P = 0x:P, y:10PC: P = 0
x:P, y:11PC: P != 0x:P, y:11PC: P != 0
OKPC: P != 0
OKPC: P != 0
p:P, q:QPC: truep:P, q:QPC: true
ERRORPC: P = 0ERROR
PC: P = 0
Executing query <init_0;a;b>
class Example { private static int x = 0; private static int y = 0;
public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
x:P, y:QPC: truex:P, y:QPC: true
x:P, y:10PC: P = 0x:P, y:10PC: P = 0
x:P, y:11PC: P != 0x:P, y:11PC: P != 0
OKPC: P != 0
OKPC: P != 0
p:P, q:QPC: truep:P, q:QPC: true
ERRORPC: P = 0ERROR
PC: P = 0
Executing query <init_0;a;b>
x:P, y:QPC: truex:P, y:QPC: true
x:P, y:10PC: P = 0x:P, y:10PC: P = 0
x:P, y:11PC: P != 0x:P, y:11PC: P != 0
OKPC: P != 0
OKPC: P != 0
p:P, q:QPC: truep:P, q:QPC: true
ERRORPC: P = 0ERROR
PC: P = 0
class Example { private static int x = 0; private static int y = 0;
public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
Executing query <init_0;a;b>
class Example { private static int x = 0; private static int y = 0;
public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
x:P, y:QPC: truex:P, y:QPC: true
x:P, y:10PC: P = 0x:P, y:10PC: P = 0
x:P, y:11PC: P != 0x:P, y:11PC: P != 0
OKPC: P != 0
OKPC: P != 0
p:P, q:QPC: truep:P, q:QPC: true
ERRORPC: P = 0ERROR
PC: P = 0
Executing query <init_0;a;b>
class Example { private static int x = 0; private static int y = 0;
public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
x:P, y:QPC: truex:P, y:QPC: true
x:P, y:10PC: P = 0x:P, y:10PC: P = 0
x:P, y:11PC: P != 0x:P, y:11PC: P != 0
OKPC: P != 0
OKPC: P != 0
p:P, q:QPC: truep:P, q:QPC: true
ERRORPC: P = 0ERROR
PC: P = 0
Refinement: Split init_0
ERRORPC: P = 0ERROR
PC: P = 0
public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}
x:P, y:QPC: truex:P, y:QPC: true
x:P, y:10PC: P = 0x:P, y:10PC: P = 0
x:P, y:11PC: P != 0x:P, y:11PC: P != 0
OKPC: P != 0
OKPC: P != 0
p:P, q:QPC: truep:P, q:QPC: true
init_0_0 := init[q != 10 && p == 0]
init_0_1 := init[q != 10 && p != 0]
Restart Learning
ERRORPC: P = 0ERROR
PC: P = 0
public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}
new learner alphabet:{init_0_0, init_0_1, init_1, a, b}
learning restarts
x:P, y:QPC: truex:P, y:QPC: true
x:P, y:10PC: P = 0x:P, y:10PC: P = 0
x:P, y:11PC: P != 0x:P, y:11PC: P != 0
OKPC: P != 0
OKPC: P != 0
p:P, q:QPC: truep:P, q:QPC: true
Answering Equivalence Queries
L* uses a teacher to answer the following queries Membership queries
Whether or not a given sequence of method calls leads to an error or not in the implementation
Equivalence queries Whether a conjectured DFA captures all the behaviors
of the implementation
Unbounded Loops in Conjectures
Component have no loops, but conjectures do!
We unroll unbounded loops in conjectures a bounded number of times
Answering Equivalence Queries
Walk the conjectured automaton and extract all legal method sequences to a given depth k all illegal method sequences
for each illegal sequence of depth n, extract the legal sequence of depth n - 1
We then use membership queries to check the outcome of each sequence If a sequence is misclassified by the learner, we
have a counterexample for L*
Running Example: Depth is 2
class Example { private static int x = 0; private static int y = 0;
public static void init(int p, int q) { x = p; y = q; }
public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}
Running Example: Depth is 3
Implementation and Experiments
Architecture of PSYCO
Implementation of PSYCO
Implemented on top of Java PathFinder (JPF) software model checking infrastructurehttp://babelfish.arc.nasa.gov/trac/jpf
PSYCO-related modules jpf-psyco: interface generation for Java classes
including parameters uses jpf-learn and jpf-jdart
jpf-learn: implements L* jpf-jdart: symbolic execution in JPF
actually DART/concolic
Experiments
Example Methods
k-max
k-min
Conjectures
Refinements
Alphabet States
SIGNATURE 5 7 2 2 0 5 4
PIPEDOUTPUTSTREAM
4 7 2 2 1 5 3
INTMATH 8 1 1 1 7 16 3
ALTBIT 2 27 4 8 3 5 5
CEV-FLIGHTRULE 3 3 3 3 2 5 3
CEV 18 3 3 10 6 24 9
k-max is the maximum exploration depth reached in one hourk-min is the depth when we realized the expected interface
Automata do not change between k-min and k-max, and are k-max-full
Summary
Summary
Combined automata learning and symbolic techniques for temporal interface generation Generating richer interfaces with symbolic method
guards Implemented a prototype tool in Java PathFinder
Works well on realistic examples Equivalence queries are a potential bottleneck
Our Contribution cont.
We learn 3-valued Deterministic Finite Automata
mod(p, q)[q > 0 && p >= 0]
mod(p, q)[q <= 0 || p < 0]
div(p, q)[q == 0]
div(p, q)[q != 0]
ERROR
DON’T KNOW
INITIAL
Using 3-Valued DFA
mod(p, q)[q > 0 && p >= 0]
mod(p, q)[q <= 0 || p < 0]
div(p, q)[q == 0] div(p, q)
[q != 0]
ERROR
INITIAL
Underlying solver returns “Don’t Know”
Using 3-Valued DFA cont.
We learn 3-valued Deterministic Finite Automata
mod(p, q)[q > 0 && p >= 0]
mod(p, q)[q <= 0 || p < 0]
div(p, q)[q == 0]
div(p, q)[q != 0] DON’T KNOW
INITIAL
ERROR
Definition of k-full Interface
Interface is k-safe if all legal sequences in the automata to depth k are also legal executions in the component
Interface is k-permissive if all illegal sequences in the automata to depth k also lead to errors in the component
Interface is k-tight if all sequences to depth k leading to the don’t know state in the automata cannot be resolved in the component
Interface that is k-safe, k-permissive, and k-tight is k-full
Guarantees of PSYCO Algorithm
Theorem: If the behavior of a component C can be characterized by an interface DFA, then PSYCO terminates with a k-full interface for C. Proof is in the SAS paper No unbounded loops/recursion in components No “mixed parameters”