miniatur at fse 2007

Finding Bugs Efficiently with a SAT Solver

Julian Dolby, Mandana Vaziri, Frank Tip

IBM Thomas J. Watson Research Center

FSE 2007—Dubrovnik, Croatia—September 6, 2007

Outline

• Background

• Miniatur Contributions

• Example

• Miniatur Evaluation

• Related Work

• Conclusions and Future Work

Background—Finding Bugs

• Finding bugs must balance coverage and precision

– Tools must find bugs to be useful

– Tools must minimize false positives to be useful


• Testing approaches find real bugs

– Each test is an actual execution, so all bugs real

– Coverage depends upon test suite, so often spotty


• Conservative static analysis approaches find all bugs

– Conservative approximation of all executions

– Can find false positives due to approximation

Background—Systematic Underapproximation

• Systematic underapproximation blends testing and analysis

– Explores all possible concrete execution within a finite set

– Real bugs and total coverage within for set of executions

Background—Systematic Underapproximation

• Key issue is choosing a “good” set of execution

– Set must be tractable

– Set must cover interesting range of all executions

• Use “small scope hypothesis” to bound set of executions

– Hypothesis: most data structure bugs need only few objects

– E.g. Collection bugs can be seen by inserting few elements

– Explore executions with small heaps (also bound loops)

(small scope) (other approximation)

Background—Finding Bugs with SAT

• Given program, specification, find concrete counter example

• Use relational first-order logic (FOL) (Alloy [Jackson et al])

– Universe of atoms, and bounded relations over atoms

– Relational operators, first-order formulae over relations

– Tool (Kodkod[Torlak et al]) translates to CNF for SAT solver

• Java in Relational FOL [Vaziri et al] [Taghdiri et al] [Dennis et al]

– Heap objects as atoms, types as unary relations

– Fields as binary relations, code as formulae

– Encode integer operations using bit sets

– Solve FOL formula program ∧ ¬(spec)

Miniatur Contributions

• Supports much larger range of integers than previous work

– Integers represented by one atom per power of 2

– Enables e.g. use of hashCode in updating a collection

• Novel sparse representation of array objects

– Allows subset of array indices to have values

– Enables Miniatur to handle array-based collections

• Novel sliced translation based on control dependence

• General theoretical condition for slicing relational formulae

• Demonstration on real collections and a few real programs

– Arrays, integers enable analyzing all Java collections

– Integers allow checking Java equality contracts

Example—HashMap Data

class HashMap ... {

Entry[] table; ...

class Entry {

Object key;

Object value;

int hash;

Entry next;

...}}

HashMap← {H1}

Array ← {A1}

Entry ← {E1}

Object← {A1, E1, H1,K1, V 1}

table← {〈H1, A1〉}

key ← {〈E1, K1〉}

value← {〈E1, V 1〉}

hash← {〈E1,#65〉}

I ← {〈A1, N1,#5〉}

V ← {〈A1, N1, E1〉}

Example—HashMap Data

class HashMap ... {

Entry[] table; ...

class Entry {

Object key;

Object value;

int hash;

Entry next;

...}}

HashMap← {H1}

Array ← {A1}

Entry ← {E1}

Object← {A1, E1, H1,K1, V 1}

table← {〈H1, A1〉}

key ← {〈E1, K1〉}

value← {〈E1, V 1〉}

hash← {〈E1,#1〉 , 〈E1, #64〉}

I ← {〈A1, N1,#1〉 , 〈A1, N1, #4〉}

V ← {〈A1, N1, E1〉}

Example—HashMap.put

public Object put(Object k, Object v) {

int h = k.hashCode();

int i = h % table.length;

for (Entry e = table[i]; e != null; e = e.next) {

...

}

modCount++;

table[i] = new Entry(h, k, v, table[i]);

return null;





Entry e = table[i];

if (e != null) { ...

e = e.next

if (e != null) { ...

e = e.next

if (e != null) goto end;

} }

modCount++;

table[i] = new Entry(h, k, v, table[i]);

return null;





Entry e = table[i]0;

if (e != null) { ...

e1 = e.next;

if (e1 != null) { ...

e2 = e1.next;

if (e2 != null) goto end;

} }

modCount1 = modCount0 + 1;

table[i]1 = new Entry(h, k, v, table[i]0);

return null;





Entry e = table[i]0;

if (e != null) { ...

e1 = e.next;

if (e1 != null) { ...

e2 = e1.next;

if (e2 != null) goto end;

} }

modCount1 = modCount0 + 1;

table[i]1 = new Entry(h, k, v, table[i]0);

A: assert table[i]1.next == null; // perfect hashing

return null;

Expression for assert table[i]1.next == null

¬ (Guard(A)→ E [table[i].next == null])

E [table[i].next == null] ← E [table[i].next] = {Null}

E [table[i].next] ← E [table[i]].E [next]

E [table[i]] ← {E [table].x.E [V 1] |sum(E[table].x.E [I1]) = E [i]}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .

table← {A1} , next← {〈E1, Null〉}

T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}





E [table[i]] ← {E [table].x.E [V 1] |sum(E[table].x.E [I1]) = E [i]}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .

table← {A1}, next← {〈E1, Null〉}

T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}





E [table[i]] ← {{A1} .x.E [V 1] |sum({A1} .x.E [I1]) = 5}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .


T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}





E [table[i]] ← {{A1} . {N1} .E [V 1] |sum({A1} . {N1} .E [I1]) = 5}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .


T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}





E [table[i]] ← {{A1} . {N1} .E [V 1] |sum({#1, #4}) = 5}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .


T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}





E [table[i]] ← {{A1} . {N1} .E [V 1] |5 = 5}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .


T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}





E [table[i]] ← {E1}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .


T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}




E [table[i].next] ← {Null}

E [table[i]] ← {E1}

. . .

sum(x) ←∑

xi∈xint(xi)

int(j) ← int(#1)← 1, int(#2)← 2, int(#4)← 4, . . .


T1← {〈A1, N1〉} , i← 5

I1← {〈A1, N1, #1〉 , 〈A1, N1,#4〉}

V 1← {〈A1, N1, E1〉}

Guard for assert table[i]1.next == null

E [e2! = null] = False ∨ E [e1! = null] = False ∨ E [e! = null] = False

Evaluation

• Miniatur tool embodies our techniques

– Uses WALA for control dependence and other analyses

– Uses Kodkod to generate CNF

– Uses Minisat as backend SAT solver

• Evaluated structural assertions in java.util collections

– Check that size fields accurately reflect data structure

– Use driver that inserts arbitrary objects into collection

• Evaluated java equality contracts on open-source codes

– (1) reflexive, (2) symmetric, (3) transitive, (4) non-null,

– (5) hashCode, (6-9) compareTo properies

Evaluation Results: Testing java.util

class formula Time (s) Lines

LinkedList this.size = |this.header.next∗ − null| 8.2 111

TreeMap this.size = 67.4 12592

|count(this.root.(left + right)∗ − null|

TreeSet this.m.size = 80.0 12667

|this.m.root.(left + right)∗ − null|

HashMap this.size = |this.table[].next∗ − null| 30.0 1107

HashSet this.map.size = 34.8 1129

|(this.map.table[].next∗ − null|

• Encode correctness of size fields

• Check expressive properties against complex collections

• Heap size per type, Loop unrolling

Evaluation Results: Testing Equals Contracts

• Test each property of each equals method using harness

public static void equalsTester(Object a, Object b) {

if (a.equals(b)) assert b.equals(a); }

• Evaluated several common, open-source benchmarks

– Antlr, BCEL, Hsqldb, java cup

– Found 20 concrete violations (of 2, 3, 5, 6, 7)

– Most tests ran within 2 minutes

• Bugs illustrated by concrete counter-examples

argument a argument b

ArrayType3 ArrayType2

type=54304 type=9181

basic type=UninitializedObjectType3 basic type=anonType3

... ...

Related Work

• Most-closely related work is checking Java with Alloy

– Vaziri et al, Taghdiri et al, Dennis et al

– Miniatur handles more of Java, scales better

• SATURN checks C using SAT

– Uses manually-tailored summaries

– Checks less-rich properties

– Better scaling than Miniatur

• SMT solvers

– Mix SAT with theories for arithmetic, arrays, etc

– Can prove properties within particular theories

– Current SMT solvers less expressive

Conclusions and Future Work

• Miniatur makes SAT-based checking for Java practical

– Extends prior work to better handle integers, arrays

– New, sound slicing mechanism for scalability

→ Handles real properties on real programs

→ Deep structural properties on real data structures

• Future work

– Refinement-based approach to unrolling loops

– Model concurrency

miniatur at fse 2007

Documents