50.530: software engineering sun jun sutd. datetopicremarks sep 15introduction sep 22automatic...
TRANSCRIPT
Date Topic RemarksSep 15 IntroductionSep 22 Automatic Testing Sep 29 Delta DebuggingOct 13 Bug LocalizationOct 20 Specification MiningNov 3 Race Detection Nov 10 Hoare Logic and ProvingNov 17 Symbolic ExecutionNov 24 Invariant GenerationDec 1 Software Model Checking Dec 12 Rely Guarantee Reasoning Dec 15, 10 - 12Dec 19 Final Exam
Course Outline
Debugging
Verification
Where the bug is?
Where the bug is depends on what the programmer wants at each step. How do we know what the programmer wants?
We “find out” what the programmer wants, borrowing ideas and techniques from machine learning.
The Idea
Delta Debugging is perhaps inefficient and un-scalable because it compares a pair of concrete program states: too many differences and too detailed.
Good Bad
The Idea
2. Generate likely invariants
At L, x = 1 and y = -2
At L, x = 2 and y = 0
At L, x = 3 and y = 1
1<=x<=3 and-2<=y<=1
What forms of invariants do I use?
The Idea
3. Test the likely invariant with the failed test
1<=x<=3 and-2<=y<=1
Bad
At L, x = 50 and y = 0
L is a candidate root cause of the bug!
The Idea
4. Reduce the candidate root causes • Dynamic program slicing: finding out which
statements affect the candidate root cause• Dynamic dependence filtering: given two root
causes A and B, if B is affected by A and A comes earlier, A is more likely the real cause.
Overall Picture
How to generate inputs?
What invariants to generate?
How to conclude one candidate root cause is more likely than the other?
1. Generate Inputs
• The inputs should be “close” to the failure input, in the same spirit of “nearest neighbor”.
• Systematically generate inputs based on the DDmin algorithm.
The initial good inputs + good inputs generated from DDmin
A queue of good inputs to generate more good inputs from.
A list of good inputs
Algorithm 1
Algorithm 1
Consider the input is “SELECT DATE_FORMAT(“0000-01-01”, ‘%W %d %M %Y’) for the MySQL example, does it work?
If a specification of the input format is given, we can generate better and meaningful inputs.
Research Discussion
How do we guarantee to generate inputs which are close to the failure input?
Can we generate inputs at a program points closer to the failure?
2. Generate Invariants
• The invariant should rightly “guess” what the programmer wants somewhere in the program. – Where do we generate invariants? – What form of the invariants should take?
2. Generate Invariants
• Where do we generate invariants? – (in the paper) load, store and function return
instructions. • Load: array[i] * 5 + 2• Store: array[i] = array[k] + 100;• Return: return x + y;
How would you justify this?What is the consequence?
2. Generate Invariants
• What form of the invariants should take?– (in the paper) a range invariant, e.g., x in [1..5]
How would you justify this?
4. Reduce Candidate Causes
• Using dynamic program slicing: given a statement S, the backward slice of S contains all statements which S depends on.– A data dependency is a situation in which S refers
to the data of a preceding statement.– S is control dependent on a preceding statement
if the outcome of latter determines whether S should be executed or not.
Remove all those candidate causes which the initial failure statement does not depend on.
Dynamic Program Slicing
int[] previous = new int[5];
public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } }
previous[0] = max; return max;}
public int max (int[] list)
int max = list[0];
int i = 0
i < list.length-1
if (max < list[i]) {
max = list[i]
i++
Previous[0] = max
i < list.length-1
return max
So if the value of returned max caused a failure, “previous[0] = max” should not be a candidate cause.
Exercise 1
int sum = 0;int i = 0;
while (i < 1100) { sum += i; i++;}
assert(sum >=0);
Use program slicing on the assertion.
4. Reduce Candidate Causes
• Using dependency filtering: if a faulty statement that is the bug’s root cause triggers an invariant failure, then any statement using the faulty value computed by that statement might also trigger an invariant failure.
• If statement T (control/data-)depends on S, remove T.
Is this justified?
4. Reduce Candidate Causes
• If there are multiple failed test cases, with the same cause of failure, intersect the candidate cause set for each failed test case.
Is this justified?
Case Study
• Objects of analysis– The Squid HTTP proxy server– The MySQL database server– The Apache HTTP web server
• Selected 8 real software bugs– Have to be software versions which can be
supported by the tool developed by the authors– No concurrency bugs. Why?– No missing code bugs. Why?
Case Study: Effectiveness
Q1: whether the approach can find the true root causes of bugs?• For each bug, the correction patch in the bug
reports is used to identify the minimal statements which should be changed or deleted to remove the failure symptom.
Q2: how many false positives it generates?
Is this justified?
Case Study: Effectiveness Results
Given a set of remaining causes, find out the statements the causes depend on.
Do you know why many languages are strong typed?
“Language type systems probably find more bugs on a daily basis than any other approach.”
--- Engler et al. SOSP 01
Typestate
• Typestates define valid sequences of operations that can be performed upon an instance of a given type.– method A must be invoked before method B is invoked,
and method C may not be invoked in between• Typestates associate state information with
variables of that type. This state information is used to determine at compile-time which operations are valid to be invoked upon an instance of the type.
Example: FileWriter
java.io.FileWriter:
FileWriter(File)
write(String)
close()
error
accepting state
write(String)
Exercise: Look at the Java API documentation, try to complete this typestate.
“Language type systems probably find more bugs on a daily basis than any other approach.”
--- Engler et al. SOSP 01
Motivation
Programmers don’t document the typestate when they define a data-structure. So we learn it!
Running Example
Class: java.util.Stack<E>Methods: • empty(): test if this task is empty• peek(): look at the top element in the stack• pop(): remove the top element in the stack• push(Object o): push an item onto the stack
The number of pop() must be no more than that of push()
Problem Definition
• Task: learn a model of Stack which tells what are good/bad sequences of method calls
• What models do we learn?
Deterministic Finite State Automata
Stateful Typestate
Learn DFA
• Assume that the typestate is in the form of a DFA.
• There are a number of algorithms desired to learn DFA efficiently.– Passive learning: use only existing test cases– Active learning: generate new test cases on
demand
The L* Learning Algorithm
Teacher knows the model which is a DFA.
Student asks two kinds of questions in order to learn.
Membership Query:Is <push, pop, push> valid?
Equivalence Query:Is this your DFA?
First Round: Member Queries
Is the sequence <> good?
Yes.
Is the sequence <push> good?
Yes.
Is the sequence <pop> good?
No.
Is the sequence <pop,push> good?
No.
Is the sequence <pop,pop> good?
No.
First Round: Observation Table
The table is closed and consistent. I think I know now.
Consistent: if tr = tr’, tr^<e> = tr’<e> for all e;Closed: for all tr above the blue line, tr^<e> = tr
First Round: Equivalence Query
No, <push, pop> is good
<push,pop> is represented by <pop> previously.It is obviously wrong
What is wrong with L*
• It is designed to learn DFA, whereas programs are beyond DFA.
• L* requires a perfect teacher, which is infeasible – What if the methods have non-trivial parameters?
The Approach
• Learn stateful typestates where the predicates are conjunctions of linear inequalities.
• Learn from test cases. – A test case is a failure if it causes an unhandled
exception or an assertion failure. • Learn using techniques from machine learning
community.
Is it justified?
TzuYu
Tester : I don’t know, let me test it out
Is <methodA,methodB> good or bad?
Yes, No, If x > 5, then yes; otherwise no.
First Round: Member Queries
Is the sequence <> good?
Yes.
Is the sequence <push> good?
Yes.
Is the sequence <pop> good?
No.
Is the sequence <pop,push> good?
No.
Is the sequence <pop,pop> good?
No.
First Round: Observation Table
The table is closed and consistent. I think I know now.
Consistent: if tr = tr’, tr^<e> = tr’<e> for all e;Closed: for all tr above the blue line, tr^<e> = tr
First Round: Equivalence Query
Heh, <push,pop> seemed good and <pop> seemed bad, they can’t be
both reaching state B, there must be something different before invoking
pop()!
Numerical Value Graph
How to distinguish these two objects?
What is a stack object
The stack object after <push>
The stack object after <>
Numerical Value Graph
• How to distinguish these two objects?stack after <push> Stack after <>
Level 0 features [not null] [not null]
Level 1 features [not null, eleCount= 1, array is not null]
[not null, eleCount= 0, array is not null]
Level 1 features distinguishable!
SVM: Supporting Vector Machine
• For the stack objects: 2*eleCount>= 1
X
XX
X
XX
O
O
OO
O
feature 1
feat
ure
2 Support Vectors
SVM: Supporting Vector Machine
X
XX
X
XX
O
O
OO
O
feature 1
feat
ure
2
OO
O
What do we do if the vectors are located like this?
First Round: Equivalence Query
I know now that whether eleCount>= 1 is important, can you restart the learning from the beginning using three events push, [eleCount>= 1]pop, [!(eleCount>=1)]pop
Research Discussion
Is the typestate learned guaranteed to be correct?
If no, how do we make it correct or more likely correct?
Research Discussion
What if the vectors are located like above.
XXXXXX
O
O
OO
O
feature 1
feat
ure
2
O
O
O
O
O
O
OO
O
O
OO
Research Discussion
As an expert programmer, how do you learn what the programmer wants?
int[] previous = new int[5];
public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } }
previous[0] = max; return max;}
What does this program do and how do you know?
Research Discussion
if (card == null) {printk (KERN_ERR, “capidrv-%d: … %d!\n”, card->contrnr, id);
}
How do you know there is a bug in the program?
Research Discussion
int mxser_write (struct tty_struct *tty, …) {struct mxser_struct *info = tty->driver_data;unsigned long flags;
if (!tty || !info->xmit_buf) {return (0);
}}
There is a potential problem and why?