tracking down bugs
DESCRIPTION
Tracking Down Bugs. Benny Vaksendiser. Overview. Motivation Isolating Cause-Effect Chains from Computer Programs Visualization of Test Information to Assist Fault Localization Dynamically Discovering Likely Program Invariants to Support Program Evolution - PowerPoint PPT PresentationTRANSCRIPT
1
Tracking Down Bugs
Benny Vaksendiser
2
Overview
Motivation
Isolating Cause-Effect Chains from Computer Programs
Visualization of Test Information to Assist Fault Localization
Dynamically Discovering Likely Program Invariants to Support
Program Evolution
Automatic Extraction of Object-Oriented Component Interfaces
Comparison
Summary
References
3
Motivation
Improve software quality.
Reduce the number of delivered faults.
Lowering maintenance cost.
4
Motivation Lowering Maintenance Cost
50%
60%
70%
80%
90%
100%
1979 1984 1990 2000
Sof
twar
e m
aint
enan
ce/to
tal
softw
are
cost
The cost for maintaining software represents more than 90%.
Debugging is a high consuming task.
5
Why Debugging So Hard?
Complexity of software.
Fixing unfamiliar code.
Finding the cause of a bug isn’t trivial.
“Corner places”.
Undocumented code.
Documentation often incomplete or wrong.
Lack of a guiding tool.
6
Isolating Cause-Effect Chains from Computer Programs
Andreas Zeller professor at Universität des Saarlandes in Saarbrücken, Germany
7
Isolating Cause-Effect Chains from Computer Programs
Andreas Zeller
A passing run and a failing run
Delta Debugging algorithm
Isolating the cause of the failing run
8
Failing Rundouble mult(double z[], int n){
int i,j;i=0;for(j=0;j<n;j++){
i=i+j+1;z[i]=z[i]*(z[0]+1.0);
}return z[n];
}
Compiling fail.c, the GNU compiler (GCC) crashes: linux$ gcc-2.95.2 -O bug.c gcc: Internal error: program cc1 got fatal signal 11
What’s the error that causes this failure?
9
Cause
What’s the cause for the GCC failure?The cause of any event (“effect”) is a preceding event without which the effect would not have occurred.
— Microsoft EncartaTo prove causality, we must show that: 1. The effect occurs when the cause occurs –failing run. 2. The effect does not occur when the cause does not occur – passing run.
General technique: Experimentation—constructing a theoryfrom a series of experiments (runs)Can’t we automate experimentation?
10
Isolating Failure Causes
11
Isolating Failure Causes
12
Isolating Failure Causes
13
Isolating Failure Causes
14
Isolating Failure Causes
+1.0 is the failure cause – after only 19 tests
15
What’s going on in GCC?
16
What’s going on in GCC?
17
What’s going on in GCC?
18
What’s going on in GCC?
19
What’s going on in GCC?
To fix the failure, we must break this cause-effect chain.
20
Small Cause, Big Effect
How do we isolate the relevant state dierences?
21
Memory Graphs
Vertices are variables.
Edges are references.
22
The GCC Memory Graph
23
The Process in a Nutshell
24
The Process in a Nutshell
25
The Process in a Nutshell
26
The GCC Cause-Effect Chain
27
www.askigor.org
Submit buggy program
Specify invocations
Click on “Debug it”
Diagnosis comes via e-mail
28
Visualization of Test Information to Assist Fault Localization
Mary Jean Harrold John T. Stasko James A. Jones
College of ComputingGeorgia Institute of Technology
24th International Conference on Software EngineeringUSA, May 2002
29
Visualization of Test Information to Assist Fault Localization
James A. Jones, Mary Jean Harrold, John Stasko
Higher frequency of execution of a statement of program
by failure test cases
Higher probability of having fault in the statement
30
Discrete Approach Input
Source code For each test case
its pass/fail status statements that it executes
Display statements in program according to the test cases that execute them
Only failedtest cases
Both passed &failed test cases
Only passedtest cases
Statementsexecuted by:
31
Example mid() {
int x,y,z,m;
3,3
,5
1,2
,3
3,2
,1
5,5
,5
5,3
,4
2,1
,3
1: read(x,y,z);
2: m=z;
3: if(y<z)
4: if(x<y)
5: m=y;
6: elseif(x<z)
7: m=y;
8: else
9: if(x>y)
10: m=y;
11: elseif(x>z)
12: m=x;
13: print(m);
}PPPPPF
32
Example mid() {
int x,y,z,m;
3,3
,5
1,2
,3
3,2
,1
5,5
,5
5,3
,4
2,1
,3
1: read(x,y,z);●●●●●●
2: m=z; ●●●●●●
3: if(y<z)●●●●●●
4: if(x<y)●
5: m=y;●
6: elseif(x<z)●●●
7: m=y;●●
8: else●●●
9: if(x>y)●
10: m=y;●
11: elseif(x>z)
12: m=x;
13: print(m);●●●●●●
}PPPPPF
33
Example mid() {
int x,y,z,m;
3,3
,5
1,2
,3
3,2
,1
5,5
,5
5,3
,4
2,1
,3
1: read(x,y,z);●●●●●●
2: m=z; ●●●●●●
3: if(y<z)●●●●●●
4: if(x<y)●
5: m=y;●
6: elseif(x<z)●●●
7: m=y;●●
8: else●●●
9: if(x>y)●
10: m=y;●
11: elseif(x>z)
12: m=x;
13: print(m);●●●●●●
}PPPPPF
34
Problem
Not very helpful!
Does not capture the relative frequency.
5,3
,4
5,5
,5
3,2
,1
1,2
,3
3,3
,5
2,1
,3
1: read(x,y,z);●●●●●●
7: m=y;●●
PPPPPF
35
Continuous Approach Distribute statements executed by both
passed and failed test cases over spectrum.
Indicate the relative success rate of each statement by its hue.
Discrete Approach:
Continuous Approach:
Only failedtest cases
Both passed &failed test cases
Only passedtest cases
36
Continuous Approach - Hue
% ( )( ) * _
% ( ) % ( )passed s
color s color rangepassed s failed
es
r d
Indicate the relative success
rate of each statement by its
hue.
37
Continuous Approach - Brightness
))(),%(max(%)( sfailedspassedsbirght
Statements executed more frequently are rendered brighter
38
Back To Example mid() {
int x,y,z,m;
3,3
,5
1,2
,3
3,2
,1
5,5
,5
5,3
,4
2,1
,3
1: read(x,y,z);●●●●●●
2: m=z; ●●●●●●
3: if(y<z)●●●●●●
4: if(x<y)●
5: m=y;●
6: elseif(x<z)●●●
7: m=y;●●
8: else●●●
9: if(x>y)●
10: m=y;●
11: elseif(x>z)
12: m=x;
13: print(m);●●●●●●
}PPPPPF
39
Scalability
} mid() {
int x,y,z,m;
1: read(x,y,z);
2: m=z;
3: if(y<z)
4: if(x<y)
5: m=y;
6: elseif(x<z)
7: m=y;
8: else
9: if(x>y)
10: m=y;
11: elseif(x>z)
12: m=x;
13: print(m);
}
40
Limitations
double() {
int x,d;
42-5
1: read(x);●●●
2: d=abs(x+x); ●●●
3: print(d);●●●
}PPF
Data related Bugs.
Very test case depended.
41
Future Work What other views and analyses would
be useful?
What is the maximum practical number of faults for which this technique works?
Visualization on higher-level representations of the code.
Using visualization in other places.
42
Tarantula
43
Dynamically Discovering Likely Program Invariants to Support
Program Evolution
Michael D. Ernst William G. Griswold
David Notkin Jake Cockrell
Dept. of Computer Science & EngineeringUniversity of Washington
Dept. of Computer Science & EngineeringUniversity of California San Diego
IEEE Transactions on Software Engineering 2001
44
Dynamically Discovering Likely Program Invariants to Support Program Evolution
Michael D. Ernst, William G. Griswold, Jake Cockrell, David Notkin
Problem Invariants are useful. Programmers (usually) don’t write invariants.
Solution Dynamic Invariant Detection. Automatic tool: “Daikon“.
45
Example example from “The Science of Programming,”
by Gries, 1981.
46
Example
100 randomly-generated arrays Length is uniformly distributed from
7 to 13 Elements are uniformly distributed
from –100 to 100
Daikon discovers invariant by running the program on this test set monitoring the values of the
variables
47
Example
100 randomly-generated arrays Length is uniformly distributed
from 7 to 13 Elements are uniformly
distributed from –100 to 100
Daikon discovers invariant by running the program on this test
set monitoring the values of the
variables
Invariants produced by Daikon
48
Architecture
49
Instrumentation At program points of interest:
Function entry points Loop heads Function exit points
Output values of all `interesting' variables Scalar values (locals, globals, array subscript
expressions, etc.) Arrays of scalar values Object addresses/ids More kinds of invariants checked for numeric types
50
Types of InvariantsVariables x, y, z ; constants a, b, c
Invariants over any variable x Constant value: x = a Uninitialized: x = uninit Small value set: x {a, b, c}
variable takes a small set of values
Invariants over a single numeric variable: Range limits: x a, x b, a x b Nonzero: x 0 Modulus: x = a (mod b) Nonmodulus: x a (mod b)
reported only if x mod b takes on every value other than a
51
Types of Invariants Invariants over two numeric variables x, y
Linear relationship: y = ax + b Ordering comparison: x y, x y, x y, x
y, x = y, x y Functions: y = fn(x) or x = fn(y)
where fn is absolute value, negation, bitwise complement
Invariants over x+y invariants over single numeric variable where x+y is
substituted for the variable Invariants over x-y
52
Types of Invariants
Invariants over three numeric variables Linear relationship: z = ax + by + c,
y=ax+bz+c, x=ay+bz+c Functions z = fn(x,y)
where fn is min, max, multiplication, and, or, greatest common divisor, comparison, exponentiation, floating point rounding, division, modulus, left and right shifts
All permutations of x, y, z are tested (three permutations for symmetric functions, 6 permutations for asymmetric functions)
53
Types of Invariants
Invariants over a single sequence variable Range: minimum and maximum
sequence values, ordered lexicographically
Element ordering: nondecreasing, nonincreasing, equal
Invariants over all sequence elements: such as each value in an array being nonnegative
54
Types of Invariants Invariants over two sequence variables:
x, y Linear relationship: y = ax + b, elementwise Comparison: x y, x y, x y, x y, x = y, x
y, performed lexicographically Subsequence relationship: x is a subsequence
of y Reversal: x is the reverse of y
Invariants over a sequence x and a numeric variable y Membership: x y
55
Derived Variables Variables not appearing in source text.
array: length, sum, min, max array and scalar: element at index, subarray number of calls to a procedure
Enable inference of more complex relationships.
Staged derivation and invariant inference. avoid deriving meaningless values avoid computing tautological invariants
56
Invariant Confidence To make the tool useful, invariants must
be supported by statistically significant number of different values.
Daikon checks likelihood that invariant would occur by chance.
Invariants filtered based on a minimum confidence parameter.
57
Invariant Confidence – show x ≠0
x in range of size r.
Probability that x is not 0 is 1 – 1/r
Given s samples then probability that x is never 0 is (1-1/r)s.
If this probability is less than a user defined confidence level then x0 is reported as an invariant.
58
Efficiency Efficiency of instrumentation
Values of tracked variables are output at each instrumentation point
Significant program slowdown, large amounts of trace data produced
Efficiency of analysis Potentially cubic in number of variables at
any program point Influenced more strongly by size of trace
data
59
Limitations
The instrumentation needs large disk space.
We need large test suite.
Needs human intervention.
60
Future Work
Combining this dynamic invariant detection with a static one.
Extending the types of invariants.
Increasing relevance.
61
Automatic Extraction of Object-Oriented Component Interfaces
Monica S. Lam John Whaley Michael C. Martin
ISSTA 2002
Stanford University
ACM SIGSOFT Distinguished Paper Award, 2002
62
Automatic Extraction of Object-Oriented Component Interfaces
J. Whaley, M. C. Martin, M. S. Lam
Documentation Based on the actual code, so no
divergence Rules for static or dynamic checkers
Find errors in API usage Find API bugs
Discrepancy between code & intended API Dynamic extraction:
Evaluation of test coverage
63
Interfaces?
Interfaces are constraints on the orderings of method calls.
Example, Method m1 can be called only after a
call to method m2. Both methods m1 and m2 have to be
called before method m3 is called.
64
Specification Use a Finite State Machine (FSM) to express
ordering constraints. States correspond to methods Transitions imply the ordering constraints
M2M1
Method M2 can be called after method M1 is called
65
Example: File
openSTART
read
write
close END
66
A Simple OO Component Model
Each object follows an FSM model. One state per method, plus START & END states. Method call causes a transition to a new state.
openSTART
read
write
close ENDm1 ; m2 is legal,new state is m2
m1 m2
67
Problem 1
An object has two fields, a and b.
Each field must be set before being read.
set_a
get_a
set_b
get_b
START
set_a
get_a
set_b
get_b
END
set_a
get_b
68
Problem 1
set_a
get_a
set_b
get_b
START
set_a
get_a
set_b
get_b
END
set_a
get_b
An object has two fields, a and b.
Each field must be set before being read.
69
Solution: Splitting by Fields
set_a
get_a
set_b
get_b
START
set_a
get_a
set_b
get_b
END
START
set_b
get_b
END
START
set_a
get_a
END
Separate by fields into different, independent submodels.
70
Problem 2
getFileDescriptor is state-preserving.
startSTART
create
END
connect
close
getFileDescriptor
STARTSTART
getFileDescriptor
Model for SocketModel for Socket
71
Problem 2
getFileDescriptor is state-preserving.
startSTART
create
END
connect
close
getFileDescriptor
STARTSTART
getFileDescriptorconnect
Model for SocketModel for Socket
72
Solution: State Preserving Methods
startSTART
create
END
connect
close
START
getFileDescriptor
m1 is state-modifyingm2 is state-preservingm1 ; m2 is legal,new state is m1
m1 m2
73
Extraction Techniques
StaticDynamic
For all possible program executions
For one particular program execution
ConservativeExact (for that execution)
Analyze implementation
Analyze component usage
Detect illegal transitions
Detect legal transitions
Superset of ideal model(upper bound)
Subset of ideal model(lower bound)
74
Extracting Interface Statically
The static algorithm has two main steps:
1. For each method m identify those fields and predicates that guard whether exceptions can be thrown.
2. Find the methods m’ that set those fields to values that can cause the exception. This means that immediate transitions from m’ to m are illegal
Complement of the illegal transitions forms the a model of transitions accepted by the static analysis.
75
Detecting Illegal Transitions Only support simple predicates
Comparisons with constants, implicit null pointer checks
Find <source, target> pairs such that: Source must execute:
field = const ; Target must execute:
if (field == const) throw exception;
76
Static Model Extractor
Defensive programming Implementation throws exceptions
(user or system defined) on illegal input.
public void connect() { connection = new Socket();}public void read() { if (connection == null) throw new IOException();}
START
connect read
77
Dynamic Extractor Goal: find the legal transitions that occur
during an execution of the program.
Java bytecode instrumentation.
For each thread, each instance of a class: Track last state-modifying method for each
submodel.
Same mechanism for dynamic checking Instead of adding to model, flag exception.
78
Limitations
The model is too simple – only one state history.
79
Future Work Interfaces between classes.
END
START
ServerSocket.accept()
Socket.close()
Socket.getOutputStream()
80
Comparison
Delta Debugging
TarantulaDaikonInterfaces Extraction
Main UseIsolating the cause of a failing run
Visualization of fault localization
Extract invariants
Extract documentation
Test Cases1 pass case, 1 fail case
Many pass / fail cases
Many pass cases
Many pass cases
ExamineProgram state
source codeProgram state
Program state and source code
Humane involvement
LowHighMediumMedium
81
Comparison
Delta Debugging
TarantulaDaikonInterfaces Extraction
EfficiencyMediumHighLowHigh
Detailed Result
Medium – HighLowHighMedium
Tool Availability
Available.In the near future also for java.
Not availableAvailableNot available
82
Summary
Programmers aren’t going to be obsolete in the near future.
Automatic tools can guide humans in the debugging process.
83
References Isolating Cause-Effect Chains from Computer
Programs Presentation of Andreas Zeller Presentaion of Jinlin Yang
Visualization of Test Information to Assist Fault Localization
Paper presentation. Presentation of Jinlin Yang. Tarantula homepage.
84
References Dynamically Discovering Likely Program
Invariants to Support Program Evolution Daikon homepage. Presentation of Tevfik Bultan. Presentation of Marcelo D’Amorim Presentation of Joel Winstead. Talk Sliedes. Presentation of David Hovemeyer.
85
References
Automatic Extraction of Object-Oriented Component Interfaces
Presentation of John Whaley. Presentation of Tevfik Bultan.