star: stack trace based automatic crash reproduction

99
STAR: STACK TRACE BASED AUTOMATIC CRASH REPRODUCTION PhD Thesis Defence 1 November 05, 2013 Ning Chen Advisor: Sunghun Kim

Upload: sung-kim

Post on 05-Dec-2014

4.000 views

Category:

Technology


2 download

DESCRIPTION

Ning's PhD Thesis Defense

TRANSCRIPT

Page 1: STAR: Stack Trace based Automatic Crash Reproduction

STAR: STACK TRACE BASED AUTOMATIC CRASH REPRODUCTION

PhD Thesis Defence

1

November 05, 2013

Ning Chen

Advisor: Sunghun Kim

Page 2: STAR: Stack Trace based Automatic Crash Reproduction

Outline

1. Motivation & Related Work

2. Approaches of STAR1) Crash Precondition Computation

2) Input Model Generation

3) Test Input Generation

3. Evaluation Study

4. Challenges & Future Work

5. Contributions

2

Page 3: STAR: Stack Trace based Automatic Crash Reproduction

Failure reproduction is a difficult and time consuming task. But it is necessary for fixing the corresponding bug.

For example: https://issues.apache.org/jira/browse/COLLECTIONS-70 Have not been fixed for five months due to difficulties in

reproducing the bug.

After a test case was submit, it was soon fixed with a comment:

“As always, a good test case makes all the difference.”

Motivation

3

Page 4: STAR: Stack Trace based Automatic Crash Reproduction

Problem Statement The intention of this research is to propose a stack trace based

automatic crash reproduction framework, which is efficient and applicable to real world object-oriented programs.

Sub-problem 1:

Propose an efficient crash precondition computation approach which is applicable to non-trivial real world programs.

Sub-problem 2:

Propose a novel method sequence composition approach which can generate crash reproducible test cases for object-oriented programs.

4

Page 5: STAR: Stack Trace based Automatic Crash Reproduction

Contributions Study the scalability challenge of automatic crash reproduction, and

propose approaches to improve its efficiency.

Study the object creation challenge for reproducing object-oriented crashes, and propose a novel method sequence composition approach to address it.

A novel framework, STAR, which combines the proposed approaches to achieve automatic crash reproduction using only crash stack trace.

A detailed empirical evaluation to investigate the usefulness of STAR.

5

Page 6: STAR: Stack Trace based Automatic Crash Reproduction

Related Work

Page 7: STAR: Stack Trace based Automatic Crash Reproduction

Record-and-replay approaches: Jrapture, 2000, BugNet, 2005, ReCrash/ReCrashJ, 2008 LEAP/LEAN, 2010

Post-failure-process approaches: Microsoft PSE, 2004 IBM SnuggleBug, 2009 XyLem, 2009 ESD, 2010 BugRedux, 2012

Related Work

7

Page 8: STAR: Stack Trace based Automatic Crash Reproduction

Record-and-replay Approaches Approach:

Monitoring Phase: Captures/Stores runtime heap & stack objects. Test Generation Phase: Generates tests that loads the correct

objects with the crashed methods.

Original Program Execution

Stored Objects

Recreated Test Case

Store from heap & stack

Load as crashed method params

8

Page 9: STAR: Stack Trace based Automatic Crash Reproduction

Record-and-replay Approaches

9

FrameworksInstrumenta

tionData Collections

Memory Overhead

Performance Overhead

Jrapture’00 Required All Interactions N/A N/A

BugNet’05 Required / Hardware

All Inputs/ Executed Code N/A N/A

ReCrash’08 Required Stack Objects 7% - 90% 31% - 60%

LEAP’10 Required SPE Access / Thread Info N/A 7% - 600%

Limitations: Require up-front instrumentations or special hardware deployment.

Collect client-side data, which may raise privacy concern. [Clause et. al, 2010]

Non-trivial memory and runtime overheads.

Page 10: STAR: Stack Trace based Automatic Crash Reproduction

Post-failure-process Approaches Perform analyses on crashes only after they have

occurred.

Advantages Usually do not record runtime data.

Incur no or very little performance overhead.

10

Page 11: STAR: Stack Trace based Automatic Crash Reproduction

Crash Explanation Approaches Microsoft PSE [Manevich et. al, 2004]

IBM SnuggleBug [Chandra et. al, 2009]

XyLem [Nanda et. al, 2009]

Assist crash debugging by providing hints on the target crashes: Potential crash traces Potential crash conditions

Could not reproduce the target crashes.

Post-failure-process Approaches

11

Page 12: STAR: Stack Trace based Automatic Crash Reproduction

Crash Reproduction Approaches Core dump-based Approaches

Cdd [Leitner et. al, 2009]RECORE [Roßler et. al, 2013]

Symbolic execution-based approachesESD [Zamfir et. al, 2009]

BugRedux [Jin et. al, 2012]

Aims to reproduce crashes using only post-failure data such as Crash stack traces

Memory core dump at the time of the crash

Post-failure-process Approaches

12

Page 13: STAR: Stack Trace based Automatic Crash Reproduction

Crash Reproduction Approaches Core dump-based approaches

E.g. Cdd [Leitner et. al, 2009] and RECORE [Roßler et. al, 2013]

Leverage the memory core dump and even some developer written contracts to guide the crash reproduction process.

Advantage Higher chance of reproducing a crash as more data is provided.

Limitations Requires not just stack trace, but the entire memory core dump at

the time of the crash.

Less capable in reality due to the lack of memory core dump.

13

Page 14: STAR: Stack Trace based Automatic Crash Reproduction

Crash Reproduction Approaches Symbolic execution-based approaches

E.g. ESD [Zamfir et. al, 2009] and

BugRedux [Jin et. al, 2012]

Perform symbolic execution-based analysis to identify crash paths and generate crash reproducible test cases.

14

Page 15: STAR: Stack Trace based Automatic Crash Reproduction

Advantages: Use only crash stack trace to achieve crash reproduction.

No runtime overhead is incurred at client-side.

Limitations: Existing approaches rely on forward symbolic executions to

compute crash preconditions, which is less efficient.

Could not be fully optimized due to the nature of forward symbolic execution.

Could not reproduce non-trivial crashes from object-oriented programs due to the object-creation challenge.

Crash Reproduction Approaches

15

Page 16: STAR: Stack Trace based Automatic Crash Reproduction

Crash Reproduction Approaches STAR: Stack Traced based Automatic crash Reproduction

Advantages:

16

Approaches Limitations Advantages of STAR

Record-replay Data collection No runtime data collection

Record-replay Performance overhead No performance overhead

Core dump based

Memory Core dump anddeveloper written contracts

Crash stack trace

Symbolic. Exec.-based

Lack of optimizationsOptimizations to greatly improve the crash reproduction process.

Symbolic Exec.-based

Lack of support for object-oriented programs

Capable of reproducing non-trivial crashes for object-oriented programs.

Page 17: STAR: Stack Trace based Automatic Crash Reproduction

Overview of STAR

program

Test Input Generation

test cases

1

2

3

stack trace

Crash Precondition Computation

Input Model Generation

Crash Preconditions

Crash Models

17

Page 18: STAR: Stack Trace based Automatic Crash Reproduction

Crash Precondition Computation

Page 19: STAR: Stack Trace based Automatic Crash Reproduction

Crash Precondition Computation

program

Test Input Generation

test cases

1

2

3

stack trace

Crash Precondition Computation

Input Model Generation

Crash Preconditions

Crash Models

19Crash Precondition Computation

Page 20: STAR: Stack Trace based Automatic Crash Reproduction

Crash Precondition Computation Crash Precondition

the conditions of inputs at a method entry that can trigger the crash.

It specifies in what kind of memory state can the crash be reproduced.

Crash Precondition Computation 20

Page 21: STAR: Stack Trace based Automatic Crash Reproduction

Crash Precondition Computation Existing approaches such as ESD and BugRedux use forward

symbolic executions to compute the crash preconditions. Program is executed in the same direction as normal executions.

Inputs and variables are represented as symbolic values instead of concrete values.

Limitations of forward symbolic execution Non-demand-driven: Need to execute many paths not related to

crash Limited optimization: Difficult perform optimizations using the

crash information

Crash Precondition Computation 21

Page 22: STAR: Stack Trace based Automatic Crash Reproduction

Crash Precondition Computation STAR performs a backward symbolic execution to compute the

crash precondition. Program is executed from crash location to method entry.

Advantages of backward symbolic execution Demand-driven: Only paths related to the crash are executed.

Optimizations: Optimizations can be performed using the crash information.

Crash Precondition Computation 22

Page 23: STAR: Stack Trace based Automatic Crash Reproduction

Backward Symbolic Execution Given a program P, a crash location L and the crash condition

C at L, we execute P from L to a method entry with C as the initial crash precondition.

The precondition is updated along the execution path according to the executed statements. E.g. int var3 = var1 + var2;

-> all occurrences of var3 are replaced by var1 + var2

E.g. if (var1 != null)

-> Coming from true branch: var1 != null is added to precondition

-> Coming from false branch: var1 == null is added to precondition

The preconditions at method entries are save as the final crash preconditions.

Crash Precondition Computation 23

Page 24: STAR: Stack Trace based Automatic Crash Reproduction

Backward Symbolic ExecutionMethod Entry

buffer[i] = 0;

AIOBE

24Crash Precondition Computation

int i = this.last;

If (i < buffer.length)

T

TRUE

{buffer != null}{i < 0 or i >= buffer.length}

{buffer != null}{i < 0 or i >= buffer.length}

{i < buffer.length}

{buffer != null}{last < 0 or last >=

buffer.length}{last < buffer.length}

Precondition

Sym

bolic

Exe

cutio

n

Page 25: STAR: Stack Trace based Automatic Crash Reproduction

isDebugging()

Challenge – Path explosion

buffer[i] = 0

AIOBE

debugLog(…) print(…)

i = 0 i = index

T F

FT

25

Crash Precondition Computation

index >= buffer.length

buffer = new int[16]

Page 26: STAR: Stack Trace based Automatic Crash Reproduction

Optimizations STAR introduces three different approaches to improve

crash precondition computation process: Static Path Reduction

Heuristic backtracking

Early detection of inner contradictions

26Crash Precondition Computation

Page 27: STAR: Stack Trace based Automatic Crash Reproduction

Static Path Reduction Observation:

Only a subset of the conditional branches and method calls contribute to the target crash.

E.g. Methods that perform runtime logging can be safely skipped

E.g. Branches which do not modify the crash related variables can be safely skipped.

Optimization: STAR detects and skips branches or method calls that do not contribute to the target crash during symbolic execution.

27Crash Precondition Computation

Page 28: STAR: Stack Trace based Automatic Crash Reproduction

isDebugging()

Static Path Reduction

28Crash Precondition Computation

method isDebugging() does not contribute to the crash

buffer[i] = 0

AIOBE

debugLog(…) print(…)

i = 0 i = index

T F

FT

index >= buffer.length

buffer = new int[16]

Page 29: STAR: Stack Trace based Automatic Crash Reproduction

isDebugging()

Static Path Reduction

29Crash Precondition Computation

the conditional branch does not contribute to the crash as well.

buffer[i] = 0

AIOBE

debugLog(…) print(…)

i = 0 i = index

T F

FT

index >= buffer.length

buffer = new int[16]

Page 30: STAR: Stack Trace based Automatic Crash Reproduction

isDebugging()

Static Path Reduction

30Crash Precondition Computation

STAR can detect and skip over methods and branches not contributing to the crash

buffer[i] = 0

debugLog(…) print(…)

i = 0 i = index

T F

FT

index >= buffer.length

buffer = new int[16]

AIOBE

Page 31: STAR: Stack Trace based Automatic Crash Reproduction

Static Path Reduction A conditional branch or a method call is contributive to the

crash if: It can modify any stack location referenced in the current crash

precondition formula.

It can modify any heap location referenced in the current crash precondition formula.

However, in backward execution, the actual heap locations may not be decidable until they are explicitly defined.

Crash Precondition Computation 31

Page 32: STAR: Stack Trace based Automatic Crash Reproduction

Static Path Reduction For any reference whose heap location cannot be decide:

Compare whether the modified heap location and the reference has compatible data types.

Compare whether the modified heap location and the reference has the same field name (exception array)

If both of the above criterion are satisfied, the heap locations are considered the same.

In Java, the same heap location can only be accessed through the same field name, except for array fields.

Crash Precondition Computation 32

Page 33: STAR: Stack Trace based Automatic Crash Reproduction

Heuristic Backtracking Observation:

Backtracking execution to the most recent branching point is likely inefficient, as the contradictions are usually introduced much earlier.

Optimization: STAR can efficiently backtrack to the most relevant branches where contradictions may still be avoided.

33Crash Precondition Computation

Page 34: STAR: Stack Trace based Automatic Crash Reproduction

isDebugging()

Heuristic Backtracking

34

An executed path is not satisfiable according to the SMT solver.

Crash Precondition Computation

buffer[i] = 0

debugLog(…) print(…)

i = 0 i = index

T F

FT

index >= buffer.length

buffer = new int[16]

AIOBE

Page 35: STAR: Stack Trace based Automatic Crash Reproduction

Heuristic Backtracking

35

Typical backtracking is not efficient.

Crash Precondition Computation

isDebugging()

buffer[i] = 0

debugLog(…) print(…)

i = 0 i = index

T F

FT

index >= buffer.length

buffer = new int[16]

AIOBE

Page 36: STAR: Stack Trace based Automatic Crash Reproduction

i = index

isDebugging()

Heuristic Backtracking

36

STAR can quickly backtrack to the most relevant branches

Crash Precondition Computation

buffer[i] = 0

debugLog(…) print(…)

i = 0

T F

FT

index >= buffer.length

buffer = new int[16]

AIOBE

Page 37: STAR: Stack Trace based Automatic Crash Reproduction

Heuristic Backtracking The unsatisfiable core of the last unsatisfied path

conditions. A subset of the path conditions which are still unsatisfied by

themselves

A branching point is considered relevant to the last unsatisfaction and will be backtracked to only if: A condition in the unsatisfiable core was added in this branch, or

A variable’s concrete value in the unsatisfiable core was decided in this branch, or

A variable’s actual heap location in the unsatisfiable core was decided in this branch.

Crash Precondition Computation 37

Page 38: STAR: Stack Trace based Automatic Crash Reproduction

i = index

Inner Contradiction Detection

38

STAR quickly discovers inner-contradictions in the current precondition during execution.

Crash Precondition Computation

isDebugging()

buffer[i] = 0

debugLog(…) print(…)

i = 0

T F

FT

index >= buffer.length

buffer = new int[16]

AIOBE

Page 39: STAR: Stack Trace based Automatic Crash Reproduction

Inner Contradiction Detection

Crash Precondition: index < 0 or index >= 16Index < 16

39Crash Precondition Computation

i = index

isDebugging()

buffer[i] = 0

print(…)

T F

FT

index >= buffer.length

buffer = new int[16]

AIOBE

debugLog(…)

i = 0

STAR quickly discovers inner-contradictions in the current precondition during execution.

Page 40: STAR: Stack Trace based Automatic Crash Reproduction

Other Details Loops and recursive calls

Options for the maximum loop unrollment and maximum recursive call depth

Call graph construction User can specify a pointer analysis algorithm to use

Option for maximum call targets

String operations Strings are treated as arrays of characters.

Complex string operations/regular expressions are not support: require the usage of more specialized constraint solvers: Z3-str, HAMPI

40Crash Precondition Computation

Page 41: STAR: Stack Trace based Automatic Crash Reproduction

Input Model Generation

Page 42: STAR: Stack Trace based Automatic Crash Reproduction

Input Model Generation

program

Test Input Generation

test cases

1

2

3

stack trace

Crash Precondition Computation

Input Model Generation

Crash Preconditions

Crash Models

42Input Model Generation

Page 43: STAR: Stack Trace based Automatic Crash Reproduction

Input Model Generation After computing the crash precondition, we need to

compute a model (object state) which satisfies this precondition.

However, for one precondition, there could be many models that can satisfy it. • E.g. For precondition: {ArrayList.size != 0}, there could be infinite

number of models satisfying it.

43Input Model Generation

Page 44: STAR: Stack Trace based Automatic Crash Reproduction

Generating Feasible Input Models Object Creation Challenge [Xiao et. al, 2011]

Not every model satisfying a precondition is feasible to be generated.

For precondition: ArrayList.size != 0, an input model: ArrayList.size == -1 can satisfy it, but such object can never be generated.

Therefore, we want to obtain input models whose objects are actually feasible to generate.

44Input Model Generation

Page 45: STAR: Stack Trace based Automatic Crash Reproduction

Generating Practical Input Models For different input models, the difficulties in generating the

corresponding objects can be very different.

45

Model 1:

ArrayList.size == 100

Requires add() 100 times

Model 2:

ArrayList.size == 1

Requires add() 1 time

Therefore, we also want to obtain input models whose values are as close to the initial values as possible.

Input Model Generation

Page 46: STAR: Stack Trace based Automatic Crash Reproduction

Class Information STAR has an input model generation approach that can

Generate feasible models Generate practical models

Extracts and uses the class semantic information to guide the input model generation process. The initial value for each class member field.

The potential value range for each numerical field: • e.g. ArrayList.size >= 0

46Input Model Generation

Page 47: STAR: Stack Trace based Automatic Crash Reproduction

Input Model Generation

47

ArrayList.size >= 0

ArrayList.size != 0

ArrayList.size starts from 0

ArrayList.size == 1

Crash Precondition Class Information

A feasible and practical model

Value Range Initial Value

Input Model Generation

SMTSolver

Page 48: STAR: Stack Trace based Automatic Crash Reproduction

Test Input Generation

Page 49: STAR: Stack Trace based Automatic Crash Reproduction

Test Input Generation

program

Test Input Generation

test cases

1

2

3

stack trace

Crash Precondition Computation

Input Model Generation

Crash Preconditions

Crash Models

49Test Input Generation

Page 50: STAR: Stack Trace based Automatic Crash Reproduction

Test Input Generation Given a crashing model, it is necessary to generate test

inputs that can satisfy it.

However, it could be challenging to generate object test inputs [Xiao et. al, 2011] Non-public fields are not assignable Class invariants are easily broken if generate using reflection.

A legitimate method sequence that can create and mutate an object to satisfy the target model (target object state).

50Test Input Generation

Page 51: STAR: Stack Trace based Automatic Crash Reproduction

Test Input Generation Randomized techniques

Randoop [Pacheco et. al, 2007]

Dynamic analysis Palulu [Artzi et. al, 2009] Palus [Zhang et. al, 2011]

Codebase mining MSeqGen [Thummalapenta et. al, 2009]

Not efficient as their input generation process are not demand-driven, and may rely on existing code bases.

Test Input Generation 51

Page 52: STAR: Stack Trace based Automatic Crash Reproduction

Test Input Generation STAR proposes a novel demand-driven test input

generation approach.

52Test Input Generation

Page 53: STAR: Stack Trace based Automatic Crash Reproduction

Summary Extraction

Forward symbolic execution to obtain the summary of each method.

Test Input Generation 53

Page 54: STAR: Stack Trace based Automatic Crash Reproduction

Summary Extraction

54Test Input Generation

Summary of a method the collection of the summaries of its individual paths.

Summary of a method path: , where : the path conditions represented as a conjunction of constraints

over the method inputs (heap locations read by the method)

: postcondition of the path represented as a conjunction of constraints over the method outputs (heap locations written by the method) Essentially, it is the final effect of this method path.

Page 55: STAR: Stack Trace based Automatic Crash Reproduction

Summary Extraction

Test Input Generation 55

obj != null

e = new Exception()

T F

throw e

list[size] = obj

size += 1

Method Entry

Method Exit

Path 1

obj != null

list[size] = objsize += 1

obj == null

throw new Exception

Path 2

We perform a forward symbolic execution to the target method.

Path Condition

Path Effect

Page 56: STAR: Stack Trace based Automatic Crash Reproduction

Method Sequence Deduction

STAR introduced a deductive-style approach to construct method sequences that can achieve the target object state

Test Input Generation 56

Page 57: STAR: Stack Trace based Automatic Crash Reproduction

Method Sequence Deduction

.

Deductive Engine

Constraint Solver

57Test Input Generation

Φ𝑡𝑎𝑟𝑔𝑒𝑡

Method Path:

Candidate Method

Φ h𝑝𝑎𝑡 ∧Φ𝑡𝑎𝑟𝑔𝑒𝑡

Input Parameter’s Object States

satisfies

By taking this path, the target object state can be achieved

Recursive deduction for parameter

Given a target object state , the path summaries for each method, the approach finds a method sequence that can produces an object satisfying in a recursive deduction.

Page 58: STAR: Stack Trace based Automatic Crash Reproduction

Example

public class Container {

public Container()

public void add(Object);

public void remove(Object);

public void clear();

}

Desired object state (Input model): Container.size == 10

58Test Input Generation

Page 59: STAR: Stack Trace based Automatic Crash Reproduction

Example – Summary Extraction

Path 1

size = 0 remove all in listsize = 0

Path 1

Path 1

obj != null

list[size] = objsize += 1

obj == null

throw an exception

Path 2

Path 1

obj in list

remove from listsize -= 1

Path 2

obj not in list

No effect

Container() clear()

add(obj)

remove(obj)

TRUE TRUE

Test Input Generation 59

Page 60: STAR: Stack Trace based Automatic Crash Reproduction

Example – Sequence Deduction

Can add() produce target state?

Yes, this.size == 9 && obj != null

Container.size == 9

Select clear() No, not satisfiable

Can clear() produce target state?

60Test Input Generation

Select add(obj)Container.size

== 10 Deductive Engine

Constraint Solver

𝚽𝐭𝐚𝐫𝐠𝐞𝐭 Method Deduction

Page 61: STAR: Stack Trace based Automatic Crash Reproduction

Example – Sequence Deduction

Can add() produce target state?

Yes, this.size == 9 && obj != null

Yes, this.size == 8 && obj != null

Can add() produce target state?

Container.size == 0

Yes, no parameter requirement

Can Contaier() produce target state?

61Test Input Generation

Container.size == 9

Select add(obj)

Select add(obj)Container.size

== 10

Select Container()

Deductive Engine

Constraint Solver

𝚽𝐭𝐚𝐫𝐠𝐞𝐭 DeductionMethod

Page 62: STAR: Stack Trace based Automatic Crash Reproduction

Example – Final Sequence Combine in reverse direction to form the whole sequence

void sequence() {

Container container = new Container();

Object o1 = new Object();

container.add(o1);

… (10 times)

}

62Test Input Generation

Page 63: STAR: Stack Trace based Automatic Crash Reproduction

Other Details The forward symbolic execution in method summary extraction

follows similar settings as precondition computation E.g. Loops and recursive calls are expanded for only limited

times/depth. (So the extracted path summary ≤ total method paths)

• The incompleteness of method path summary does not affect the precision of the method sequence composition. Generated method sequences are still correct. Method sequences may not be generated due to missing path summary.

Optimizations have been applied to reduce the number of methods and method paths to examine.

63Test Input Generation

Page 64: STAR: Stack Trace based Automatic Crash Reproduction

Evaluation

Page 65: STAR: Stack Trace based Automatic Crash Reproduction

Research Questions Research Question 1

How many crashes can STAR compute their crash triggering preconditions?

Research Question 2How many crashes can STAR reproduce based on the crash triggering preconditions?

Research Question 3How many crash reproductions by STAR are useful for revealing the actual cause of the crashes?

65

Page 66: STAR: Stack Trace based Automatic Crash Reproduction

Subjects: Apache-Commons-Collection (ACC):

data container library that implements additional data structures over JDK. 60kLOC.

Ant (ANT)Java build tool that supports a number of built-in and extension tasks such as compile, test and run Java applications. 100kLOC.

Log4j (LOG)logging package for printing log output to different local and remote destinations. 20kLOC.

Evaluation Setup

66

Page 67: STAR: Stack Trace based Automatic Crash Reproduction

Crash Report Collection: Collect from the issue tracking system of each subject.

Only confirmed and fixed crashes were collected.

Crashes with no or incorrect stack trace information were discarded.

Three major types of crashes: custom thrown exceptions, NPE and AIOBE. (covers 80% of crashes, Nam et. al, 2009)

52 crashes were obtained from the three subjects.

Evaluation Setup

Subject # of Crashes Versions Avg. Fix Time Report Period

ACC 12 2.0 – 4.0 42 days Oct. 03 – Jun. 12

ANT 21 1.6.1 – 1.8.3 25 days Apr. 04 – Aug. 12

LOG 19 1.0.0 – 1.2.16 77 days Jan. 01 – Oct. 09

67

Page 68: STAR: Stack Trace based Automatic Crash Reproduction

Our evaluation study has the largest number of crashes compared to previous studies

Evaluation Setup

Subject Number of Crashes

RECRASH 11

ESD 6

BugRedux 17

RECORE 7

STAR 52

68

Page 69: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 1 How many crashes can STAR compute their crash

preconditions? How many crashes can STAR compute crash precondition without

the optimization approaches.

How many crashes can STAR compute crash precondition with the optimization approaches.

We applied STAR to compute the preconditions for each crash.

69

Page 70: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 1

ACC ANT LOG Overall0

10

20

30

40

50

60

70

80

66.7

14.3

36.8 34.6

7571.4 73.7 73.1

Without Optimizations With Optimizations

Cra

shes

with

pre

cond

ition

s (%

)

70

Percentage of crashes whose preconditions were computed by STAR

+57.1

+36.9 +38.5

Page 71: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 1

ACC ANT LOG Overall0

10

20

30

40

50

60

70

80

90

100

18.5

90.4

55.159.3

2.1 4.9 2.4 3.3

Without Optimizations With Optimizations

Ave

rage

tim

e sp

ent

(sec

ond)

71

Average time to compute the crash preconditions (The lower the better)

Page 72: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 1

ACC ANT LOG Overall0

10

20

30

40

50

60

70

80

66.7

14.3

36.834.6

75

23.8

47.444.2

66.7

23.8

36.838.5

66.7

14.3

42.1

36.5

7571.4

73.7 73.1

No Optimization Static Path ReductionHeuristic Backtracking Contradiction DetectAll Optimizations

Cra

shes

with

pre

cond

ition

s (%

)

72

Percentage of crashes whose preconditions were computed by STAR – Break down by each optimization

Page 73: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 1 STAR successfully computed crash preconditions for 38

(73.1%) out of the 52 crashes.

STAR’s optimization approaches have significantly improved the overall result by 20 (38.5%) crashes.

Static path reduction is the most effective optimization, but the application of all three optimizations together can achieve a much higher improvement.

73

Page 74: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 2 How many crashes can STAR reproduce based on the

crash preconditions?

Criterion of Reproduction [ReCrash, 2008]A crash is considered reproduced if the generated test case can trigger the same type of exception at the same crash line.

We applied STAR to generate crash reproducible test cases for each computed crash precondition.

74

Page 75: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 2

Subject # of Crashes# of

Precondition# of

ReproducedRatio

ACC 12 9 866.7%

(88.9%)

ANT 21 15 1257.1%

(80.0%)

LOG 19 14 1157.9%

(78.6%)

Total 52 38 3159.6%

(81.6%)

Overall crash reproductions achieved by STAR for each subject:

75

Page 76: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 2

SubjectAverage # of

ObjectsAvg. Candidate

MethodsMin – Max Sequence

Average Sequence

ACC 1.5 35.5 2 - 19 9.4

ANT 1.4 11.7 2 - 14 6.2

LOG 1.5 21.8 2 - 17 8.1

Total 1.5 21.4 2 - 19 7.7

More statistics for the test case generation process by STAR

76

Page 77: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 3 Criterion of Reproduction does not require a crash

reproduction to match the complete stack trace frames. A partial match of only the top stack frames is still considered as a

valid reproduction of the target crash according to the criterion.

The root causes of more than 60% of crashes lie in the top three stack frames [Schroter et. al, 2010] It is not necessary to reproduce the complete stack trace to reveal

the root cause of a crash.

77

Page 78: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 3 Drawbacks of Criterion of Reproduction

The crash reproduction may not be the same crash.

The crash reproduction may not be useful for revealing the crash triggering bug.

78

Buggy frame

Reproduced

Page 79: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 3 How many crash reproductions by STAR are useful for

revealing the actual causes of the crashes?

Criterion of useful crash reproductionA crash reproduction is considered useful if it can trigger the same incorrect behaviors at the buggy location, and eventually causes the crash to re-appear.

We manually examined the original and fixed versions of the program to identify the actual buggy location for each crash.

79

Page 80: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 3

Subject # of Reproduced # of Useful Ratio (Total)

ACC 8 7 87.5% (58.3%)

ANT 12 7 58.3% (33.3%)

LOG 11 8 72.7% (42.1%)

Total 31 22 71.0% (42.3%)

Overall useful crash reproductions achieved by STAR for each subject:

80

Page 81: STAR: Stack Trace based Automatic Crash Reproduction

Comparison Study We compared STAR with two different crash reproduction

frameworks: Randoop: feedback-directed test input generation framework. It is

capable of generating thousands of test inputs that may reproduce the target crashes.

Maximum of 1000 seconds to generate test cases. (10 times of STAR)Manually provide the crash related class list to increase its probabilities.

BugRedux: a state-of-the-art crash reproduction framework. It can compute crash preconditions and generate crash reproducible test cases.

We apply the two frameworks to the same set of crashes used in our evaluation.

81

Page 82: STAR: Stack Trace based Automatic Crash Reproduction

Comparison Study

82

Precondition Reproduction Usefulness0

5

10

15

20

25

30

35

40

0

12

8

18

107

38

31

22

Randoop BugRedux STAR

Nu

mb

er o

f C

rash

es

The number of crashes reproduced by the three approaches

Page 83: STAR: Stack Trace based Automatic Crash Reproduction

Comparison Study

83

Randoop

12 crashes

BugRedux

10 crashes5 crashes

STAR

Page 84: STAR: Stack Trace based Automatic Crash Reproduction

Comparison Study STAR outperformed Randoop because:

Randoop uses a randomized search technique to generate method sequences. Can generate many method sequences but not guided.

Due to the large search space of real world programs, the probabilities to generate crash reproducible sequences are low.

STAR outperformed BugRedux because: Several effective optimizations to improve the efficiency of the

crash precondition computation process.

A method sequence composition approach that can generate complex input objects satisfying the crash preconditions.

84

Page 85: STAR: Stack Trace based Automatic Crash Reproduction

Case Study https://issues.apache.org/jira/browse/collections-411

An IndexOutOfBoundsException could be raised in method ListOrderedMap.putAll() due to incorrect index increment.

This bug was soon fixed by the developers by adding checkers to make sure index is incremented only in certain cases.

85

01 public void putAll(int index, Map map) {

02 for (Map.Entry entry : map.entrySet()) {

03 put(index, entry.getKey(), entry.getValue();

04 ++index; / / buggy increment

05 }

06 }

Page 86: STAR: Stack Trace based Automatic Crash Reproduction

Case Study STAR was applied to generate a crash reproducible test case

for this crash: Surprisingly, it successfully generated a test case that could crash both the

original and fixed (latest) version of the program.

We reported this potential issue discovered by STAR to the project developers https://issues.apache.org/jira/browse/collections-474

We also attached the auto-generated test case by STAR in our bug report.

86

Page 87: STAR: Stack Trace based Automatic Crash Reproduction

Case Study Developers quickly confirmed:

The original patch for bug ACC-411 was actually incomplete. It missed a corner case that can still crash the program.

Neither the developers nor the original bug reporter identified this corner case in over a year.

It only took developers a few hours to confirmed and fixed the bug after STAR’s test case demonstrated this corner case.

The crash reproducible test case by STAR was added to the official test suite of the Apache Commons Collections project by the developers. http://svn.apache.org/r1496168

87

Page 88: STAR: Stack Trace based Automatic Crash Reproduction

Case Study STAR is capable of identifying and reproducing crashes that

are even difficult for experienced developers.

STAR can be used to confirm the completeness of bug fixes. If a bug fix is incomplete, STAR may generate a crash reproducible

test case to demonstrate the missing corner case.

88

Page 89: STAR: Stack Trace based Automatic Crash Reproduction

Challenges & Future Work

Page 90: STAR: Stack Trace based Automatic Crash Reproduction

Challenges We manually examined each not reproduced crashes to

identify the major challenges of reproduction:

Environment dependency (36.7%) File input. Network input.

SMT Solver Limitation (23.3%) Complex string constraints (e.g. regular expressions) Non-linear arithmetic

Concurrency & Non-determinism (16.7%) Some crashes are only reproducible non-deterministically or under

concurrent execution.

Path Explosion (6.7%)

90

Page 91: STAR: Stack Trace based Automatic Crash Reproduction

Future Work Improving reproducibility

Support for environment simulation, e.g. file inputs

Incorporate specialized SMT solver: string solver like Z3-str

Automatic fault localization Existing fault localization approaches requires both passing and

failing test cases locate faulty statements.

STAR’s ability to generate failing test cases can help automate the fault localization process.

Crash reproduction for mobile applications Android applications are similar to desktop Java programs in many

aspects.

91

Page 92: STAR: Stack Trace based Automatic Crash Reproduction

Conclusions We proposed STAR, an automatic crash reproduction

framework using stack trace.

Successfully reproduced 31 (59.6%) out of 52 real world crashes from three non-trivial programs.

The reproduced crashes can effectively help developers reveal the underlying crash triggering bugs, or even identify unknown bug.

A comparison study demonstrates that STAR can significantly outperform existing crash reproduction approaches.

92

Page 93: STAR: Stack Trace based Automatic Crash Reproduction

Thank You!

Page 94: STAR: Stack Trace based Automatic Crash Reproduction

Appendix

Page 95: STAR: Stack Trace based Automatic Crash Reproduction

Our evaluation study has one of the largest subject size compared to previous studies

Subject Sizes

Subject Subject Sizes Average Subject Size

RECRASH 200 – 86,000 47,000

ESD 100 – 100,000 N/A

BugRedux 500 – 241,000 27,000

RECORE 68 – 62,000 35,000

STAR 20,000 – 100,000 60,000

95

Page 96: STAR: Stack Trace based Automatic Crash Reproduction

Research Question 1

ACC ANT LOG Overall0

10

20

30

40

50

60

70

80

90

100

18.5

90.4

55.159.3

11.8

67.5

28.3

39.2

15.9

74.8

47.850

13.8

86.8

48.2

54.3

2.14.9

2.4 3.3

No Optimization Static Path ReductionHeuristic Backtracking Contradiction DetectAll Optimizations

Ave

rage

tim

e sp

ent

(sec

ond)

96

Average time to compute the crash preconditions (The lower the better) – Break down by each optimization

Page 97: STAR: Stack Trace based Automatic Crash Reproduction

Comparison Study

ACC ANT LOG Overall0

5

10

15

20

25

30

35

2.4

29.9

4.275

8.7

2.3

10.8

3.75 4.6

BugRedux STAR

Ave

rag

e ti

me

spen

t (s

eco

nd

)

97

Average time to reproduce crashes (The lower the better) – Only the common reproductions

Page 98: STAR: Stack Trace based Automatic Crash Reproduction

User Survey

ACC-53“The auto-generated test case would reproduce the bug. . . I think that having such a test case would have been useful.”

98

Survey Sent ResponsesConfirmed

CorrectnessConfirmed Usefulness

31 6 (19%) 5 3

Page 99: STAR: Stack Trace based Automatic Crash Reproduction

Comparison Study

ACC JSAP SAT4J0

10

20

30

40

50

60

70

80

16

30

12

29

40

2019

58

2222

54

0

29

61

36

6974

54

Sample Execution Randoop PaluluRecGen Palus STAR

Bra

nch

Co

vera

ge

(%)

Branch coverage achieved by different test case generation approaches