statistical debugging cs 6340. motivation bugs will escape in-house testing and analysis tools...

56
Statistical Debugging CS 6340

Upload: brice-allen

Post on 18-Jan-2018

226 views

Category:

Documents


0 download

DESCRIPTION

An Idea: Statistical Debugging Monitor Deployed Code –Online: Collect information from user runs –Offline: Analyze information to find bugs Effectively a “black box” for software

TRANSCRIPT

Page 1: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Statistical Debugging

CS 6340

Page 2: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Motivation

• Bugs will escape in-house testing and analysis tools– Dynamic analysis (i.e. testing) is unsound

– Static analysis is incomplete

– Limited resources (time, money, people)

• Software ships with unknown (and even known) bugs

Page 3: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

An Idea: Statistical Debugging

• Monitor Deployed Code– Online: Collect information

from user runs

– Offline: Analyze informationto find bugs

• Effectively a “black box” for software

Page 4: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Benefits of Statistical Debugging

Actual runs are a vast resource!

• Crowdsource-based Testing– Number of real runs >> number of testing runs

• Reality-directed Debugging– Real-world runs are the ones that matter most

Page 5: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Two Key Questions

• How do we get the data?

• What do we do with it?

Page 6: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Practical Challenges

1. Complex systems– Millions of lines of code– Mix of controlled and uncontrolled code

2. Remote monitoring constraints– Limited disk space, network bandwidth, power, etc.

3. Incomplete information– Limit performance overhead– Privacy and security

dataremote client

central server

Internet

Page 7: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

The Approach

• Guess behaviors that are “potentially interesting”– Compile-time instrumentation of program

• Collect sparse, fair subset of these behaviors– Generic sampling framework– Feedback profile + outcome label (success vs. failure)

for each run

• Analyze behavioral changes in successful vs. failing runs to find bugs– Statistical debugging

Page 8: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Overall Architecture

Guesses

Top bugs with likely

causes

Guesses

StatisticalDebugging

Program Source

Profile+ / 😊 😡

Deployed Application

Sampler

Compiler

Page 9: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

A Model of Behavior

• Assume any interesting behavior is expressible asa predicate P on a program state at a particular program point.

Observation of behavior = observing P

• Instrument the program to observe each predicate

• Which predicates should we observe?

Page 10: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Branches Are Interesting

++branch_17[p!=0];if (p) …else …

63p == 0

0

branch_17:

Track predicates:p != 0

Page 11: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Return Values Are Interesting

n = fopen(...);++call_41[(n==0)+(n>=0)];

23

call_41:

n < 0n > 0n == 0

Track predicates: 0

90

Page 12: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

What Other Behaviors Are Interesting?

• Depends on the problem you wish to solve!

• Examples:– Number of times each loop runs

– Scalar relationships between variables(e.g. i < j, i > 42)

– Pointer relationships (e.g. p == q, p != null)

Page 13: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Identify the Predicates

List all predicates tracked for this program, assuming only branches are potentially interesting:

void main() { int z; for (int i = 0; i < 3; i++) { char c = getc(); if (c == ‘a’) z = 0; else z = 1; assert(z == 1); }}

Page 14: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Identify the Predicates

List all predicates tracked for this program, assuming only branches are potentially interesting:

c == ‘a’

c != ‘a’

i < 3

i >= 3

void main() { int z; for (int i = 0; i < 3; i++) { char c = getc(); if (c == ‘a’) z = 0; else z = 1; assert(z == 1); }}

Page 15: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Summarization and Reporting

63p == 0p != 0 0

branch_17:

23

call_41:n < 0n > 0n == 0

0

90

p == 0p != 0n < 0n > 0n == 0

63

0

23

0

90

1

0

Page 16: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Summarization and Reporting

• Feedback report per run is:– Vector of predicate states:

• , 0 , 1 , *– Success/failure outcome

label

• No time dimension, forgood or ill

outcome S / F

p == 0p != 0n < 0n > 0n == 0

63

0

23

0

90

1

0

Page 17: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Abstracting Predicate Counts

‒ Never observed

0 False at least once, never true

1 True at least once, never false

* Both true and false at least once

p == 0p != 0n < 0n > 0n == 0

63

0

23

0

90

1

0

Page 18: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Abstracting Predicate Counts

p == 0p != 0n < 0n > 0n == 0

63

0

23

0

90

1

0

*

0

*

‒ Never observed

0 False at least once, never true

1 True at least once, never false

* Both true and false at least once

Page 19: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

void main() { int z; for (int i = 0; i < 3; i++) { char c = getc(); if (c == ‘a’) z = 0; else z = 1; assert(z == 1); }}

“bba” “bbb”

c == ‘a’

c != ‘a’

i < 3

i >= 3

Outcome label (S/F)

Populate the predicate vectors and outcome labels for the two runs:

QUIZ: Populate the Predicates

Page 20: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

void main() { int z; for (int i = 0; i < 3; i++) { char c = getc(); if (c == ‘a’) z = 0; else z = 1; assert(z == 1); }}

Populate the predicate vectors and outcome labels for the two runs:

QUIZ: Populate the Predicates

“bba” “bbb”

c == ‘a’ * 0

c != ‘a’ * 1

i < 3 1 *

i >= 3 0 *

Outcome label (S/F)

F S

Page 21: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

• Tracking all predicates is expensive• Decide to examine or ignore each instrumented site:

– Randomly – Independently– Dynamically

• Why?– Fairness– We need an accurate picture of rare events

p == 0p != 0n < 0n > 0n == 0

1

0

*

0

*

The Need for Sampling

Page 22: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

A Naive Sampling Approach

• Toss a coin at each instrumentation site Too slow!

++count_42[p != NULL];p = p->next;

++count_43[i < max];total += sizes[i];

if (rand(100) == 0) ++count_42[p != NULL];p = p->next;

if (rand(100) == 0) ++count_43[i < max];total += sizes[i];

Page 23: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Some Other Problematic Approaches

• Sample every kth site– Violates independence

– Might miss predicates “out of phase”

• Periodic hardware timer or interrupt– Might miss rare events

– Not portable

Page 24: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Amortized Coin Tossing

• Observation: Samples are rare (e.g. 1/100)

• Idea: Amortize sampling cost by predicting time until next sample

• Implement as countdown values selected from geometric distribution

• Models how many tails (0) before next head (1) for biased coin toss

• Example with sampling rate 1/5:

0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, …

5 3 4 Next sample after: sites

Page 25: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

An Efficient Approach

if (countdown >= 2) { countdown -= 2; p = p->next; total += sizes[i];} else { if (countdown-- == 0) { ++count_42[p != NULL]; countdown = next(); } p = p->next; if (countdown-- == 0) { ++count_43[i < max]; countdown = next(); } total += sizes[i];}

if (rand(100) == 0) ++count_42[p !=NULL];p = p->next;

if (rand(100) == 0) ++count_43[i < max];total += sizes[i];

Page 26: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Feedback Reports with Sampling

• Feedback report per run is:– Vector of sampled predicate states (‒, 0, 1, *)– Success/failure outcome label

• Certain of what we did observe– But may miss some events

• Given enough runs, samples ≈ reality– Common events seen most often– Rare events seen at proportionate rate

Page 27: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Uncertainty Due to Sampling

Check all possible states that a predicate P might take dueto sampling. The first column shows the actual state of P (without sampling).

P ‒ 0 1 *

0

1 ✅ ✅

*

Page 28: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Uncertainty Due to Sampling

P ‒ 0 1 *

‒ ✅

0 ✅ ✅

1 ✅ ✅

* ✅ ✅ ✅ ✅

Check all possible states that a predicate P might take dueto sampling. The first column shows the actual state of P (without sampling).

Page 29: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Overall Architecture Revisited

Guesses

Top bugs with likely

causes

Guesses

StatisticalDebugging

Program Source

Profile+ / 😊 😡

Deployed Application

Sampler

Compiler

Page 30: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Finding Causes of Bugs

• We gather information about many predicates– ≈ 300K for bc (“bench calculator” program on Unix)

• Most of these are not predictive of anything

• How do we find the few useful predicates?

Page 31: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Finding Causes of Bugs

How likely is failure when predicate P is observed to be true?

F(P) = # failing runs where P is observed to be true

S(P) = # successful runs where P is observed to be true

Example: F(P)=20, S(P)=30 ⇒ Failure(P) = 20/50 = 0.4

Failure(P) =

F(P)F(P) + S(P)

Page 32: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Tracking Failure Is Not Enough

Predicate x == 0 is an innocent bystander• Program is already doomed

Failure(x == 0) = 1.0

Failure(f == NULL) = 1.0

if (f == NULL) { x = foo(); *f;}

int foo() { return 0;

}

Page 33: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Tracking Context

How likely is failure when predicate P is observed to be true?

F(P observed) = # failing runs observing P

S(P observed) = # successful runs observing P

Example: F(P observed) = 40, S(P observed) = 80 ⇒ Context(P) = 40/120 = 0.333…

Context(P) =

F(P observed)

F(P observed) + S(P observed)

Page 34: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

A Useful Measure: Increase()

Does P being true increase chance of failure over the background rate?

Increase(P) = Failure(P) - Context(P)

• A form of likelihood ratio testing

• Increase(P) ≈ 1 High correlation with failing runs⇒• Increase(P) ≈ -1 High correlation with successful⇒

runs

Page 35: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Failure(x == 0) = 1.00

Increase(f == NULL) = 0.67

if (f == NULL) { x = foo(); *f;}

int foo() {return 0;

}Context(x == 0) = 1.00

Failure(f == NULL) = 1.00

Increase(x == 0) = 0.00

Context(f == NULL) = 0.33

1 failing run: f == NULL2 successful runs: f != NULL

Increase() Works

Page 36: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Computing Increase()

“bba” “bbb” Failure

Context

Increase

c == ‘a’ * 0

c != ‘a’ * 1 0.5 0.5 0.0

i < 3 1 *

i >= 3 0 *

Outcome label (S/F)

F S

Page 37: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

QUIZ: Computing Increase()

“bba” “bbb” Failure

Context

Increase

c == ‘a’ * 0 1.0 0.5 0.5

c != ‘a’ * 1 0.5 0.5 0.0

i < 3 1 * 0.5 0.5 0.0

i >= 3 0 * 0.0 0.5 -0.5

Outcome label (S/F)

F S

Page 38: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Isolating the Bug

Increase

c == ‘a’ 0.5

c != ‘a’ 0.0

i < 3 0.0

i >= 3 -0.5

void main() { int z; for (int i = 0; i < 3; i++) { char c = getc(); if (c == ‘a’) z = 0; else z = 1; assert(z == 1); }}

Page 39: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

A First Algorithm

1. Discard predicates having Increase(P) ≤ 0– e.g. bystander predicates, predicates correlated

with success

– Exact value is sensitive to few observations

– Use lower bound of 95% confidence interval

2. Sort remaining predicates by Increase(P)– Again, use 95% lower bound

– Likely causes with determinacy metrics

Page 40: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Isolating the Bug

void main() { int z; for (int i = 0; i < 3; i++) { char c = getc(); if (c == ‘a’) z = 0; else z = 1; assert(z == 1); }}

Increase

c == ‘a’ 0.5

c != ‘a’ 0.0

i < 3 0.0

i >= 3 -0.5

Page 41: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Isolating a Single Bug in bc

void more_arrays()

{

...

/* Copy the old arrays. */

for (indx = 1; indx < old_count; indx++)

arrays[indx] = old_ary[indx];

/* Initialize the new elements. */

for (; indx < v_count; indx++)

arrays[indx] = NULL;

...

}

#1: indx > scale#2: indx > use_math#3: indx > opterr#4: indx > next_func#5: indx > i_base

Page 42: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

It Works!

• At least for programs with a single bug

• Real programs typically have multiple, unknown bugs

• Redundancy in the predicate list is a major problem

Page 43: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Using the Information

• Multiple useful metrics: Increase(P), Failure(P), F(P), S(P)

• Organize all metrics in compact visual (bug thermometer)

Page 44: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Sample Report

Page 45: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Multiple Bugs: The Goal

Find the best predictor for each bug,

without prior knowledge of the number of bugs,

sorted by the importance of the bugs.

Page 46: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Multiple Bugs: Some Issues

• A bug may have many redundant predictors– Only need one

– But would like to know correlated predictors

• Bugs occur on vastly different scales– Predictors for common bugs may dominate,

hiding predictors of less common problems

Page 47: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

An Idea

• Simulate the way humans fix bugs

• Find the first (most important) bug

• Fix it, and repeat

Page 48: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Revised Algorithm

Repeat

Step 1: Compute Increase(), F(), etc. for all predicates

Step 2: Rank the predicates

Step 3: Add the top-ranked predicate P to the result list

Step 4: Remove P and discard all runs where P is tr

• Simulates fixing the bug corresponding to P

• Discard reduces rank of correlated predicates

Until no runs are left

Page 49: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Ranking by Increase(P)

Problem: High Increase() scores but few failing runs!

Sub-bug predictors: covering special cases of moregeneral bugs

Page 50: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Ranking by F(P)

Problem: Many failing runs but low Increase() scores!

Super-bug predictors: covering several different bugs together

Page 51: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

A Helpful Analogy

• In the language of information retrieval– Precision = fraction of retrieved instances that are relevant– Recall = fraction of relevant instances that are retrieved

• In our setting:– Retrieved instances ~ predicates reported as bug

predictors– Relevant instances ~ predicates that are actual bug

predictors

• Trivial to achieve only high precision or only high recall• Need both high precision and high recall

Page 52: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Combining Precision and Recall

• Increase(P) has high precision, low recall

• F(P) has a high recall, low precision

• Standard solution: take the harmonic mean of both

• Rewards high scores in both dimensions

21/Increase(P) + 1/F(P)

=

Page 53: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Sorting by the Harmonic Mean

It works!

Page 54: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

What Have We Learned?

• Monitoring deployed code to find bugs

• Observing predicates as model of program behavior

• Sampling instrumentation framework

• Metrics to rank predicates by importance: Failure(P), Context(P), Increase(P), ...

• Statistical debugging algorithm to isolate bugs

Page 55: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Key Takeaway S1 of 2

• A lot can be learned from actual executions– Users are executing them anyway

– We should capture some of that information

Page 56: Statistical Debugging CS 6340. Motivation Bugs will escape in-house testing and analysis tools Dynamic analysis (i.e. testing) is unsound Static analysis

Key Takeaway S2 of 2

• Crash reporting is a step in the right direction– But stack is useful for only about 50% of bugs

– Doesn’t characterize successful runs•But this is changing ...

Screenshot of permission to report crash

Screenshot of permission to

monitor all runs