improved fitness functions for automated program repair zachary p. fry

20
IMPROVED FITNESS FUNCTIONS FOR AUTOMATED PROGRAM REPAIR ZACHARY P. FRY

Upload: dorothy-phillips

Post on 24-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

IMPROVED FITNESS FUNCTIONS FOR AUTOMATED PROGRAM REPAIR

ZACHARY P. FRY

2

IMPROVED FITNESS FUNCTIONS

Automatic program repair can fix bugs.

Bugs

Fixes

GenProg

3

IMPROVED FITNESS FUNCTIONS

Automatic program repair can fix bugs.

Bugs

Fixes

Fitness Functions

GenProg

4

IMPROVED FITNESS FUNCTIONS

• The current fitness model is imprecise

Ideas:

• Not all test cases are created equal

• Test cases may not describe all relevant program behavior

• Different types of bugs might benefit from different kinds of fixes

We propose to address the naivety of the current fitness representation.

5

FITNESS DISTANCE CORRELATION

“Quantifying the extent to which a GA fitness function approaches an ideal of heuristic search”1

Informally, does a given fitness function produce values that correlate with some grounded notion of “closeness to a fix”?

1) T. Jones and S. Forrest. Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In International Conference on Genetic Algorithms, pages 184–192, 1995.

6

IMPROVED FITNESS FUNCTIONS

• Measuring proximity to a fix• Insert, delete, and swapping lines in the program

Fix d(135)

NO FIX

FIX

i(251,205) i(774,111) s(598,324)

7

IMPROVED FITNESS FUNCTIONS

• Measuring proximity to a fix• Insert, delete, and swapping lines in the program

M1

Fix d(135)

NO FIX

FIX

i(251,205) i(774,111) s(598,324)

i(251,205) i(774,111) s(598,324) d(63)

8

IMPROVED FITNESS FUNCTIONS

• Measuring proximity to a fix• Insert, delete, and swapping lines in the program

M1

Fix d(135)

NO FIX

FIX

i(251,205) i(774,111) s(598,324)

i(251,205) i(774,111) s(598,324) d(63)✓ ✓ ✓ ✗ 75%

9

IMPROVED FITNESS FUNCTIONS

• Measuring proximity to a fix• Insert, delete, and swapping lines in the program

M1

M2

Fix d(135)

NO FIX

FIX

i(251,205) i(774,111) s(598,324)

i(251,205) i(774,111) s(598,324) d(63)✓ ✓ ✓ ✗ 75%

d(84) s(844,265) i(774,111) i(735,431)

10

IMPROVED FITNESS FUNCTIONS

• Measuring proximity to a fix• Insert, delete, and swapping lines in the program

M1

M2

Fix d(135)

NO FIX

FIX

i(251,205) i(774,111) s(598,324)

i(251,205) i(774,111) s(598,324) d(63)✓ ✓ ✓ ✗ 75%

d(84) s(844,265) i(774,111) i(735,431)✓✗ ✗ 25%✗

11

IMPROVED FITNESS FUNCTIONS

The current model of fitness does not correlate well with proximity to a fix (0.145).

Hypothesis: By taking into account previously unused information about test cases, bugs, and fixes we can better inform the evolutionary bug fixing process to fix bugs faster and more often.

12

IMPROVED FITNESS FUNCTIONS

Approach: weight test cases based on known fixes

Test Case 1

Test Case 2

M1

M2

M3

M4

M1

M2

M3

M4

FIX NO FIX

13

IMPROVED FITNESS FUNCTIONS

Approach: weight test cases based on known fixes

Test Case 1

Test Case 2

M1

M2

M3

M4

M1

M2

M3

M4

FIX NO FIX

14

IMPROVED FITNESS FUNCTIONS

Approach: weight test cases based on known fixes

Test Case 1

Test Case 2

M1

M2

M3

M4

M1

M2

M3

M4

FIX NO FIX

0.8 0.2

15

IMPROVED FITNESS FUNCTIONS

Evaluation:

• How much can we speed up fixes?• Computational time and monetary cost• Preliminary results

• How many more bugs can we fix? • Fraction of previously unfixed bugs• Future work

16

PRELIMINARY RESULTS

For a sample of 15 bugs from one program, 31.3% of test cases show no correlation with actual fitness (closeness to a fix)

Bug Avg Percent Time Savings

libtiff-bug-0fb6cf7-b4158fa 24.47%

libtiff-bug-01209c9-aaf9eb3 50.49%

libtiff-bug-10a4985-5362170 35.53%

libtiff-bug-5b02179-3dfb33b 22.08%

libtiff-bug-8f6338a-4c5a9ec 62.24%

Total: 38.96%

17

PRELIMINARY RESULTS

Some test cases are over 23x more correlated with actual fitness than others• Suggests an adequate weighting scheme

using machine learning could fix more bugs, faster

This work and additional efforts to investigate other strategies for improving fitness functions are ongoing.

18

APPLICABILITY

• When might this work?• Programs with expensive test suites – e.g. Php

(12,000+)• When there is heavy overlap between test cases• Test suites/cases that fail to specify the bug

• Assumptions?• Presence of historical bug fix data to mine• Test suites do not evolve drastically from bug to bug• Bugs for a given program are related on some level

19

GOALS

• By providing GenProg a better signal for mutants’ fitness we hope to:• Better direct the search – arrive at fixes

faster, lowering cost (up to 38%)• In the limit, find more fixes for previously

unfixed bugs

20

GOALS

• By providing GenProg a better signal for mutants’ fitness we hope to:• Better direct the search – arrive at fixes

faster, lowering cost (up to 38%)• In the limit, find more fixes for previously

unfixed bugs

QUESTIONS?