automated debbuggin for prgrams

Upload: jonathan-guzman

Post on 02-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Automated Debbuggin For Prgrams

    1/11

    Automated Debugging of Programs with Contracts

    Yi Wei Yu Pei Carlo A. Furia Lucas S. Silva Stefan Buchholz

    Bertrand Meyer Andreas Zeller

    Chair of Software Engineering, ETH Zrich Software Engineering ChairZrich, Switzerland Saarland University Computer Science

    {yi.wei, yu.pei, caf, Bertrand.Meyer}@inf.ethz.ch Saarbrcken, Germany{slucas, bustefan}@student.ethz.ch [email protected]

    ABSTRACT

    In program debugging, finding a failing run is only the firststep; what about correcting the fault? Can we automatethe second task as well as the first? The AutoFix-E tool au-tomatically generates and validates fixes for software faults.The key insights behind AutoFix-E are to rely on contractspresent in the software to ensure that the proposed fixes aresemantically sound, and onstate diagramsusing an abstractnotion of state based on the boolean queries of a class.

    Out of 42 faults found by an automatic testing tool in twowidely used Eiffel libraries, AutoFix-E proposes successfulfixes for 16 faults. Submitting some of these faults to expertsshows that several of the proposed fixes are identical or closeto fixes proposed by humans.

    Categories and Subject Descriptors

    D.2.5 [Software Engineering]: Testing and Debuggingdebugging aids, diagnostics; D.2.4 [Software Engineer-ing]: Software Verificationprogramming by contract, as-sertion checkers, reliability; F.3.1 [Logics and Meaningsof Programs]: Specifying and Verifying and Reasoning

    about Programspre- and post-conditions; I.2.2 [ArtificialIntelligence]: Automatic Programmingprogram modifi-cation

    General Terms

    Reliability, Verification

    Keywords

    Automated debugging, automatic fixing, program synthesis,dynamic invariants

    1. INTRODUCTION

    The programmers ever recommencing fight against errorinvolves two tasks: finding faults; and correcting them. Bothare in dire need of at least partial automation.

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.

    Automatic testing tools are becoming available to addressthe first goal. Recently, some progress has also been madetowards tools that can proposeautomatic corrections. Suchis in particular the goal of the AutoFix project, of whichthe tool described here, AutoFix-E, is one of the first results.The general idea is, once a testing process (automatic, man-ual, or some combination) has produced one or more exe-cution failures, reflecting a fault in the software, to propose

    corrections (fixes) for the fault. The important question ofhow to use such corrections (as suggestions to the program-mer, or, more boldly, for automatic patching) will not beaddressed here; we focus on the technology for finding goodpotential corrections.

    The work reported here is part of a joint project, Aut-oFix, between ETH Zurich and Saarland University. Thetool is called AutoFix-E, to reflect that it has been devel-oped for Eiffel, although the concepts are applicable to anyother language natively equipped with contracts (such asSpec#) or a contract extension of an existing language (suchas JML for Java). The AutoFix project builds on the pre-vious work on automatic testing at ETH, leading to the Au-toTest tool [20], and on work on automatic debugging at

    Saarbrucken, leading in particular to the Pachika tool [5].While AutoFix-E is only an initial step towards automaticfault correction, the first results, detailed in Section 4, areencouraging. In one experiment, we ran AutoFix-E on 42 er-rors found on two important libraries (EiffelBase and Gobo)by the AutoTest automatic testing framework [20]. Theselibraries are production software, available for many yearsand used widely by Eiffel applications, including ones withup to 2 million lines of code. The use of an automatic test-ing tool (rather than a manual selection of faults found byhuman testers) is a protection against bias. AutoFix-E suc-ceeded in proposing valid corrections for 16 of the faults. Wethen showed some failed executions to a few highly experi-enced programmers and asked them to propose their own

    corrections; in several cases the results were the same.The remainder of this paper is organized as follows: Sec-tion 2 illustrates AutoFix-E from a users perspective on anexample program. Section 3 describes the individual stepsof our approach. Section 4 evaluates the effectiveness of theproposed fixes. After discussing related work in Section 5,Section 6 presents future work. All the elements necessaryto examine and reproduce these and other results describedin this article are available for download, as discussed in Sec-tion 7: the source code of AutoFix-E, detailed experimentalresults, and instructions to rerun the experiments.

  • 8/10/2019 Automated Debbuggin For Prgrams

    2/11

    Listing 1: Calling duplicate when before holds vio-lates the precondition of item.

    1 duplicate (n: INTEGER): like Current2 Copy of subset beginning at cursor position3 and having at most n elements.4 require5 non negative: n 0

    6 local7 pos: CURSOR8 counter: INTEGER9 do10 pos := cursor11 create Result.make12 from until(counter= n) or else afterloop13 Result.put left (item)14 forth15 counter := counter+ 116 end17 go to (pos)18 end19

    20

    item: G Current item21 require22 not off: (not before) and(notafter)

    2. AN AUTOFIX EXAMPLELet us first demonstrate AutoFix-E from a users per-

    spective. Listing 1 shows an excerpt of the EiffelBase classTWO WAY SORTED SETa collection of objects, with acursor iterating over the collection. The cursor may pointbefore the first or after the last element; these conditionscan be queried as boolean functions before and after.

    The duplicate routine takes a nonnegative argument nand returns a new set containing at most nelements startingfrom the current position ofcursor. After saving the current

    position of cursor as pos and initializing an empty set asResult, duplicateenters a loop that navigates the set datastructure and iteratively adds the current item to Result, calls forth to move to the next element in the set, andupdates the counter.

    During automated testing, AutoTest discovered a defectin duplicate. The precondition of routine item (also shownin Listing 1) requires that not before and not after bothholdthat is, the cursor is neither before the first elementnor after the last. The loop exit condition, however, onlyenforces the second condition. So if a call to duplicate oc-curs when beforeholds, the call to itemwill violate itemsprecondition, resulting in a failure.

    For this failure, AutoFix-E automatically generates thefix shown in Listing 2, replacing the lines 1315 in Listing 1.The fix properly checks for the previously failing condition(before) and, by calling forth, brings the program into astate such that the problem does not occur.

    How does AutoFix-E generate this fix? Two properties inparticular distinguish AutoFix-E from previous work in thearea, namely the reliance oncontractsand on boolean queryabstraction:

    Contracts associate specifications, usually partial, withsoftware elements such as classes or routines (meth-ods). The work on AutoTest has already shown thatcontracts significantly boost automatic testing [4].

    Listing 2: Fixed loop body generated by AutoFix-E13 if before then14 forth15 else16 Result.put left (item)17 forth18 counter := counter+ 1

    19 end

    AutoFix-E illustrates a similar benefit for automaticcorrection: as a failure is defined by a contract viola-tion, knowledge about the contract helps generate a fixnot just by trying out various syntactical possibilities,but by taking advantage of deep semantic knowledgeabout the intent of the faulty code. For example, ifthe evidence for the failure is that an assertion q isviolated, and analysis determines that a certain rou-tine r has qas part of its postcondition, adding a callto r may be a valid fix.

    Boolean queries of a class (boolean-valued functions with-out arguments) yield a partition of the correspond-ing object state into a much smaller set of abstractstates. This partition has already proved useful fortesting [15], and it also helps for automatic correction,by providing the basis for state/transition models.

    These two properties form the core of our approach, de-tailed in the following section.

    3. HOW AUTOFIX WORKSLet us now describe the detailed steps of our approach.

    Figure 1 summarizes the individual steps from failure to fix,detailed in the following subsections.

    3.1 Contracts and CorrectnessAutoFix-E works on Eiffel classes [8] equipped with con-

    tracts (made of assertions [19]). Contracts constitute thespecification of a class and consist of preconditions(require), postconditions (ensure), intermediate assertions(check), and class invariants (translated into additionalpostcondition and precondition clauses in all the examples).

    Contracts provide a criterion to determine the correctnessof a routine: every execution of a routine starting in a statesatisfying the precondition (and the class invariant) mustterminate in a state satisfying the postcondition (and theclass invariant); every intermediate assertion must hold inany execution that reaches it; every call to another routinemust occur in a state satisfying that routines precondition.

    3.2 Testing and Fault DiscoveryTesting a routine with various inputs can reveal a fault in

    the form of an assertion violation. Testing can be manualwith the programmer providing a number of test cases ex-ercising a routine in ways that may make it failor au-tomatic. In previous work, we developed AutoTest [20], afully automated random testing framework for Eiffel classes.AutoTest generates as many calls as possible in the avail-able testing time, and reports faults in the form of minimal-length failing runs. In practice, AutoTest regularly findsfaults in production software [4]. The faults found by Au-toTest are a natural input to AutoFix-E; the integration of

  • 8/10/2019 Automated Debbuggin For Prgrams

    3/11

    Eiffel Class (3.1) Behavior Models (3.5)

    Fault Profile (3.4)

    !

    Fix Candidates (3.6-7) Validated Fixes (3.8)

    > put_left(item)

    Line 13:

    > bind()

    Line 11:

    < unbind()

    Line 15:

    > put_left(item)

    Line 13:

    Object States (3.3)

    "

    !

    Test Case Runs (3.2)

    before !off

    Line 13:

    before !off

    Line 13:

    Figure 1: How AutoFix-E works. We take an Eiffel class (Section 3.1) and generate test cases with Au-toTest (Section 3.2). From the runs, we extract object states using boolean queries (Section 3.3). By com-paring the states of passing and failing runs at the failing location, we generate a fault profile (Section 3.4),an indication of what went wrong in terms of abstract object state. From the state transitions in passing runs,we generate a finite-state behavioral model (Section 3.5), capturing the normal behavior in terms ofcontrol.Both control and state guide the generation of fix candidates (Section 3.6), with special treatment of linearassertions (Section 3.7). Only those fixes passing the regression test suite remain (Section 3.8).

    the two tools is intended to provide a completely automatedpush-button solution to the testing-debugging phases ofsoftware development. AutoFix-E can, however, work withany set of test cases exposing a fault, whether these caseshave been derived automatically or manually.

    3.3 Assessing Object StateTo generate fixes, we first must reason about where the

    failure came from. For this purpose, we assess object stateby means of boolean queries.

    3.3.1 Argument-less Boolean Queries

    A class is usually equipped with a set of argument-less,boolean-valued functions (called boolean queries from nowon), defining key properties of the object state: a list isempty or not, the cursor is on an item or off all items,a checking account is overdraft or not. n boolean queriesfor a class define a partition of the corresponding objectstate space into 2n abstract states. As n is rarely higher

    than about 15 in practice, the resulting size is not too large,unlike the concrete state space which is typically intractable.Intuitively, such a partition is attractive, because for well-designed classes, boolean queries characterize fundamentalproperties of the objects. Our earlier work [15] confirms thisintuition for testing applications, showing that high bool-ean query coverage correlates with high testing effectiveness;our earlier work on fix generation [5] also relied on similarabstractions. The reasons why state abstraction is effectiveappear to be the following:

    Being argument-less, boolean queries describe the ob-ject state absolutely, as opposed to in relation withsome given arguments.

    Boolean queries usually do not have any precondition,hence they posses a definite value in any state. Thisfeature is crucial to have a detailed object state at anypoint in a run.

    Boolean queries are widely used in Eiffel contracts,which suggests that they model important propertiesof a class.

    In all, it is very likely that there exist some boolean querywhose value in failing runs is different than in error-free runs.

    We have seen some examples of useful boolean queries inSection 2: before and after describe whether the internal

    cursor is before the first element or after the last elementin an ordered set. is empty is another boolean query of theclass which, as the name indicates, is true whenever the setcontains no elements.

    3.3.2 Complex PredicatesThe values of boolean queries are often correlated. Our

    experience suggests thatimplicationscan express most cor-relations between boolean queries. For example, executingthe routine forthon an empty set changes after to true; inother words, the implication

    is emptyimplies after

    characterizes the effect of routine forth. Such implicationsare included as predicates for the object state model, be-cause they are likely to make the abstraction more accurate.

    Trying out all possible implications between pairs of bool-ean queries is impractical and soon leads to a huge numberof often irrelevant predicates. We therefore use two sourcesfor implications:

    Contracts. We mine the contractsof the class under anal-ysis for implications; these implications are likely tocapture connections between boolean queries.

    Mutations. We also found it useful to generate threemu-tationsfor each implication, obtained by negating theantecedent, the consequent, or both. These mutationsare often helpful in capturing the object state in faultyruns.

    In our ongoing example, the implication is emptyimplies aftermutates into: (1) not is emptyimpliesafter; (2) is empty implies not after; and (3) not

    is empty implies not after.

    The resulting set of predicates is called predicate set anddenoted by P; denotes instead the set P {not p | p P} including all predicates in the predicate set and theirnegations.

    3.3.3 Predicate Pruning

    The collection of boolean queries and implications usuallycontains redundanciesin the form of predicates that are co-implied (i.e., they are always both true or both false) orcontradictory (i.e., if one of them is true the other is falseand vice versa). Some predicates may also be subsumed by

  • 8/10/2019 Automated Debbuggin For Prgrams

    4/11

    class invariants. Redundant predicates increase the size ofthe predicate set without providing additional information.

    To prune redundant predicates out of the predicate set,an automated theorem prover is the tool of choice. We useZ3 [6] to check whether a collection of predicates is con-tradictory or valid, and remove iteratively redundant pred-icates until we obtain a suitable setneither contradictorynor valid. While there is in general no guarantee that this

    procedure gives a minimal set, the relationships among pred-icates are sufficiently simple that this unsophisticated ap-proach gives good results.

    3.4 Fault AnalysisOnce we have extracted the predicates, we need to char-

    acterize thecrucial differencebetween the passing and fail-ing runs. To characterize this difference, AutoFix-E runsDaikon [11] to find out which of the predicates in (deter-mined in Section 3.3) hold over all passing and failing runs,respectively.1 When these state invariants are taken imme-diately before the instruction whose execution triggers thefailure, they characterize thepassing stateandfailing state,respectively; their difference is what characterizes the failurecause.

    3.4.1 Fault Profiles

    We first consider all passing runs and infer a state in-variant at each program location that is executed. Then,we perform the same task with the set of failing runs forthe bug currently under analysis; the failing runs cannotcontinue past the location of failure, hence the sequence ofinvariants will also stop at that location. The results of thisphase are two sequences of invariants, a pair for each exe-cuted location up to the point of failure.

    Comparing these pairs of failing and passing states resultsin a fault profile. The fault profile contains all predicatesthat hold in the passing run, but not in the failing run.This is an an alleged indication of what went wrong in the

    failing run in terms of abstract objectstate.In the case of routine duplicate, AutoFix-E finds out that

    before the location of the exception, the following state in-variant hold only in failing test cases, and thus are part ofthe fault profile:

    before exhausted off

    Any of these three predicates may be the cause of the theprecondition violation. To narrow down the cause further,we use two heuristics:

    1. We keep only predicates in the failing state invariantsthat imply the negation of the violated assertion.

    2. Among those remaining predicates, we attempt to findout a strongest one (if possible).

    Again, Z3 does the reasoning. In our example, all threepredicates imply the negation of the violated precondition,so the first heuristic does not help. The second heuris-tic, though, finds out that before implies off and beforeimplies exhausted. Being the antecedent in both implica-tions, before is very likely to be the cause of the assertionviolation.

    1While Daikon also reports other properties as invariants,we are only interested in the boolean query predicates asdiscussed in Section 3.3.

    Formally, the fault profile is constructed as follows: Fora given location , let I+ and I

    be the sets

    of predicates characterizing the passing and failing state,respectively. Thefault profileat location is the set of all predicates p for which p I+ p I

    holds; it

    characterizes p otential causes of errors at.

    3.4.2 Fixing as a Program Synthesis Problem

    Using the fault profile at the program point where a fail-ure occurs, we can formulate the problem of finding a fix asaprogram synthesisproblem. Let fail state and pass statebe the invariants characterizing the failing and passing runs,respectively. Any piece of code that can drive the programstate from fail state to pass state is a candidate fix which,if executed at the point of failure, would permit normal con-tinuation of the failing run. Formally, we are looking for aprogram fix that satisfies the specification:

    require fail state do fix ensure pass state

    The underlying assumption is that program fixis in generalsufficiently simpleconsisting of just a few lines of codeand that, consequently, it can be generated automatically.

    3.5 Behavioral ModelsNow that we can characterize the difference between pass-

    ing and failing state, we need to know how toreachthe pass-ing statethat is, how to synthesize a fix. Building on ourprevious experiences with Pachika [5], AutoFix-E extracts asimplefinite-state behavioral model fromal lerror-free runsof the class under analysis.

    The behavioral model represents a predicate abstractionof the class behavior. It is a finite-state automaton whosestates are labeled with predicates that hold in that state.Transitions are labeled with routine names: a routine mwith specification requirepreensurepostappears on tran-sitions from a state where preholds to a state where post

    holds.

    Figure 2: State transitions of forth

    As an example, Figure 2 shows the behavioral model forthe forth routine from our running example. We can seethat if the initial state was is empty, after will always holdafter a call to forth. Also, after invoking forth, notbeforewill always hold. Therefore, if we want to reach the notbefore state, as we would do in our running example to

    avoid violating the precondition, invoking forthis a possibleoption.

    In general, the built abstraction is neither complete norsound because it is based on a finite number of test runs.Nonetheless, our experiments showed that it is often suffi-ciently precise to guide the generation of valid fixes.

    In the remainder of this paper, we call routinesmutatorsif they change the state according to the behavioral model;a sequence of mutators (reaching a specific state) is called asnippet.

  • 8/10/2019 Automated Debbuggin For Prgrams

    5/11

    3.5.1 Reaching States

    To derive a possible fix from a behavioral model, we needto determine sequences of routine calls which can change theobject state appropriately. Formally, this works as follows.

    For every predicate p in a fault profile, Mp de-

    notes the set of mutators: the routines that, according tothe finite-state model, drive the object from a state wherep does not hold to a state satisfying p. For a set of predi-

    cates , M

    denotes the set of common mutators pMp.Sequences of mutators are called snippets. A snippet for

    a set of predicates {p1, p2, . . .} is any sequence of mutatorsm1,m2, . . . that drive the object from a state where noneof{p1, p2, . . .} holds to one where all of them hold. It maybe the case that a routine is the mutator for more than onepredicate, hence the length of the snippet need not be thesame as the size of the set of predicates it affects. For therunning example, a predicate off is equivalent to beforeor after, hence any mutator for before or after is also amutator for off.S[] denotes the set of all snippets for the set or predi-

    cates :

    S[]

    {1,2,...}Part()m1M

    1 ,m2M2 ,...

    m

    1,m

    2, . . .

    where Part(V) denotes the set of all partitions (of any size)ofV.

    Since the finite-state abstraction is built from observingindividual mutators, not all snippets may actually be exe-cutable or achieve the correct result. Invalid snippets will,however, be quickly discarded in the validation phase. Wefound that being permissive in the snippet generation phaseis a better trade-off than adopting more precisebut signif-icantly more expensivetechniques such as symbolic execu-tion or precise predicate abstraction.

    3.6 Generating Candidate FixesNow that we know about fault-characterizing states (Sec-

    tion 3.4) and mutators between these states (Section 3.5),we can leverage them to actually generate fixes. AutoFix-Ebuilds a set of possible candidate fixes in two phases. First,it selects a fix schemaa template that abstracts commoninstruction patterns. Then, it instantiates the fix schemawith actual conditions, as extracted from the fault profile(Section 3.4), and routine calls, as obtained from the behav-ioral model (Section 3.5). The set of possible instantiationsconstitutes a set ofcandidate fixes.

    3.6.1 Fix Schemas

    To restrict the search space for code generation, AutoFix-

    E uses a set of predefined templates calledfix schemas. Thefour fix schemas currently supported are shown in Table 1.In these schemas, failis instantiated by a predicate,snippetis a sequence of mutators to reach a state, and old stmtsomestatements, in the original program, related to the point offailure.2

    2The AutoFix-E framework can accommodate different im-plementations of these two components and, more generally,different techniques for the generation of fix candidates fromthe specification. Thus, advances in any of these techniques(e.g., [23]) immediately translate into a more powerful au-tomated debugging technique.

    (a) snippetold stmt

    (c) if not fail thenold stmt

    end

    (b) if fail thensnippet

    endold stmt

    (d) if fail thensnippet

    else

    old stmtend

    Table 1: Fix schemas implemented in AutoFix-E.

    (b) if before thenforth

    endResult.put left (item)forthcounter := counter+ 1

    (d) if before thenforth

    elseResult.put left (item)forthcounter := counter+ 1

    end

    Table 2: Fix candidates. By instantiating the fixschemas (b) and (d) from Table 1 for the duplicateexample (Listing 1), we obtain two fix candidates.

    For the running example, AutoFix-E generates two can-didate fixes by instantiating the fix schemas (b) and (d),respectively; they correspond to the routine duplicate withthe two sequences of statements in Table 2 replacing theoriginal loop body.

    3.6.2 Instantiating Schemas

    The instantiation of schemas starts from the locationwhere the fault occurred and might possibly backtrack toprevious locations in the failing run if no valid fix can begenerated at the point of failure.

    The instantiation of schemas for a location is done ex-haustively as follows:

    fail takes one of the following values:

    1. not p, for a single predicate p ;

    2. notp1 and not p2 and . . ., for a selection ofpredicates p1, p2, . . . ;

    3. not violated clause, where violated clause is theoriginally violated assertion clause revealing thefault.

    The first case is the most common, and is sufficientin the large majority of cases. The second case doesnot necessary lead to a huge number of selections, be-cause is often a small set. The third case is use-ful when the predicates in cannot precisely charac-terize the violated assertion that caused the failure;in these cases, we simply replicate verbatim the veryclause that was violated.

    snippet takes any value in S[].

    old stmt takes one of the following values:

    1. the lone statement at location ;

  • 8/10/2019 Automated Debbuggin For Prgrams

    6/11

    2. the block of statements that immediately con-tains .

    Accordingly, the instantiated schema replaces thestatement at location or the whole block.

    3.7 Linearly Constrained AssertionsIn contract-based development, many assertions take the

    form of linear constraintsthat is, an assertion consistingof boolean combinations of linear inequalities over programvariables and constants. As an example of such linearly con-strained assertions (or linear assertionsfor short), considerthe precondition of the routine put i th, which inserts itemvat position iin a sorted data structure:

    put i th(v: G; i: INTEGER) requirei 1 and icount

    The precondition requires that i denotes a valid positionin the range [1..count], wherecountis the total number ofelements in the structure.

    As linear assertions are common in contracts, they re-quire special techniques for fix generation in case they areviolated. The rest of this section describes them succinctly

    and illustrates how they are integrated within the generalautomated debugging framework.

    3.7.1 Fault Analysis

    The violation of a linear assertion occurs when a variabletakes a value that does not satisfy the constraint. In theexample, calling routine put i th with i= 0 triggers a pre-condition violation.

    A fix for a linear assertion violation should check whetherthe variable subject to the linear constraint takes a valuecompatible with the constraint itself and, in case it doesnot happen, it should change the value into a valid one.This means that we cannot treat linear constraints as atomicpropositions, but we must open the box and reason about

    their structure. More precisely, it is important to determinewhat is a variableand what is a constant in a linear asser-tion. A variable is the parameter that must be changed ( iinthe example), while everything else is a constantwith re-spect to the constraintand should not be modified (countand, obviously, 1 in the example).

    If we can tell variables and constants apart, constraintsolving can determine a valid value for variables: an ex-tremal solution to the constraint, expressed as a symbolicexpression of the constants. Finally, injecting the valid valueinto the routine using suitable fix schemas produces a can-didate fix.

    3.7.2 Variable Selection

    Some heuristics can guess which identifiers occurring in alinear assertions are variables and which are constants. Theheuristics assign a weight to each identifier according to thefollowing guidelines; the identifier with the least weight willbe the variable, and everything else will be a constant.

    Routine arguments in preconditions receive lowerweights than other identifiers.

    In any assertion, an identifier receives a weight in-versely proportional to the number of its occurrencesin the assertion (e.g., appearing twice weights less thanappearing once).

    Identifiers that the routine body can assign to receiveless weight than expressions that cannot change.

    To account for the imperfection of this weighting scheme,it is useful to introduce a bit of slack in the selection of thevariable and trying out more than one candidate among theidentifiers with low weights. If two different selections bothwork as a fix, the one with the lowest weight will eventuallybe chosen as the best fix in the final ranking of valid fixes.

    3.7.3 Fixes for Linear Assertions

    Fixes for linear assertions are generated in two phases:first, we select a value for the variable that satisfies the con-straint, and then we plug the value into a fix schema andinject it in the routine under analysis.

    Given a linear assertion and a variable v, we look forextremal values ofv that satisfy . AutoFix-E uses Mathe-matica [17] to solve for maximal and minimal values ofvas a function of the other parameters (numeric or symbolic)in . To increase the quality of the solution, strengthenwith linear assertions from the class invariants which shareidentifiers with . In the example of put i th, the class in-variant count 0 is added to when looking for extrema.

    The solution consists of the extremal values 1 andcount.We use the following schema to build candidate fixes for

    linear assertions.

    if not constraint thennew stmtelse old stmtend

    The violated linear assertion replaces constraintin the sche-ma. The rest of the schema is instantiated as follows, for anextremal value ext for variablev.

    Precondition violation: old stmt is the routine invoca-tion triggering the fault; new statement is a call tothe same routine withextreplacingv; the instantiatedschema replaces the faulty routine invocation in thecandidate schema.

    Postcondition or other assertion violation: old stmtis empty, new statement is the assignment v := ext,and the instantiated schema precedes the checkstate-ment or the end of the routine.

    A faulty call put i th(x, j) is a case of precondition viola-tion, which we would handle by replacing it with:

    if not ( j 1 and j count ) thenput i th(x, 1)

    elseput i th(x, j)

    end

    3.8 Validating FixesThe previous sections have shown how to generate a num-

    ber ofcandidate fixes. But do these candidates actually cor-rect the fault at hand? For this purpose, AutoFix-E runs allthe candidates through the full set of test cases and retainsthose that pass all runs. A fix is valid if it passes all the(previously) failing test cases and it still passes the originalpassing test cases generated in the fault analysis. Given thatthe contracts constitute the specification, we can even callthe fix acorrection.

    In the example of routine duplicate, we generated twocandidate fixes, shown in Table 2. These two candidates arenow put to the test:

  • 8/10/2019 Automated Debbuggin For Prgrams

    7/11

    The left fix (b) still does not pass test cases wherebeforeholds. In test cases where the set is empty, thefix fails because the first call to forthmakes after true,which causes the violation of the preconditionrequirenot afterof routine forthat its second call.

    The left fix also fails on non-empty sets because twoconsecutive calls to forth skip one element of the set;this misbehavior will cause a violation of the postcon-

    dition of routine duplicate.

    The right fix (d) of routine duplicate is instead a validfix: it does not change the behavior of the loop ifbeforeis false, and it moves the cursor to the first element ifbefore is true, so that all references to itemare error-free.

    Consequently, fix (d) is retained, and finally suggested tothe programmer as a valid fix (Listing 2).

    3.9 Ranking FixesAutoFix-E often findsseveralvalid fixes for a given fault.

    While it is ultimately the programmers responsibility to se-lect which one to deploy, flooding her with many fixes defeats

    the purpose of automated debugging, because understand-ing what the various fixes actually do and deciding whichone is the most appropriate is tantamount to the effort ofdesigning a fix in the first place. To facilitate the selection,AutoFix-E ranks the valid fixes according to two simple met-rics, combining dynamic and static information.

    The experiments in Section 4 will show that these met-rics are sufficient to guarantee that, in the large majorityof cases, a proper fix appears within the first five fixes inthe ranking. Here proper refers to the expectations of areal programmer who is familiar with the code-base underanalysis.

    3.9.1 Dynamic Metric

    Dynamic (semantic) metrics prefer fixes which modify therun-time behavior of passing test cases as little as possible;the intuition is that a good fix does not affect significantlythe behavior of correct runs of the program.

    The dynamic metric estimates the difference in runtimebehavior between the fix and the original (faulty) programover the set of originally passing runs. It is based on statedistance, defined as the number of argument-less booleanand integer queries whose value differ in two object states.

    AutoFix-E sums the state distances over all routine exitpoints in all passing runs for the original and the fixed pro-grams; the final figure gives the value of the dynamic metric.

    3.9.2 Static Metric

    Static (syntactic) metrics favor fixes with simpler textualchanges; the smaller its value, the smaller the textual changein the source code. Fixes that introduce only minor textualchanges are likely easier to understand and maintain, hencepreferableall else being equalover more complex ones.

    We use a simple static metric which combines four syn-tactic measures according to the formula:

    OS + 5 SN + 2.5 BFwhere each weighted factor is the normalized value of thecorresponding syntactic measure. The weights have beendetermined empirically, while the four measures are definedas follows, with reference to the fix schemas in Table 1.

    Old Statements (OS): zero for the fix schemas (a),(b),and the number of statements in old stmt for the fixschemas (c),(d).

    This factor measures the number of original instruc-tions that the fix encloses; schemas (a) and (b) ex-ecute old stmtunconditionally, hence they score zerofor this factor.

    Snippet Size (SN): number of statements in snippet.This factor measures the complexity of the new in-structions used by the fix.

    Branching Factor (BF): number of branches to reachold stmtfrom the point of injection of the instantiatedfix schema.

    This factor measures how deep does the fix bury theoriginal instructions within the fix schema.

    4. EXPERIMENTAL EVALUATIONThis section reports on some experiments that applied

    AutoFix-E to several faults found in production software

    and provides a preliminary assessment of the quality of theautomatically generated fixes.

    4.1 Experimental SetupAll the experiment run on a Windows 7 machine with a

    2.53GHz Intel dual-core CPU and 4 GB of memory.AutoFix-E was the only computationally-intensive processrunning during the experiments. On average, AutoFix-Eruns for 2.6 minutes for each fault.

    4.1.1 Selection of Faults

    We run AutoFix-E on 42 faults detected by AutoTest in 10data structure classes from the EiffelBase and Gobo libraries[9, 12]. Table 3 lists, for each class, its length in lines of

    code (LOC), its number of routines (#R), of boolean queries(#B), and the number of faulty routines (#F).

    Table 3: Classes used in the experiments.Class LOC #R #B #FACTIVE LIST 2,547 155 16 3ARRAYED CIRCULAR 1,912 131 15 9ARRAYED LIST 2,357 149 16 1ARRAYED QUEUE 1,654 109 10 1ARRAYED SET 2,705 162 12 2ARRAY 1,354 93 11 5BOUNDED QUEUE 876 63 11 4DS ARRAYED LIST 2,762 166 8 7LINKED PRIORITY QUEUE 2,375 123 10 3

    TWO WAY SORTED SET 2,871 139 16 4

    The selection covers varied types of faults (Table 4 groupsthem according to the type of assertion they violate), in rou-tines of diverse complexity (Table 5, where the same routinecan originate more than one fault).

    4.1.2 Selection of Test Cases

    The selection of test cases for passing (and failing) runscan affect significantly the performance of automated de-bugging. To reduce this bias and to show that the wholechain testing anddebugging can be fully automated, our

  • 8/10/2019 Automated Debbuggin For Prgrams

    8/11

    Table 4: Types of faults and fixes.Type of fault # Faults # Fixed # ProperPrecondition 24 11 (46%) 11 (46%)Postcondition 8 0 0Check 1 1(100%) 0Class invariant 9 4 (44%) 2 (22%)Total 42 16(38%) 13(30%)

    Table 5: Lines Of Code (LOC) of faulty routines.LOC 15 610 1120 2130 3140 Total# Rout. 15 11 9 3 1 39

    experiments only use test cases generated automatically byAutoTest. AutoFix-E is allowed to discard test cases if theyare redundant and do not add information to the finite-state abstraction; this filters out several test cases producedby AutoTest without affecting the quality of the generatedfixes. In our experiments, the average number of passingand failing test cases for a fault is 9 and 6.5, respectively.

    4.2 ResultsValid fixes. The column # Fixed in Table 4 shows the

    number of faults for which AutoFix-E built at least one validfix. The data shows that AutoFix-E works better on pre-condition and class invariant violations than on other typesof fault. The reason probably lies in the greater simplicityof most preconditionsoften consisting of a single booleanqueryover postconditions.

    Candidate fixes. The number of candidate fixes gener-ated by AutoFix-E for a given fault varies wildly, rangingfrom just a couple for linear assertions up to over a thousandfor assertion violations occurring within a complex loop. Forthe 16 faults which were fixed automatically, AutoFix-E gen-erated an average of 165.75 candidates and 12.1 valid fixes

    per fault, corresponding to an average 25% of valid fixes percandidate.

    Assertion forms. Table 6 correlates the form of a vi-olated assertionwhether it is a single boolean query, animplication, a linear constraint, etc.and the success ofAutoFix-E in finding valid fixes for that form. For eachassertion form, the second column reports the number ofautomatically fixed faults. AutoFix-E is most effective forsimpler assertions consisting of a single boolean query or alinear constraint. In both cases, it is straightforward to trackdown the causes of a fault and there is a higher chance offinding routine calls that restore a valid state.

    Table 6: Relationship between assertion form andvalid fixes.

    Assertion form # Faults # FixedSingle boolean query 6 5 (83%)Implication 2 1 (50%)Linear 7 5 (71%)Other 27 5 (19%)

    Quality of fixes from a programmers perspective.A valid fix is only as good as the test suite it passes. As asanity check on the automatically generated fixes, we manu-ally inspected the top five valid fixes for each fault, accordingto the ranking criterion of AutoFix-E. The rightmost column

    of Table 4 counts the number of faults for which we couldfind at least one properfix among the top five. A properfix is one that fixes the bug without obviously introducingother bugs; it might still not be the best correction, but it isa significant improvement over the buggy code and not justa trivial patch. In all the experiments, whenever a properfix exists it ranks first among the valid fixes, which givesus some confidence in the ranking criteria implemented in

    AutoFix-E.As an additional preliminary assessment of the quality ofthe automatically generated fixes, we selected a few faultsand asked two experienced programmers from Eiffel Soft-ware to write their own fixes for the faults. In 4 out of 6 ofthe cases, the programmers submitted fixes which are iden-tical (or semantically equivalent) to the best fixes producedautomatically by AutoFix-E.

    As an example of multiple valid fixes for the precondi-tion violation in routine duplicate, consider the followingtwo fragments, replacing the loop (lines 1216 in Listing 1).

    fromif before then

    forth

    enduntil ... loop

    original loop bodyend

    fromif before then

    start

    enduntil ... loop

    original loop bodyend

    AutoFix-E discovered these two valid fixes, but rankedthem lower than the one presented in Section 2: the staticmetric prefers corrections that modify locations closest tothe failing statement (inside the loopbody in the example)over those modifying other statements (the from clause).The two new fixes are perfectly equivalent because forth

    and start have the very same effect on an object wherebefore holds. Semantically, they are also equivalent to theone previously shown, as forthis executed only once because

    it is guarded by before. These two fixes, however, achieve abetter run-time performance (in the absence of sophisticatedcompiler optimizations) as they check beforeonly once. Theexpert programmers indeed suggested the rightmost one asa correction. The human preference for using startover theequivalent, in this context, forthis probably due to the nameof the two routines, where start suggests a closer relation tothe notion ofbefore than forth does.

    4.3 Theats to ValidityThe following threats may influence the generalization of

    our results:

    All the classes used in the experiments are data struc-

    ture related. Although they have various semanticsand syntactic complexity, they could not represent pro-gram in general. The effectiveness of AutoFix-E maybe quite different when applies to other classes.

    The assertions being violated in the examples mayalso reflect the particular contracting style of these li-braries; for example they often refer to the notion ofcursor in a data structure. We do not know if thismight bias the results.

    The evaluation of proper fixes was done under the as-sumption that contracts present in classes are correct

  • 8/10/2019 Automated Debbuggin For Prgrams

    9/11

    and strong enough. This may not be true from thepoint of view of the library developers, hence, they maydisagree with some fixes considered as proper here.

    We have not yet performed a large-scale retro-analysisof inferred fixes against fixes actually performed in thehistory of a project. This is part of planned futurework.

    5. RELATED WORKThis section reviews techniques that correct programming

    errors automatically by combining diverse software engineer-ing techniques.

    5.1 Restricted ModelsAutomated debugging is more tractable for restricted

    models of computations; a number of works deal with fixingfinite-state programs automatically (see e.g., [18, 24]).

    Abraham and Erwig developed automated debuggingtechniques for spreadsheets, where the user may introduceerroneous formulas. In [1], they present a technique basedon annotating cells with simple information about their ex-

    pected value. Whenever the computed value of a cell con-tradicts its expected value, the system suggests changes tothe cell formula that would restore its value to within theexpected range. Their method can be combined with au-tomated testing techniques to reduce the need for manualannotations [2].

    5.2 Dynamic PatchingSome fixing techniques workdynamically, that is at run-

    time, with the goal of contrasting the adverse effects of somemalfunctioning functionality and prolonging the up time ofsome piece of deployed software.

    Data structure repair. Demsky and Rinard [7] showhow to dynamically repair data structures that violate their

    consistency constraints. The programmer specifies the con-straints, which are monitored at runtime, in a domain lan-guage based on sets and relations. The system reacts toviolations of the constraints by running repair actions thattry to restore the data structure in a consistent state.

    Elkarablieh and Khurshid develop the Juzi tool for Javaprograms [10]. A user-definedrepOk boolean query checkswhether the data structure is in a coherent state. JuzimonitorsrepOkat runtime and performs some repair actionwhenever the state is corrupted. The repair action is basedon symbolic execution and a systematic search through theobject space. In a paper presenting ongoing work [16], thesame authors outline how the dynamic fixes generated bythe Juzi tool could be abstracted and propagated back tothe source code.

    Memory errors repair. The ClearView framework [22]dynamically corrects buffer overflows and illegal control flowtransfers in binaries. It exploits a variant of Daikon to ex-tract invariants in normal executions. When the inferred in-variants are violated, the system tries to restore them froma faulty state by looking at the differences between the twostates. ClearView can prevent the damaging effects of ma-licious code injections.

    Exterminator [21] is another tool for dynamic patching ofmemory errors such as out-of-bound. It monitors repeatedruns of a program and allocates extra memory to accommo-date out-of-bound references appropriately.

    5.3 Static DebuggingStatic approaches to automated debugging target the

    source code to permanently remove the buggy behavior froma program.

    Model-driven debugging. Some automated debuggingmethods including the one in the present paper rely onthe availability of a finite-state abstraction of the programsbehavior to detect errors and build patches.

    Weimer [25] presents an algorithm to produce patches ofJava programs according to finite-state specifications of aclass. The main differences with respect to our work are theneed for user-provided finite-state machine specifications,and the focus on security policies: patches may harm otherfunctionalities of the program and are not intended to beapplied automatically [25].

    Machine-learning-based debugging. Machine-learn-ing techniques can work in the absence of user-provided an-notations, which is a big advantage to analyze pre-existinglegacy code in languages such as C or Java.

    Jeffrey et al. [14] present BugFix, a tool that summarizesexisting fixes in the form of association rules. BugFix thentries to apply existing association rules to new bugs. The

    user can also provide feedback in the form of new fixesor validations of fixes provided by the algorithm thusameliorating the performance of the algorithm over time.

    Other authors applied genetic algorithms to generate suit-able fixes. Arcuri and Yao [3] use a co-evolutionary algo-rithm where an initially faulty program and some test casescompete to evolve the program into one that satisfies itsformal specification.

    Weimer et al. [26] present a genetic algorithm that takesas input an unannotated program, a set of successful testcases, and a single failing one. After rounds of evolution,the program changes into one that passes all test cases, in-cluding the previously failing one. [26] mitigates the limitedscalability of genetic algorithms by only producing changesthat re-use portions of code available elsewhere in the pro-gram and that are limited to the portions of code that isexecuted when the bug occurs. This, however, requires thatthe failing execution path is different than the successfulexecution path, a restriction which does not apply to ourapproach. Another limitation of [26] resides in its sensitiv-ity to the quality (and size) of the provided test suite, aneffect which is much more limited in our approach whererandom testing techniques can generate a suitable test suiteautomatically.

    Axiomatic reasoning. He and Gupta [13] present atechnique that compares two program states at a faulty lo-cation in the program. Unlike our work, [13] computes thesecharacterizations statically with weakest precondition rea-soning. The comparison between the two program states

    illustrates the source of the error; a change to the pro-gram that reconciles the two states fixes the bug. Besidesthe apparent high-level similarities between the approach of[13] and ours, there are several important details that dif-fer. First, weakest precondition reasoning requires a verydetailed postcondition (e.g., full functional specifications infirst-order logic), which also limits scalability. In addition,[13] computes patches by syntactically comparing the twoprogram states; this restricts the fixes that can be auto-matically generated to limited changes in expressions (forexample in off-by-one errors).

  • 8/10/2019 Automated Debbuggin For Prgrams

    10/11

    5.4 Own Earlier WorkIn [5], Dallmeier, Zeller, and Meyer presented Pachika, a

    tool that automatically builds finite-state behavioral modelsof a set of passing and failing test cases of a Java class. A fixcandidate is then a modification of the model of failing runswhich makes it compatible with the model of passing runs.In practice, Pachika can insert new transitions or delete ex-isting transitions to change the behavior of the failing model.

    While Pachika and AutoFix-E share common origins aspart of the joint AutoFix project (such as behavior mod-els, state abstraction, and creating fixes from transitions),AutoFix-E brings several advances, which constitute thecontributions of this paper. The main contribution is thatAutoFix-E leverages user-provided contracts integratingthem with dynamically inferred information; this shapes andimpacts all stages of fix generation (Section 3). In addition,the concepts offix schemas(Section 3.6), linear assertions(Section 3.7), and ranking fixes (Section 3.9) are unique tothis paper. All these contributions implies a greater flexi-bility in accommodating different templates, more complextypes of fix actions, and a much higher effectiveness andefficiency in generating fixes.

    6. FUTURE WORKDespite our successes, much remains to be done in auto-

    matic fix generation as in our own tool. Our future workwill focus on the following topics:

    Retro-analysis. For further validation of the approach, weplan to run a systematic application of AutoFix-E onthe history of fixed bugs in representative projects, tocompare the results with the fixes that developers ac-tually applied.

    Contract-less environments. In the absence of contracts,

    defects are much harder to locate. We want to leveragedynamic invariants and state models to identify likelyviolations and derive locations for p otential fixes.

    Predicate selection and instantiation. If one only in-cludes predicates that are relevant to a particular faultin the state model, this can result in fewer, yet morerelevant fixes to be generated. Also, the current imple-mentation of AutoFix-E does not instantiate fixschemas with more than one predicate at a time; allthese options are on our agenda.

    Improved behavior models. Our technique relies onboolean queries to build a state model (Section 3.3),hence it works poorly for classes without booleanqueries or such that boolean queries do not captureeffectively salient features of the object state.

    Dependence on executions. We build a finite-state be-havioral model using test runs randomly generated byAutoTest (Section 3.5). This might lead to inconsis-tent or biased behavioral models, which would thengenerate ineffective or needlessly complicated snippets.The current implementation reduces the chances ofthis happening to a minimum with validation (Sec-tion 3.9) and by empirically limiting the maximum ad-mitted length of a snippet to a small number (typically

    two or three instructions). We want to apply more so-phisticated techniques to build the behavioral modelmaybe with some soundness or minimality guaranteeto improve performances.

    Enriching the execution space. As causes for a faultare tracked down by comparing abstract object statesduring passing and failing runs, it is impossible to workon a single fault report without a sufficient number ofpassing runs. Of course, it is always possible to use au-tomated random testing to build a sufficient numberof passing test cases. This limitation does not applyto linear constraints, which we handle differently (Sec-tion 3.7).

    Alternate fault types. We only handle faults in the formof assertion violations. We plan to extend our ap-proach to consider different kinds of faults, such asVoid dereferencing, integer overflow, I/O errors, etc.

    Faults in contracts. We assume contracts are correct, butsome faults may require a fixing of the contracts them-selves. We are working on ways to track down this type

    of faults.

    Improved numerical constraints. For numerically con-strained assertions, we only generates fixes involvingminimal or maximal values. While this choice is em-pirically effective, more subtle faults may require theusage of non-extremal values.

    Finding the best fix. We use very simple metrics to rankvalid fixes; more research in this line is expected toboost the quality of the rankingand to determinewhat makes one fix better than another. One inter-esting approach is to look at known histories of changesand fixes to leverage this history for generating similarfixes.

    7. CONCLUSIONSIn the past decade, automated debugging has made spec-

    tacular advances: First, we have seen the raise of methods toisolate failure causes automatically; then, statistical meth-ods were devised that highlight likely failure locations. Now,we have reached a state where automated debugging trulydeserves its name: we can actually attempt to automaticallygenerate fixes.

    Our experiences with AutoFix-E show that this goal iswithin reach: out of 42 faults found by AutoTest in twowidely used Eiffel libraries, AutoFix-E proposes successfulfixes for 16 faults. Since the entire process is fully auto-matic, this means that in 16 cases, the programmers loadcould be reduced to almost zero. Note that we say almostzerohere, as we still assume that a human should assess thegenerated fixes and keep authority over the code. One mayalso think of systems that generate and apply fixes automat-ically; the risk of undesired behavior may still be preferredto no behavior at all, and can be alleviated by a strong spec-ification in terms of contracts. The present work is aimed atbuilding the base technology and not at resolving such ques-tions, obviously important, about how best to use it as partof a development environment and a development process.Availability. The AutoFix-E source code, and all data andresults cited in this article, are available at:

  • 8/10/2019 Automated Debbuggin For Prgrams

    11/11

    http://se.inf.ethz.ch/research/autofix/

    Acknowledgments. This work is funded by Deutsche For-schungsgemeinschaft, Ze509/4-1 and Hasler-Stiftung, Grantno. 2327 under the title AutoFixPrograms that fix them-selves. The concept of generating fixes from differences inpassing and failing runs was conceived with Andreas Leitner.The authors thank Jocelyn Fiat and Alexander Kogtenkovfor their help in assessing the quality of automatically gen-

    erated fixes. Valentin Dallmeier provided helpful feedbackon an earlier version of this paper.

    8. REFERENCES[1] R. Abraham and M. Erwig. Goal-directed debugging

    of spreadsheets. In2005 IEEE Symposium on VisualLanguages and Human-Centric Computing, pages3744, 2005.

    [2] R. Abraham and M. Erwig. Test-driven goal-directeddebugging in spreadsheets. In IEEE Symposium onVisual Languages and Human-Centric Computing,pages 131138, 2008.

    [3] A. Arcuri and X. Yao. A novel co-evolutionaryapproach to automatic software bug fixing. InIEEE

    Congress on Evolutionary Computation, pages162168. IEEE, 2008.

    [4] I. Ciupa, A. Leitner, M. Oriol, and B. Meyer.Experimental assessment of random testing forobject-oriented software. InISSTA 07, pages 8494,New York, NY, USA, 2007. ACM.

    [5] V. Dallmeier, A. Zeller, and B. Meyer. Generatingfixes from object behavior anomalies. In24thIEEE/ACM International Conference on AutomatedSoftware Engineering. IEEE, 2009.

    [6] L. M. de Moura and N. Bjrner. Z3: An efficient SMTsolver. InTools and Algorithms for the Constructionand Analysis of Systems, 14th InternationalConference, volume 4963 ofLecture Notes in

    Computer Science, pages 337340. Springer, 2008.[7] B. Demsky and M. C. Rinard. Automatic detection

    and repair of errors in data structures. In Proceedingsof the 2003 ACM SIGPLAN Conference onObject-Oriented Programming Systems, Languages andApplications, pages 7895. ACM, 2003.

    [8] ECMA International. Standard ECMA-367. Eiffel:Analysis, Design and Programming Language. 2ndedition, June 2006.

    [9] http://freeelks.svn.sourceforge.net.

    [10] B. Elkarablieh and S. Khurshid. Juzi: a tool forrepairing complex data structures. In30thInternational Conference on Software Engineering,pages 855858, 2008.

    [11] M. D. Ernst, J. Cockrell, W. G. Griswold, andD. Notkin. Dynamically discovering likely programinvariants to support program evolution.IEEETransactions of Software Engineering, 27(2):99123,2001.

    [12] http://sourceforge.net/projects/gobo-eiffel/.[13] H. He and N. Gupta. Automated debugging using

    path-based weakest preconditions. InProceedings ofthe 7th International Conference FundamentalApproaches to Software Engineering, volume 2984 ofLecture Notes in Computer Science, pages 267280,2004.

    [14] D. Jeffrey, M. Feng, N. Gupta, and R. Gupta. BugFix:A learning-based tool to assist developers in fixingbugs. InThe 17th IEEE International Conference onProgram Comprehension, pages 7079. IEEEComputer Society, 2009.

    [15] L. L. Liu, B. Meyer, and B. Schoeller. Using contractsand Boolean queries to improve the quality ofautomatic test generation. In Tests and Proofs, First

    International Conference, TAP 2007, volume 4454 ofLecture Notes in Computer Science, pages 114130,2007.

    [16] M. Z. Malik, K. Ghori, B. Elkarablieh, andS. Khurshid. A case for automated debugging usingdata structure repair. In24th IEEE/ACMInternational Conference on Automated SoftwareEngineering. IEEE, 2009.

    [17] http://www.wolfram.com.

    [18] W. Mayer and M. Stumptner. Evaluating models formodel-based debugging. In 23rd IEEE/ACMInternational Conference on Automated SoftwareEngineering, pages 128137. IEEE, 2008.

    [19] B. Meyer. Object-Oriented Software Construction.

    Prentice Hall, 2nd edition, 1997.[20] B. Meyer, A. Fiva, I. Ciupa, A. Leitner, Y. Wei, and

    E. Stapf. Programs that test themselves.IEEESoftware, pages 2224, 2009.

    [21] G. Novark, E. D. Berger, and B. G. Zorn.Exterminator: Automatically correcting memoryerrors with high probability. Communications of theACM, 51(12):8795, 2008.

    [22] J. H. Perkins, S. Kim, S. Larsen, S. P. Amarasinghe,J. Bachrach, M. Carbin, C. Pacheco, F. Sherwood,S. Sidiroglou, G. Sullivan, W.-F. Wong, Y. Zibin,M. D. Ernst, and M. C. Rinard. Automaticallypatching errors in deployed software. InProceedings ofthe 22nd ACM Symposium on Operating Systems

    Principles, pages 87102. ACM, 2009.[23] S. Srivastava, S. Gulwani, and J. S. Foster. From

    program verification to program synthesis. InProceedings of the 37th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages,pages 313326, 2010.

    [24] S. Staber, B. Jobstmann, and R. Bloem. Finding andfixing faults. In 13th IFIP WG 10.5 AdvancedResearch Working Conference on Correct HardwareDesign and Verification Methods, volume 3725 ofLecture Notes in Computer Science, pages 3549.Springer, 2005.

    [25] W. Weimer. Patches as better bug reports. InProceedings of the 5th International Conference on

    Generative Programming and Component Engineering,pages 181190. ACM, 2006.

    [26] W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest.Automatically finding patches using geneticprogramming. In 31st International Conference onSoftware Engineering, pages 364374. IEEE, 2009.