eth zurich andreas krause martin vechev learning programs ... · martin vechev andreas krause eth...

108
Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich

Upload: others

Post on 28-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • Learning Programs from Noisy Data

    Veselin RaychevPavol BielikMartin VechevAndreas Krause

    ETH Zurich

  • Why learn programs from examples?

    2

    Input/output examples

    often easier to provide examples than specification

    (e.g. in FlashFill)

  • Why learn programs from examples?

    3

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?often easier to provide

    examples than specification(e.g. in FlashFill)

  • Why learn programs from examples?

    4

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?often easier to provide

    examples than specification(e.g. in FlashFill)

    the user may make a mistake in the

    examples

  • Why learn programs from examples?

    5

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?

    Actual goal: produce p that the user really wanted and tried to specify

    often easier to provide examples than specification

    (e.g. in FlashFill)

    the user may make a mistake in the

    examples

  • Why learn programs from examples?

    6

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?

    Actual goal: produce p that the user really wanted and tried to specify

    often easier to provide examples than specification

    (e.g. in FlashFill)

    the user may make a mistake in the

    examples

    Key problem of synthesis: overfits, not robust to noise

  • Learning Programs from Data: Defining Dimensions

    7

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

  • Learning Programs from Data: Defining Dimensions

    8

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

  • Learning Programs from Data: Defining Dimensions

    9

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

  • Learning Programs from Data: Defining Dimensions

    10

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

  • Learning Programs from Data: Defining Dimensions

    11

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

    expands capabilities ofexisting synthesizers

  • Learning Programs from Data: Defining Dimensions

    12

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

    new state-of-the-art precision for programming tasks

    expands capabilities ofexisting synthesizers

  • Learning Programs from Data: Defining Dimensions

    13

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

    new state-of-the-art precision for programming tasks

    expands capabilities ofexisting synthesizers

    Bridges gap between ML and PL

    Advances both areas

  • In this paper

    14

    ● A general framework that handles○ errors in training dataset○ learns statistical models on data○ handles synthesis with millions of examples

    ● Instantiated with two synthesizers○ generalize existing works

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    15

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    16

    Handling noise

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Synthesis with noise: usage model

    17

    Input/output examples

  • Synthesis with noise: usage model

    18

    Input/output examples

    DomainSpecific

    Language

  • Synthesis with noise: usage model

    19

    Input/output examples synthesizer

    DomainSpecific

    Language

  • Synthesis with noise: usage model

    20

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

  • Synthesis with noise: usage model

    21

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

  • Synthesis with noise: usage model

    22

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

    p( )≠

  • Synthesis with noise: usage model

    23

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

    p( )≠new kind of feedback

    from synthesizer

  • Synthesis with noise: usage model

    24

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

    p( )≠new kind of feedback

    from synthesizer● Tell user to

    remove suspicious example, or

    ● Ask for more examples

  • if returnif returnif returnif returnreturn

    Issue: such a program makesno errorsand it may be the only solution to the minimization problem

    Handling noise: problem statement

    25

    D: Input/output examples

    incorrect examples

    synthesizer pbest = arg min errors(D, p)p∈P

    Too long program, hardcodes the input/outputs.Synthesis must penalize such answers

    dataset of input/output

    examples

    space of possible programs in DSL

    Our problem formulation:

    pbest = arg min errors(D, p) + λr(p) p∈P

    regularizerpenalizes long

    programs

    regularization constant

    error rate

  • Noisy synthesis using SMT

    26

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

  • Noisy synthesis using SMT

    27

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

  • Ψ

    Noisy synthesis using SMT

    28

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    29

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    30

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    31

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    32

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    33

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    34

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    UNSAT

    SAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    35

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    UNSAT

    SAT

    best program is with two instructions and makes one error

    formula given to SMT solver

  • Noisy synthesizer: example

    36

    Take an actual synthesizer andshow that we can make it handle noise

  • Implementation: BitSyn

    37

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

  • Implementation: BitSyn

    38

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

    Question: how well does our synthesizer discover noise?(in programs from prior work)

  • Implementation: BitSyn

    39

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

    Question: how well does our synthesizer discover noise?(in programs from prior work)

  • Implementation: BitSyn

    40

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

    Question: how well does our synthesizer discover noise?(in programs from prior work)

    best area to be in. empirically pick λ here

  • So far… handling noise

    ● Problem statement and regularization● Synthesis procedure using SMT● Presented one synthesizer

    Handling noise enables us to solve new classes of problems beyond normal synthesis

    41

  • Handling large datasets

    Handling noise

    New probabilistic modelsContributions

    42

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Handling large datasetsHandling large datasets

    Handling noise

    New probabilistic modelsContributions

    43

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Fundamental problem

    44

    Large number of examples:

    pbest = arg min cost(D, p)p∈P

  • Fundamental problem

    45

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    p∈P

  • Fundamental problem

    46

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    computing cost(D, p)

    O( |D| )

    p∈P

  • Fundamental problem

    47

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    computing cost(D, p)

    O( |D| )

    Synthesis: practically intractable

    p∈P

  • Fundamental problem

    48

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    computing cost(D, p)

    O( |D| )

    Synthesis: practically intractable

    Key idea: iterative synthesis onfraction of examples

    p∈P

  • Our solution: two components

    given dataset d, finds best program

    49

    pbest = arg min cost(d, p)p∈P

    Program generator

    Synthesizer for small number of examples

  • Our solution: two components

    given dataset d, finds best program

    50

    pbest = arg min cost(d, p)p∈P

    Program generator

    Datasetsampler

    Picks dataset d ⊆ D

    Synthesizer for small number of examples

    We introduce representative dataset sampler

    Generalize a user providing input/output examples

  • In a loop

    51

    Representative dataset sampler

    Program generator

  • In a loop

    52

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

  • In a loop

    53

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

  • In a loop

    54

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Representative dataset samplerd

  • In a loop

    55

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset samplerd

  • In a loop

    56

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset sampler

    Representative dataset samplerd

  • In a loop

    57

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset sampler

    pbest

    Representative dataset samplerd

  • In a loop

    58

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset sampler

    pbest

    Representative dataset samplerd

    Algorithm generalizes synthesis by examples techniques

  • Representative dataset sampler

    59

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    60

    Costs on small dataset d

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    61

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    62

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    63

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

    p1 p2

  • Representative dataset sampler

    64

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

    Theorem: this sampler shrinks the candidate program search spaceIn evaluation: significant speedup of synthesis

  • So far... handling large datasets

    ● Iterative combination of synthesis and sampling

    ● New way to perform approximateempirical risk minimization

    ● Guarantees (in the paper)

    65

    Representative dataset sampler

    Program generator

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    66

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    67

    Input/output examples

    incorrect examples

    New probabilistic models

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Statistical programming tools

    68

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

  • Statistical programming tools

    69

    1. Train machine learning model

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

  • Statistical programming tools

    70

    1. Train machine learning model

    2. Make predictionswith model

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

  • Statistical programming tools

    71

    1. Train machine learning model

    2. Make predictionswith model

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

    hard-coded modellow precision

  • Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    72

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    73

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    74

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    75

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    76

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    77

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    78

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    79

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    80

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    81

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

  • Problem of existing systems

    Precision. They rarely predict the next statement

    82

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

  • Problem of existing systems

    Precision. They rarely predict the next statement

    83

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

    Very low precision

  • Problem of existing systems

    Precision. They rarely predict the next statement

    84

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

    Very low precision

    Low precision for JavaScript

  • Problem of existing systems

    Precision. They rarely predict the next statement

    85

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

    Very low precision

    Low precision for JavaScript

    Core problem:Existing machine learning

    models are limited and notexpressive enough

  • Key idea: second-order learning

    86

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    87

    1. Synthesizeprogram

    describing a model

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    88

    1. Synthesizeprogram

    describing a model

    i.e. learn the mapping

    2. Train model

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    89

    1. Synthesizeprogram

    describing a model

    i.e. learn the mapping

    2. Train model 3. Make predictionswith this model

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    90

    1. Synthesizeprogram

    describing a model

    i.e. learn the mapping

    2. Train model 3. Make predictionswith this model

    Learn a program that parametrizes a probabilistic model that makes predictions.

    prior models are described by simple hard-coded programs

    Our approach:learn a better program

  • output

    Training and evaluation

    Training example:

    91

    slice

    input

  • output

    Training and evaluation

    Training example:

    92

    slice

    Compute contextwith program p

    input

  • output

    Training and evaluation

    Training example:

    93

    slice

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    input

  • output

    Training and evaluation

    Training example:

    94

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    input

  • output

    Training and evaluation

    Training example:

    95

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    Compute context with program p

    input

  • output

    Training and evaluation

    Training example:

    96

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    slice

    Compute context with program p predict completion

    input

  • output

    Training and evaluation

    Training example:

    97

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    slice

    Compute context with program p predict completion

    input

  • Observation

    Synthesis of probabilistic model can be done with the same optimization problem as before!

    98

    Our problem formulation:

    pbest = arg min errors(D, p) + λr(p) p∈P

    regularizerpenalizes long

    programs

    regularization constant

    evaluation data: input/output

    examples

  • Observation

    Synthesis of probabilistic model can be done with the same optimization problem as before!

    99

    Our problem formulation:

    pbest = arg min errors(D, p) + λr(p) p∈P

    regularizerpenalizes long

    programs

    regularization constant

    evaluation data: input/output

    examplescost(D, p)

  • So far...

    100

    Handling noise

    Synthesizing a model

    Representative dataset sampler

    Techniques are generally applicable to program synthesis

    Next, application for “Big Code” called DeepSyn

  • DeepSyn: Training

    Trained on 100’000 JavaScript files from GitHub

    101

    Program synthesizer

    Representative dataset sampler

    Program generator

    D

  • DeepSyn: Training

    Trained on 100’000 JavaScript files from GitHub

    102

    Program synthesizer

    Representative dataset sampler

    Program generator

    D

    Train model on full dataand best program p

    pD

  • DeepSyn: Evaluation

    50’000 evaluation files (not used in training or synthesis)

    API completion task

    103

  • DeepSyn: Evaluation

    50’000 evaluation files (not used in training or synthesis)

    API completion task

    104

    Conditioning program p Accuracy

    Last two tokens, Hindle et al.[ICSE’12] 22.2%

    Last two APIs per object, Raychev et al.[PLDI’14] 30.4%

    Program synthesis with noise 46.3%

    Program synthesis with noise + dataset sampler 50.4%This

    wor

    k

  • DeepSyn: Evaluation

    50’000 evaluation files (not used in training or synthesis)

    API completion task

    105

    Conditioning program p Accuracy

    Last two tokens, Hindle et al.[ICSE’12] 22.2%

    Last two APIs per object, Raychev et al.[PLDI’14] 30.4%

    Program synthesis with noise 46.3%

    Program synthesis with noise + dataset sampler 50.4%

    We can explain best program. It looks at API preceding completion position and at tokens prior to these APIs.

    This

    wor

    k

  • Scalability

    Handling noise

    Second-order learning

    Handling large datasets

    Q&A

    106

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

    Synthesis of probabilistic

    models

    Extending synthesizers to handle noise

  • Scalability

    Handling noise

    Second-order learning

    Handling large datasets

    Q&A

    107

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

    Synthesis of probabilistic

    models

    Extending synthesizers to handle noise

    Bridges gap between ML and PLAdvances both areas

  • What did we synthesize?

    108

    Left PrevActor WriteAction WriteValue PrevActor WriteAction PrevLeaf WriteValue PrevLeaf WriteValue