eth zurich andreas krause martin vechev learning programs ... · martin vechev andreas krause eth...
TRANSCRIPT
-
Learning Programs from Noisy Data
Veselin RaychevPavol BielikMartin VechevAndreas Krause
ETH Zurich
-
Why learn programs from examples?
2
Input/output examples
often easier to provide examples than specification
(e.g. in FlashFill)
-
Why learn programs from examples?
3
Input/output examples p such that
p( )=
...p( )=
learn a function
?often easier to provide
examples than specification(e.g. in FlashFill)
-
Why learn programs from examples?
4
Input/output examples p such that
p( )=
...p( )=
learn a function
?often easier to provide
examples than specification(e.g. in FlashFill)
the user may make a mistake in the
examples
-
Why learn programs from examples?
5
Input/output examples p such that
p( )=
...p( )=
learn a function
?
Actual goal: produce p that the user really wanted and tried to specify
often easier to provide examples than specification
(e.g. in FlashFill)
the user may make a mistake in the
examples
-
Why learn programs from examples?
6
Input/output examples p such that
p( )=
...p( )=
learn a function
?
Actual goal: produce p that the user really wanted and tried to specify
often easier to provide examples than specification
(e.g. in FlashFill)
the user may make a mistake in the
examples
Key problem of synthesis: overfits, not robust to noise
-
Learning Programs from Data: Defining Dimensions
7
Number of Examples
Handling errors in the dataset
Learned program complexity
-
Learning Programs from Data: Defining Dimensions
8
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
-
Learning Programs from Data: Defining Dimensions
9
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
-
Learning Programs from Data: Defining Dimensions
10
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
-
Learning Programs from Data: Defining Dimensions
11
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
expands capabilities ofexisting synthesizers
-
Learning Programs from Data: Defining Dimensions
12
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
new state-of-the-art precision for programming tasks
expands capabilities ofexisting synthesizers
-
Learning Programs from Data: Defining Dimensions
13
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
new state-of-the-art precision for programming tasks
expands capabilities ofexisting synthesizers
Bridges gap between ML and PL
Advances both areas
-
In this paper
14
● A general framework that handles○ errors in training dataset○ learns statistical models on data○ handles synthesis with millions of examples
● Instantiated with two synthesizers○ generalize existing works
-
Handling noise
New probabilistic models
Handling large datasets
Contributions
15
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
-
Handling noise
New probabilistic models
Handling large datasets
Contributions
16
Handling noise
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
-
Synthesis with noise: usage model
17
Input/output examples
-
Synthesis with noise: usage model
18
Input/output examples
DomainSpecific
Language
-
Synthesis with noise: usage model
19
Input/output examples synthesizer
DomainSpecific
Language
-
Synthesis with noise: usage model
20
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
-
Synthesis with noise: usage model
21
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
-
Synthesis with noise: usage model
22
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
p( )≠
-
Synthesis with noise: usage model
23
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
p( )≠new kind of feedback
from synthesizer
✔
❌
✔
✔
✔
-
Synthesis with noise: usage model
24
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
p( )≠new kind of feedback
from synthesizer● Tell user to
remove suspicious example, or
● Ask for more examples
✔
❌
✔
✔
✔
-
if returnif returnif returnif returnreturn
Issue: such a program makesno errorsand it may be the only solution to the minimization problem
Handling noise: problem statement
25
D: Input/output examples
incorrect examples
synthesizer pbest = arg min errors(D, p)p∈P
Too long program, hardcodes the input/outputs.Synthesis must penalize such answers
dataset of input/output
examples
space of possible programs in DSL
Our problem formulation:
pbest = arg min errors(D, p) + λr(p) p∈P
regularizerpenalizes long
programs
regularization constant
error rate
-
Noisy synthesis using SMT
26
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
-
Noisy synthesis using SMT
27
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
-
Ψ
Noisy synthesis using SMT
28
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
formula given to SMT solver
-
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
29
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
formula given to SMT solver
-
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
30
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
formula given to SMT solver
-
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
31
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
formula given to SMT solver
-
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
32
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
formula given to SMT solver
-
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
33
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
UNSAT
formula given to SMT solver
-
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
34
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
UNSAT
SAT
formula given to SMT solver
-
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
35
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
UNSAT
SAT
best program is with two instructions and makes one error
formula given to SMT solver
-
Noisy synthesizer: example
36
Take an actual synthesizer andshow that we can make it handle noise
-
Implementation: BitSyn
37
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
-
Implementation: BitSyn
38
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
Question: how well does our synthesizer discover noise?(in programs from prior work)
-
Implementation: BitSyn
39
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
Question: how well does our synthesizer discover noise?(in programs from prior work)
-
Implementation: BitSyn
40
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
Question: how well does our synthesizer discover noise?(in programs from prior work)
best area to be in. empirically pick λ here
-
So far… handling noise
● Problem statement and regularization● Synthesis procedure using SMT● Presented one synthesizer
Handling noise enables us to solve new classes of problems beyond normal synthesis
41
-
Handling large datasets
Handling noise
New probabilistic modelsContributions
42
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
-
Handling large datasetsHandling large datasets
Handling noise
New probabilistic modelsContributions
43
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
-
Fundamental problem
44
Large number of examples:
pbest = arg min cost(D, p)p∈P
-
Fundamental problem
45
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
p∈P
-
Fundamental problem
46
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
computing cost(D, p)
O( |D| )
p∈P
-
Fundamental problem
47
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
computing cost(D, p)
O( |D| )
Synthesis: practically intractable
p∈P
-
Fundamental problem
48
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
computing cost(D, p)
O( |D| )
Synthesis: practically intractable
Key idea: iterative synthesis onfraction of examples
p∈P
-
Our solution: two components
given dataset d, finds best program
49
pbest = arg min cost(d, p)p∈P
Program generator
Synthesizer for small number of examples
-
Our solution: two components
given dataset d, finds best program
50
pbest = arg min cost(d, p)p∈P
Program generator
Datasetsampler
Picks dataset d ⊆ D
Synthesizer for small number of examples
We introduce representative dataset sampler
Generalize a user providing input/output examples
-
In a loop
51
Representative dataset sampler
Program generator
-
In a loop
52
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
-
In a loop
53
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
-
In a loop
54
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Representative dataset samplerd
-
In a loop
55
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset samplerd
-
In a loop
56
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset sampler
Representative dataset samplerd
-
In a loop
57
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset sampler
pbest
Representative dataset samplerd
-
In a loop
58
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset sampler
pbest
Representative dataset samplerd
Algorithm generalizes synthesis by examples techniques
-
Representative dataset sampler
59
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
-
Representative dataset sampler
60
Costs on small dataset d
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
-
Representative dataset sampler
61
Costs on small dataset d
Costs on full dataset D
p1 p2
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
-
Representative dataset sampler
62
Costs on small dataset d
Costs on full dataset D
p1 p2
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
-
Representative dataset sampler
63
Costs on small dataset d
Costs on full dataset D
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
p1 p2
-
Representative dataset sampler
64
Costs on small dataset d
Costs on full dataset D
p1 p2
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
Theorem: this sampler shrinks the candidate program search spaceIn evaluation: significant speedup of synthesis
-
So far... handling large datasets
● Iterative combination of synthesis and sampling
● New way to perform approximateempirical risk minimization
● Guarantees (in the paper)
65
Representative dataset sampler
Program generator
-
Handling noise
New probabilistic models
Handling large datasets
Contributions
66
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
-
Handling noise
New probabilistic models
Handling large datasets
Contributions
67
Input/output examples
incorrect examples
New probabilistic models
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
-
Statistical programming tools
68
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
-
Statistical programming tools
69
1. Train machine learning model
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
-
Statistical programming tools
70
1. Train machine learning model
2. Make predictionswith model
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
-
Statistical programming tools
71
1. Train machine learning model
2. Make predictionswith model
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
hard-coded modellow precision
-
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
72
-
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
73
-
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
74
-
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
75
-
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
76
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
-
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
77
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
-
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
78
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
-
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
79
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
-
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
80
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
-
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
81
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
-
Problem of existing systems
Precision. They rarely predict the next statement
82
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
-
Problem of existing systems
Precision. They rarely predict the next statement
83
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Very low precision
-
Problem of existing systems
Precision. They rarely predict the next statement
84
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Very low precision
Low precision for JavaScript
-
Problem of existing systems
Precision. They rarely predict the next statement
85
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Very low precision
Low precision for JavaScript
Core problem:Existing machine learning
models are limited and notexpressive enough
-
Key idea: second-order learning
86
Learn a program that parametrizes a probabilistic model that makes predictions.
-
Key idea: second-order learning
87
1. Synthesizeprogram
describing a model
Learn a program that parametrizes a probabilistic model that makes predictions.
-
Key idea: second-order learning
88
1. Synthesizeprogram
describing a model
i.e. learn the mapping
2. Train model
Learn a program that parametrizes a probabilistic model that makes predictions.
-
Key idea: second-order learning
89
1. Synthesizeprogram
describing a model
i.e. learn the mapping
2. Train model 3. Make predictionswith this model
Learn a program that parametrizes a probabilistic model that makes predictions.
-
Key idea: second-order learning
90
1. Synthesizeprogram
describing a model
i.e. learn the mapping
2. Train model 3. Make predictionswith this model
Learn a program that parametrizes a probabilistic model that makes predictions.
prior models are described by simple hard-coded programs
Our approach:learn a better program
-
output
Training and evaluation
Training example:
91
slice
input
-
output
Training and evaluation
Training example:
92
slice
Compute contextwith program p
input
-
output
Training and evaluation
Training example:
93
slice
Learn a mapping
toUpperCase slice
Compute contextwith program p
input
-
output
Training and evaluation
Training example:
94
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
input
-
output
Training and evaluation
Training example:
95
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
Compute context with program p
input
-
output
Training and evaluation
Training example:
96
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
slice
Compute context with program p predict completion
input
-
output
Training and evaluation
Training example:
97
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
slice
Compute context with program p predict completion
✔
input
-
Observation
Synthesis of probabilistic model can be done with the same optimization problem as before!
98
Our problem formulation:
pbest = arg min errors(D, p) + λr(p) p∈P
regularizerpenalizes long
programs
regularization constant
evaluation data: input/output
examples
-
Observation
Synthesis of probabilistic model can be done with the same optimization problem as before!
99
Our problem formulation:
pbest = arg min errors(D, p) + λr(p) p∈P
regularizerpenalizes long
programs
regularization constant
evaluation data: input/output
examplescost(D, p)
-
So far...
100
Handling noise
Synthesizing a model
Representative dataset sampler
Techniques are generally applicable to program synthesis
Next, application for “Big Code” called DeepSyn
-
DeepSyn: Training
Trained on 100’000 JavaScript files from GitHub
101
Program synthesizer
Representative dataset sampler
Program generator
D
-
DeepSyn: Training
Trained on 100’000 JavaScript files from GitHub
102
Program synthesizer
Representative dataset sampler
Program generator
D
Train model on full dataand best program p
pD
-
DeepSyn: Evaluation
50’000 evaluation files (not used in training or synthesis)
API completion task
103
-
DeepSyn: Evaluation
50’000 evaluation files (not used in training or synthesis)
API completion task
104
Conditioning program p Accuracy
Last two tokens, Hindle et al.[ICSE’12] 22.2%
Last two APIs per object, Raychev et al.[PLDI’14] 30.4%
Program synthesis with noise 46.3%
Program synthesis with noise + dataset sampler 50.4%This
wor
k
-
DeepSyn: Evaluation
50’000 evaluation files (not used in training or synthesis)
API completion task
105
Conditioning program p Accuracy
Last two tokens, Hindle et al.[ICSE’12] 22.2%
Last two APIs per object, Raychev et al.[PLDI’14] 30.4%
Program synthesis with noise 46.3%
Program synthesis with noise + dataset sampler 50.4%
We can explain best program. It looks at API preceding completion position and at tokens prior to these APIs.
This
wor
k
-
Scalability
Handling noise
Second-order learning
Handling large datasets
Q&A
106
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Synthesis of probabilistic
models
Extending synthesizers to handle noise
-
Scalability
Handling noise
Second-order learning
Handling large datasets
Q&A
107
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Synthesis of probabilistic
models
Extending synthesizers to handle noise
Bridges gap between ML and PLAdvances both areas
-
What did we synthesize?
108
Left PrevActor WriteAction WriteValue PrevActor WriteAction PrevLeaf WriteValue PrevLeaf WriteValue