predicting a correct program in pbe rishabh singh, microsoft research sumit gulwani, microsoft...

Post on 18-Jan-2016

222 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Predicting a Correct Program in PBE

Rishabh Singh, Microsoft ResearchSumit Gulwani, Microsoft Research

Programming By Examples

IntuitiveNaturalAccessible

Ambiguity!

Excel Forums

300_w1_aniSh_c1_b w1

=MID(“300_w1_aniSh_c1_b”,5,2)

300_w30_aniSh_c1_b w30

=MID($B:$B,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B),””))-1)

Excel Forums

FlashFill [Gulwani POPL2011][Gulwani,Harris,Singh CACM 2012]

DSL

VSAProgram

Heuristics

Benchmarks

DSL

VSAProgram

Ranking

Benchmarks

Handling AmbiguityInput Output

Rick Rashid Mr. Rick

Satya Nadella

Prefer non-constantsInput Output

Rick Rashid Mr. Rick

Satya Nadella Ms. Satya

Prefer smaller substrings as constants

Prefer smaller constantsInput Output

Satya Nadella S. Nadella

Bill Gates

2nd word, last word, 2nd capital followed by 2nd lowercase string….

Machine Learning for

Ranking

“With great power comes great responsibility.”

Labelled Training Data

Machine Learning Algorithm

Efficient Ranking Algorithm

Three Challenges

Training Data Generation

Input Output

Rick Rashid Mr. Rashid

Satya Nadella Mr. Nadella

Peter Lee Mr. Lee

Structuring Hypothesis Space with Sharing in Version-space

Associative Expressions

Fixed-arity Expressions

f(e1, f(e2, f(e3, e4)))

f(e1, e2, e3, e4)

DAG-based sharing

Set-based sharing

Ranking Function f(p)

Assume Linear Functionf(p) = w1* f1 + w2*f2 + … + wk*fk

Learning To Rank

Logistic RegressionListwise Approach

Didn’t work well Too strong a constraint

All relevant pages over irrelevant

Training PhaseInput Output

Rick Rashid Mr. Rick

Satya Nadella Mr. Satya

Peter Lee Mr. Lee

Lower 1st uppercase letterConstant “r”Lower 2nd upper case letter….

Goal: Find ranking function f(p) over program features that ranks positive programs higher than negative programs

Learn DAGs

0 1 2 4 83 5 6 7𝛾1

𝛾50

𝛾 2 𝛾 3 𝛾 4 𝛾5 𝛾 6 𝛾7 𝛾 8

𝛾 9 𝛾10𝛾11 𝛾12 𝛾13

Rick Rashid Mr. Rashid

Satya Nadella Mr. Satya

0 1 2 4 83 5 6 7𝛾1

𝛾50

𝛾 2 𝛾 3 𝛾 4 𝛾5 𝛾 6 𝛾7 𝛾 8

𝛾11 𝛾12 𝛾13

Intersect DAGs

Rick Rashid Mr. Rick

Satya Nadella Mr. Satya

Assign Positive Labels

Rick Rashid Mr. Rick

Satya Nadella Mr. Satya

Assign Negative Labels

Rick Rashid Mr. Rick

Satya Nadella Mr. Satya

Rick Rashid Mr. Rick

Satya Nadella Mr. Satya

Learn ranking function f(p) that ranks programs higher than programs.

Training Phase

Positive Programs

Negative Programs

Rank any positive program over all negative programs

Hierarchical Ranking

Atomic Expression

Substring Expression

Concat Expression

Frequency of tokens, context, neighborhood,…

Length of substring, input, output, constant,…

Number of Arguments, sum, max, min, prod

Evaluation

175 benchmarks30-70 train-test partitionBaseline (Occam’s razor): Smallest & Simplest programs

Ranking Evaluation

LearnRank learns from 1 example for 79% benchmarks

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 1210

1

2

3

4

5

6

Number of Examples for Learning the Test TaskBaseline LearnRank

Benchmarks

Num

ber o

f Exa

mpl

es

Efficiency of Ranking

Ranking for PBEMachine Learning + Synthesis

VSA Sharing FormalizationEfficient Features & Algorithms

General Loss Function for PBE

Thanks! risin@microsoft.com

top related