![Page 1: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/1.jpg)
Open Vocabulary Learning on Source Code with a Graph-Structured Cache
Milan CvitkovicCaltech, Amazon Web Services
Badal SinghAmazon Web Services
Anima AnandkumarCaltech
ICML, 2019-6-12
![Page 2: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/2.jpg)
Open Vocabulary Learning
Standard, closed vocabulary model Open vocabulary
1 of 400k word embeddings → 1 of 400k words Any words → Any words
Goal: Models that can reason over flexible sets of inputs and outputs
![Page 3: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/3.jpg)
Open Vocabulary Learning
Motivation: Tasks on source codeExample: Variable naming
Needs an open vocabularyIn our data, 28% of variable names contain out–of–vocabulary word
Inputint <NAME-ME> = assertArraysAreSameLength(expected, actuals, header); for (int i = 0; i < <NAME-ME>; i++) { Object expected = Array.get(expected, i);
Output
‘expected_length’
![Page 4: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/4.jpg)
Strategy: Represent distinct words and usages with graph structure, process with GNN
Graph-Structured Cache
Original input def get_jupyter_addr():
jupyter_addr = ‘localhost’ if is_serving() else Nonereturn jupyter_addr
Same input, represented using a Graph-Structured Cache
getjupyter addr serving
Edge Indicating Word Use
<word> <word> <word> <word>
<word>
<word>
<word> <word> <word> <word> <word> <word> <word>
<word><word> Edge Indicating Next Word
![Page 5: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/5.jpg)
Full Model for Tasks on Source Code
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Input
/** SomeFile.java
public void addFoo(Foo foo){ this.myBaz.add(foo); }
Augment AST with semantic information
Parse code into AST
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Last Use
Field Reference
Next Node
. . .
Strategy from recent work [1]
[1] Allamanis et al. “Learning to Represent Programs with Graphs.” ICLR 2018
![Page 6: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/6.jpg)
Full Model for Tasks on Source Code
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Input
/** SomeFile.java
public void addFoo(Foo foo){ this.myBaz.add(foo); }
Augment AST with semantic information
Parse code into AST
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Last Use
Field Reference
Next Node
. . .
Add Graph-Structured Cache
foo
add
my
baz
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Last Use
Field Reference
Next Node
. . .
Word Use
Our main contribution to prior work
![Page 7: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/7.jpg)
Full Model for Tasks on Source Code
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Input
/** SomeFile.java
public void addFoo(Foo foo){ this.myBaz.add(foo); }
Augment AST with semantic information
Parse code into AST
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Last Use
Field Reference
Next Node
. . .
Add Graph-Structured Cache
foo
add
my
baz
. . .
Method Declaration
Parameter Code Block
Method Call
add Foo
myBaz
add
foo
Name Expr
foo
Field Access
Last Use
Field Reference
Next Node
. . .
Word Use
Convert all nodes to vectors,process with GNN
Output(Depends on task)
![Page 8: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/8.jpg)
● Full-name reproduction accuracy (and top 5 accuracy):
Experiment: Variable Naming Task
For other tasks and experiments, see our poster or paper
![Page 9: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/9.jpg)
Takeaways
Graph-Structured Caches are an appealing strategy for open vocabulary learning
○ Whatever your current embedding strategy, GSC + GNN can augment it
○ No free lunch! About 30% training slowdown.
○ But helps in all cases we tried, sometimes significantly
![Page 10: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter](https://reader033.vdocuments.site/reader033/viewer/2022050400/5f7e45b0ef4439588979348e/html5/thumbnails/10.jpg)
Acknowledgments
● Badal Singh, Anima Anandkumar
● Miltos Allamanis
● Hyokun Yun
● Haibin Lin
Our code, for use on your code
https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache--Code-Preprocessor
https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache