analyzing cs1 student code using code embeddings · 2020-05-13 · subject...

1
Can we learn good vector representations (vector embeddings) of student code? Can we use these representations to reason about student code? Many automated tools have been developed to understand and provide feedback to computer programming students [4]. Neural networks have been used to learn embeddings of words (e.g. word2vec [1] and GloVe [2]) and images (e.g. image auto encoders [3]). Can we do the same with student code? CS1 student code submission to a Python programming problem about modifying lists. A total of 8,476 syntactically correct submissions, split into 5,086 for training, 1,695 validation and 1,695 test. Automatically generated 100 unit tests. MODEL: Analyzing CS1 Student Code Using Code Embeddings e 1 e 2 . . . e n y 1 y 2 y 3 y 4 . . . y 100 Unit-test Predictions Student Code Submission 1 0 0 . . . 0 0 0 1 . . . 0 0 1 0 . . . 0 ... One-hot Encoding of Characters d e f i ... v 0 0 0 . . . 0 0 0 0 . . . 0 0 0 0 . . . 0 h 5 h 4 h 3 ... h N Recurrent Neural Network (RNN) Encoder Neural Network Decoder Embedding h 2 h 1 We treat each syntactically-correct student code submission as a sequence of characters. A character-level RNN that uses bi-directional Gated Recurrent Units (GRU). It takes the sequence of characters in a code submission as input, and outputs its vector embedding. A multi-layer perceptron (MLP) with two fully connected layers. It takes the vector embedding as input, and predicts the pass/fail of each of the 100 unit tests. The entire model (encoder + decoder) is trained together to predict the unit test output (pass/fail) of the code submissions in our training set. Although the encoder is the only part of the model that we use in our analysis, the decoder helps the encoder learn good embeddings that captures the unit test pass/fail information. Final Training Accuracy 90% Final Validation Accuracy 84% Final Test Accuracy 85% Text ANALYSIS: Code Clustering: Distances in the embedding space appears to be meaningful. We find seven tight clusters of correct solutions (dimensionality reduction via tSNE): Embeddings of words [1] often have structures with analogies like "king - man + woman = queen". Is there evidence of similar analogies in our code embedding? In this example, we: 1. Embed the first three pieces of code 2. Compute the addition and subtraction of the vectors 3. Find the code submission closests to resulting vector This result suggests that direction vectors in the embedding space are meaningful, and can represent complex programming concepts. Key References: [1] Mikolov et al. (2013) Distributed Representations of Words and Phrases and their Compositionally [2] Pennington et al. (2014) GloVe: Global Vectors for Word Representation [3] Cheng et al. (2018) Deep Convolutional AutoEncoder-based Lossy Image Compression [4] Keuning et al. (2019) A Systematic Literature Review of Automated Feedback Generation for Programming Exercises Acknowledgements: We thank Prof. Andrew Petersen for the data and the helpful feedback. We thank Haotian Yang for some auxiliary model results. Can we use code embeddings to predict student understanding of concepts? Can we use code embeddings to help give students feedback? Green and black clusters at the bottom left contain solutions that iterated through the list forwards All five clusters in the upper right contain solutions that iterated through the list backwards Turquois and red clusters differ only in whitespace Robert Bazzocchi 1 , Micah Flemming 2 , Lisa Zhang 2 1 University of Toronto, 2 University of Toronto Mississauga Can we use the embeddings to make predictions about other features of the code? Task 1 Error Identification: Identify whether a code submission throws a Python exception exception. Task 2 Error Classification: Identify what type of error a code submission throws. We grouped the errors in to four classes: No error, IndexError, RecursionError or Timeout, and Other. Input Test Accuracy Task1 Task2 79% 62% Unit test 81% 75% Code Auxillary (New) Tasks: 91% 87% Embedding Research Questions: Background: Data: Future Work: Training: Code Clustering: Analogies:

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyzing CS1 Student Code Using Code Embeddings · 2020-05-13 · Subject %3Cmxfile%20host%3D%22%22%20modified%3D%222020-03-06T13%3A06%3A47.399Z%22%20agent%3D%22Mozilla%2F5.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2010_13

 Can we learn good vector representations(vector embeddings) of student code? 

Can we use these representations toreason about student code?

  

Many automated tools have been developedto understand and provide feedback tocomputer programming students [4].  

Neural networks have been used to learnembeddings of words (e.g. word2vec [1] andGloVe [2]) and images (e.g. image autoencoders [3]).  

Can we do the same with student code?

  

CS1 student code submission to a Pythonprogramming problem about modifyinglists. A total of 8,476 syntactically correctsubmissions, split into 5,086 for training,1,695 validation and 1,695 test.  

Automatically generated 100 unit tests.

MODEL: 

Analyzing CS1 Student Code Using Code Embeddings

e1

e2

.

.

.

en

y1

y2

y3

y4

.

.

.

y100

Unit-testPredictions

Student Code Submission

1

0

0

.

.

.

0

0

0

1

.

.

.

0

0

1

0

.

.

.

0

...

One

-hot

Enc

odin

g of

Cha

ract

ers

d e f i ... v

0

0

0

.

.

.

0

0

0

0

.

.

.

0

0

0

0

.

.

.

0

h5h4h3 ... hN

Recurrent Neural Network (RNN) EncoderNeural Network

Decoder

Embedding

h2h1

We treat each syntactically-correct studentcode submission as a sequence ofcharacters.

A character-level RNN that uses bi-directionalGated Recurrent Units (GRU). It takes thesequence of characters in a code submission asinput, and outputs its vector embedding.

A multi-layer perceptron (MLP) withtwo fully connected layers. It takesthe vector embedding as input, andpredicts the pass/fail of each of the100 unit tests.

  

The entire model (encoder + decoder) istrained together to predict the unit test output(pass/fail) of the code submissions in ourtraining set.  

Although the encoder is the only part of themodel that we use in our analysis, thedecoder helps the encoder learn goodembeddings that captures the unit testpass/fail information.

 

  Final Training Accuracy 90%  Final Validation Accuracy 84%  Final Test Accuracy 85%

Text

ANALYSIS:

Code Clustering:  

Distances in the embedding space appearsto be meaningful. We find seven tight clustersof correct solutions (dimensionality reductionvia tSNE):

  

Embeddings of words [1] often have structures withanalogies like "king - man + woman = queen". Is thereevidence of similar analogies in our code embedding?

In this example, we: 

1. Embed the first three pieces of code 

2. Compute the addition and subtraction of the vectors 

3. Find the code submission closests to resulting vector

This result suggests that direction vectors in theembedding space are meaningful, and can representcomplex programming concepts.

Key References:[1] Mikolov et al. (2013) Distributed Representationsof Words and Phrases and their Compositionally[2] Pennington et al. (2014) GloVe: Global Vectors forWord Representation[3] Cheng et al. (2018) Deep ConvolutionalAutoEncoder-based Lossy Image Compression[4] Keuning et al. (2019) A Systematic Literature Reviewof Automated Feedback Generation for ProgrammingExercises 

Acknowledgements: We thank Prof. AndrewPetersen for the data and the helpful feedback. Wethank Haotian Yang for some auxiliary model results.

  Can we use code embeddings to predictstudent understanding of concepts?

Can we use code embeddings to help givestudents feedback?

Green and blackclusters at the bottomleft contain solutions

that iterated through thelist forwards

All five clusters inthe upper right

contain solutionsthat iterated

through the listbackwards

Turquois and redclusters differ only in

whitespace

Robert Bazzocchi1, Micah Flemming2, Lisa Zhang2 1University of Toronto, 2University of Toronto Mississauga

 Can we use the embeddings tomake predictions about otherfeatures of the code? 

Task 1 Error Identification:Identify whether a code submissionthrows a Python exception exception.

Task 2 Error Classification:Identify what type of error a codesubmission throws. We grouped theerrors in to four classes: No error,IndexError, RecursionError orTimeout, and Other. 

Input

  Test Accuracy

Task1 Task2

79% 62%Unit test81% 75%Code

    Auxillary (New) Tasks:

91% 87%Embedding

    Research Questions:

    Background:

    Data:

    Future Work:

    Training:

    Code Clustering:     Analogies: