Transcript
Page 1: Analyzing CS1 Student Code Using Code Embeddings · 2020-05-13 · Subject %3Cmxfile%20host%3D%22%22%20modified%3D%222020-03-06T13%3A06%3A47.399Z%22%20agent%3D%22Mozilla%2F5.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2010_13

 Can we learn good vector representations(vector embeddings) of student code? 

Can we use these representations toreason about student code?

  

Many automated tools have been developedto understand and provide feedback tocomputer programming students [4].  

Neural networks have been used to learnembeddings of words (e.g. word2vec [1] andGloVe [2]) and images (e.g. image autoencoders [3]).  

Can we do the same with student code?

  

CS1 student code submission to a Pythonprogramming problem about modifyinglists. A total of 8,476 syntactically correctsubmissions, split into 5,086 for training,1,695 validation and 1,695 test.  

Automatically generated 100 unit tests.

MODEL: 

Analyzing CS1 Student Code Using Code Embeddings

e1

e2

.

.

.

en

y1

y2

y3

y4

.

.

.

y100

Unit-testPredictions

Student Code Submission

1

0

0

.

.

.

0

0

0

1

.

.

.

0

0

1

0

.

.

.

0

...

One

-hot

Enc

odin

g of

Cha

ract

ers

d e f i ... v

0

0

0

.

.

.

0

0

0

0

.

.

.

0

0

0

0

.

.

.

0

h5h4h3 ... hN

Recurrent Neural Network (RNN) EncoderNeural Network

Decoder

Embedding

h2h1

We treat each syntactically-correct studentcode submission as a sequence ofcharacters.

A character-level RNN that uses bi-directionalGated Recurrent Units (GRU). It takes thesequence of characters in a code submission asinput, and outputs its vector embedding.

A multi-layer perceptron (MLP) withtwo fully connected layers. It takesthe vector embedding as input, andpredicts the pass/fail of each of the100 unit tests.

  

The entire model (encoder + decoder) istrained together to predict the unit test output(pass/fail) of the code submissions in ourtraining set.  

Although the encoder is the only part of themodel that we use in our analysis, thedecoder helps the encoder learn goodembeddings that captures the unit testpass/fail information.

 

  Final Training Accuracy 90%  Final Validation Accuracy 84%  Final Test Accuracy 85%

Text

ANALYSIS:

Code Clustering:  

Distances in the embedding space appearsto be meaningful. We find seven tight clustersof correct solutions (dimensionality reductionvia tSNE):

  

Embeddings of words [1] often have structures withanalogies like "king - man + woman = queen". Is thereevidence of similar analogies in our code embedding?

In this example, we: 

1. Embed the first three pieces of code 

2. Compute the addition and subtraction of the vectors 

3. Find the code submission closests to resulting vector

This result suggests that direction vectors in theembedding space are meaningful, and can representcomplex programming concepts.

Key References:[1] Mikolov et al. (2013) Distributed Representationsof Words and Phrases and their Compositionally[2] Pennington et al. (2014) GloVe: Global Vectors forWord Representation[3] Cheng et al. (2018) Deep ConvolutionalAutoEncoder-based Lossy Image Compression[4] Keuning et al. (2019) A Systematic Literature Reviewof Automated Feedback Generation for ProgrammingExercises 

Acknowledgements: We thank Prof. AndrewPetersen for the data and the helpful feedback. Wethank Haotian Yang for some auxiliary model results.

  Can we use code embeddings to predictstudent understanding of concepts?

Can we use code embeddings to help givestudents feedback?

Green and blackclusters at the bottomleft contain solutions

that iterated through thelist forwards

All five clusters inthe upper right

contain solutionsthat iterated

through the listbackwards

Turquois and redclusters differ only in

whitespace

Robert Bazzocchi1, Micah Flemming2, Lisa Zhang2 1University of Toronto, 2University of Toronto Mississauga

 Can we use the embeddings tomake predictions about otherfeatures of the code? 

Task 1 Error Identification:Identify whether a code submissionthrows a Python exception exception.

Task 2 Error Classification:Identify what type of error a codesubmission throws. We grouped theerrors in to four classes: No error,IndexError, RecursionError orTimeout, and Other. 

Input

  Test Accuracy

Task1 Task2

79% 62%Unit test81% 75%Code

    Auxillary (New) Tasks:

91% 87%Embedding

    Research Questions:

    Background:

    Data:

    Future Work:

    Training:

    Code Clustering:     Analogies:

Top Related