generative grading cs398 - web.stanford.edu - generativegrading.pdfgenerative grading cs398. four...

Post on 12-Jul-2020

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Generative GradingCS398

Four Prototypical Trajectories

Checkout this assignment on code.org

The Code.org Dataset

● Students learning nested loops

● 50k students with 1.5 million submissions to a curriculum of 8 exercises.

● 800 human labels across 2 of the exercises.

P4

https://studio.code.org/s/course4/stage/10/puzzle/4

Four Prototypical Trajectories

Can you write an algorithm to give auto-feedback for this assignment?

1E-6

1E-5

1E-4

1E-3

1E-2

1E-1

1E+0

1E+0 1E+1 1E+2 1E+3 1E+4 1E+5

Prob

abili

ty M

ass

(log

scal

e)

Rank Order (log scale)

Edit distance is meaningless, Code is Zifian

100

10-1

10-2

10-3

10-4

10-5

10-6

100 101 102 103 104 105

f(k) =1/ks

PNn (1/ns)

Code Zipf Plot

Exponential combination of decisions. Super fat tailed. Everything looks unique

1E-6

1E-5

1E-4

1E-3

1E-2

1E-1

1E+0

1E+0 1E+1 1E+2 1E+3 1E+4 1E+5

Prob

abili

ty M

ass

(log

scal

e)

Rank Order (log scale)

Hard Problem

1 million unique solutions to programming Linear Regression

Brute force solution?

WWW 2014

100

10-1

10-2

10-3

10-4

10-5

10-6

100 101 102 103 104 105

f(k) =1/ks

PNn (1/ns)

Code Zipf Plot

Hard Problem

1 million unique solutions to programming Linear Regression

Brute force solution?

WWW 2014

1 100 100001 100 10000

1 100 10000

Code.org A Stanford A

Coursera B

1 100 10000

Stanford C

Code Zipf Plot

Evaluation Task

Stanford TAs label 800 submissions

import code.org.*;

public class MySoln {public void run() {move(50);for(int i=0; i<4; i++){if(frontIsClear()) {turnLeft(90);

}for(int j=0; j<i; i++){move(i * 20);turnRight(120);move(10);

}}

}}

Evaluation Task

0.00.10.20.30.40.50.60.70.80.91.0

Feed

back

F1

Scor

e

First Problem (P1)Last Problem (P8)

Traditional Deep Learning Doesn’t Work

Old Gaurd

Humans

Label student code

0.00.10.20.30.40.50.60.70.80.91.0

Feed

back

F1

Scor

e

First Problem (P1)Last Problem (P8)

Old Gaurd

HumansDeep Learning

Inaccurate, Uninterpretable, and Data Hungry

Label student coderun

cond body

putBeeper

putBeeper move

while

Raw

Pre

cond

ition

Raw

Pos

tcon

ditio

n

Multiply Program Matrix… …Encoder Decoder

Prediction LossAutoencoding Loss

Decoder

Piech et Al, ICML 2014

0.00.10.20.30.40.50.60.70.80.91.0

Feed

back

F1

Scor

e

First Problem (P1)Last Problem (P8)

Old Gaurd

HumansDeep Learning

Inaccurate, Uninterpretable, and Data Hungry

Label student code

0.00.10.20.30.40.50.60.70.80.91.0

Feed

back

F1

Scor

e

First Problem (P1)Last Problem (P8)

Old Gaurd

HumansDeep Learning

Data Hungry

Label student code

We need one shot learning

We needverifiability

Four Prototypical Trajectories

We need a strategy! How are we going to solve this problem???

Four Prototypical Trajectories

[Suspense]

Four Prototypical Trajectories

Taste of the future of AI

Humans Don’t Need Much Data

Single training example:

Test set:

Bayesian Program Learning

Lake et al. Human-level concept learning through probabilistic program induction

Bayesian Program Learning

Imagine Students

• Struggle with double for loops

• Confuses logic for deleting bricks

• Strugglewithdoubleforloops

• Confuseslogicfordeletingbricks

Imagine Students

A students“ability”

Infer ability and choices from code

A students“choices”

The resulting code

This is easy and exponential This is hard and linear

0.00.10.20.30.40.50.60.70.80.91.0

Feed

back

F1

Scor

e

First Problem (P1)Last Problem (P8)

Old Gaurd

HumansDeep Learning

Label student code

Generative Understanding

• Strugglewithdoubleforloops

• Confuseslogicfordeletingbricks

0.00.10.20.30.40.50.60.70.80.91.0

Feed

back

F1

Scor

e

First Problem (P1)Last Problem (P8)

Generative Understanding

Old Gaurd

Humans

Label student code

Deep Learning

Zero Shot

Learning

• Strugglewithdoubleforloops

• Confuseslogicfordeletingbricks

Four Prototypical Trajectories

In simple terms: generate a ton of our own labelled examples. It’s an unreasonably

good way to start to hit high performance

Four Prototypical Trajectories

How many decisions with 3 options to get to10K labelled programs?

Four Prototypical Trajectories

How do I write said grammars??

ideaToText

Teachers Articulate N misconceptionsThis is code for a single decision point

Give a name to the choice that the student is making

How do those choices translate into grades?

What does the code look like? Often evokes other decision points

1.

2.

3.

4.

Starter Code in Blocks

Solution in Blocks

Starter Code in PsuedoCode

Solution in PsuedoCode

For(???, ???, ???) {}

For(15, 300, 15) {Repeat(4) {

Move(Counter)TurnLeft(90)

}}

Four Prototypical Trajectories

First challenge: no solution or yes solution.

Second challenge: draw square using repeat or unrolled

Four Prototypical Trajectories

Pro Tips

Keep your grammar clean!

One idea per class

Have a knowledge prior

Decisions are not independent

Be clever in your use of outcomes

Template Choices

top related