detecting the learning value of items in a randomized problem set

1

Detecting the Learning Value of Items In a Randomized Problem Set

Zachary A. Pardos, Neil T. HeffernanWorcester Polytechnic InstituteDepartment of Computer Science

2

The Problem

•100s of Items of learning content•1000s of students’ responses

What is the learning value of content in ITS? - does it promote learning? Ways to find out:

• Run an RCE• Data mine responses

(using this method)

3

Dataset

Main problem

hint

• Student main problem responses (correct/incorrect) to 25 problem sets of 2,3 and 4 questions

• Questions within a problem set relate to the same skill

• 160-800 students completed each problem set in 2006-2007 school year data

• 2,400 students total with 54,000 responses (14-16 year olds)

• Questions in the problem sets were presented in a randomized order (required for this analysis)

4

Confound

Main problem

hint

• Since only main question responses are being analyzed, the learning from the main question is confounded with the learning from the scaffolding and hints of the problem.

• learning could be attributed to• The immediate feedback to the

main problem of question 1• The scaffolding of question 1• Applying concepts from question

1 the next main problem

5

ModelModeling or measuring learning requires modeling knowledge• Knowledge Tracing used to model learning

incorrect correct correct

Observables(question answers)

SLatent(skill knowledge)(dichotomous)

S S

P(Skill: 0 → 1) P(Skill: 0 → 1)Parameters(probability of learning)(guess/slip)

P(correct| Skill = 0)P(incorrect| Skill = 1)

Parameters can be learned with the EM algorithm! .. ?

6

Model• Knowledge tracing assumes that learning rate is the same between each opportunity• Our model associates the learning rate with the particular problem that was encountered

Knowledge Tracing

0.12 0.12

• Learning rate between opportunities are the same regardless of which problem the student saw

Our Item Effect Model

0.11 0.15

• Learning rates are an attribute of specific problems• Learning rates must be associated with problem for all permutations.

7

ModelThree question sequence permutations modeled with shared Bayesian parameters

Also known as Equivalence classes of CPTs (conditional probability tables )

8

Reliability measure• Data for a problem set randomly split into 20 equal size bins by

student• Each bin was evaluated separately by the model• Binomial test used to estimate the probability of the null hypothesis,

that each item is equally likely to have the highest learning rate• ie: binopdf(best_choice_mode,20,0.25)

1 2 3 4Split 1 0.0732 0.0267 0.0837 0.0701

... ... ... ... ...Split 20 0.0849 0.0512 0.0550 0.0710

Item learning rates

9

Method ApplicationCompute the learning rates of the three questions in the problem set

Which one is BEST?

Definition of BEST in this analysis: The question in a problem set that has the highest probability of learning.

10

Results•Problem sets with four questions were analyzed and the parameters of prior, guess/slip and learning rates were learned using the described method•The question with the highest probability of learning was identified

Problem set Number of users Best question p value prior q1 rate q2 rate q3 rate q4 rate

16 800 2 0.0652 0.6738 0.1100 0.1115 0.1017 0.1011

11 560 4 0.0170 0.5909 0.0958 0.0916 0.0930 0.1039

14 480 3 0.0170 0.6499 0.1365 0.0977 0.1169 0.1063

25 440 1 0.0652 0.7821 0.1392 0.0848 0.1157 0.1242

282 220 1 0.0039 0.7365 0.1574 0.0999 0.0991 0.1004

33 200 4 0.4394 0.7205 0.1124 0.1028 0.1237 0.1225

39 160 3 0.0652 0.6180 0.0853 0.1192 0.1015 0.0819

•Method needs validation. Another method may report different results and reliability.•Ground truth of the parameters is necessary to validate the method and results

11

Simulation Validation• Since ground truth of learning rates in the real world are

impossible to know, a simulation study was run• The simulation set a variety of values for the parameters of

prior, guess/slip and learning rates and then simulated user responses

• These responses could then be analyzed by the method using the same technique as was used on real data

• An error analysis could done since the underlying simulation parameters of the data were known (did the method pick the right best question?)

• Opportunity to learn what the method can & can’t do

12

Simulation Results

100 200 500 1000 2000 4000 10000 200000

0.2

0.4

0.6

0.8

1

Frequency of detecting reliable learning rate differences

0.0715-0.23400.0380-0.07150.0165-0.03800.0010-0.0165

Number of users

% o

f cor

rect

and

relia

ble

resu

lts

Learning ratedifference range

• More students increases chance of a result• Larger learning difference between questions also increases the change of a result• Of the 160 experiments evaluated, 89 were reported as reliable (56%) • Of the 89 reported reliable results (using p < 0.05), seven were incorrect (7.8% FP)

13

Limitations• Only problem sets of five questions or less can be

reasonably evaluated– Larger problem sets become intractable to compute due to the

exponential increase in nodes and permutations as question count increases• for a four question set (4+4)*24 = 192 nodes• for a five question set (5+5) *120 = 1,200 nodes

– Possible optimization is to only model the sequences for which there is data

• Randomization of question order must be present to control for factors including problem difficulty and allow for detecting learning rates of all item pairs in the problem set

14

Contribution

• No methods previous to this– Estimate learning rates per problem• Allows for the best (and worse) content to be identified

without RCEs– Extends knowledge tracing to support

randomization of problem order– Uses permutations of sequences to estimate

stable Bayesian parameters with EM

15

Conclusions & Future Work

• We think that this method, and ones built off of it, will facilitate better tutoring systems

• Randomization gives many of the properties of a RCE. This method can perform a similar function but in the form of data mining to find what content works best

• Method could be applied to aid in improving accuracy of question skill tagging

detecting the learning value of items in a randomized problem set

Documents

main problem of question

problem sets

learning value of content

particular problem

learning value of items

highest learning rateie

main question responses

randomized problem setzachary