multiple choice test item analysis facilitator: sophia scott

Multiple Choice Test Item Multiple Choice Test Item Analysis Analysis

Facilitator: Sophia ScottFacilitator: Sophia Scott

Workshop FormatWorkshop Format

1. What is Multiple Choice Test Item Analysis?

2. Background information

3. Fundamentals

4. Guided Practice

5. Individual Practice

What is Multiple Choice Test What is Multiple Choice Test Item Analysis?Item Analysis?

Statistically analyzing your multiple choice test items so that you can ensure that your items are effectively evaluating student learning.

Background informationBackground information

• What does a test score mean?• Reliability and Validity• Norm-referenced or Criterion-referenced

What does a Test Score Mean?What does a Test Score Mean?

• A score that is a reflection of what you really knew (true score) and error (things like atmosphere, nerves etc that modify your true score).

• The purpose of a systematic approach to test design is to reduce error in test taking.

Reliability and ValidityReliability and Validity

• Reliability – the test scores are consistent– Test-retest reliability (measure of an individual score

is consistent over time)

– Inter-rater reliability (consistency of individual judges’ ratings of a performance)

• Validity – the test measured what it was suppose to measure.

You want your test to be both reliable and valid

Norm-referenced or Criterion-referencedNorm-referenced or Criterion-referenced

• Norm-referenced – defines the performance of test-takers in relation to one another. Use the frequency distribution and can rank students. Often used to predict success like GRE or GMAT.

• Criterion-referenced – defines the performance of each test taker without regard to the performance of others. The success is being able to perform a specific task or set of competencies. Uses a mastery curve.

Item analysisItem analysis

How you interpret the results of a test and use individual item statistics to improve the quality of a test

Terms used – Standard deviation – range above and below the

average score, the more the scores are spread out the high the SD

– Mean – average score

– N – number of items on the test

– Raw scores – actual scores

– Variance = standard deviation squared

Fundamentals of Item AnalysisFundamentals of Item Analysis

1. Were any of the items too difficult or easy?

2. Do the items discriminate between those students who really knew the material from those that did not?

3. What is the reliability of the exam?

1. Were any of the items too difficult 1. Were any of the items too difficult or too easy?or too easy?• Use the Difficulty Factor of a question

– Proportion of respondents selecting the right answer to that item

D = c / n

D = difficulty factor

c = number of correct answers

n = number of respondents

• Range 0 -1• The HIGHER the difficulty factor – the easier the

question is, so a value of 1 would mean all the students got the question correct and it may be too easy

Difficulty FactorDifficulty Factor

• Optimal Level is .5• To be able to discriminate between different levels

of achievement, the difficulty factor should be between .3 and .7

• If you want the students to master the topic area, high difficulty values should be expected.

D = c / n

Guided PracticeGuided Practice

What is the D for Items 1-3

StudentRaw score Item 1 Item 2 Item 3 Item 4 Item 5

A 8 a b a d e

B 6 c b e c e

C 6 a c e c b

D 4 a b e a c

E 2 c a b d c

F 8 a b c c e

G 10 a b a c e

H 6 a b c d e

I 8 a c a c e

J 4 a c a d b


• Item # 1 = .8• Item # 2 = .6• Item # 3 = .4

What does it mean?• Item # 1 = .8 may be too easy• Item # 2 = .6 good• Item # 3 = .4 good

Individual PracticeIndividual Practice

What is the D for Items 4-5

StudentRaw score Item 1 Item 2 Item 3 Item 4 Item 5

A 8 a b a d e

B 6 c b e c e

C 6 a c e c b

D 4 a b e a c

E 2 c a b d c

F 8 a b c c e

G 10 a b a c e

H 6 a b c d e

I 8 a c a c e

J 4 a c a d b


• Item # 4 = .5• Item # 5 = .6

What does it mean?• Item # 4 = .5 optimal• Item # 5 = .6 good

Overall, you can say that only item #1 may be too easy

2. Do the items discriminate between those students 2. Do the items discriminate between those students who really knew the material from those that did not?who really knew the material from those that did not?

• The Discrimination Index– DI = (a-b) / n – a=response frequency of the High group– b=response frequency of the Low group– n-number of respondents

• Point- Biserial Correlation

2. Do the items discriminate between those students 2. Do the items discriminate between those students who really knew the material from those that did who really knew the material from those that did not?not?

• Correlates the test-takers performance on a single test item with their total score.

• Range +1.00 to -1.00• Items which discriminate well are those which

have difficulties between .3 and .7


• Positive coefficient means that test-taker who got the item right generally did well on the test as a whole, while those who did poorly on the item did poorly on the test.

• Negative coefficient means that the test-taker who did well on the test missed the item, while those who did poorly got the item right.

• Zero coefficient means that all test-takers got the item correct or incorrect.


The Discrimination Index Steps

1. Rank test scores from highest to lowest, so the highest is at the top of the list

2. Define high group (top 27%)

3. Define low group (bottom 27%)

4. Calculate DI= a-b / n

What does it mean?What does it mean?

Point Biserial• Item # 1 = .48• Item # 2 = .43• Item # 3 = .47• Item # 4 = .62• Item # 5 = .83

Item 5 is close to not discriminating

Overall the test does discriminate

3. What is the reliability of the exam3. What is the reliability of the exam

1. Kuder- Richardson 20

2. Kuder-Richardson 21

3. Cronbach alpha

3. What is the reliability of the exam3. What is the reliability of the exam

• Range 0-1• Higher value indicates a strong relationship

between items and test• Lower value indicates a weaker relationship

between test item and test

r = n / n-1[s2 + Σp1q1 / s2 ]

n = number of items on tests= standard deviation

p1= proportion of correct responses

q1= 1- p1

What does it mean?What does it mean?

Kuder 20• Item # 1 = .88• Item # 2 = .63• Item # 3 = .40• Item # 4 = .76• Item # 5 = .89

Item 3 may not relate as well

Overall the test is reliable

Review Review

Purpose - statistically analyze multiple choice test items to ensure items are effectively evaluating student learning.

1. Were any of the items too difficult or easy? (Difficulty index)

2. Do the items discriminate between those students who really knew the material from those that did not? (Discrimination index or Point Biserial)

3. What is the reliability of the exam? (Kuder 20)

More Practice…More Practice…

Item Difficulty Discrimination

Reliability

# 1 .28 .40 .80

# 2 .30 .68 .76

# 3 .80 .78 .70

# 4 .10 -1.00 .20

Thank you for your TimeThank you for your Time

Any Questions or Comments?Any Questions or Comments?

multiple choice test item analysis facilitator: sophia scott

Documents

test scores

item d

performance of test

easy item

difficulty factor item

test score mean

good item

test terms