psychometrics 101: know what your assessment data is telling you

PSYCHOMETRICS 101: KNOW WHAT YOUR ASSESSMENT DATA IS TELLING YOUExecutive Director of Sales ExamSoft Worldwide, Inc.Formerly Program Manager for Evaluation and Assessment at The Ohio State University College of Medicine

If you have a question

Please pose questions to the presenter through the “Questions” field of the Go To Webinar tool on the right side of your screen.

All questions will be addressed at the conclusion of the presentation.

THE OVERVIEW

AGENDA• Overview• Types of stats• Interpreting the item analysis report• Examples• General statistical guidelines

Item analysis is not a fool proof answer to these questions.

But… YOU HAVE TO START SOMEWHERE.

“Where do I start?”

“Is this a good or bad question? Can statistics even tell me that?”

THE OVERVIEW

“How can I reconcile what I know about my assessment’s past with

what the data is telling me?”

Item Difficulty/p Value: a decimal representation

of difficulty using the percentage of students

who got the item correct. The lower the decimal the higher

the difficulty.

Upper 27%: of only the top 27%

of scorers what percentage of those

students got the item correct.

Lower 27%: of only the bottom

27% of scorers what percentage of those students got the item correct.

TYPES OF STATSCommon Stats:

Discrimination index: calculated by subtracting the % of the bottom 27% group that got the item correct from the % of the top 27% group that got the item correct. Discrimination index measures

whether the item discriminate between highest and lowest performers.

Point-Biserial: a discrimination statistic that

indicates whether doing well on that specific item correlated with doing well on the exam overall. Thus was that item a good or

bad predictor of overall performance on the exam.

TYPES OF STATSCommon Stats:

Item Difficulty:Range 1.0 to 0.0

Discrimination Index:Range 1.0 to -1.0

Point Biserial:Range 1.0 to -1.0

ITEM ANALYSIS REPORT

But with any statistic it is important to

remember: context matters!

6 Factors to always consider when evaluating item performance:

1. Cheating2. Return on investment3. Conflicting content/faculty4. “Six degrees of Kevin Bacon”5. Author Intent6. Content delivery method

“Stats alone can not tell the whole story..”

EXTRANEOUS FACTORS

Diff (p) Upper A B D E

0.98 100 .00% 0.10 0 1 1 * 1780.00 0.55 0.55 98.340.00 0.02 - 0 .10 0.100.00 0.00 - 0 .02 0.020.00 0.00 0.00 1.000.00 0.00 0.02 0.98Lower 27%

Upper 27%Disc. I ndex 0.00

0.00

0.000.00

00.00

LowerDisc. I ndex

1% Selected

Point Biserial (rpb)

96.15% E0.04

I tem #

Correct Responses Point Biserial

Correct Answer

Response Frequencies (* Indicates correct answer)

C

ITEM ANALYSIS EXAMPLES

Diff (p) Upper A B D E0.66 82.00% 0.28 7 17 * 120 9

3.87 9.39 66.30 4.97- 0.11 - 0 .19 0.28 - 0 .07- 0 .04 - 0 .19 0.36 - 0 .040 .00 0.00 0.82 0.060 .04 0.19 0.46 0.10

Lower CI tem

#Correct Responses Disc.

I ndexPoint

Biserial Correct Answer

Response Frequencies (* I ndicates correct answer)

0.36

Lower 27%Upper 27%Disc. I ndex - 0 .09

0.210.12


46.15% D 2815.47- 0.12

7% Selected


Diff (p) Upper A B D E0.36 52.00% 0.22 35 34 * 66 25

19.34 18.78 36.46 13.81- 0 .09 0.04 0.22 - 0 .06- 0 .15 0.07 0.25 - 0 .020.10 0.24 0.52 0.100.25 0.17 0.27 0.12

I tem #

Correct Responses Disc. I ndex

Point Biserial

Correct Answer

Response Frequencies (* I ndicates correct answer)Lower C

0.25


0.190.04


26.92% D 2111.60- 0 .20

22% Selected


Diff (p) Upper A B D E0.52 64.00% 0.18 61 21 5 0

33.70 11 .60 2 .76 0 .00- 0.10 - 0 .19 0 .12 0 .00- 0.12 - 0 .13 0 .04 0 .000.26 0.04 0 .06 0 .000.38 0.17 0 .02 0 .00

I tem #


Point Biserial

Correct Answer


Lower C0.22

Lower 27%Upper 27%Disc. I ndex 0.22

0.420.64


42.31% C * 9451.930.18

24% Selected


Diff (p) Upper A B D E0.71 90.00% 0.31 0 * 129 30 21

0.00 71.27 16.57 11.600.00 0.31 - 0 .25 - 0 .110.00 0.34 - 0 .23 - 0 .090.00 0.90 0 .06 0.040.00 0.56 0 .29 0.13

I tem #


Point Biserial

Correct Answer


Lower C0 .34


0.020.00


55.77% B 10.55- 0 .16

34% Selected


Desired statistical range’s - opinions differ but most commonly used are:

• Item Difficulty/p Value - Acceptable item difficulty is not a set number but more a correlation with question intention. If you intended the item to be a mastery item you want the difficulty as close to 1.00 as possible. If you desired a discriminating question significantly lower levels are acceptable.

• Upper 27% - if less than 60% of your top performers are getting a question correct a further analysis is needed to see if there are issues with the question. Also if less of your upper 27% get a question correct than your lower 27% then there could also be an issue.

• Lower 27% - generally you never want it to be higher than the upper 27%. As low as 0% can be acceptable as high as 100% can be acceptable if it is a mastery question.

GENERAL GUIDELINES

Desired statistical range’s - opinions differ but most commonly used are:

• Discrimination index – some set specific numbers of acceptable and unacceptable values, I would argue the more accurate guide is that the lower the p value the higher the discrimination index needs to be.

Generally .2 the item is considered to have discriminated, less than that is considered no discrimination. .3 or greater is consider highly discriminating.

• Point-Biserial – similarly to discrimination index some set specific numbers of acceptable and unacceptable values.

Generally .2 and above is considered to have discrimination and have positive association with overall performance on the assessment, lower levels are acceptable for mastery and .3+ would be desired for discriminating questions.

GENERAL GUIDELINES

KR-20

• Used as an overall measure of reliability for the assessment.• Measured on a scale from 0.0 to 1.0 with 0.0 being very poor and 1.0 being excellent.• Quick notes:

1. Heavily influenced by number of questions in assessment2. Heavily influenced by number of students taking the assessments3. The combination can FREQUENTLY lead to false positive and false negative KR-20 values.

GENERAL GUIDELINES

Ways to increase the accuracy/usefulness of your stats:• Item review process

– Format– Level of difficulty– Alternative correct options

• Historical item analysis– Across assessments– Across versions

• Reuse/Recycle

BEST PRACTICES

Click to edit Master title style

Click to edit Master subtitle styleFor More Information:

Call: 1.866.429.8889

Email: [email protected]

Visit: learn.examsoft.com

mailto:[email protected]

psychometrics 101: know what your assessment data is telling you

Education