psychometrics 101: know what your assessment data is telling you
TRANSCRIPT
PSYCHOMETRICS 101: KNOW WHAT YOUR ASSESSMENT DATA IS TELLING YOUExecutive Director of Sales ExamSoft Worldwide, Inc.Formerly Program Manager for Evaluation and Assessment at The Ohio State University College of Medicine
If you have a question
Please pose questions to the presenter through the “Questions” field of the Go To Webinar tool on the right side of your screen.
All questions will be addressed at the conclusion of the presentation.
THE OVERVIEW
AGENDA• Overview• Types of stats• Interpreting the item analysis report• Examples• General statistical guidelines
Item analysis is not a fool proof answer to these questions.
But… YOU HAVE TO START SOMEWHERE.
“Where do I start?”
“Is this a good or bad question? Can statistics even tell me that?”
THE OVERVIEW
“How can I reconcile what I know about my assessment’s past with
what the data is telling me?”
Item Difficulty/p Value: a decimal representation
of difficulty using the percentage of students
who got the item correct. The lower the decimal the higher
the difficulty.
Upper 27%: of only the top 27%
of scorers what percentage of those
students got the item correct.
Lower 27%: of only the bottom
27% of scorers what percentage of those students got the item correct.
TYPES OF STATSCommon Stats:
Discrimination index: calculated by subtracting the % of the bottom 27% group that got the item correct from the % of the top 27% group that got the item correct. Discrimination index measures
whether the item discriminate between highest and lowest performers.
Point-Biserial: a discrimination statistic that
indicates whether doing well on that specific item correlated with doing well on the exam overall. Thus was that item a good or
bad predictor of overall performance on the exam.
TYPES OF STATSCommon Stats:
Item Difficulty:Range 1.0 to 0.0
Discrimination Index:Range 1.0 to -1.0
Point Biserial:Range 1.0 to -1.0
ITEM ANALYSIS REPORT
But with any statistic it is important to
remember: context matters!
6 Factors to always consider when evaluating item performance:
1. Cheating2. Return on investment3. Conflicting content/faculty4. “Six degrees of Kevin Bacon”5. Author Intent6. Content delivery method
“Stats alone can not tell the whole story..”
EXTRANEOUS FACTORS
Diff (p) Upper A B D E
0.98 100 .00% 0.10 0 1 1 * 1780.00 0.55 0.55 98.340.00 0.02 - 0 .10 0.100.00 0.00 - 0 .02 0.020.00 0.00 0.00 1.000.00 0.00 0.02 0.98Lower 27%
Upper 27%Disc. I ndex 0.00
0.00
0.000.00
00.00
LowerDisc. I ndex
1% Selected
Point Biserial (rpb)
96.15% E0.04
I tem #
Correct Responses Point Biserial
Correct Answer
Response Frequencies (* Indicates correct answer)
C
ITEM ANALYSIS EXAMPLES
Diff (p) Upper A B D E0.66 82.00% 0.28 7 17 * 120 9
3.87 9.39 66.30 4.97- 0.11 - 0 .19 0.28 - 0 .07- 0 .04 - 0 .19 0.36 - 0 .040 .00 0.00 0.82 0.060 .04 0.19 0.46 0.10
Lower CI tem
#Correct Responses Disc.
I ndexPoint
Biserial Correct Answer
Response Frequencies (* I ndicates correct answer)
0.36
Lower 27%Upper 27%Disc. I ndex - 0 .09
0.210.12
Point Biserial (rpb)
46.15% D 2815.47- 0.12
7% Selected
ITEM ANALYSIS EXAMPLES
Diff (p) Upper A B D E0.36 52.00% 0.22 35 34 * 66 25
19.34 18.78 36.46 13.81- 0 .09 0.04 0.22 - 0 .06- 0 .15 0.07 0.25 - 0 .020.10 0.24 0.52 0.100.25 0.17 0.27 0.12
I tem #
Correct Responses Disc. I ndex
Point Biserial
Correct Answer
Response Frequencies (* I ndicates correct answer)Lower C
0.25
Lower 27%Upper 27%Disc. I ndex - 0 .15
0.190.04
Point Biserial (rpb)
26.92% D 2111.60- 0 .20
22% Selected
ITEM ANALYSIS EXAMPLES
ITEM ANALYSIS EXAMPLES
Diff (p) Upper A B D E0.52 64.00% 0.18 61 21 5 0
33.70 11 .60 2 .76 0 .00- 0.10 - 0 .19 0 .12 0 .00- 0.12 - 0 .13 0 .04 0 .000.26 0.04 0 .06 0 .000.38 0.17 0 .02 0 .00
I tem #
Correct Responses Disc. I ndex
Point Biserial
Correct Answer
Response Frequencies (* I ndicates correct answer)
Lower C0.22
Lower 27%Upper 27%Disc. I ndex 0.22
0.420.64
Point Biserial (rpb)
42.31% C * 9451.930.18
24% Selected
ITEM ANALYSIS EXAMPLES
Diff (p) Upper A B D E0.71 90.00% 0.31 0 * 129 30 21
0.00 71.27 16.57 11.600.00 0.31 - 0 .25 - 0 .110.00 0.34 - 0 .23 - 0 .090.00 0.90 0 .06 0.040.00 0.56 0 .29 0.13
I tem #
Correct Responses Disc. I ndex
Point Biserial
Correct Answer
Response Frequencies (* I ndicates correct answer)
Lower C0 .34
Lower 27%Upper 27%Disc. I ndex - 0 .02
0.020.00
Point Biserial (rpb)
55.77% B 10.55- 0 .16
34% Selected
ITEM ANALYSIS EXAMPLES
Desired statistical range’s - opinions differ but most commonly used are:
• Item Difficulty/p Value - Acceptable item difficulty is not a set number but more a correlation with question intention. If you intended the item to be a mastery item you want the difficulty as close to 1.00 as possible. If you desired a discriminating question significantly lower levels are acceptable.
• Upper 27% - if less than 60% of your top performers are getting a question correct a further analysis is needed to see if there are issues with the question. Also if less of your upper 27% get a question correct than your lower 27% then there could also be an issue.
• Lower 27% - generally you never want it to be higher than the upper 27%. As low as 0% can be acceptable as high as 100% can be acceptable if it is a mastery question.
GENERAL GUIDELINES
Desired statistical range’s - opinions differ but most commonly used are:
• Discrimination index – some set specific numbers of acceptable and unacceptable values, I would argue the more accurate guide is that the lower the p value the higher the discrimination index needs to be.
Generally .2 the item is considered to have discriminated, less than that is considered no discrimination. .3 or greater is consider highly discriminating.
• Point-Biserial – similarly to discrimination index some set specific numbers of acceptable and unacceptable values.
Generally .2 and above is considered to have discrimination and have positive association with overall performance on the assessment, lower levels are acceptable for mastery and .3+ would be desired for discriminating questions.
GENERAL GUIDELINES
KR-20
• Used as an overall measure of reliability for the assessment.• Measured on a scale from 0.0 to 1.0 with 0.0 being very poor and 1.0 being excellent.• Quick notes:
1. Heavily influenced by number of questions in assessment2. Heavily influenced by number of students taking the assessments3. The combination can FREQUENTLY lead to false positive and false negative KR-20 values.
GENERAL GUIDELINES
Ways to increase the accuracy/usefulness of your stats:• Item review process
– Format– Level of difficulty– Alternative correct options
• Historical item analysis– Across assessments– Across versions
• Reuse/Recycle
BEST PRACTICES
Click to edit Master title style
Click to edit Master subtitle styleFor More Information:
Call: 1.866.429.8889
Email: [email protected]
Visit: learn.examsoft.com