assessing the assessment reliability. am i measuring something? validity. am i measuring what i...
TRANSCRIPT
Assessing the AssessmentAssessing the Assessment
Reliability. Am I measuring something?
Validity. Am I measuring what I think I am measuring?
Test-retestInterobserver agreementParallel forms
Split-half (internal consistency)
Content
Criterion
Construct
Reliability is a necessary prerequisite for validity.
ReliabilityReliabilityReliability refers to the consistency of a measure. Across
A reliable test has little measurement error.
Time Versions
Raters
And so on
Observed Score = True Score + Error
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
ReliabilityReliability
True score – true or perfectly True score – true or perfectly accurateaccurate
E.g. the timeE.g. the timeOften a fictional mark in psychologyOften a fictional mark in psychologyBased on multiple measurementsBased on multiple measurementsAggregation = averaging a number of Aggregation = averaging a number of
imprecise measurements to increase imprecise measurements to increase reliabilityreliability
ReliabilityReliabilityTest-retest
Interobserver agreement
Administer same measure at two points in time
Multiple observers/judges/raters/scorers rate same target
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
Parallel formsCompare alternate forms of same test
Split-half reliabilitySplit test into two halves and compare scores across halvesCoefficient alpha: average of all possible split-half reliabilities
ValidityValidityIs the test measuring what I think it is?
There are three types of validity
This requires empirical demonstration
Content Validity
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
Criterion Validity
Construct Validity
ValidityValidityContent Validity
A test has content validity if it adequately covers the area of content it is supposed to cover.Difficult to examine statistically
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
Content validity typically must be built in at beginningCourse exams are the best examples
ValidityValidityCriterion Validity
For criterion validity, tests are evaluated against some criterionOften called predictive validity
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
bilityErrorVariaTrueScore
ariabilityTrueScoreVliability
Re
Most at issue for tests employed to make decisionsSelection of students
Parole decisions
Jobs
Microsoft PowerPoint Presentation
Criterion Validity - ConcurrentCriterion Validity - Concurrent Concurrent validity: does my measure Concurrent validity: does my measure
correlate highly with an established correlate highly with an established measure?measure?
Can my measurement instrument predict Can my measurement instrument predict a criterion that occurs at the same point in a criterion that occurs at the same point in time?time?
Can my measure (i.e. my Can my measure (i.e. my operationalization) distinguish between operationalization) distinguish between two groups that it should be able to two groups that it should be able to distinguish between?distinguish between?
Criterion Validity - PredictiveCriterion Validity - Predictive
Can my measure predict future Can my measure predict future behavior?behavior?– If yes, has predictive validity (a type of If yes, has predictive validity (a type of
criterion validity)criterion validity)
Predictive Validity of the GREPredictive Validity of the GRE
Graduate Record Examination
Kuncel, N.R., Hezlett, S.A., & Ones, D.S. (2001). A comprehensive meta-analysis of
the predictive validity of the Graduate Record Examinations: Implications for graduate school student selection and performance. Psychological Bulletin, 127, 162-181.
Originally designed to measure “basic developed abilities relevant to performance in graduate studies”
Used often and heavily in decisions about admissions
Verbal measure: analogy, antonym, sentence completion, reading comprehension
Quantitative measure: quantitative, quantitative comparison, data interpretation
Analytic measure: analytical and logical reasoningSubject test: acquired knowledge in particular area
Predictive validity of GREPredictive validity of GRE Want to establish predictive validity of GRE Want to establish predictive validity of GRE
What will my criterion of graduate school What will my criterion of graduate school performance be?performance be?
Use several indicators of “performance”:Use several indicators of “performance”:– Graduate GPAGraduate GPA– 11stst year graduate GPA year graduate GPA– Comprehensive exam scoresComprehensive exam scores– Publication citation countsPublication citation counts– Faculty ratingsFaculty ratings– (these are the criteria)(these are the criteria)
Predictive Validity of the GREPredictive Validity of the GRE
Predictive Validity of the GREPredictive Validity of the GRE
Predictive Validity of the GREPredictive Validity of the GRE
SummarySummary
All areas of GRE were found to be valid All areas of GRE were found to be valid predictors of GGPA, 1predictors of GGPA, 1stst year GGPA, faculty year GGPA, faculty ratings, and comprehensive exam scores.ratings, and comprehensive exam scores.
GRE subject tests were consistently better GRE subject tests were consistently better predictors of the criteria than quantitative predictors of the criteria than quantitative or verbal tests; or verbal tests;
also better than UGPAalso better than UGPA
Construct ValidityConstruct Validity
Most important type of validityMost important type of validity
““If this were a measure of …, what would it If this were a measure of …, what would it look like?”look like?”
Depends heavily on theory:Depends heavily on theory: How is this construct related to other constructs?How is this construct related to other constructs? Requires broad thinkingRequires broad thinking In validating my construct, I am validating my theoryIn validating my construct, I am validating my theory
Steps to establish construct validitySteps to establish construct validity
1.1. Need to establish convergent correlationsNeed to establish convergent correlations measures of constructs that theoretically measures of constructs that theoretically shouldshould be be
related to each other are, in fact, observed to be related related to each other are, in fact, observed to be related to each other (that is, you should be able to show a to each other (that is, you should be able to show a correspondence or correspondence or convergenceconvergence between similar between similar constructs)constructs)
2.2. Need to establish divergent correlationsNeed to establish divergent correlations
measures of constructs that theoretically should measures of constructs that theoretically should notnot be be related to each other are, in fact, observed to not be related to each other are, in fact, observed to not be related to each other (that is, you should be able to related to each other (that is, you should be able to discriminatediscriminate between dissimilar constructs) between dissimilar constructs)
3.3. Build nomological netBuild nomological net
Convergent validityConvergent validity
Measures that Measures that shouldshould be related be related are relatedare related
These 4 items are These 4 items are convergingconverging on the on the same thing (don’t same thing (don’t know for sure that know for sure that it is “self-esteem” it is “self-esteem” yetyet
Divergent ValidityDivergent Validity
Self-esteem Self-esteem measures do not measures do not correlate with locus correlate with locus of control of control measuresmeasures
These measure These measure seem to be tapping seem to be tapping different thingsdifferent things
Establishing convergent and Establishing convergent and divergent validitydivergent validity
Nomological NetworkNomological Network
Must develop a “lawful Must develop a “lawful network” for your network” for your measure in order to measure in order to establish construct establish construct validity.validity.
IncludesIncludes– Theoretical frameworkTheoretical framework– Empirical frameworkEmpirical framework– ObservablesObservables
Childhood Psychopathy ScaleChildhood Psychopathy Scale
Lynam, D.R. (1997). Pursuing the psychopath: Capturing the fledgling psychopath in
a nomological net. Journal of Abnormal Psychology, 106, 425-438.“The construct of psychopathy and attendant personality information might profitably be used at the childhood level to identify a more homogeneous group of antisocial children.”
PsychopathyPsychopathy The [psychopath] is unfamiliar with the primary facts or data of The [psychopath] is unfamiliar with the primary facts or data of
what might be called personal values and is altogether incapable what might be called personal values and is altogether incapable of understanding such matters. of understanding such matters.
It is impossible for him to take even a slight interest in the tragedy It is impossible for him to take even a slight interest in the tragedy or joy or the striving of humanity as presented in serious literature or joy or the striving of humanity as presented in serious literature or art. He is also indifferent to all these matters in life itself. or art. He is also indifferent to all these matters in life itself. Beauty and ugliness, except in a very superficial sense, goodness, Beauty and ugliness, except in a very superficial sense, goodness, evil, love, horror, and humour have no actual meaning, no power evil, love, horror, and humour have no actual meaning, no power to move him. to move him.
He is, furthermore, lacking in the ability to see that others are He is, furthermore, lacking in the ability to see that others are moved. It is as though he were colour-blind, despite his sharp moved. It is as though he were colour-blind, despite his sharp intelligence, to this aspect of human existence. It cannot be intelligence, to this aspect of human existence. It cannot be explained to him because there is nothing in his orbit of explained to him because there is nothing in his orbit of awareness that can bridge the gap with comparison. He can awareness that can bridge the gap with comparison. He can repeat the words and say glibly that he understands, and there is repeat the words and say glibly that he understands, and there is no way for him to realize that he does not understand (Cleckley, no way for him to realize that he does not understand (Cleckley, 1941, p. 90 quoted in Hare, 1993, pp. 27-28).1941, p. 90 quoted in Hare, 1993, pp. 27-28).
• Developed Child Psychopathy Scale• Principles of rational scale construction• Working from Psychopathy Checklist (PCL-R),
identified mother-reported items that assessed PCL-R constructs
Operationalized 13 of the 20 PCL-R constructs at 3- to 4-item scales – glibness, untruthfulness, manipulation, lack of guilt,
poverty of affect, callousness, parasitic lifestyle, behavioral dyscontrol, lack of planning, impulsiveness, unreliability, failure to accept responsibility, criminal versatility
Items on the CPSItems on the CPS
Construct Validity of the CPSConstruct Validity of the CPSIf the CPS is truly assessing psychopathy, scores on the CPS should be positively related to serious delinquency
Construct Validity of the CPSConstruct Validity of the CPSIf the CPS is truly assessing psychopathy, scores on the CPS should be positively related to stable delinquency
Construct Validity of the CPSConstruct Validity of the CPSIf the CPS is truly assessing psychopathy, scores on the CPS should be positively related to impulsivity
Construct Validity of the CPSConstruct Validity of the CPSIf the CPS is assessing psychopathy, scores on the CPS should be positively related to externalizing problems and negatively related to internalizing problems
Construct Validity of the CPSConstruct Validity of the CPSIf the CPS is assessing psychopathy, scores on the CPS should predict delinquency above and beyond other well known predictors