reliability and validity - the basics
DESCRIPTION
Slides to accompany an introductory lecture in reliability and validity.TRANSCRIPT
Reliability and Validity
Measurement
Scales (Levels) of measurementScales clarify the characteristics of measurement
processesScales indicate which statistical procedures are
appropriateNominal
• Categories without order
• Colors, gender, political party, nationality
Ordinal
• Categories with order
• Size (S,M,L), Social class, Agreement (strong, some, low, none)
Interval
• Distance is meaningful between categories
• Temperature, ACT scores, shoe size, IQ
Ratio
• Scale of categories has absolute zero
• Age, income, all rates and percents, vacation time
Levels of Measurement: Learn them by playing the
game!
Does my measurement procedure give
the same accurate measurement each time it is used?
Reliability
What is reliability?
Reliability is consistency in measurement
Stern Tone Variator, from The Archives of the History of American Psychology
What is measurement validity?
Lavery Psychograph, from The Archives of the History of American Psychology
Validity is “truth” in measurement
Reliable, but not valid
Reliable
Not valid.
Valid, but not reliableValid
Not reliable
Neither reliable nor validNot reliable
Not valid
Both reliable and valid
Reliable:
Valid.
Bullseye!! by modenadude at http://www.flickr.com/photos/modenadude/3280286776
Why do reliability and validity matter?
All of our research uses data gathered through measurement procedures.
Evaluate reliability and validity of measurement before applying results.
Evaluation usesLogicStatistical techniques
Summary: Reliability and Validity
ReliabilityDoes the value observed and
recorded accurately reflect the “true” value of the object?
Test by measuring the object multiple times or ways.
Every researcher must either use a known instrument, or test and demonstrate the reliability of a new tool. The Literature Search is a
huge labor saving device Using a known instrument
improves research quality
ValidityDoes the value observed and
recorded reflect the concept and dimension of interest?
Test by comparing with other data or similar processes.
Every researcher must either use a known instrument, or test and demonstrate the validity of a new tool. The Literature Search is a
huge labor saving device Using a known instrument
improves research quality.
Existing Measurement ToolsBlackboard: Buros
Blackboard: Social Psych Tools
Reliability = True Score + ErrorUnreliable tools
introduce excess error
Development of new tools requiresTime, often yearsStaffing, often
several professionalsTrial subjects, often
in the hundredsLots of money to pay
for the above
Methods to Establish Reliability
Test-RetestParallel FormsInter-Rater Reliability
Compute correlation of two or more forms, taken under same circumstances.
Reliability – Internal Consistency
Multiple items (questions)
Cronbach’s alpha (α)
Sources of Reduced Reliability
Unclear questions
Untrained raters
Unclear instructions
Outside events
Problems of Effective Range (Scale Attenuation)Ceiling effectThreshold effect
Measurement in research
Real world is messy.
Reliability and Validity issues are intertwined
math problems for girls by woodleywonderworks at http://www.flickr.com/photos/wwworks/3597217248/
W. Andrew Harrell describes his study
Two controversies are connected to
Dr. Harrell’s study:
Do physical traits such as
beauty have an evolutionary
impact (i.e., people are more
likely to have children to pass
along their genes)?
Isn’t beauty a subjective
judgment, not a trait that can be
objectively measured in a
research study.
Dr. Harrell addresses both of those
questions in this 4 minute audio
clip.
Click to listen
University Of Alberta (2005, April 13). Researchers Show Parents Give Unattractive Children Less Attention. ScienceDaily. Retrieved July 25, 2009, from http://www.sciencedaily.com /releases/2005/04/050412213412.htm
Measurement in Harrell’s study
Direct observation of seat belt safety
Previously validated measures of beauty
Two trained researchers evaluate attractiveness(inter-rater reliability)
Two different trained researchers observe safety(inter-rater reliability and avoiding bias)
How to improve reliabilityUse identical instructions.
Use a larger number of items (questions)
Eliminate questions that evoke inconsistent responses
Cover the entire range of the dimension
Reliability and validity must be tested & established in each new situation where the procedure is used.
Does my measurement procedure measure the construct or concept
that I intend? Or is it measuring
something else?
Validity
Two meanings of “validity”Validity is an over-arching concern of research
Measurement – are the observations directly and truly linked to the dimension or concept claimed?
Research design – how well does the experiment or study control the situation so that we are confident that the relationships or results observed were due to the impact of the independent variable?
Operationalization – The entire procedure used to produce the measurement. This includes any instrument, but also instructions for its use.
Operationalization: How the Research is Actually Conducted
Blackboard: Obedience to Authority #2
Methods of Determining Validity
Content validity“Face” validityRange of content
Construct /Internal structureOne or multiple
dimensions?Criterion validity
Predictive (GRE) Concurrent (audition)Convergent
Discriminant validityDivergenceKnown groups validity Manuscripts and checklists by Muffett at
http://www.flickr.com/photos/calliope/173797447/
Milgram’s obedience experiment
Variety of Measurement Situations
Blackboard: The “Strange” Situation
Sources of invalidity
Lavery Psychograph, from The Archives of the History of American Psychology
Incorrect theory
Measurement doesn’t match intention.
Many interpretations of meaning of the measurement.
Relationship of reliability and validity
Both measured on a continuum (low to high)A measurement with low reliability cannot be
valid.Maximum validity is the square root of reliability.
Reliability and validity analysis comes before any statistical analysis.
yReliabilitValidity
Reliability and Validity
Measurement