reliability and validity - the basics

Reliability and Validity

Measurement

Scales (Levels) of measurementScales clarify the characteristics of measurement

processesScales indicate which statistical procedures are

appropriateNominal

• Categories without order

• Colors, gender, political party, nationality

Ordinal

• Categories with order

• Size (S,M,L), Social class, Agreement (strong, some, low, none)

Interval

• Distance is meaningful between categories

• Temperature, ACT scores, shoe size, IQ

Ratio

• Scale of categories has absolute zero

• Age, income, all rates and percents, vacation time

Levels of Measurement: Learn them by playing the

game!

Does my measurement procedure give

the same accurate measurement each time it is used?

Reliability

What is reliability?

Reliability is consistency in measurement

Stern Tone Variator, from The Archives of the History of American Psychology

What is measurement validity?

Lavery Psychograph, from The Archives of the History of American Psychology

Validity is “truth” in measurement

Reliable, but not valid

Reliable

Not valid.

Valid, but not reliableValid

Not reliable

Neither reliable nor validNot reliable

Not valid

Both reliable and valid

Reliable:

Valid.

Bullseye!! by modenadude at http://www.flickr.com/photos/modenadude/3280286776

Why do reliability and validity matter?

All of our research uses data gathered through measurement procedures.

Evaluate reliability and validity of measurement before applying results.

Evaluation usesLogicStatistical techniques

Summary: Reliability and Validity

ReliabilityDoes the value observed and

recorded accurately reflect the “true” value of the object?

Test by measuring the object multiple times or ways.

Every researcher must either use a known instrument, or test and demonstrate the reliability of a new tool. The Literature Search is a

huge labor saving device Using a known instrument

improves research quality

ValidityDoes the value observed and

recorded reflect the concept and dimension of interest?

Test by comparing with other data or similar processes.

Every researcher must either use a known instrument, or test and demonstrate the validity of a new tool. The Literature Search is a

huge labor saving device Using a known instrument

improves research quality.

Existing Measurement ToolsBlackboard: Buros

Blackboard: Social Psych Tools

Reliability = True Score + ErrorUnreliable tools

introduce excess error

Development of new tools requiresTime, often yearsStaffing, often

several professionalsTrial subjects, often

in the hundredsLots of money to pay

for the above

Methods to Establish Reliability

Test-RetestParallel FormsInter-Rater Reliability

Compute correlation of two or more forms, taken under same circumstances.

Reliability – Internal Consistency

Multiple items (questions)

Cronbach’s alpha (α)

Sources of Reduced Reliability

Unclear questions

Untrained raters

Unclear instructions

Outside events

Problems of Effective Range (Scale Attenuation)Ceiling effectThreshold effect

Measurement in research

Real world is messy.

Reliability and Validity issues are intertwined

math problems for girls by woodleywonderworks at http://www.flickr.com/photos/wwworks/3597217248/

http://www.flickr.com/photos/wwworks/

W. Andrew Harrell describes his study

Two controversies are connected to

Dr. Harrell’s study:

Do physical traits such as

beauty have an evolutionary

impact (i.e., people are more

likely to have children to pass

along their genes)?

Isn’t beauty a subjective

judgment, not a trait that can be

objectively measured in a

research study.

Dr. Harrell addresses both of those

questions in this 4 minute audio

clip.

Click to listen

University Of Alberta (2005, April 13). Researchers Show Parents Give Unattractive Children Less Attention. ScienceDaily. Retrieved July 25, 2009, from http://www.sciencedaily.com /releases/2005/04/050412213412.htm

http://www.npr.org/templates/story/story.php?storyId=4678922

Measurement in Harrell’s study

Direct observation of seat belt safety

Previously validated measures of beauty

Two trained researchers evaluate attractiveness(inter-rater reliability)

Two different trained researchers observe safety(inter-rater reliability and avoiding bias)

How to improve reliabilityUse identical instructions.

Use a larger number of items (questions)

Eliminate questions that evoke inconsistent responses

Cover the entire range of the dimension

Reliability and validity must be tested & established in each new situation where the procedure is used.

Does my measurement procedure measure the construct or concept

that I intend? Or is it measuring

something else?

Validity

Two meanings of “validity”Validity is an over-arching concern of research

Measurement – are the observations directly and truly linked to the dimension or concept claimed?

Research design – how well does the experiment or study control the situation so that we are confident that the relationships or results observed were due to the impact of the independent variable?

Operationalization – The entire procedure used to produce the measurement. This includes any instrument, but also instructions for its use.

Operationalization: How the Research is Actually Conducted

Blackboard: Obedience to Authority #2

Methods of Determining Validity

Content validity“Face” validityRange of content

Construct /Internal structureOne or multiple

dimensions?Criterion validity

Predictive (GRE) Concurrent (audition)Convergent

Discriminant validityDivergenceKnown groups validity Manuscripts and checklists by Muffett at

http://www.flickr.com/photos/calliope/173797447/

Milgram’s obedience experiment

Variety of Measurement Situations

Blackboard: The “Strange” Situation

Sources of invalidity

Lavery Psychograph, from The Archives of the History of American Psychology

Incorrect theory

Measurement doesn’t match intention.

Many interpretations of meaning of the measurement.

Relationship of reliability and validity

Both measured on a continuum (low to high)A measurement with low reliability cannot be

valid.Maximum validity is the square root of reliability.

Reliability and validity analysis comes before any statistical analysis.

yReliabilitValidity

Reliability and Validity

Measurement

reliability and validity - the basics

Education