measurement in psychology i: reliability

Measurement in Psychology I:RELIABILITY

Lawrence R. Gordon

Do you support the civil union legislation?

What are some of the ways in which you can ask this question?

How do you measure the response (operational definitions)?

Levels of Measurement

Nominal scales giving names to data, putting into categories Examples: sex, race labels; baseball uniform

numbersOrdinal scales

numbers give order but not distance Examples: mailbox numbers; class rankings

Levels of Measurement (cont.)

Interval scales numbers indicate order and distance (they are

separated by equal distances or intervals) Example: Fahrenheit temperature

Ratio scales numbers indicate order, distance, AND have a

true zero point (zero = there isn’t any) Examples: height; weight; miles per hour; time

Levels of Measurement ExampleAuto race which started at 2 pm

Driver Make FinishOrder

FinishTime

ElapsedTime

Mary Corvette 1 3:00 1.00Joe Mustang 2 3:15 1.25Tom BMW 3 3:30 1.50Ann Ferrari 4 4:00 2.00

Nominal Nominal Ordinal Interval Ratio

Closed vs. Open Responses

Closed responses (a.k.a. forced choice) Examples (rate civil union support on a scale 1 to 9) Advantages

• you know what the responses will be (or what they should be!) because of restrictions on choice

• easy to empirically evaluate (relatively)• gives data that gives a straightforward answer to how you

ask your question• coding not necessary, usually

Closed vs. Open ResponsesClosed responses (a.k.a. forced choice)

Disadvantages • may not be sensitive enough to get some interesting

information• will not give you as clear an indication of what

participants think/feel/report“Do you agree that same-sex couples should have

the right to marry/civil union?”

1 2 3 4 5 6 7 8 9Disagree AgreeCompletely Completely

Support Civil Union (histogram)

Agreement (9='Agree Completely')

9.08.07.06.05.04.03.02.01.0

Attitudes toward Civil Union

Psyc 109, Fall 2001

Freq

uenc

y (o

f 195

)

140

120

100

80

60

40

20

0


9.08.07.06.05.04.03.02.01.0


Psyc 109, Fall 2002

Fre

quen

cy (o

f 195

)

140

120

100

80

60

40

20

0

Support Civil Union (area graph)Attitudes toward Civil Union

Psyc 109, Fall 2001

Agreement

Agree Cmpltly

8.00

7.00

6.00

Midpoint

4.00

3.00

2.00

Disagr Cmpltly

Freq

uenc

y (o

f 195

)

140

120

100

80

60

40

20

0


Psyc 109, Fall 2002

Agreement

Agree Cmpltly8.00

7.006.00

Midpoint4.00

3.002.00

Disagr Cmpltly

Freq

uenc

y (o

f 195

)

140

120

100

80

60

40

20

0

Compare the Graphs: Same Info


9.08.07.06.05.04.03.02.01.0


Psyc 109, Fall 2002

Fre

quen

cy (

of 1

95)

140

120

100

80

60

40

20

0


Psyc 109, Fall 2002

Agreement

Agree Cmpltly8.00

7.006.00

Midpoint4.00

3.002.00

Disagr Cmpltly

Fre

quen

cy (

of 1

95)

140

120

100

80

60

40

20

0


Open responses (a.k.a. free response) • Examples (Do you support the civil union legislation?

Why?)Example from the survey used the first day?“Please describe yourself in 12 words or less”

• more on this in a bit...Advantages

• gives any answer participant wants• not restricted by choices


Open responses (cont.) Disadvantages

• have to code to empirically evaluate (time intensive, need to find people who will do it)

• reliability issues!

Reliability

Consistency (stays the same)Repeatable (get the same results again and

again) Measures need to be reliable to be good

measuresNow, some nitty-gritty...

Reliability (cont.)

Measuring closed responses you don’t need to put things into categories reliable over time (do you get the same answers

again and again?) if the answers vary greatly from one time of

measurement to the next, the measurement is not reliable

Reliability (cont.)Measuring closed responses (cont.)

scales (sets of questions designed to measure something) need to be given multiple times, or in multiple forms, and the answers must remain similar for the scale to be reliable

Example (personality scale?)

Types of reliability Stability (“test-retest reliability”) Equivalence (“parallel forms reliability”) Consistency (“split-half reliability”) Homogeneity (“internal consistency reliability”)

Reliability Quick ExampleAny test, scale, inventory with items: E.g., a 50-item test, scored 0-50:

Form A 9/4 9/4, Form AExaminee 9/4 9/25 Form A Form B Odd Even1 George 27 35 27 33 15122 Alice 49 46 49 40 30193 Mary 30 35 30 27 13174 Larry 16 10 16 19 795 Linda 27 24 27 20 10176 Doug 40 42 40 48 22187 Chuck 21 18 21 35 10118 Judy 42 39 42 35 1923

Test-retest: Form A, 9/4 vs 9/25 (“r=.92") StabilityParallel forms: Form A vs Form B, 9/4 (“r=.69") EquivalenceCross form: Form A 9/25 vs Form B 3/19 (“r=.72") Stab & EquivSplit-half: Odd vs Even, Form A 9/4 (“r=.79") ConsistencyAlpha reliability No example – data from all 50 items Internal consistency

Reliability (cont.)

Measuring open responses Will often code into categories (Examples) How do you assess reliability?

Reliability (cont.)Measuring open responses (cont.)

Does everyone put the response into the same category? If yes, you have good inter-coder reliability

more specific operational definitions will increase this reliability

Coding personality responses into categories Using positive, negative, and neutral descriptors

Reliability (cont.)Measuring behavioral responses through

observation special cases of open response, can’t really control

what participants do coding and/or rating what you observe reliability of ratings (interrater reliability? If all

raters agree on the rating, then yes.) need to be very clear on operational definitions

Baggage claim study (Scherer & Ceschi, 2000)

Assessing Reliability

Steps decide on operational definitions of your

variables and scale(s) of measurement train your coders/raters, answer questions, and

alleviate confusion do the coding and rating compare responses were the measurements reliable?

Reliability ExerciseMeasuring your personalityLooking for “big” traits

defining big traits and training coders The Big Five Personality Factors

1. Open to Experience (O) vs. Closed to Experience (NO)2. Conscientious (C) vs. Nonconscientious (NC)3. Extraverted (E) vs. Introverted (NE)4. Agreeable (A) vs. Unagreeable (NA)5. Neurotic (N) vs. Nonneurotic (NN)

Which one best fits the description?Do the coding!

Reliability ExerciseMeasuring your personalityLooking for “big” traitscompare responses to other coders

intercoder reliability List number on which you agreed List number on which you disagreed Calculate the percentages

were the measurements reliable?

And for next time…is reliability enough?

If your measurement is reliable, does that mean that it is good?

Does being reliable make your measurement valid?

measurement in psychology i: reliability

Documents

time of measurement

civil union legislation

open responsesopen responses

open responsesclosed

closed responses cont

marrycivil union

reliablereliability

form aexaminee94