agenda levels of measurement measurement reliability measurement validity some examples need for...
TRANSCRIPT
Agenda Agenda
Levels of measurement
Measurement reliability
Measurement validity
Some examples
Need for Cognition
Horn-honking
Levels of measurement Levels of measurement
Nominal
Ordinal
Interval
Ratio
Linking concepts to data
Linking concepts to data
Conceptual definition:
Theoretical variables
Units of analysis
Operational definition:
Procedures for measuring variables
Subject units
X
TV exposure
Z
Reading time
Y
Reading skills
X
Self-reportedTV watching
Z
Self-reportedreading
Y
Scores on reading test
? ?
?
Operation-alization
Theory of Measurement
Two key qualities Two key qualities
Measurement Reliability
The extent to which repeated
measurements produce same results
Inversely related to the amount of
random error
Measurement Validity
The extent to which a measure “does
what it is intended to do”
Random error and reliability
Random error and reliability
Measures have at least two components:
Measure = True Value + Random Error
Variation comes from both sources:
Total Variation = True Variation + Random Variation
The reliability of a measure is:
True Variation /Total Variation
Estimating reliability Estimating reliability
Need at least two measures of same concept
Each measure has random error
Variation shared is not due to random error
True Value
X1Error1 X2 Error2
Correlation reflects reliability
Reliability coefficients Reliability coefficients
Some coefficients estimate reliability of individual measures (items)
Test/Retest correlation
Same item repeated on (unchanging) true value
Inter-item correlation
Different items measure same true value
Inter-coder correlation (agreement)
Different coders measure same true value
Increasing reliability Increasing reliability
How to counteract noisy measurements?
Careful conceptualization
Employ precise quantitative measures
Combine multiple measures of the same
theoretical concept
Multi-item scales Multi-item scales
Example
10 vocabulary test items
Each is subject to some random error
Combining (e.g., adding) items will compound what is common to the measures
the “true” vocabulary scores
Combining items will not compound what is unique
the random errors
So combining increases proportion of “true” variation to total variation
Reliability coefficientsReliability coefficientsSome coefficients estimate reliability of multiple-item scales
Split-half method
Total set of items randomly divided in half
Each half summed to form a scale
Scores on the two halves correlated
Example: Spearman-Brown reliability coefficient
Internal consistency method
Calculate all inter-item correlations
Average them, and adjust for the number of items
Example: Cronbach’s alpha reliability coefficient
Examining scales Examining scales
Which items produce the most reliable scale?
Item-total correlations
Correlate each item with the total (of other items)
Weak correlations suggest item doesn’t share much variance with the overall scale
Comparative scale reliability
Calculate scale reliability (e.g., Spearman-Brown or Cronbach’s alpha) with and without particular item
If a scale’s reliability doesn’t increase with additional item, we suspect it is weak
What reliability insures What reliability insures
High proportion of variance is systematic, not random
However …
Systematic variance may stem from shared bias
Acquiescence response bias, social desirability
Systematic variance may stem from the wrong concept
Confusing intelligence with socialized learning
Valid measures must be reliable, but reliability does not guarantee validity
Measurement validity Measurement validity
“One validates, not a test, but an interpretation of data arising from a test” (Lee Cronbach)
How should a measure be interpreted?
What empirical data can help insure that a given interpretation is valid?
Face validity Face validity
Simple examination of measure
Does it manifestly address the right concept?
Weak form of validation
Largely matter of interpretation
Content validity Content validity
Focuses on extent to which a measure reflects a specific domain of conceptual content
Addresses “coverage” of a measure
Largely matter of interpretation
Requires conceptual definition of domain
Criterion-related validity Criterion-related validity
Involves correlating a measure with some external phenomenon
Concurrent: distinguishing some co-existing difference
Predictive: forecasting future difference
Depends upon validity of criterion
May not always be applicable
Construct validity Construct validity
Extent to which a measure relates to other measures consistent with theoretically derived hypotheses
Sometimes termed nomological validity
E.g., age abstract reasoning ability
Focuses on pattern of relationships among
various concepts and measures
Construct validity Construct validity
Convergent validity
Similar data result from measurements of similar concepts using different operational techniques
Discriminant validity
Dissimilar data result from measurements of different concepts (particularly those which might be easily confused operationally)
Trait
Leadership Cooperation
Questionnaire
Method
Observer ratings
Same trait/ different methods should agree Different traits/ same
methods should not agree
Multi-Trait/Multi-Method Matrix
Convergent
Discriminant
An example An example
Cacioppo & Petty (1982) “The Need for Cognition”
What is the concept?
How measured?
What evidence of reliability?
Item-total coefficients (Study 1)
Spearman- Brown coefficient (Study 1)
Factor analysis confirms single underlying factor (Study 1 & 2)
FactorNeed for Cognition
x1NFC Item
FactorOther Concept
x2NFC Item
x3NFC Item
x4NFC Item
x5NFC Item
Factor Loading correlation
RandomError
RandomError
RandomError
RandomError
RandomError
Example, cont. Example, cont.What evidence of validity?
Concurrent validity
Distinguishes faculty from assembly-line workers (Study 1)
Discriminant validity
Only small relationship with cognitive style (Study 2)
No relationship with test anxiety (Study 2)
Significant and modest relationship with ACT scores (Study 3)
Weak relationship with social desirability (Study 3 & 4)
Weak negative correlation with dogmatism (Study 3 & 4)
Predictive validity
Predicts enjoyment of a cognitive task (Study 4)
Another example Another example
Gross & Doob (1982) “Status of Frustrator as an Inhibitor of Horn-Honking Responses”
What is the theory?
How are the concepts measured?
What evidence of reliability?
What evidence of validity?
Construct validity: Status and gender differences consistent with prior research
Convergent validity? Authors cast doubt on the validity of questionnaire measures
Reliability and validity Reliability and validity
Conceptualization and measurement are primary concerns in all research
Always look for evidence of measurement reliability and validity
Of the two, validity is probably more important
Unreliable measures increases the odds that we won’t find anything
Invalid measures increase the odds that we’ll find the wrong thing
For Thursday For Thursday
Question-asking
Sudman & Bradburn, Ch. 1-5
Begin working on fourth individual assignment
(scaling exercise)