agenda levels of measurement measurement reliability measurement validity some examples need for...

Agenda Agenda

Levels of measurement

Measurement reliability

Measurement validity

Some examples

Need for Cognition

Horn-honking

Levels of measurement Levels of measurement

Nominal

Ordinal

Interval

Ratio

Linking concepts to data

Linking concepts to data

Conceptual definition:

Theoretical variables

Units of analysis

Operational definition:

Procedures for measuring variables

Subject units

X

TV exposure

Z

Reading time

Y

Reading skills

X

Self-reportedTV watching

Z

Self-reportedreading

Y

Scores on reading test

? ?

?

Operation-alization

Theory of Measurement

http://www.asc.upenn.edu/courses/comm522/

Two key qualities Two key qualities

Measurement Reliability

The extent to which repeated

measurements produce same results

Inversely related to the amount of

random error

Measurement Validity

The extent to which a measure “does

what it is intended to do”

Random error and reliability

Random error and reliability

Measures have at least two components:

Measure = True Value + Random Error

Variation comes from both sources:

Total Variation = True Variation + Random Variation

The reliability of a measure is:

True Variation /Total Variation

Estimating reliability Estimating reliability

Need at least two measures of same concept

Each measure has random error

Variation shared is not due to random error

True Value

X1Error1 X2 Error2

Correlation reflects reliability

Reliability coefficients Reliability coefficients

Some coefficients estimate reliability of individual measures (items)

Test/Retest correlation

Same item repeated on (unchanging) true value

Inter-item correlation

Different items measure same true value

Inter-coder correlation (agreement)

Different coders measure same true value


Increasing reliability Increasing reliability

How to counteract noisy measurements?

Careful conceptualization

Employ precise quantitative measures

Combine multiple measures of the same

theoretical concept


Multi-item scales Multi-item scales

Example

10 vocabulary test items

Each is subject to some random error

Combining (e.g., adding) items will compound what is common to the measures

the “true” vocabulary scores

Combining items will not compound what is unique

the random errors

So combining increases proportion of “true” variation to total variation


Reliability coefficientsReliability coefficientsSome coefficients estimate reliability of multiple-item scales

Split-half method

Total set of items randomly divided in half

Each half summed to form a scale

Scores on the two halves correlated

Example: Spearman-Brown reliability coefficient

Internal consistency method

Calculate all inter-item correlations

Average them, and adjust for the number of items

Example: Cronbach’s alpha reliability coefficient


Examining scales Examining scales

Which items produce the most reliable scale?

Item-total correlations

Correlate each item with the total (of other items)

Weak correlations suggest item doesn’t share much variance with the overall scale

Comparative scale reliability

Calculate scale reliability (e.g., Spearman-Brown or Cronbach’s alpha) with and without particular item

If a scale’s reliability doesn’t increase with additional item, we suspect it is weak


What reliability insures What reliability insures

High proportion of variance is systematic, not random

However …

Systematic variance may stem from shared bias

Acquiescence response bias, social desirability

Systematic variance may stem from the wrong concept

Confusing intelligence with socialized learning

Valid measures must be reliable, but reliability does not guarantee validity


Measurement validity Measurement validity

“One validates, not a test, but an interpretation of data arising from a test” (Lee Cronbach)

How should a measure be interpreted?

What empirical data can help insure that a given interpretation is valid?


Face validity Face validity

Simple examination of measure

Does it manifestly address the right concept?

Weak form of validation

Largely matter of interpretation


Content validity Content validity

Focuses on extent to which a measure reflects a specific domain of conceptual content

Addresses “coverage” of a measure

Largely matter of interpretation

Requires conceptual definition of domain


Criterion-related validity Criterion-related validity

Involves correlating a measure with some external phenomenon

Concurrent: distinguishing some co-existing difference

Predictive: forecasting future difference

Depends upon validity of criterion

May not always be applicable


Construct validity Construct validity

Extent to which a measure relates to other measures consistent with theoretically derived hypotheses

Sometimes termed nomological validity

E.g., age abstract reasoning ability

Focuses on pattern of relationships among

various concepts and measures

Construct validity Construct validity

Convergent validity

Similar data result from measurements of similar concepts using different operational techniques

Discriminant validity

Dissimilar data result from measurements of different concepts (particularly those which might be easily confused operationally)

Trait

Leadership Cooperation

Questionnaire

Method

Observer ratings

Same trait/ different methods should agree Different traits/ same

methods should not agree

Multi-Trait/Multi-Method Matrix

Convergent

Discriminant

An example An example

Cacioppo & Petty (1982) “The Need for Cognition”

What is the concept?

How measured?

What evidence of reliability?

Item-total coefficients (Study 1)

Spearman- Brown coefficient (Study 1)

Factor analysis confirms single underlying factor (Study 1 & 2)

FactorNeed for Cognition

x1NFC Item

FactorOther Concept

x2NFC Item

x3NFC Item

x4NFC Item

x5NFC Item

Factor Loading correlation

RandomError

RandomError

RandomError

RandomError

RandomError

Example, cont. Example, cont.What evidence of validity?

Concurrent validity

Distinguishes faculty from assembly-line workers (Study 1)

Discriminant validity

Only small relationship with cognitive style (Study 2)

No relationship with test anxiety (Study 2)

Significant and modest relationship with ACT scores (Study 3)

Weak relationship with social desirability (Study 3 & 4)

Weak negative correlation with dogmatism (Study 3 & 4)

Predictive validity

Predicts enjoyment of a cognitive task (Study 4)

Another example Another example

Gross & Doob (1982) “Status of Frustrator as an Inhibitor of Horn-Honking Responses”

What is the theory?

How are the concepts measured?

What evidence of reliability?

What evidence of validity?

Construct validity: Status and gender differences consistent with prior research

Convergent validity? Authors cast doubt on the validity of questionnaire measures


Reliability and validity Reliability and validity

Conceptualization and measurement are primary concerns in all research

Always look for evidence of measurement reliability and validity

Of the two, validity is probably more important

Unreliable measures increases the odds that we won’t find anything

Invalid measures increase the odds that we’ll find the wrong thing


For Thursday For Thursday

Question-asking

Sudman & Bradburn, Ch. 1-5

Begin working on fourth individual assignment

(scaling exercise)


agenda levels of measurement measurement reliability measurement validity some examples need for...

Documents