1 epsy 546: lecture 3 generalizability theory and validity george karabatsos

1

EPSY 546: LECTURE 3

GENERALIZABILITY THEORYAND

VALIDITY

George Karabatsos

2

GENERALIZABILITY THEORY

3

TRUE SCORE MODEL• Recall the true score model:

X+n Observed Test Score of person n,

Tn True Test Score (unknown)

en Random Error (unknown)

X T en n n

4

TRUE SCORE MODEL• Recall the true score model:

• One may view that the true score model narrowly defines error.1 variable, simple ANOVA:

Between (true score) var + Within (random error) var.

X T en n n

5

GENERALIZABILTY THEORY

• Generalizability Theory extends the true score model by acknowledging that multiple factors affect the measurement variance.

– Multivariable ANOVA:

The observed test response is a function of 2 or more variables, their interactions, and random measurement error.

6

G-THEORY MODEL (example)

Xnjt = Grand mean

+ n – Person n’s effect

+ j – Item j’s effect

+ t – Time t’s effect

+ nt – n – t + Person Time effect

+ nj – n – j + Person Item effect

+ tj – t – j + Time Item effect

+ residual Three way

interaction, and error

7

G-THEORY VARIANCE PARTITION

Systematic

Persons 2P

Measurement Error (facet contributions)

Items 2I

Time 2T

Person Time 2 PT

Person Item 2 PI

Time Item 2 TI

3-way inter + error 2PIT, error

8

G-THEORY OF DECISIONS

• Relative decisions: Decisions based on the rank ordering of persons (e.g., college admission, pass-fail testing).

• Variance contributing to measurement error for relative decisions:

2Relat = 2

PI + 2PT + 2

PIT,error

(all variance components associated with the interaction of persons)

9

G-THEORY OF DECISIONS

• Absolute decisions: Decisions based on the level of the observed score, without regard to the performance of others. (e.g., driver’s license).

• Variance contributing to measurement error for absolute decisions :

2Abs = 2

T + 2I + 2

PI + 2PT + 2

IT + 2PIT,error

(all variance components associated with the facets, which introduce “constant” effects to absolute decisions)

10

GENERALIZABILITY COEFFICIENT

• Indicates how accurately the observed test scores allows us to generalize about persons’ behavior in a designed universe of situations (Cronbach, 1972).

E P

P D ecision

D ecision

22

2 2

2 2 2

,

w ith : o r R ela t A b s

11

STUDIES

• G-Study (Generalizability Study):

Aims to estimate the variance components underlying a measurement process by defining the universe of admissible observations as broadly as possible.

12

STUDIES

• D-Study (Design Study):

Using G-study results to address “what if” questions about variation in measurement design (Thompson & Melancon, 1987).

This helps pinpoint sources of error to specify protocol modifications to obtain the desired level of generalizability.

13

EXAMPLES OF G- THEORY

• Nice illustrations are offered in:

Webb, Rowley, & Shavelson (1988)

and

Crowley, Thompson, & Worchel (1994)

14

VALIDITY

15

TEST VALIDITY

• VALIDITY: A test is valid if it measures what it claims to measure.

• Types: Face, Content, Concurrent, Predictive, Construct.

16

• Face validity: When the test items appear to measure what the test claims to measure.

• Content Validity: When the content of the test items, according to domain experts, adequately represent the latent trait that the test intends to measure.

TEST VALIDITY

17

• Concurrent validity: When the test, which intends to measure a particular latent trait, correlates highly with another test that measures that trait.

• Predictive validity: When the scores of the test predict some meaningful criterion.

TEST VALIDITY

18

• Construct validity: A test has construct validity when the results of using the test fit hypotheses concerning the theoretical nature of the latent trait. The higher the fit, the higher the construct validity.

TEST VALIDITY

19

– Content: Item content relevance, representativeness, and technical quality (includes face).

– Substantive: Theoretical rationales for the observed consistencies in the test responses.

– Structural: Fidelity of scoring structure to the structure of the content domain.

– Generalizability: The extent to which the score properties and interpretations generalize over population groups, settings, and tasks.

– External: Concurrent/convergent, discrim., pred.– Consequential: refers to the (potential and actual)

consequences of test use.

MESSICK’S UNIFIED CONSTRUCT VALIDITY

1 epsy 546: lecture 3 generalizability theory and validity george karabatsos

Documents