jackson stenner chairman & ceo, metametrics jstenner@lexile
DESCRIPTION
Causal Rasch Models and Individual Growth Trajectories National Center for the Improvement of Educational Assessment January 18, 2011. Jackson Stenner Chairman & CEO, MetaMetrics [email protected]. - PowerPoint PPT PresentationTRANSCRIPT
1
Causal Rasch Models and Individual Growth Trajectories
National Center for the Improvement of Educational AssessmentJanuary 18, 2011
A. Jackson Stenner
Chairman & CEO, [email protected]
2
“Although adopting a probabilistic model for describing responses to an intelligence test, we have taken no sides in a possible argument about responses being ultimately explainable in causal terms.”
(Rasch, 1960, p.90)
3
Three well researched constructs
Reader ability
Text Complexity
Comprehension
4
Reader Ability
Temperature
5
Reading is a process in which information from the text and the knowledge possessed by the reader act together to
produce meaning.
Anderson, R.C., Hiebert, E.H., Scott, J.A., & Wilkinson, I.A.G. (1985) Becoming a nation of readers: The report of the Commission on ReadingUrbana, IL: University of Illinois
6
An Equation
= Reader Ability
Text ComplexityComprehension -
Conceptual
Statistical
RawScore
=i
e (RA – TC )i
1 + e (RA – TC i )
RA = Reading Ability
TC = Text Calibrations
7
Each of these thermometers is engineered to use the same
correspondence table
Each of these reading tests is engineered to use the same correspondence table
8
Correspondence Table: Co and Lexile
Raw Score Co Lexil
eRaw Score Co Lexil
eRaw Score
Co LexileRaw Score
Co Lexile
1 35.6 378L 12 36.8 905L 23 38.0 1116L 34 39.
21331L
2 35.7 509L 13 36.9 926L 24 38.1 1134L 35 39.
31355L
3 35.8 589L 14 37.0 947L 25 38.2 1151L 36 39.
41381L
4 35.9 647L 15 37.1 968L 26 38.3 1170L 37 39.
61409L
5 36.0 695L 16 37.2 987L 27 38.4 1188L 38 39.
71440L
6 36.1 736L 17 37.3 1007L 28 38.6 1207
L 39 39.8
1474L
7 36.2 770L 18 37.4 1025L 29 38.7 1226
L 40 39.9
1513L
8 36.3 801L 19 37.6 1044L 30 38.8 1245
L 41 40.0
1560L
9 36.4 830L 20 37.7 1062L 31 38.9 1265
L 42 40.1
1616L
10 36.6 857L 21 37.8 1080L 32 39.0 1286
L 43 40.2
1697L
11 36.7 881L 22 37.9 1098L 33 39.1 1308
L 44 40.3
1829L
9
Aspect/Construct Temperature Reader Ability
Object of measurement Person Person
Instrument Thermometer Reading test
Measurement outcome Number of theory calibrated cavities (0-45)
that fail to reflect green light
Count correct on a collection of 45theory calibrated test items
Substantive theory Thermodynamic theory Lexile Theory
Unit of measurement Degree Fahrenheit (oF) Lexile (L)
Correspondence table/calibration equation
Exploits a chemical reaction and lightabsorption to table temperature as a
function (Guttman Model)of a sufficient statistic
Exploits semantic and syntacticfeatures of test items to table reader
ability as a function (Rasch model) of a sufficient statistic
Measure/Quantity Measurement outcome converted intoa quantity via the substantive theory
Measurement outcome converted into a quantity via the substantive theory
Readable technology NexTemp Thermometer™ Oasis™
General objectivity Point estimates of temperature areindependent of the thermometer
Point estimates of reader ability areindependent of the reading test
Anatomy of Two Measurement Procedures
10
Ten Features of Causal Response Models – whether Guttman or Rasch1. Both measurement procedures depend on within-person causal
interpretations of how these two instruments work. NexTemp uses a causal Guttman Model, The Lexile Framework for Reading uses a causal Rasch Model.
2. In both cases the measurement mechanism is well specified and can be manipulated to produce predictable changes in measurement outcomes (e.g. percent correct or percent of cavities turning black).
3. Item parameters are supplied by substantive theory and, thus, person parameter estimates are generated without reference to or use of any data on other persons or populations. Therefore, effects of the examinee population have been completely eliminated from consideration in the estimation of person parameters for reader ability and temperature.
11
4. In both cases the quantitivity hypothesis can be experimentally tested by evaluating the trade-off property. A change in the person parameter can be off-set or traded-off for a compensating change in the measurement mechanism to hold constant the measurement outcome.
5. When uncertainty in item difficulties is too large to ignore, individual item difficulties may be a poor choice to use as calibration parameters in causal models. As an alternative we recommend, when feasible, averaging over individual item difficulties to produce “ensemble” means. These means can be excellent dependent variables for testing causal theories.
6. Index models are not causal because manipulation of neither the indicators nor the person parameter produces a predictable change in the measurement outcome.
Ten Features of Causal Response Models – whether Guttman or Rasch cont’d.
12
7. Causal Rasch models are individual centered and are explanatory at both within-subject and between-subject levels. The attribute on which I differ from myself a decade ago is the same attribute on which I differ from my brother today.
8. When data fit a Rasch model differences between person measures are objective. When data fit a causal Rasch model absolute person measures are objective (i.e. independent of instrument).
9. The case against an individual causal account, although popular, has been poorly made. Investigators need only experiment to isolate the causal mechanism in their instruments, test for the trade-off property and confirm invariance over individuals. This has been accomplished for a construct, reader ability, that has been described by scholars as the most complex cognitive activity that humans regularly engage in. Given the success with reading, we think it likely that other behavioral constructs can be similarly measured.
10. Causal Rasch models make possible the construction of generally objective growth trajectories. Each trajectory can be completely separated from the instruments used in its construction and from the performance of any other persons whatsoever.
Ten Features of Causal Response Models – whether Guttman or Rasch, cont’d.
13
To causally explain a phenomenon [a measurement outcome] is to provide information about the factors [person processes and instrument mechanisms] on which it depends and to exhibit how it depends on those factors. This is exactly what the provision of counterfactual information…accomplishes: we see what factors some explanandum M [measurement outcome, raw score] depends on (and how it depends on those factors) when we have identified one or more variables such that changes in these (when produced by interventions) are associated with changes in M (Woodward, 2003, p.204).
14
How Many Ways Can We Say X Causes Y?
X “elicited a greater” Y
X “impacts” Y
X “accounts for” Y X “has been linked to” Y
Y “is the result of” X X “didn’t diminish” YY “because of” X Y “depends on” XX “has led to” Y X “largely motivates”
YY “stemmed from” X X “proved critical to”
YX “fosters” Y X “changes” YX “triggers” Y X “affects” Y
15
Psychometrics vs. MetrologyAspect
Interpretation of ProbabilityGroup CenteredInterpretation involves 100
people with the same ability answering a single item
Individual CenteredInterpretation involves
administering 100 items with the same calibration to a single
person
Person Measures A person’s response record is embedded in different samples and each group specific Rasch analysis produces a different
measure
A person’s response record is evaluated against theory-
referenced calibrations
Measurement Error Traditional test theory uses a sample standard deviation and a sample correlation to compute
an SEM which is intended to characterize the individual
ISEM is the within person standard deviation over
replications of the measurement procedure
Data Fit to the Model Varies with the locally constructed frame of reference,
sample dependent
Fit is to a theory, thus, sample independent
Validity Correlational, thus, sample dependent
Causal within person, thus, sample independent
16
17
18
r = 0.952
r” = 0.960
R2” = 0.921
RMSE” = 99.8L
Figure 1: Plot of Theoretical Text Complexity versus Empirical Text Complexity for 475 articles
“Pizza Problems”
19
What could account for the 8% unexplained variance? Missing Variables Improved Proxies/Operationalizations Expanded Error Model Rounding Error Interaction between Individual and
Text Psychometric Uncertainty Principle
20
21
May 2016(12th
Grade)
Text Demands forCollege and Career
21
1200
1000
1400
1600
May 2007 – Dec. 2009284 Encounters117,484 Words2,894 Items848 Minutes
Student 15286th GradeMaleHispanicPaid Lunch
22
Item-Based vs.Ensemble-Based Psychometrics
23
Reading Task-Complexity Plane for Dichotomous Items
Native Lexile
Added Hardness
Added Easiness
Production Cloze
Auto-Generated
Cloze
1.31.2
1.1
1.0
0.9
0.8
0.7
Unit Size Adjustment Applied to Logits
24
Comparing Item-Based vs. Ensemble-Based Psychometrics
Item-Based– Item statistics– Item characteristic curves– DIF for items
Ensemble-Based– Ensemble statistics– Ensemble characteristic curves– DIF for ensembles
25
The Ensemble Objective: Correspondence Table
– Raw score to Lexile measure
What we think we know– Mean and spread of item distributions for a
passage
What is assumed to be unknown– Individual item difficulties
1300L(132L)
26
The Process – Iteration 1
STEP 1
Sample 45 Item Difficulties from
Ensemble
STEP 2
Compute Lexile Measures for Each Raw Score (1 to 44)
STEP 3
Table Results
Raw Score
123...
44
Lexile Measure
362L514L584L
.
.
.1811L
Sample 1
27
The Process – Iteration 2
STEP 1
Sample 45 Item Difficulties from
Ensemble
STEP 2
Compute Lexile Measures for Each Raw Score (1 to 44)
STEP 3
Table Results
Raw Score
123...
44
Lexile Measure
362L514L584L
.
.
.1811L
Lexile Measure
354L506L575L
.
.
.1797L
Sample 1 Sample 2
28
The Process – Iteration 1,000
STEP 1
Sample 45 Item Difficulties from
Ensemble
STEP 2
Compute Lexile Measures for Each Raw Score (1 to 44)
STEP 3
Table Results
Raw Score
123...
44
Lexile Measure
362L514L584L
.
.
.1811L
Lexile Measure
354L506L575L
.
.
.1797L
Sample 1 … Sample 1,000
Mean Lexile Measure
378L509L589L
.
.
.1829L
Mean of 1,000
29
ClosingNo matter how it is sliced and diced, analyses of joint and conditional probability distributions yield no more than patterns of association. Nothing in the response data nor Rasch analyses of these data exposes the processes (features of the object of measurement) or mechanisms (features of the instrument) that are hypothesized to be conjointly causal on the measurement outcomes.