what makes reading difficult? - uc berkeley bear...
TRANSCRIPT
Yukie Toyama UC Berkeley
BEAR Center
What Makes Reading
Difficult?
An Investigation of the Contribution of
Passage, Task, and Reader
Characteristics on Difficulty of Reading
Comprehension Items
Using Explanatory Item Response Models
¡ Introduction
§ Reading comprehension is not a unitary construct
§ RAND heuristic of RC
§ Processing model for RC multiple-choice assessment
¡ Research Questions
¡ Materials & Methods
¡ Findings
§ Main effects
§ Interaction effects
¡ Conclusions
Outline
Reading comprehension (RC) is not viewed
as a unitary construct
Simple view of reading (Hoover & Gough, 1990)
https://www.sedl.org/reading/framework/overview.html
RC is not viewed as a unitary construct
(conti.)
Quantitative research looks into relations among
components and their contributions to RC
Tunmer & Chapman (2012)
Alternative view of RC incorporating reader, text,
and activity factors
Reader brings component skills,
world knowledge, interest, etc.
Text has numerous linguistic and discourse
features (e.g., vocabulary, sentence
length,
Activity (or task) includes processes and
purposes (e.g., knowledge gain vs.
leisure)
RAND heuristic of RC emphasizes
interrelatedness of the three factors
RAND Reading Study Group (2002)
RC is “the process of simultaneously extracting and constructing meaning
through interaction and
involvement with written language”
Examine the effect of the 3 factors and their interactions
to RC using explanatory i tem response models
RC Item
Response
Assessment
Context
Task Features Text Features
Reader Characteristics
Processing model for RC multiple-choice assessment
(Embretson & Wenzel, 1987)
Encoding & Coherence Encoding & Coherence Text Mapping
1. Text Representation (TR) 2. Response Decision (RD)
� propositional density � vocab demand of text
� sentence length …
� vocab demand of item & answer choices
� passage-item relation (item type)
� abstractness of info asked . . .
� falsifiability of distractors
. . .
task-features predictors text-feature predictors
Evaluate Truth
Status of Alternatives
� general vocab knowledge
reader characteristic
Text Analyzers: Lexile, Coh-Metrix, TextEvaluator
Research questions
1. Which set of text and task features best explain
variability in the difficulty of RC items, after
controlling for general vocabulary knowledge of
students?
2. Are any of text feature effects moderated by
students’ vocabulary knowledge and/or by task
characteristics?
Materials & Methods
• Operational computer-adaptive placement test for online intervention
• Places grades 1-12+ students into a reading level
• A typical testlet:
à A passage followed by 5 MC items
à All passages are expository
The Assessment
• Passage not available when answering
a question
• The analytic sample included 48
testlets, 240 items, covering 12
reading levels
Each student was given five testlets in a
testing session
chosen based on
student’s vocab level
given adaptively based on
performance on the previous testlet
randomly chosen
from the test bank
Made the most use of this
segment of item
responses
testlet administration order
Used some for linking purpose Not used
Considered as
“practice”
Anchor / linking design: basics
common items
common people
Anchor (linking) design: basics (conti.)
Anchor (linking) design with testlets
7 anchor testlets for the study (shaded)
were spread across 12 levels
• 35 anchor items
(about 15% of total
items examined)
Randomly assigned
• Responses to
anchor items from
2nd, 3rd, 4th, & 5th
testlets
• Responses to non-
anchor items from
the 5th testlet only
Anchor testlets were chosen to minimize the
effect of the adaptive logic
chosen based on student’s vocab level
given based on the adaptive logic randomly chosen
testlet administration order
Not used
240 items from 2nd –4th testlets 240 items from 5th testlet
Calibrated together as if they were different items with Rasch-MML
The 7 anchor testlets showed least discrepancy in item difficulty between the 2 sets.
Item set 1 Item set 2
• About 25% of students retained (original N = 41,555) • 2 equivalent samples: for model building & for cross validation
• Each sample had over 5,200 students in grades 1-12+ from
the U.S. and Canada
Student samples (n = 10,547)
Text-Feature
Predictors
(n = 48)
à hypothesized
to affect
text
representation
(TR) phase
Cohesion Examples
Task-Feature Predictors
(à hypothesized to affect response decis ion phase)
Q. A singing bowl is really a type of:
a. upside-down bell.
b. food from Japan.
c. cooking dish.
d. short sound.
à This info appears almost verbatim
within a sentence. Q. What is the main idea of this passage?
a. Asingingbowlisanimportantpartof
mostweddings.
b. Asingingbowlisusedtomakea
specialkindofsound.
c. ...
à Information cuts across multiple
paragraphs.
Task-Feature Predictors
High Concrete: people, animals,
things, concrete actions
SW Concrete: amounts, attributes,
times
SW Abs: goal, purpose, attempt, cause,
effect, reason
High Abs: theme, difference,
equivalence
à hypothesized to affect
response decision(RD)
phase
Doubly Explanatory Item Response Model (Latent Regression
LLTM)
Indices:
p = person
i = item
k = item/text feature
j = person covariate
Analytical Process: 4 groups of models
Encoding & Coherence Encoding & Coherence Text Mapping
1. Text Representation (TR) 2. Response Decision (RD)
� propositional density � vocab demand of text
� sentence length …
� vocab demand of item & answer choices
� passage-item relation (item type)
� abstractness of info asked . . .
� falsifiability of distractors
. . .
task-features predictors text-feature predictors
Evaluate Truth
Status of Alternatives
0. Initial calibration with the Rasch model
1. TR models with text-feature predictors & vocab knowledge
2. RD models with task-feature predictors
3. TR + RD
4. Interaction models
4a. Text x Reader 4b Text x Task
4c. Text x Task x Reader
Results 1: Main effects
Initial Rasch Calibration (no predictors)
Item difficulty: M = .22 (SD = .11)
Min: -1.65 Max: 2.17 Range: 3.52
Student ability:
M = 0 (SD = .68)
Rank order correlation with
testlet level: .76
colors = testlet levels designed into the assessment
Results: Text Representation Models
Pseudo-R2 / Fit Index (Embretson, 1983)
Text Representation Models
Effect size in terms of SD of Student Ability (0.68)
Max: Syntactic simplicity -.58 (β = -.40, M3)
Min: Temporality -.13 (β = -.09, M5)
Results: Response Decision Models & Combined Model
Results: Response Decision Models & Combined Model
Response Decision Models
Effect size in terms of SD of Student Ability (0.68)
Max: Knowledge-base items .35 (β = .24, M8)
Min: Vocab demand of distractors .03 (β = .02, M11)
Text Representation + Response Decision
Combined Model
Effect size in terms of SD
of Student Ability (0.68)
Max:
- Mean sent length
- Somewhat concrete
.32 (β = .22)
Results 2: Interaction
Models
Recall, the best fitting TR & RD Model
Blue = text-features
Green = task-features
Circled in red = consistent main effects
Interaction Models to Examine Modification of Text Effects
è Is the text’s word frequency effect modified by student’s vocab level?
è Is the text’s word frequency effect modified by different item type?
Latent Regression-LLTM: Interaction Models (conti.)
è Is the word frequency-item type interaction modified by student’s vocab level?
Text-Reader Interactions: All but one text’s main effects were modified by the reader’s vocabulary level (exception = temporal cohesion)
.3.4
.5.6
.7
prob
abili
ty o
f cor
rect
res
pons
e
-2 -1 0 1 2vocab knowledge
short-sentence passages
long-sentence passages
(a) Mean Sentence Length
.3.4
.5.6
.7
-2 -1 0 1 2vocab knowledge
low word freq passages
high word freq passages
(b) Mean Log Word Frequency
.3.4
.5.6
.7
prob
abili
ty o
f cor
rect
res
pons
e
-2 -1 0 1 2vocab knowledge
syntactically complex passages
syntactically simple passages
(c) Syntactic Simplicity
.3.4
.5.6
.7
-2 -1 0 1 2vocab knowledge
low temporal cohesion passages
high temporal cohesion passages
(d) Temporal Cohesion
Text-Task Interactions:
All but one text’s main
effects were modified by
item type (exception =
Syntactic Simplicity)
.2.3
.4.5
.6.7
pro
ba
bili
ty o
f co
rre
ct re
sp
on
se
-2 -1 0 1 2mean sentence length (z-score)
(a) Mean Sentence Length
.2.3
.4.5
.6.7
-2 -1 0 1 2mean log word frequency (z-score)
(b) Mean Log Word Frequency.2
.3.4
.5.6
.7
pro
ba
bili
ty o
f co
rre
ct re
sp
on
se
-2 -1 0 1 2Syntactic Simplicity (z-score)
(c) Syntactic Simplicity
.2.3
.4.5
.6.7
-2 -1 0 1 2Temporality (z-score)
(d) Temporality
text-based reconstruct integrate knowledge-based
Reader-Text-Task
Interaction:
Across four item types,
sentence length helps
more as students’
vocab knowledge goes
up.
Temporality yielded a significant three-way interaction, indicating “reverse effect”: Lower vocab knowledge students benefit more from high temporality passages, while high vocab peers benefit more from low temporality passages.
Sample Low Temporality Passage (-3.38)
Since the 1990s, four special vehicles have left Earth on rockets and made
successful landings on the surface of Mars. Known as rovers, they have become
very important research tools for scientists.
Rovers were not the first objects built by humans to reach Mars. Landers have been
sending data from the surface of Mars for about 40 years. While useful, their ability
to do research is somewhat limited because they are stationary. Rovers can move
around on the planet's surface.
The most recent rover sent to Mars is called Curiosity. It weighs more than four tons
and is the size of a car. It left Earth in November 2011 and landed on Mars about
eight months later. It carries highly sensitive equipment to study the weather and
soil of Mars, and to send data back to Earth.
Curiosity was built to withstand the planet's wide range of temperatures and harsh
conditions. . . .
Sample High Temporality Passage [1.46]
A sand dollar is a sea animal. It is round and flat. It is three inches wide. Some
people think it looks like a coin. This kind of animal makes its home in the sand on
the bottom of the sea. It lives in water that is not too deep.
This sea animal has a hard shell. The shell keeps it safe. Most other animals are not
able to eat a sand dollar. The shell has many small spines all over it. These are like
little feet. They help the animal move around in the sand. They also help the sand
dollar dig for food.
Hundreds of little hairs cover each spine. They do an important job.
¡ Text-Representation models with text-feature
predictors had a larger explanatory power than
Response-Decision models
§ The final combined model (without the interactions)
explained about 60% of variance in item difficulty
Conclusions
¡ Doubly explanatory models revealed interesting
interactions among text, reader & task features.
¡ Shorter sentence length & familiar words help
students with higher general vocab knowledge.
Conclusions (conti.)
¡ Reverse effect of temporality for high vocab
readers is noteworthy
§ Similar findings with background knowledge in the
literature (e.g., McNamara, Kintsch, Songer, N. &
Kintsch,1996)
¡ Random item effects were not included in the
model
¡ Nesting structure (items nested within a
passage) was not accounted for.
¡ Passages are relatively short (168-282 words),
which may explain the lack of significant effects
of various other cohesions.
Limitations
¡ Inform assessment developer about the findings
¡ Think more about how EIRM can be better used
to inform instruction
¡ Critique the current measurements of text
complexity.
What’s next
Thank you!