what makes reading difficult? - uc berkeley bear...

Yukie Toyama UC Berkeley

BEAR Center

What Makes Reading

Difficult?

An Investigation of the Contribution of

Passage, Task, and Reader

Characteristics on Difficulty of Reading

Comprehension Items

Using Explanatory Item Response Models

¡ Introduction

§ Reading comprehension is not a unitary construct

§ RAND heuristic of RC

§ Processing model for RC multiple-choice assessment

¡ Research Questions

¡ Materials & Methods

¡ Findings

§ Main effects

§  Interaction effects

¡ Conclusions

Outline

Reading comprehension (RC) is not viewed

as a unitary construct

Simple view of reading (Hoover & Gough, 1990)

https://www.sedl.org/reading/framework/overview.html

RC is not viewed as a unitary construct

(conti.)

Quantitative research looks into relations among

components and their contributions to RC

Tunmer & Chapman (2012)

Alternative view of RC incorporating reader, text,

and activity factors

Reader brings component skills,

world knowledge, interest, etc.

Text has numerous linguistic and discourse

features (e.g., vocabulary, sentence

length,

Activity (or task) includes processes and

purposes (e.g., knowledge gain vs.

leisure)

RAND heuristic of RC emphasizes

interrelatedness of the three factors

RAND Reading Study Group (2002)

RC is “the process of simultaneously extracting and constructing meaning

through interaction and

involvement with written language”

Examine the effect of the 3 factors and their interactions

to RC using explanatory i tem response models

RC Item

Response

Assessment

Context

Task Features Text Features

Reader Characteristics

Processing model for RC multiple-choice assessment

(Embretson & Wenzel, 1987)

Encoding & Coherence Encoding & Coherence Text Mapping

1. Text Representation (TR) 2. Response Decision (RD)

� propositional density � vocab demand of text

� sentence length …

� vocab demand of item & answer choices

� passage-item relation (item type)

� abstractness of info asked . . .

� falsifiability of distractors

. . .

task-features predictors text-feature predictors

Evaluate Truth

Status of Alternatives

� general vocab knowledge

reader characteristic

Text Analyzers: Lexile, Coh-Metrix, TextEvaluator

Research questions

1.  Which set of text and task features best explain

variability in the difficulty of RC items, after

controlling for general vocabulary knowledge of

students?

2.  Are any of text feature effects moderated by

students’ vocabulary knowledge and/or by task

characteristics?

Materials & Methods

•  Operational computer-adaptive placement test for online intervention

•  Places grades 1-12+ students into a reading level

•  A typical testlet:

à A passage followed by 5 MC items

à All passages are expository

The Assessment

•  Passage not available when answering

a question

•  The analytic sample included 48

testlets, 240 items, covering 12

reading levels

Each student was given five testlets in a

testing session

chosen based on

student’s vocab level

given adaptively based on

performance on the previous testlet

randomly chosen

from the test bank

Made the most use of this

segment of item

responses

testlet administration order

Used some for linking purpose Not used

Considered as

“practice”

Anchor / linking design: basics

common items

common people

Anchor (linking) design: basics (conti.)

Anchor (linking) design with testlets

7 anchor testlets for the study (shaded)

were spread across 12 levels

•  35 anchor items

(about 15% of total

items examined)

Randomly assigned

•  Responses to

anchor items from

2nd, 3rd, 4th, & 5th

testlets

•  Responses to non-

anchor items from

the 5th testlet only

Anchor testlets were chosen to minimize the

effect of the adaptive logic

chosen based on student’s vocab level

given based on the adaptive logic randomly chosen

testlet administration order

Not used

240 items from 2nd –4th testlets 240 items from 5th testlet

Calibrated together as if they were different items with Rasch-MML

The 7 anchor testlets showed least discrepancy in item difficulty between the 2 sets.

Item set 1 Item set 2

•  About 25% of students retained (original N = 41,555) •  2 equivalent samples: for model building & for cross validation

•  Each sample had over 5,200 students in grades 1-12+ from

the U.S. and Canada

Student samples (n = 10,547)

Text-Feature

Predictors

(n = 48)

à hypothesized

to affect

text

representation

(TR) phase

Cohesion Examples

Task-Feature Predictors

(à hypothesized to affect response decis ion phase)

Q. A singing bowl is really a type of:

a. upside-down bell.

b. food from Japan.

c. cooking dish.

d. short sound.

à This info appears almost verbatim

within a sentence. Q. What is the main idea of this passage?

a.  Asingingbowlisanimportantpartof

mostweddings.

b.   Asingingbowlisusedtomakea

specialkindofsound.

c.  ...

à Information cuts across multiple

paragraphs.

Task-Feature Predictors

High Concrete: people, animals,

things, concrete actions

SW Concrete: amounts, attributes,

times

SW Abs: goal, purpose, attempt, cause,

effect, reason

High Abs: theme, difference,

equivalence

à  hypothesized to affect

response decision(RD)

phase

Doubly Explanatory Item Response Model (Latent Regression

LLTM)

Indices:

p = person

i = item

k = item/text feature

j = person covariate

Analytical Process: 4 groups of models

Encoding & Coherence Encoding & Coherence Text Mapping

1. Text Representation (TR) 2. Response Decision (RD)

� propositional density � vocab demand of text

� sentence length …

� vocab demand of item & answer choices

� passage-item relation (item type)

� abstractness of info asked . . .

� falsifiability of distractors

. . .

task-features predictors text-feature predictors

Evaluate Truth

Status of Alternatives

0. Initial calibration with the Rasch model

1. TR models with text-feature predictors & vocab knowledge

2. RD models with task-feature predictors

3. TR + RD

4. Interaction models

4a. Text x Reader 4b Text x Task

4c. Text x Task x Reader

Results 1: Main effects

Initial Rasch Calibration (no predictors)

Item difficulty: M = .22 (SD = .11)

Min: -1.65 Max: 2.17 Range: 3.52

Student ability:

M = 0 (SD = .68)

Rank order correlation with

testlet level: .76

colors = testlet levels designed into the assessment

Results: Text Representation Models

Pseudo-R2 / Fit Index (Embretson, 1983)

Text Representation Models

Effect size in terms of SD of Student Ability (0.68)

Max: Syntactic simplicity -.58 (β = -.40, M3)

Min: Temporality -.13 (β = -.09, M5)

Results: Response Decision Models & Combined Model

Response Decision Models

Effect size in terms of SD of Student Ability (0.68)

Max: Knowledge-base items .35 (β = .24, M8)

Min: Vocab demand of distractors .03 (β = .02, M11)

Text Representation + Response Decision

Combined Model

Effect size in terms of SD

of Student Ability (0.68)

Max:

-  Mean sent length

-  Somewhat concrete

.32 (β = .22)

Results 2: Interaction

Models

Recall, the best fitting TR & RD Model

Blue = text-features

Green = task-features

Circled in red = consistent main effects

Interaction Models to Examine Modification of Text Effects

è Is the text’s word frequency effect modified by student’s vocab level?

è Is the text’s word frequency effect modified by different item type?

Latent Regression-LLTM: Interaction Models (conti.)

è Is the word frequency-item type interaction modified by student’s vocab level?

Text-Reader Interactions: All but one text’s main effects were modified by the reader’s vocabulary level (exception = temporal cohesion)

.3.4

.5.6

.7

prob

abili

ty o

f cor

rect

res

pons

e

-2 -1 0 1 2vocab knowledge

short-sentence passages

long-sentence passages

(a) Mean Sentence Length

.3.4

.5.6

.7


low word freq passages

high word freq passages

(b) Mean Log Word Frequency

.3.4

.5.6

.7

prob

abili

ty o

f cor

rect

res

pons

e


syntactically complex passages

syntactically simple passages

(c) Syntactic Simplicity

.3.4

.5.6

.7


low temporal cohesion passages

high temporal cohesion passages

(d) Temporal Cohesion

Text-Task Interactions:

All but one text’s main

effects were modified by

item type (exception =

Syntactic Simplicity)

.2.3

.4.5

.6.7

pro

ba

bili

ty o

f co

rre

ct re

sp

on

se

-2 -1 0 1 2mean sentence length (z-score)

(a) Mean Sentence Length

.2.3

.4.5

.6.7

-2 -1 0 1 2mean log word frequency (z-score)

(b) Mean Log Word Frequency.2

.3.4

.5.6

.7

pro

ba

bili

ty o

f co

rre

ct re

sp

on

se

-2 -1 0 1 2Syntactic Simplicity (z-score)

(c) Syntactic Simplicity

.2.3

.4.5

.6.7

-2 -1 0 1 2Temporality (z-score)

(d) Temporality

text-based reconstruct integrate knowledge-based

Reader-Text-Task

Interaction:

Across four item types,

sentence length helps

more as students’

vocab knowledge goes

up.

Temporality yielded a significant three-way interaction, indicating “reverse effect”: Lower vocab knowledge students benefit more from high temporality passages, while high vocab peers benefit more from low temporality passages.

Sample Low Temporality Passage (-3.38)

Since the 1990s, four special vehicles have left Earth on rockets and made

successful landings on the surface of Mars. Known as rovers, they have become

very important research tools for scientists.

Rovers were not the first objects built by humans to reach Mars. Landers have been

sending data from the surface of Mars for about 40 years. While useful, their ability

to do research is somewhat limited because they are stationary. Rovers can move

around on the planet's surface.

The most recent rover sent to Mars is called Curiosity. It weighs more than four tons

and is the size of a car. It left Earth in November 2011 and landed on Mars about

eight months later. It carries highly sensitive equipment to study the weather and

soil of Mars, and to send data back to Earth.

Curiosity was built to withstand the planet's wide range of temperatures and harsh

conditions. . . .

Sample High Temporality Passage [1.46]

A sand dollar is a sea animal. It is round and flat. It is three inches wide. Some

people think it looks like a coin. This kind of animal makes its home in the sand on

the bottom of the sea. It lives in water that is not too deep.

This sea animal has a hard shell. The shell keeps it safe. Most other animals are not

able to eat a sand dollar. The shell has many small spines all over it. These are like

little feet. They help the animal move around in the sand. They also help the sand

dollar dig for food.

Hundreds of little hairs cover each spine. They do an important job.

¡ Text-Representation models with text-feature

predictors had a larger explanatory power than

Response-Decision models

§ The final combined model (without the interactions)

explained about 60% of variance in item difficulty

Conclusions

¡ Doubly explanatory models revealed interesting

interactions among text, reader & task features.

¡ Shorter sentence length & familiar words help

students with higher general vocab knowledge.

Conclusions (conti.)

¡ Reverse effect of temporality for high vocab

readers is noteworthy

§ Similar findings with background knowledge in the

literature (e.g., McNamara, Kintsch, Songer, N. &

Kintsch,1996)

¡ Random item effects were not included in the

model

¡ Nesting structure (items nested within a

passage) was not accounted for.

¡ Passages are relatively short (168-282 words),

which may explain the lack of significant effects

of various other cohesions.

Limitations

¡ Inform assessment developer about the findings

¡ Think more about how EIRM can be better used

to inform instruction

¡ Critique the current measurements of text

complexity.

What’s next

[email protected]

Thank you!

what makes reading difficult? - uc berkeley bear...

Documents