crtiec summit 2012 v3crtiec/rti_summit/documents/mcconnell... · test: ! oral language/...

9/20/12

1

IGDIs and Beyond: Measurement and Decision-Making for Language and Literacy RTI Efforts in Early Childhood Education

Scott McConnell, Alisha Wackerle-Hollman and Tracy Bradfield

Disclosure

¨  Scott McConnell and colleagues developed Individual Growth and Development Indicators; intellectual property from this research has been licensed to Early Learning Labs, Inc., for commercial development and sale. Scott and the University of Minnesota have royalty and equity interest in Early Learning Labs, Inc. These relationships have been reviewed and managed by the University of Minnesota in accordance with its conflict of interest policies.

Today’s Session

¨  “Measuring Up” – the logic of General Outcome Measurement and contemporary models

¨  IGDIs 2.0 – Building Items and Scales ¨  IGDIs 2.0 and RTI – A Decision-Making

Framework

9/20/12

2

Logic of General Outcome Measurement and Contemporary Models of RTI, Measurement

Measuring Up

Time

Standardized and Sound

Sensitive to time, treatment

Repeatable

Fast, Easy

General Outcome

GOM Metrics

¨  Status ¤ What is the level of performance for a child

(or group) today? ¤ How does this child (or group) compare to

an a priori standard? ¨  Change

¤ Has performance changed from last time? ¨  Growth

¤ What is the rate of change? ¤ Is the child “on track” toward a long-term

desired goal?

9/20/12

3

Application to Early Literacy

¨  Four CRTIEC domains ¤ Oral Language ¤ Phonological Awareness ¤ Alphabet Knowledge and Concepts about

Print ¤ Comprehension

¨  Assumption of concurrent or heterotypic development Sound Units

Print Units

Contextual Units (Narrative)

Semantic Units (Concepts)

Outside-In Skills

Inside-Out Skills

Language Units

GOM of Early Literacy

¨  Assessment of interrelated domains ¨  Assessment of domains over time ¨  Assessment for different functions

¤ Description of language and early literacy development (“status”)

¤ Evaluation of timeliness of development (“growth”)

¤ Identification of need for additional intervention

¤ Monitoring effects of intervention

Item Response Theory

¨  Assumes an “ability” that is invariant in characteristics across individuals and time

¨  Assumes that items and individuals can be located on this ability ¤ Thus, items and individuals vary across ability – an implicit “absolute” scale

¨  Assumes that item and test statistics are invariant across samples

¨  Provides precise way to build, use tests

9/20/12

4

IRT and GOMs

¨  Core, underlying metaphors are similar, and key assumptions are compatible ¤  IRT: Trajectory that is invariant across individuals and populations, with items and locations located on it

¤ GOM: Growth toward long-‐term, common outcome, with variations at individual (and nested) level in status, rate of growth

¤ Both efforts – locate individual at single point in time and repeated measures on trajectory (and estimate rate of growth)

Advantages/Assets from IRT

¨  Increased precision in item and scale construction ¤ More analytic tools, and more analytic colleagues ¤  Item-‐level analyses for reliability, item information function

¤ Greater facility for adding, evaluating items and constructing scales

¨  Expanding item pools ¨  Increasing knowledge of methodological and logistical requirements for design, testing, refinement, implementation

Can We Go from an IRT-Based Scale to General Outcome Measurement?

9/20/12

5

Progress on an IRT Scale

Low High Ability

Progress on an GOM Scale

Low

High

Abi

lity

Time

What’s needed to “make the move?”

¨  Construct validaAon – RelaAon to long-‐term outcome(s) of interest

¨  Item pool ¨  Growth scaling – change as a funcAon of Ame ¨  EvaluaAon

¤  Judgments ¤ Norms ¤ Benchmarks ¤ Empirical analyses

9/20/12

6

Constructing Measures for Speci2ic purposes

IGDI 2.0 Items and Scales

Measurement Framework

¨  Wilson, 2005

Early Literacy Construct

De#ining the Construct of Early Literacy

Phonological Awareness Construct DefiniAon The ability to detect and manipulate the sound structure of words

independent of their meanings (Phillips, Clancy-‐MencheQ, Lonigan, 2008), which develops along a conAnuum of complexity from idenAficaAon to synthesis to analysis.

Measures of idenAficaAon level: Rhyming and First Sounds

Oral Language Construct DefiniAon The ability to use words to communicate ideas and thoughts and to use

language as a tool to communicate to others (Dunst, TriveXe, Masiello, Roper, & Robyak, 2008; Morgan & Meier, 2008). •  Expressive language: the use of words to express meaning. •  RecepAve language: the ability to listen, process, and understand the

meaning of spoken words Measure of Expressive Language: Picture Naming

9/20/12

7

Early Literacy Construct

De#ining the Construct of Early Literacy

Alphabet Knowledge Construct DefiniAon Knowledge about the names and sounds of the 26 leXers of the alphabet (McBride-‐Chang,

1999) Measure of Aural idenAficaAon:

Sound Iden7fica7on

Comprehension Construct DefiniAon Text Comprehension: Text comprehension is the ability to understand and interpret text

as a whole (Storch & Whitehurst, 2002); it includes the “recogniAon of pictures and symbols in books and the ability to interpret and infer meaning from what is seen” (Dunst, TriveXe, Masiello, Roper, & Robyak, 2006, p. 4). Listening Comprehension: Listening comprehension is the ability to understand and interpret spoken phonemes, words, phrases, sentences, narraAves, and stories (Dickinson & Smith, 1994; Skarakis-‐Doyle, Dempsey, & Lee, 2008).

Measures of Comprehension:

Which One Doesn’t Belong

Item level Revisions ¨  Cleaning Items ¨  Item level functions

¤ Rasch Output Values n How is each item contributing to the test?

n  Item/total correlations n  In-Fit statistics n  Standar Error of the item

¤ Construct Irrelevant Features (CIF) n What characteristics of each item provide information

to the student? n What information distracts the student from the

intended content? n What features of the items are malleable?

Example: Poorly Functioning Item – Def Vocab Construct Irrelevant Features

Not a real elephant

Elephants are big but this one is actually small.

9/20/12

8

Example: Revision of Item

Example: Poorly Functioning Rhyming Item

Construct Irrelevant Features

Some are real images others

are not

Some are enclosed,

others are not

Word content, color content and image

clarity all might contribute to a

response.

Example: Revised Item

“Bat, Cat” “Bat, Doll”

9/20/12

9

New Item Development

¨  Expert Contributions ¨  online database for semantic set size (Nelson, McEvoy & Schreiber, 1998)

¨  Age of Acquisition word lists used in previous studies (Carroll & White, 1973; Garlock, 1997; Snodgrass and Yuditsky, 1996)

¨  Phonotactic Probabilty online calculator (Storkel & Hoover, 2010)

¨  Concreteness. Familiarity, and Imagability ratings online database (Wilson, 1987)

Current item pools

¨  5 measures with over 150 items per measure.

¨  Items appropriately match distributions of students.

¨  All items have been tested with over 100 students, have item/total correlations between .2 and .8; In-fit statics less than an absolute value of 2.

Assessment Purposes

¨  Screening/ Identi2ication ¤ To identify, with increased certainty, children requiring Tier 2 or Tier 3 services in one or more domains.

¨  Progress Monitoring ¤ To assess whether individual children are growing in the targeted skill area, speci2ically and generally.

¤ To determine whether individual children continue to require high intensity intervention and when it is appropriate to transition children to different levels of intensity (tiers).

9/20/12

10

Identification

¨  Tri-‐annual/seasonal assessments ¨  Criterion-‐referenced assessment based on contrast-‐groups design cut score location between Tier 1 and Tier 2/3 ¤ Decision Making Framework will add predictive power such that Tier 2 and Tier 3 will be able to be differentiated.

¨  Criterion performance is based on over 2000 children represented nationally.

¨  Information is provided to describe performance as pass/fail (go/no-‐go).

Firs

t Sou

nds

Rhy

min

g

9/20/12

11

Pic

ture

Nam

ing

Whi

ch O

ne D

oesn

’t B

elon

g S

ound

Iden

tific

atio

n

9/20/12

12

Progress Monitoring

¨  Performance based assessment that examines each child’s ability level based on a Rasch pro2ile.

¨  Assessments are delivered every 3 weeks to examine changes in ability score as a result of intervention or instruction.

¨  Growth is examined in the context of previous performance, but also in reference to the criterion standard for Tier level performance.

Designing Progress Monitoring IGDIs ¨  Sensitive to change ¨ Opportunity for growth within Tiers ¨ Reliable and Valid ¨ Tailored to each child’s unique needs

Firs

t Sou

nds

9/20/12

13

Rhy

min

g P

ictu

re N

amin

g W

hich

One

Doe

sn’t

Bel

ong

9/20/12

14

Sou

nd Id

entif

icat

ion

A Multiple Gating Model of Decision Making

IGDI 2.0 Decision Making Framework

CRTIEC’s Decision Making Framework

¨  Basic principles ¨  Rationale for multiple gating ¨  Current framework ¨  Evidence to date ¨  Coming research

9/20/12

15

DMF: Principles

¨  Principle #1: According to the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999), instructional decisions should never be made with only one source of data.

¨  It is important to have multiple sources of data to support instructional decision making.

DMF: Principles & Purpose

¨  A child’s raw score on each IGDI measure is interpretable in terms of its relation to an identified cut score (range) which distinguishes between tier one candidates and tier two/ three candidates.

Establishing Cut Scores

¨  Cut scores (ranges) were established through a standard setting process. ¤ These standards consisted of operational

definitions of child performance that would be typical of students with needs at each of the respective tier levels, for each domain.

¨  Teachers were given these tier level descriptors and ranked students as good candidates for Tier 1, Tier 2 or Tier 3.

9/20/12

16

Setting the cut scores and ranges

¨  A combination of Rasch output, ROC analysis, Regression analysis and contrasting groups design methods were used to identify the Rasch value the best distinguished between Tier 1 and Tier 2/3 ability.

¨  Cut scores maximized fit between IGDI scores and teacher judgment about ability level.

Picture Naming

How do IGDIs alone function in identification of tier candidacy?

¨  Sensitivity and Specificity of IGDIs alone with provisional cut scores, using teacher judgment as criterion

Measure' Sensitivity' Specificity'Sound'Identification' .75' .87'Alliteration' .85' .77'Picture'Naming' .76' .67'Rhyming' .71' .70'Which'One'Doesn’t'Belong?' .70' .46''

9/20/12

17

Cut Scores and need for DMF

¨ Currently, we have not identified IGDI cut scores/ ranges that distinguish between tier two and tier three candidates.

¨ A Decision Making Framework (DMF) is needed to support IGDI score interpretation to increase their use in supporting instructional decision making.

Multiple Gating Model: Rationale

¨  CRTIEC has adopted a multiple gating model of decision making. ¤ Successive “narrowing of the playing field” ¤ Maximizes efficient use of resources.

¨  Model uses teacher judgments gathered using a questionnaire at gates B and C. ¤ Recent studies have found teacher ratings

act as significant predictors of at-risk status (Speece & Ritchey, 2005; Speece et al., 2010).

MulAple GaAng Model Fall IGDI Iden7fica7on Set Administra7on

Score Below Cut Range on IGDI

Score Within Cut Range on IGDI

Score Above Cut on IGDI

Move to Gate C Move to Gate B

Tier 1 instrucCon

No problems indicated

Problems indicated

Tier 1 instrucCon

Teacher fills out Gate C “T2 vs T3” quesConnaire

No problems Problems indicated

Tier 3 instrucCon Tier 2 instrucCon

Gate A

Gate B

Result

Ac7on

Ac7on

Ac7on

Teacher fills out Gate B “Disconfirming T1” Ques7onnaire

Result

Move to Gate C

Gate C

Gate D Teacher fills out Gate D quesConnaire (regarding behavioral concerns)

9/20/12

18

Oral Language/ Comprehension Teacher QuesAonnaire

Evidence to date:

¨  Last year, a study was conducted in 30 classrooms across KS, OH, and MN in which Identification IGDIs were administered and corresponding teacher questionnaire data was collected to support use of the DMF (n=303).

What proportion of children are identified for T2/T3… ¨  …with IGDIs alone

¨  …with teacher ratings added into DMF?

Tier One

Move to Gate 2 (IGDI in cut range)

Move to Gate 3 (IGDI < cut range)

Picture Naming 9 43.8 47.3Rhyming 41.4 53.9 4.6Alliteration 32.2 44.8 23.1

Percent of Total Sample Identified As

Tier Percent 1 52% 2 24% 3 24%

9/20/12

19

Promising Evidence

q  Moderate correlation between score on Teacher Questionnaire and standardized test: ¤ Oral Language/ Comprehension

Questionnaire and PPVT-IV, r = .418 ¤ Phonological Awareness/ Early Literacy

Questionnaire and TOPEL-PA, r = .333 ¨  Significant mean difference on PPVT

(t=3.75**) when comparing DMF identified tier 2 and tier 3 candidate performance.

Revision of Teacher Questionnaire

¨  Using results from last year’s study, we have revised both teacher questionnaires: ¤ To more closely align scales with decisions

needed at each gate of the DMF. n Used CRTIEC panel of experts to support this

process. ¤  To increase the number of items (to

increase scale/ score reliability).

OL/ Comprehension Pilot Study

¨  We piloted the revised Oral Language/ Comprehension Teacher Questionnaire with 40 teachers in the metro Twin Cities area (n =83). ¤ Purpose: item and scale analysis

¨  After examining Coefficient Alpha, inter-item and item-total correlations for each scale, we made modifications at the item level resulting in the following internal consistency estimates: ¤ Gate B = .977 ¤  Gate C = .961 ¤ Gate D = .963

9/20/12

20

Pilot Study

¨  Results of OL/ Comp study also supported identification of cut scores for each scale, to support tier classification.

¨  PA/ Early Literacy questionnaire is being subjected to same pilot test. ¤ Just finished item level data entry, analysis

to be completed soon.

Current Efforts

¨  Decision Making Validation Study ¤ 5 school districts in metro Twin Cities, KS,

OR, OH. ¤ OL/ Comp measures in Fall ¤ PA/ Early Literacy measures in Winter

¨  Identification IGDIs administered, teacher questionnaire completed, tier assignments given. ¤ Standardized criterion test given to all tier

2 and 3 identified children plus random sample of tier one identified children.

Current Efforts: RQs

¨  For each domain (Oral Language/ Comprehension or Phonological Awareness/ Early Literacy), what is the relation between score on the teacher questionnaire and score on the standardized criterion measure?

¨  For each domain, what is the classification accuracy of the DMF when the standardized measure is used as the criterion of need?

¨  For each domain, does the mean standardized criterion test score differ significantly across tier assignment groups (tier 1, tier 2, tier 3)?

¨  For each domain, which variables or combination of variables capture the most variance in predicting language and literacy status?

crtiec summit 2012 v3crtiec/rti_summit/documents/mcconnell... · test: ! oral language/...

Documents