crtiec summit 2012 v3crtiec/rti_summit/documents/mcconnell... · test: ! oral language/...
TRANSCRIPT
9/20/12
1
IGDIs and Beyond: Measurement and Decision-Making for Language and Literacy RTI Efforts in Early Childhood Education
Scott McConnell, Alisha Wackerle-Hollman and Tracy Bradfield
Disclosure
¨ Scott McConnell and colleagues developed Individual Growth and Development Indicators; intellectual property from this research has been licensed to Early Learning Labs, Inc., for commercial development and sale. Scott and the University of Minnesota have royalty and equity interest in Early Learning Labs, Inc. These relationships have been reviewed and managed by the University of Minnesota in accordance with its conflict of interest policies.
Today’s Session
¨ “Measuring Up” – the logic of General Outcome Measurement and contemporary models
¨ IGDIs 2.0 – Building Items and Scales ¨ IGDIs 2.0 and RTI – A Decision-Making
Framework
9/20/12
2
Logic of General Outcome Measurement and Contemporary Models of RTI, Measurement
Measuring Up
Time
Standardized and Sound
Sensitive to time, treatment
Repeatable
Fast, Easy
General Outcome
GOM Metrics
¨ Status ¤ What is the level of performance for a child
(or group) today? ¤ How does this child (or group) compare to
an a priori standard? ¨ Change
¤ Has performance changed from last time? ¨ Growth
¤ What is the rate of change? ¤ Is the child “on track” toward a long-term
desired goal?
9/20/12
3
Application to Early Literacy
¨ Four CRTIEC domains ¤ Oral Language ¤ Phonological Awareness ¤ Alphabet Knowledge and Concepts about
Print ¤ Comprehension
¨ Assumption of concurrent or heterotypic development Sound Units
Print Units
Contextual Units (Narrative)
Semantic Units (Concepts)
Outside-In Skills
Inside-Out Skills
Language Units
GOM of Early Literacy
¨ Assessment of interrelated domains ¨ Assessment of domains over time ¨ Assessment for different functions
¤ Description of language and early literacy development (“status”)
¤ Evaluation of timeliness of development (“growth”)
¤ Identification of need for additional intervention
¤ Monitoring effects of intervention
Item Response Theory
¨ Assumes an “ability” that is invariant in characteristics across individuals and time
¨ Assumes that items and individuals can be located on this ability ¤ Thus, items and individuals vary across ability – an implicit “absolute” scale
¨ Assumes that item and test statistics are invariant across samples
¨ Provides precise way to build, use tests
9/20/12
4
IRT and GOMs
¨ Core, underlying metaphors are similar, and key assumptions are compatible ¤ IRT: Trajectory that is invariant across individuals and populations, with items and locations located on it
¤ GOM: Growth toward long-‐term, common outcome, with variations at individual (and nested) level in status, rate of growth
¤ Both efforts – locate individual at single point in time and repeated measures on trajectory (and estimate rate of growth)
Advantages/Assets from IRT
¨ Increased precision in item and scale construction ¤ More analytic tools, and more analytic colleagues ¤ Item-‐level analyses for reliability, item information function
¤ Greater facility for adding, evaluating items and constructing scales
¨ Expanding item pools ¨ Increasing knowledge of methodological and logistical requirements for design, testing, refinement, implementation
Can We Go from an IRT-Based Scale to General Outcome Measurement?
9/20/12
5
Progress on an IRT Scale
Low High Ability
Progress on an GOM Scale
Low
High
Abi
lity
Time
What’s needed to “make the move?”
¨ Construct validaAon – RelaAon to long-‐term outcome(s) of interest
¨ Item pool ¨ Growth scaling – change as a funcAon of Ame ¨ EvaluaAon
¤ Judgments ¤ Norms ¤ Benchmarks ¤ Empirical analyses
9/20/12
6
Constructing Measures for Speci2ic purposes
IGDI 2.0 Items and Scales
Measurement Framework
¨ Wilson, 2005
Early Literacy Construct
De#ining the Construct of Early Literacy
Phonological Awareness Construct DefiniAon The ability to detect and manipulate the sound structure of words
independent of their meanings (Phillips, Clancy-‐MencheQ, Lonigan, 2008), which develops along a conAnuum of complexity from idenAficaAon to synthesis to analysis.
Measures of idenAficaAon level: Rhyming and First Sounds
Oral Language Construct DefiniAon The ability to use words to communicate ideas and thoughts and to use
language as a tool to communicate to others (Dunst, TriveXe, Masiello, Roper, & Robyak, 2008; Morgan & Meier, 2008). • Expressive language: the use of words to express meaning. • RecepAve language: the ability to listen, process, and understand the
meaning of spoken words Measure of Expressive Language: Picture Naming
9/20/12
7
Early Literacy Construct
De#ining the Construct of Early Literacy
Alphabet Knowledge Construct DefiniAon Knowledge about the names and sounds of the 26 leXers of the alphabet (McBride-‐Chang,
1999) Measure of Aural idenAficaAon:
Sound Iden7fica7on
Comprehension Construct DefiniAon Text Comprehension: Text comprehension is the ability to understand and interpret text
as a whole (Storch & Whitehurst, 2002); it includes the “recogniAon of pictures and symbols in books and the ability to interpret and infer meaning from what is seen” (Dunst, TriveXe, Masiello, Roper, & Robyak, 2006, p. 4). Listening Comprehension: Listening comprehension is the ability to understand and interpret spoken phonemes, words, phrases, sentences, narraAves, and stories (Dickinson & Smith, 1994; Skarakis-‐Doyle, Dempsey, & Lee, 2008).
Measures of Comprehension:
Which One Doesn’t Belong
Item level Revisions ¨ Cleaning Items ¨ Item level functions
¤ Rasch Output Values n How is each item contributing to the test?
n Item/total correlations n In-Fit statistics n Standar Error of the item
¤ Construct Irrelevant Features (CIF) n What characteristics of each item provide information
to the student? n What information distracts the student from the
intended content? n What features of the items are malleable?
Example: Poorly Functioning Item – Def Vocab Construct Irrelevant Features
Not a real elephant
Elephants are big but this one is actually small.
9/20/12
8
Example: Revision of Item
Example: Poorly Functioning Rhyming Item
Construct Irrelevant Features
Some are real images others
are not
Some are enclosed,
others are not
Word content, color content and image
clarity all might contribute to a
response.
Example: Revised Item
“Bat, Cat” “Bat, Doll”
9/20/12
9
New Item Development
¨ Expert Contributions ¨ online database for semantic set size (Nelson, McEvoy & Schreiber, 1998)
¨ Age of Acquisition word lists used in previous studies (Carroll & White, 1973; Garlock, 1997; Snodgrass and Yuditsky, 1996)
¨ Phonotactic Probabilty online calculator (Storkel & Hoover, 2010)
¨ Concreteness. Familiarity, and Imagability ratings online database (Wilson, 1987)
Current item pools
¨ 5 measures with over 150 items per measure.
¨ Items appropriately match distributions of students.
¨ All items have been tested with over 100 students, have item/total correlations between .2 and .8; In-fit statics less than an absolute value of 2.
Assessment Purposes
¨ Screening/ Identi2ication ¤ To identify, with increased certainty, children requiring Tier 2 or Tier 3 services in one or more domains.
¨ Progress Monitoring ¤ To assess whether individual children are growing in the targeted skill area, speci2ically and generally.
¤ To determine whether individual children continue to require high intensity intervention and when it is appropriate to transition children to different levels of intensity (tiers).
9/20/12
10
Identification
¨ Tri-‐annual/seasonal assessments ¨ Criterion-‐referenced assessment based on contrast-‐groups design cut score location between Tier 1 and Tier 2/3 ¤ Decision Making Framework will add predictive power such that Tier 2 and Tier 3 will be able to be differentiated.
¨ Criterion performance is based on over 2000 children represented nationally.
¨ Information is provided to describe performance as pass/fail (go/no-‐go).
Firs
t Sou
nds
Rhy
min
g
9/20/12
11
Pic
ture
Nam
ing
Whi
ch O
ne D
oesn
’t B
elon
g S
ound
Iden
tific
atio
n
9/20/12
12
Progress Monitoring
¨ Performance based assessment that examines each child’s ability level based on a Rasch pro2ile.
¨ Assessments are delivered every 3 weeks to examine changes in ability score as a result of intervention or instruction.
¨ Growth is examined in the context of previous performance, but also in reference to the criterion standard for Tier level performance.
Designing Progress Monitoring IGDIs ¨ Sensitive to change ¨ Opportunity for growth within Tiers ¨ Reliable and Valid ¨ Tailored to each child’s unique needs
Firs
t Sou
nds
9/20/12
13
Rhy
min
g P
ictu
re N
amin
g W
hich
One
Doe
sn’t
Bel
ong
9/20/12
14
Sou
nd Id
entif
icat
ion
A Multiple Gating Model of Decision Making
IGDI 2.0 Decision Making Framework
CRTIEC’s Decision Making Framework
¨ Basic principles ¨ Rationale for multiple gating ¨ Current framework ¨ Evidence to date ¨ Coming research
9/20/12
15
DMF: Principles
¨ Principle #1: According to the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999), instructional decisions should never be made with only one source of data.
¨ It is important to have multiple sources of data to support instructional decision making.
DMF: Principles & Purpose
¨ A child’s raw score on each IGDI measure is interpretable in terms of its relation to an identified cut score (range) which distinguishes between tier one candidates and tier two/ three candidates.
Establishing Cut Scores
¨ Cut scores (ranges) were established through a standard setting process. ¤ These standards consisted of operational
definitions of child performance that would be typical of students with needs at each of the respective tier levels, for each domain.
¨ Teachers were given these tier level descriptors and ranked students as good candidates for Tier 1, Tier 2 or Tier 3.
9/20/12
16
Setting the cut scores and ranges
¨ A combination of Rasch output, ROC analysis, Regression analysis and contrasting groups design methods were used to identify the Rasch value the best distinguished between Tier 1 and Tier 2/3 ability.
¨ Cut scores maximized fit between IGDI scores and teacher judgment about ability level.
Picture Naming
How do IGDIs alone function in identification of tier candidacy?
¨ Sensitivity and Specificity of IGDIs alone with provisional cut scores, using teacher judgment as criterion
Measure' Sensitivity' Specificity'Sound'Identification' .75' .87'Alliteration' .85' .77'Picture'Naming' .76' .67'Rhyming' .71' .70'Which'One'Doesn’t'Belong?' .70' .46''
9/20/12
17
Cut Scores and need for DMF
¨ Currently, we have not identified IGDI cut scores/ ranges that distinguish between tier two and tier three candidates.
¨ A Decision Making Framework (DMF) is needed to support IGDI score interpretation to increase their use in supporting instructional decision making.
Multiple Gating Model: Rationale
¨ CRTIEC has adopted a multiple gating model of decision making. ¤ Successive “narrowing of the playing field” ¤ Maximizes efficient use of resources.
¨ Model uses teacher judgments gathered using a questionnaire at gates B and C. ¤ Recent studies have found teacher ratings
act as significant predictors of at-risk status (Speece & Ritchey, 2005; Speece et al., 2010).
MulAple GaAng Model Fall IGDI Iden7fica7on Set Administra7on
Score Below Cut Range on IGDI
Score Within Cut Range on IGDI
Score Above Cut on IGDI
Move to Gate C Move to Gate B
Tier 1 instrucCon
No problems indicated
Problems indicated
Tier 1 instrucCon
Teacher fills out Gate C “T2 vs T3” quesConnaire
No problems Problems indicated
Tier 3 instrucCon Tier 2 instrucCon
Gate A
Gate B
Result
Ac7on
Ac7on
Ac7on
Teacher fills out Gate B “Disconfirming T1” Ques7onnaire
Result
Move to Gate C
Gate C
Gate D Teacher fills out Gate D quesConnaire (regarding behavioral concerns)
9/20/12
18
Oral Language/ Comprehension Teacher QuesAonnaire
Evidence to date:
¨ Last year, a study was conducted in 30 classrooms across KS, OH, and MN in which Identification IGDIs were administered and corresponding teacher questionnaire data was collected to support use of the DMF (n=303).
What proportion of children are identified for T2/T3… ¨ …with IGDIs alone
¨ …with teacher ratings added into DMF?
Tier One
Move to Gate 2 (IGDI in cut range)
Move to Gate 3 (IGDI < cut range)
Picture Naming 9 43.8 47.3Rhyming 41.4 53.9 4.6Alliteration 32.2 44.8 23.1
Percent of Total Sample Identified As
Tier Percent 1 52% 2 24% 3 24%
9/20/12
19
Promising Evidence
q Moderate correlation between score on Teacher Questionnaire and standardized test: ¤ Oral Language/ Comprehension
Questionnaire and PPVT-IV, r = .418 ¤ Phonological Awareness/ Early Literacy
Questionnaire and TOPEL-PA, r = .333 ¨ Significant mean difference on PPVT
(t=3.75**) when comparing DMF identified tier 2 and tier 3 candidate performance.
Revision of Teacher Questionnaire
¨ Using results from last year’s study, we have revised both teacher questionnaires: ¤ To more closely align scales with decisions
needed at each gate of the DMF. n Used CRTIEC panel of experts to support this
process. ¤ To increase the number of items (to
increase scale/ score reliability).
OL/ Comprehension Pilot Study
¨ We piloted the revised Oral Language/ Comprehension Teacher Questionnaire with 40 teachers in the metro Twin Cities area (n =83). ¤ Purpose: item and scale analysis
¨ After examining Coefficient Alpha, inter-item and item-total correlations for each scale, we made modifications at the item level resulting in the following internal consistency estimates: ¤ Gate B = .977 ¤ Gate C = .961 ¤ Gate D = .963
9/20/12
20
Pilot Study
¨ Results of OL/ Comp study also supported identification of cut scores for each scale, to support tier classification.
¨ PA/ Early Literacy questionnaire is being subjected to same pilot test. ¤ Just finished item level data entry, analysis
to be completed soon.
Current Efforts
¨ Decision Making Validation Study ¤ 5 school districts in metro Twin Cities, KS,
OR, OH. ¤ OL/ Comp measures in Fall ¤ PA/ Early Literacy measures in Winter
¨ Identification IGDIs administered, teacher questionnaire completed, tier assignments given. ¤ Standardized criterion test given to all tier
2 and 3 identified children plus random sample of tier one identified children.
Current Efforts: RQs
¨ For each domain (Oral Language/ Comprehension or Phonological Awareness/ Early Literacy), what is the relation between score on the teacher questionnaire and score on the standardized criterion measure?
¨ For each domain, what is the classification accuracy of the DMF when the standardized measure is used as the criterion of need?
¨ For each domain, does the mean standardized criterion test score differ significantly across tier assignment groups (tier 1, tier 2, tier 3)?
¨ For each domain, which variables or combination of variables capture the most variance in predicting language and literacy status?