statistical considerations for educational screening & diagnostic assessments
DESCRIPTION
Statistical Considerations for Educational Screening & Diagnostic Assessments. A discussion of methodological applications which have existed in the literature for a long time and are used in other disciplines but are emerging more now in education. Yaacov Petscher, Ph.D. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/1.jpg)
YAACOV PETSCHER, PH.D.FLORIDA CENTER FOR READING
RESEARCHFLORIDA STATE UNIVERSITY
Statistical Considerations for Educational Screening & Diagnostic Assessments
A discussion of methodological applications which have existed in the literature for a long time and are used in other disciplines but are emerging more now in education
![Page 2: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/2.jpg)
Discussion Points
Assessment AssumptionsContexts of AssessmentsStatistical Considerations
Reliability Validity Benchmarking
“Disclaimer” Focusing on Breadth not Depth Based on applied contract and grant research One slide of equations
![Page 3: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/3.jpg)
Assumptions of Assessment - Researchers
Constructs exist but we can’t see themConstructs can be measuredAlthough we can measure constructs, our
measurement is not perfectThere are different ways to measure any
given constructAll assessment procedures have strengths
and limitations
![Page 4: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/4.jpg)
Assumptions of Assessment - Practitioner
Multiple sources of information should be part of the assessment process
Performance on tests can be generalized to non-test behaviors
Assessment can provide information that helps educators make better educational decisions
Assessment can be conducted in a fair manner
Testing and assessment can benefit our educational institutions and society as a whole
![Page 5: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/5.jpg)
Contexts of Assessments
Instructional Formative Interim Summative
Research Individual Differences Group Differences (RCT) Growth
Legislative Initiatives NCLB Reading First Race to the Top Common Core
![Page 6: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/6.jpg)
Common Core Adoption
![Page 7: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/7.jpg)
PARCC
![Page 8: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/8.jpg)
Smarter Balanced
![Page 9: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/9.jpg)
Within Common Core
USDOE PARCC Assessments Smarter Balanced Assessments Reading for Understanding Assessments I3 Assessments
Private Sector
![Page 10: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/10.jpg)
Underlying “Code” of Assumptions
Researcher Constructs exist but we
can’t see them Constructs can be
measured Although we can measure
constructs , our measurement is not perfect
There are different ways to measure any given construct
All assessment procedures have strengths and limitations
Practitioner Multiple sources of information
should be part of the assessment process
Performance on tests can be generalized to non-test behaviors.
Assessment can provide information that helps educators make better educational decisions
Assessment can be conducted in a fair manner.
Testing and assessment can benefit our educational institutions and society as a whole.
![Page 11: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/11.jpg)
Statistical Considerations - Reliability
Stability, accuracy, or consistency of test scores Many types
Internal consistency Retest Parallel-form Split-half
Should not be viewed as interchangeable Once could have very high stability but very poor
internal consistency Date of Birth/Height/SSN
![Page 12: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/12.jpg)
Statistical Considerations - Reliability
Most frequently used framework is classical test theory
What does this assume?
T
X
e
![Page 13: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/13.jpg)
Benefits of IRT
Puts persons and individuals on the same scale CTT looks at total score by p-value (difficulty)
Can result in shorter tests CTT reliability increases with more items
Can estimate the precision of scores at the individual level CTT assumes error is the same
![Page 14: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/14.jpg)
Item Difficulty by Total Score Decile Groups
![Page 15: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/15.jpg)
Item Difficulty by Ability
![Page 16: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/16.jpg)
Items Don’t Always Do What We Want
![Page 17: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/17.jpg)
Item Information
![Page 18: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/18.jpg)
Test Information – Standard Error
![Page 19: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/19.jpg)
Precision/Reliability
![Page 20: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/20.jpg)
Statistical Considerations - Reliability
While precision improves on the idea of reliability, can precision be improved? Account for context effects (Wainer et al., 2000)
Petscher & Foorman, 2011 Account for time (Verhelst, Verstralen, & Jansen,
1997) Prindle, Petscher, & Mitchell, 2013
![Page 21: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/21.jpg)
Statistical Considerations - Reliability
Context effects Any influence or interpretation that an item may
acquire as a result of its relationship to other items Greater problem in CAT due to unique testing Emerges as an item and passage level problem
![Page 22: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/22.jpg)
Statistical Considerations - Reliability
Common stimulus
![Page 23: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/23.jpg)
Statistical Considerations - Reliability
“If several questions within a test a test are experimentally linked so that the reaction to one question influences the reaction to another, the entire group of questions should be treated preferably as an ‘item’ when the data arising from application of split-half or appropriate analysis-of-variance methods are reported in the test manual”
APA Standards of Educational and Psychological Testing
(1966)
![Page 24: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/24.jpg)
Expressed in IRT
)](exp[1
)](exp[)1()|1(
)(
)(
ijdiji
ijdijiiijiij ba
baccxp
)](exp[1
)](exp[)1()|1(
iji
ijiiijiij ba
baccxp
![Page 25: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/25.jpg)
Study 1Reading Comprehension in Florida
![Page 26: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/26.jpg)
Precision – After 3 passages
![Page 27: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/27.jpg)
FAIR Technical Manual
![Page 28: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/28.jpg)
Simulations are all well and good…
How does accounting for item dependency improve testing in real world?
![Page 29: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/29.jpg)
N ~= 800, randomly assigned to testing condition Control was current 2pl scoring Experimental was unrestricted bi-factor
Evaluate Precision # of passages Prediction to state achievement
RCT
![Page 30: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/30.jpg)
![Page 31: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/31.jpg)
![Page 32: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/32.jpg)
What this suggests
“Newer” models help us to more appropriately model the data
Precision/reliability are improved just by modeling the context effect
Improve the efficiency and precision of a computer-adaptive test by modeling the item-dependency
![Page 33: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/33.jpg)
Study 2Morphology CAT
![Page 34: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/34.jpg)
Accounting for Time
Somewhat similar to the item dependency model
IRT models are concerned with accuracyWhat about fluency?
CBM (DIBELS, AIMSweb, easyCBM) Brief assessments (TOWRE, TOSREC, etc)
Prindle, Petscher, Mitchell (2013) N = 200 Word knowledge test Limited to 60 sec Compared 1pl with a 1pl-response time models
![Page 35: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/35.jpg)
Results
1pl marginal α = .80
1pl-rt marginal α = .87
![Page 36: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/36.jpg)
What this suggests
Accounting for response time of items can improve precision for most participants
Limitations More difficult to do with younger children Requires computer delivery to record accuracy and
time Cannot do with connected text
![Page 37: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/37.jpg)
Validity
![Page 38: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/38.jpg)
Statistical Considerations – Factor Validity
Assessments are measures of hypothetical constructs
Assessments are measured with errorUse latent variable to leverage the common
varianceHow is this modeled?
Unidimensional Multidimensional
Three illustrations Petscher & Foorman, 2012 (Syntactic Awareness) Kieffer & Petscher, 2013 (Morphology/Vocabulary) Justice, Petscher, & Pentimonti, 2013 (Early Literacy)
![Page 39: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/39.jpg)
![Page 40: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/40.jpg)
Study 1Syntactic Awareness
![Page 41: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/41.jpg)
![Page 42: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/42.jpg)
![Page 43: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/43.jpg)
![Page 44: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/44.jpg)
![Page 45: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/45.jpg)
![Page 46: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/46.jpg)
Distribution of Ability
![Page 47: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/47.jpg)
Precision (reliability) of Ability Scores
![Page 48: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/48.jpg)
Predictive Validity of Factor Scores
![Page 49: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/49.jpg)
Study 2Morphological
Awareness/Vocabulary
![Page 50: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/50.jpg)
Morphological Awareness (MA) predicts Reading Comprehension (RC)
For a while, we have known that MA is correlated with reading comprehension (e.g., Carlisle, 2000; Freyd & Baron, 1982; Tyler & Nagy, 1990)
MA RC
![Page 51: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/51.jpg)
MA predicts RC,above & beyond Vocabulary (V)
Unique contributions of MA to RC, controlling for vocabulary (e.g., Carlisle, 2000; Kieffer, Biancarosa, & Mancilla-Martinez, in press; Kieffer & Lesaux, 2008, 2012; Kieffer & Box, 2013; Nagy, Berninger, & Abbott, 2006)
MA RC
V
![Page 52: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/52.jpg)
But wait…
Are we actually measuring MA and vocabulary as separate dimensions of lexical knowledge?Observed correlations between MA and
vocabulary are attenuated by measurement error
Reliability of researcher-created MA measures has been moderate In the .70-.80 range & occasionally lower
So, “unique” contributions of MA beyond V could be an artifact of measurement error
MA V
![Page 53: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/53.jpg)
But wait…
Using Confirmatory Factor Analysis (CFA), Muse (2005) found that MA could not be distinguished from vocabulary in fourth grade, but instead form a unidimensional construct (See also Wagner, Muse, & Tannenbaum, 2007).
Spencer (2012) replicated this finding with eighth graders.
MA/V
![Page 54: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/54.jpg)
On the other hand…
Using CFA, Kieffer & Lesaux (2012) found that MA was measurably separable from two other dimensions of vocabulary, though strongly related for both native English & language minority learners in Grade 6
Neugebauer, Kieffer, & Howard (under review) replicated this finding for Spanish speaking language minority learners in Grades 6-8
MA V
![Page 55: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/55.jpg)
But
Is it possible a multidimensional structure exists but could be best captured by a general factor lexical knowledge and specific factors of morphological awareness and vocabulary?
If the common variance is captured by a general factor as well as specific factors, do they each predict distal outcome?
![Page 56: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/56.jpg)
Modeling Dimensionality of Lexical Knowledge:Unidimensional
Fit poorlyRejected across parametric & nonparametric EFA & CFA models
![Page 57: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/57.jpg)
Modeling Dimensionality of Lexical Knowledge:Two Dimensional
![Page 58: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/58.jpg)
Modeling Dimensionality of Lexical Knowledge:Bi-factor Model
CFI = .98; TLI = .98; RMSEA = .015>1D: Δχ² = 66.71, Δdf = 34, p <.001>2D: Δχ² = 48.94, Δdf = 33, p <.05
![Page 59: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/59.jpg)
Statistical Considerations – Factor Validity
![Page 60: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/60.jpg)
Statistical Considerations – Factor Validity
![Page 61: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/61.jpg)
Statistical Considerations – Factor Validity
![Page 62: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/62.jpg)
Statistical Considerations - SEM
![Page 63: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/63.jpg)
Study 3Early Literacy Skills
![Page 64: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/64.jpg)
SPOT
Measure developed by Jackie Van LankveldEmbedded assessment when students read a
storyUsed primarily with students identified with
LIMeasures
Alphabet knowledge, phonological awareness, print knowledge
Present study N~=300 In this LI sample, how are the item responses best
represented?
![Page 65: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/65.jpg)
Statistical Considerations – Factor Validity
![Page 66: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/66.jpg)
Statistical Considerations – Factor Validity
Model X2 DF CFI TLI RMSEAUnidimensional 223.42 104 0.88 0.86 .064 (.052 ,.075)Multidimensional - 3 factor 146.77 101 0.96 0.95 .040 (.024, .053)Multidimensional - Bi-factor 107.65 89 0.98 0.98 .027 (.000, .044)
![Page 67: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/67.jpg)
![Page 68: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/68.jpg)
Statistical Considerations – Factor Validity
SS
Core
EV
WS
EL
Phon.
Alphabet
-.30***
-.35***
-.46***-.53***
.01-.11
-.08
-.15
.71***
.69***
.62***
.82***
.07.14
-.07
-.03
![Page 69: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/69.jpg)
Implications
Research Multidimensional General Good for individual
differences Limited in applicability
For now (Piasta, Petscher, Anthony, in preparation)
Practice Multidimensional Correlated Traits Good for easy-to-use in
classroom Limited in specificity
Model X2 DF CFI TLI RMSEAUnidimensional 223.42 104 0.88 0.86 .064 (.052 ,.075)Multidimensional - 3 factor 146.77 101 0.96 0.95 .040 (.024, .053)Multidimensional - Bi-factor 107.65 89 0.98 0.98 .027 (.000, .044)
![Page 70: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/70.jpg)
Benchmarking
![Page 71: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/71.jpg)
Statistical Considerations – Benchmarks
Students with poor reading skills have difficulty in closing achievement gaps
Accurate identification is necessary to remediate difficulties
Many assessments include guidelines for cut-points
![Page 72: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/72.jpg)
Sample Risk Levels Chart
![Page 73: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/73.jpg)
How to Validate – Current Theory
Variety of Methods Best Guess
+/- 1SD Percentile Ranks
Simple Stat Bivariate Correlations Interrater Reliability
More Advanced Logistic Regression Discriminant Function Analysis Achievement-IQ Discrepancies
![Page 74: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/74.jpg)
Typical “Diagnostic/Screening” Q’s
WITR between blood characteristics and being HIV positive?
WITR between electromagnetic signals and correctly distinguishing from noise?
WITR between students’ scores on the Scholastic Reading Inventory and future risk on the SAT-10?
![Page 75: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/75.jpg)
What is our question? Correlational?
Bivariate Correlation Interrater Reliability
Discrimination? Logistic Regression Discrimination Function Receiver Operating Characteristic (ROC) Curves
![Page 76: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/76.jpg)
ROC
Graphical representation of operating pointsMultiple indices of efficiencyMoving cut-pointsOutperforms other techniques in diagnostic
efficiency (Hintze, 2005)
![Page 77: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/77.jpg)
Advantages of using ROC
It defines the quality of a test or prediction using a measurement without specifying a cut off value for decision making
Greater flexibility in diagnostic accuracy and predictive power
Assuming Normal distribution The mean and Standard Error can be estimated The 95% CI can be estimated Statistical significance can be determined
Whether one test is better than another can be determined
![Page 78: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/78.jpg)
Old School Discrimination
Form Two GroupsGiven the Test
4 Outcomes People who have the attribute were detected People who have the attribute were not detected People who don’t have the attribute were detected People who don’t have the attribute were not detected
Using the Results
![Page 79: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/79.jpg)
What is a ROC Curve?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sensitivity
![Page 80: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/80.jpg)
What is a ROC Curve?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sensitivity
Based on Cumulative Frequency %
![Page 81: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/81.jpg)
Data Scheme
SRILexile Score
SAT-10 (<40th
%ile)Y-axis
SAT-10 (>=40th
%ile)X-axis
505 35 (.35) 5 (.05)
520 30 (.65) 10 (.15)
550 20 (.85) 20 (.35)
600 10 (.95) 30 (.65)
700 5 (1.00) 35 (1.00)
TOTALS N=100 N=100
![Page 82: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/82.jpg)
What is a ROC Curve?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sensitivity
505 35 (.35) 5 (.05)
520 30 (.65) 10 (.15)
550 20 (.85) 20 (.35)
600 10 (.95) 30 (.65)
700 5 (1.00)
35 (1.00)
![Page 83: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/83.jpg)
Confusion Matrix
A B
C D
SAT10-Score
<40th%ile >=40th
%ile
At-Risk
SRI Score Not At-Risk
![Page 84: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/84.jpg)
Confusion Matrix
A B
C D
FNTP
TP
CA
ASE
At-Risk
SRI Score Not At-Risk
SAT10-Score
<40th%ile >=40th
%ile
![Page 85: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/85.jpg)
Confusion Matrix
A B
C D
FPTN
TN
DB
DSP
At-Risk
SRI Score Not At-Risk
SAT10-Score
<40th%ile >=40th
%ile
![Page 86: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/86.jpg)
Confusion Matrix
A B
C D
FPTP
TP
BA
APPP
At-Risk
SRI Score Not At-Risk
SAT10-Score
<40th%ile >=40th
%ile
![Page 87: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/87.jpg)
Confusion Matrix
A B
C D
TNFN
TN
DC
DNPP
At-Risk
SRI Score Not At-Risk
SAT10-Score
<40th%ile >=40th
%ile
![Page 88: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/88.jpg)
Confusion Matrix
A B
C D
TNFNFPTP
TNTP
DCBA
DAOCC
At-Risk
SRI Score Not At-Risk
SAT10-Score
<40th%ile >=40th
%ile
![Page 89: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/89.jpg)
Confusion Matrix
A B
C D
TNFNFPTP
FNTP
DCBA
CABR
At-Risk
SRI Score Not At-Risk
SAT10-Score
<40th%ile >=40th
%ile
![Page 90: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/90.jpg)
Confusion Matrix
A B
C D
DCBA
DA OCC
DC
D NPP
BA
A PPP
DB
DSP
CA
ASE
At-Risk
SRI Score Not At-Risk
SAT10-Score
<40th%ile >=40th
%ile
![Page 91: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/91.jpg)
Data Scheme
SRILexile Score
SAT-10 (<40th
%ile)Y-axis
SAT-10 (>=40th
%ile)X-axis
505 35 (.35) 5 (.05)
520 30 (.65) 10 (.15)
550 20 (.85) 20 (.35)
600 10 (.95) 30 (.65)
700 5 (1.00) 35 (1.00)
TOTALS N=100 N=100
![Page 92: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/92.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sensitivity
505 35 (.35) 5 (.05)
520 30 (.65) 10 (.15)
550 20 (.85) 20 (.35)
600 10 (.95) 30 (.65)
700 5 (1.00)
35 (1.00)
![Page 93: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/93.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sensitivity
505 35 (.35) 5 (.05)
![Page 94: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/94.jpg)
Classification – Example 1
Evaluation of Cut Scores
505 35 (.35) 5 (.05)
Column NSRI At-Risk Not At-RiskAt-Risk 35 5 40Not At-Risk 65 95 160Row N 100 100 200
SE = .35 PPP = .88SP = .95 NPP = .41FN = .65 OCC = .65FP = .05
SAT-10
![Page 95: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/95.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sensitivity
520 30 (.65) 10 (.15)
![Page 96: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/96.jpg)
Classification – Example 2
Evaluation of Cut Scores
520 30 (.65) 10 (.15)
Column NSRI At-Risk Not At-RiskAt-Risk 65 15 100Not At-Risk 35 85 100Row N 100 100 200
SE = .65 PPP = .81SP = .85 NPP = .71FN = .35 OCC = .75FP = .15
SAT-10
![Page 97: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/97.jpg)
Cut-Point Selection
Column N Column NSRI At-Risk Not At-Risk SRI At-Risk Not At-RiskAt-Risk 65 15 100 At-Risk 35 5 40Not At-Risk 35 85 100 Not At-Risk 65 95 160Row N 100 100 200 Row N 100 100 200
SE = .65 PPP = .81 SE = .35 PPP = .88SP = .85 NPP = .71 SP = .95 NPP = .41FN = .35 OCC = .75 FN = .65 OCC = .65FP = .15 FP = .05
SAT-10SAT-10
Lexile = 520 Lexile = 505
Choose Lexile = 520 Right?
![Page 98: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/98.jpg)
What are you Maximizing?
Properties of the Test - Population Sensitivity Specificity
Properties of the Sample Positive Predictive Power Negative Predictive Power
It’s All About the Base Rate!
![Page 99: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/99.jpg)
Base Rates are Variables Too!!
Screening Test RSN test of Fan Fanaticism SE = .95, SP=.90
Administered to Two Samples Sample 1 – 2,000 people in Boston where 50% have
the problem Sample 2 – 2,000 people in New York where 15% have
the problem
![Page 100: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/100.jpg)
Boston: Base Rate of 50%
Criterion
RSN Jail No Jail Total
At-Risk 950 100 1050
Not At-Risk 50 900 950
1,000 1,000 2,000 Sensitivity = 950/(1,000) = .95 Specificity = 900/(1,000)=.90
PPP =950/(1,050)=.95 NPP=900/(950)=.95
Overall Correct Classification=(950+900)/2,000=.925
![Page 101: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/101.jpg)
NYC: Base Rate of 15%
Criterion
RSN Jail No Jail Total
At-Risk 285 170 455
Not At-Risk 15 1,530 1,545
300 1,700 2,000 Sensitivity = 285/(300) = .95 Specificity = 1,530/(1,700)=.90
PPP =285/(455)=.63 NPP=1,530/(1,545)=.99
Overall Correct Classification=(285+1,530)/2,000=.91
![Page 102: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/102.jpg)
Comparison
Indices Boston NYCSE 0.95 0.95SP 0.90 0.90PPP 0.95 0.63NPP 0.95 0.99OCC 0.93 0.91
![Page 103: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/103.jpg)
Comparison
Indices Boston NYCSE 0.95 0.95SP 0.90 0.90PPP 0.95 0.63NPP 0.95 0.99OCC 0.93 0.91
![Page 104: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/104.jpg)
Applied to All 1st Graders in Florida Base Rate 15%
Criterion
Screen Problem No Problem Total
At-Risk 26,754 15,958 42,712
Not At-Risk 1,408 143,626 145,034
28,162 159,584 187,746 Sensitivity = 26,754/(28,162) = .95
Specificity = 143,626/(159,584)=.90
PPP =26,754/(42,712)=.63 NPP=143,626/(145,034)=.99
Overall Correct Classification=(26,754+143,626)/187,746=.91
![Page 105: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/105.jpg)
Statewide Screening
If we had employed a test with those measurement properties statewide to detect children who where at-risk for reading problems, we would have mislabeled around 16,000 kids as at-risk who weren’t (37%).
However, we would only have only missed 1,400 students who missed potential services that needed them (1%).
![Page 106: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/106.jpg)
Concluding Thoughts - Reliability
Researchers Evaluating other methods of reliability
Precision Generalizability
Practitioners What is being reported?
Internal consistency, test-retest, etc How reliable is it?
Nunnally/Bernstein >.80 research >.90 clinical
![Page 107: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/107.jpg)
Concluding Thoughts – Factor Validity
Researchers Testing additional specifications outside of the
traditional 1/multi framework Bi-factor, Causal Indicator, etc.
Practitioners What type of factor analysis was done?
EFA/CFA Rules of thumb?
Too many 200?
![Page 108: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/108.jpg)
Concluding Thoughts - Benchmarking
Researchers Improve the rigor of our methods
ROC, Diagnostic Measurement, Cost Curves
Practitioners Identify what “at-risk” means Establish the goal of the screening process Study how the screen was developed Determine the base rate Attend to the +/- predictive power Collect local data
![Page 109: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/109.jpg)
Implications of these Considerations
We must be careful in how we choose assessments AYP Value-added modeling Promotion/Retention
Moving toward a new phase in assessments Computer-delivered Computer-adaptive
Smarter Balanced, FCRR, RFU
Be more aware of what other disciplines are doingBe more aware of what’s in older literature
Technology!
![Page 110: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/110.jpg)
Great Resources
IRT The Theory and Practice of Item Response Theory (De
Ayala) Fundamentals of Item Response Theory (Hambleton et
al.)Factor Analysis
CFA for Applied Research (Brown)SEM
Beginner’s Guide to SEM (Schumacker & Lomax)ROC analysis
Analyzing ROC Curves with SAS (Gonen)
![Page 111: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/111.jpg)
Resources
Shameless Plug IRT
R.J. De Ayala Factor Analysis
Rex Kline SEM
Richard Lomax Benchmarking
Chris Schatschneider
![Page 112: Statistical Considerations for Educational Screening & Diagnostic Assessments](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813e78550346895da89be4/html5/thumbnails/112.jpg)
End