investigating the statistical and cognitive dimensions in large-scale science assessments
DESCRIPTION
Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments. CESC-SSHRC Symposium 2005 Jacqueline P. Leighton. Acknowledgments. Canadian Education Statistics Council (CESC) Social Sciences and Humanities Research Council (SSHRC) - PowerPoint PPT PresentationTRANSCRIPT
Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments
CESC-SSHRC Symposium 2005Jacqueline P. Leighton
Acknowledgments
Canadian Education Statistics Council (CESC)
Social Sciences and Humanities Research Council (SSHRC)
Ms. Rebecca J. Gokiert, Ms. Ying Cui CRAME colleagues
Overview
Rationale Materials—SAIP Science 99 Methods & Results
Phase 1 Methods & Results
Phase 2 Implications for Policy
Rationale
To identify the dimensional structure of the School Achievement Indicators Program (SAIP) Science Assessment
To find support (or not) for the view that science performance is associated with multiple and distinct thinking skills
Materials—SAIP Science 99
A dichotomously scored two-stage test
Administered to students in both Grade 8 and Grade 11 (13- and 16-year-olds)
6 content domains 5 ability levels
Materials—SAIP Science 99
ROUTING TEST A
TEST B TEST C
TEST AB TEST AC
Method—Phase 1: Exploratory
Dimensionality test or DIMTEST (Stout et al., 2001) is a nonparametric procedure used to test the null hypothesis that a set of test data is unidimensional
Methods—Phase 1: Exploratory
EFA of the tetrachoric correlations was conducted, using 5 recommended decision rules
The factors retained were rotated using orthogonal rotation procedures (i.e., quartimax, varimax) and an oblique transformation procedure (i.e., direct oblimin)
Results—Phase 1: DIMTEST
Test Section
Section AB Section AC
Sample size for
FAC
Sample
size for T
T P
Sample size
for FAC
Sample
size for T
T P
13-year-old
group11000 2054
4.8891
0.0000
800 21034.1602
0.0000
13-year-old
group21000 2054
5.4745
0.0000
800 21034.3294
0.0000
16-year-old
group1800 1200
3.5009
0.0002
1000 30104.2448
0.0000
16-year-old
group2800 1200
3.5425
0.0002
1000 30105.9118
0.0000
Results—Phase 1: EFA
EFA Results Decision rules indicated two factors Oblique results interpreted because
factors shared low to moderate correlations (range of .014 to .384)
Method—Phase 2: Confirmatory
Common shortcoming with EFA is the sparse description of the factors found to underlie the data (Haig, 2005)
For each item with a loading equal to or greater than 0.3, the following information was recorded: First five to ten words of the test question, Specific factor on which the item loaded Content standard or objective Ability level of the item
Methods—Phase 2: Confirmatory
Preliminary analyses of the AB and AC tests suggested that the two factors tapped student reasoning about causes and effects and student reasoning about category membership
Methods—Phase 2: Confirmatory
Recently published review article (2004) by Deanna Kuhn and David Dean Jr.. In the ongoing process of managing
and reducing the complexity of information from the external environment, individuals typically make use of two forms of inference—causal and non-causal
Methods—Phase 2: Confirmatory
SAIP items were reviewed and coded according to whether they contained primarily causal or categorical-type key words
We used key introductory words such as “why,” “how,” “cause/effect,” “what,” “which,” or “identify” to code items as either primarily causal or primarily categorical
Methods—Phase 2: Confirmatory
Influence of item format on students’ interpretation of the item as requiring causal versus categorical reasoning SAIP items also coded according to item
format Format might function as a proxy for
invoking either causal or categorical reasoning
Methods—Phase 2: Confirmatory
Linear factor analysis with LISREL to estimate the parameters for a 2-dimensional model associated with the Causal-Categorical Model (CCM) the Item Format Model (IFM)
Linear factor analysis to estimate the parameters for a 6-dimensional model using item coding associated with the Test Specifications Model (TSM)
Results—Phase 2: Confirmatory
Using recommended fit indices (Gierl & Rogers, 1996), none of the models fit the AB test data adequately
For the AC data, the IFM provided a consistently better fit than the CCM and TSM
Policy Implications
Multidimensional latent structure of the SAIP Science Assessment Distinct forms of thinking in science Sub-scores might be a better form of
score reporting for SAIP and similar large-scale assessments
Policy Implications
Superiority of the Item Format Model in confirmatory factor analysis Item format may function to elicit
distinct forms of reasoning in science—causal and categorical
Policy Implications
Use of SAIP sub-scores to measure and gauge improvements in specific forms of reasoning in students
Test design and feedback that is focused on cognitive skills as well as content