![Page 1: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/1.jpg)
The Art and Science of Test Development—Part F
Psychometric/technical statistical analysis: Internal
The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
Kevin S. McGrew, PhD.
Educational Psychologist
Research DirectorWoodcock-Muñoz Foundation
![Page 2: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/2.jpg)
Part A: Planning, development frameworks & domain/test specification blueprints
Part B: Test and Item Development
Part C: Use of Rasch Technology
Part D: Develop norm (standardization) plan
Part E: Calculate norms and derived scores
Part F: Psychometric/technical and statistical analysis: Internal
Part G: Psychometric/technical and statistical analysis: External
The Art and Science of Test Development
The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence.
The current module is designated by red bold font lettering
![Page 3: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/3.jpg)
“In god we trust….all others must show data” (unknown source)
Test authors and publishers have standards-based
responsibility to provide supporting psychometric technical information re:
tests and battery
Typically in the form of a series of technical chapters in manual or a
separate technical manual
![Page 4: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/4.jpg)
Calculate psychometric/measurement statistics for technical manual/chapters
Use Joint Test Standards as a guide
![Page 5: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/5.jpg)
g
Gf Gv Glr Gs
Gc Gsm Ga
Theoretical Domain - CHC
Measurement or empirical domain
Internal evidence is focused on
relations between and
among variables
(measures or latent
constructs) within the designed battery
![Page 6: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/6.jpg)
Calculate summary statistics (n, means, SDs, SEM) and reliabilities for all tests and clusters by technical age groups
etc…
etc…
![Page 7: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/7.jpg)
Special reliability analyses required for speeded tests
Traditional test-retest reliability analysis
![Page 8: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/8.jpg)
Special reliability analyses for all tests
More complex repeated measures reliability analysis(McArdle and Woodcock, 1989—see WJ-R Technical Manual)
![Page 9: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/9.jpg)
Provide evidence based on internal structure (internal validity)
![Page 10: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/10.jpg)
Structural (Internal) Stage of Test Development
Purpose Examine the internal relations among the measures used to operationalize the theoretical construct domain (i.e., intelligence or cognitive abilities)
Questions asked Do the observed measures “behave” in a manner consistent with the theoretical domain definition of intelligence?
Method and concepts Internal domain studies Item/subscale intercorrelations Exploratory/confirmatory factor analysis
Characteristics of strong test validity program
• Measures co-vary in a manner consistent with the intended theoretical structure
• Factors reflect trait rather than method variance• Items/measures are representative of the empirical
domain
![Page 11: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/11.jpg)
etc…
Structural/internal validity evidence: Test and cluster inter-correlation matrices by technical age groups
etc…
![Page 12: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/12.jpg)
Structural/internal validity
Confirmatory factoranalysis by major
age groups
(exploratory factor analysis if not
theory-driven test blueprint)
![Page 13: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/13.jpg)
Structural/internal validity Confirmatory factoranalysis by major age groups
(exploratory factor analysis if not theory-driven test blueprint)
.67
.53
.40
.42
.43
![Page 14: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/14.jpg)
![Page 15: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/15.jpg)
Structural (Internal) Stage of Test Development
Purpose Examine the internal relations among the measures used to operationalize the theoretical construct domain (i.e., intelligence or cognitive abilities)
Questions asked Do the observed measures “behave” in a manner consistent with the theoretical domain definition of intelligence?
Method and concepts Exploratory/confirmatory factor analysis
Characteristics of strong test validity program
• The theoretical/empirical model is deemed plausible (especially when compared against other competing models) based on substantive and statistical criteria
![Page 16: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/16.jpg)
Fit StatisticsModels Chi-square df AIC RMSEA
WJ III CHC 7-factor 13189.16 536 13377.16 0.056 (0.055-0.057)Gc/Gsm/Gs/Gv+Gf (WAIS 4-factor) 15113.99 537 15301.00 0.060 (0.059-0.061)Gc/Gsm/Gq/Gv+Gf (SB IV 4-factor) 20379.58 537 20565.58 0.070 (0.069-0.071)Gf-Gc Dichotomous (KAIT) 23145.12 549 23307.12 0.074 (0.073-0.075)PASS 4-factor * 25198.46 542 25374.46 0.077 (0.078-0.079)g single factor 65314.78 1170 65524.78 0.086 (0.085-0.086)Null model 215827.54 1219 215939.54 0.153 (0.153-0.154)
The WJ III factor structure model provided the best fit to the data when compared to six alternative models
The conclusion was the same across 5 age-differentiated samples
Structural/internal validity: Confirmatory factoranalysis model comparisons by major age groups
![Page 17: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/17.jpg)
WJ III General Intellectual Ability (GIA) as a differentially weighted measure of g (general intelligence)
Therefore need to provide internal validity evidence for test g-weights
Tests at this end are weighted (“counted”) more in the GIA score
1
GIA(g)
![Page 18: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/18.jpg)
Internal validity evidence example: g-loadings for differentially weighted General Intellectual Ability cluster
![Page 19: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/19.jpg)
Provide evidence based on internal structure: Developmental evidence?
![Page 20: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/20.jpg)
Developmental evidence in the form of differential growth curves of measures
![Page 21: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/21.jpg)
Provide Test Fairness Evidence
![Page 22: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/22.jpg)
=
White Non-White
Structural/internal validity
Evaluating structural invariance with Multiple Group CFA
![Page 23: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/23.jpg)
=
Male Female
Structural/internal validity
Evaluating structural invariance with Multiple Group CFA
![Page 24: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/24.jpg)
=
Hispanic Non-Hispanic
Structural/internal validity
Evaluating structural invariance with Multiple Group CFA
![Page 25: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/25.jpg)
Test fairness evidence: Item Level Analyses: Differential Item Functioning (DIF)
•Male/Female
•White/Non-White
•Hispanic/Non- Hispanic
![Page 26: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/26.jpg)
•Male/Female
•White/Non-White
•Hispanic/Non- Hispanic
Results combined with results from Bias
Sensitivity Review Panels
Test fairness evidence: Item Level Analyses: Differential Item Functioning (DIF)
![Page 27: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/27.jpg)
Lack of rigor and quality control in all prior/earlier stages will “rattle through the data” and rear its ugly head when performing the final statistical analysis
Shorts cuts in prior stages will “bite you in in the ____” as you attempt to perform final statistical analysis
Data screening, data screening, data screening!!!!……. prior to do performing final statistical analysis
• Compute extensive descriptive statistical analysis for all variables (e.g., histograms, scatterplots, box-whisker plots, etc.)
• More than means and SD’s. Also calculate median, skew, kurtosis, n-tiles, etc.
Deliberately planned and sophisticated “front end” data collection short-cuts (e.g., matrix sampling) introduce an extreme level of “back end” complexity to routine statistical/psychometric analysis
Know your limits, level of expertise, and skills. Even those with extensive test development experience often need access to trusted measurement/statistical consultants (cont. next slide)
![Page 28: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/28.jpg)
Don’t be seduced and completely reliant on factor analysis as the primary internal/structural validity tool
• An example: Inability of CFA to differentiate closely related latent constructs (e.g., Gc and Reading/Writing—Grw) doesn’t prove they are the same. Need to examine other evidence (e.g., very different developmental growth curves for Gc and Grw)
Published statistics/psychometric information needs to be based on final publication length tests
• Often need to use test-length correction formula’s (e.g., KR-21) for test reliabilities
• Correlations between short /and or long norming versions of a test, that differ in test length (number of items) from publication length test, may need special adjustments/corrections.
Back up, back up, back up!!!!!!!!!! Don’t let a dead hard drive or computer destroy your work and progress. Do it constantly. Build redundancy into your files and people skill sets
Sad fact: Majority of test users do NOT pay attention to the fancy and special psychometric/statistical analysis you report in technical chapters or manuals. Be prepared for post-publication education via other methods.
Post-manual publication technical reports of special/sophisticated analyses are good when publication time-line pressures dictate making difficult decisions.
![Page 29: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/29.jpg)
Exploratory-driven confirmatory factor analysis is often used by test developers to explore unexpected characteristics of tests (often called “model generation modeling” in SEM/CFA literature)
Different approaches to DIF (differential item functioning)
Multiple group CFA to test invariance (by age, by gender, by……..)
• Different degrees of measurement invariance can be tested
Traditional definition of psychometric bias and appropriate/inappropriate statistical methods
Equating (e.g., Form A/B) methods and evidence
Methods for calculating prediction models that account for regression to the mean and that are sensitive to developmental (age) X content interactions
Complex repeated measures reliability analyses to tease out test stability, internal consistency, and trait stability sources of score variance (see WJ-R Technical Manual)
![Page 30: Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal](https://reader033.vdocuments.site/reader033/viewer/2022052821/554935f5b4c905054d8b46f9/html5/thumbnails/30.jpg)
End of Part F
Additional steps in test development process will be presented in subsequent modules as they are developed