how to evaulate the unusualness (base rate) of wj iv cluster or test score differences: it is a...

Download How to evaulate the unusualness (base rate) of WJ IV cluster or test score differences:  It is a pleasure to use the correct measure

If you can't read please download the document

Upload: kevin-mcgrew

Post on 09-Jan-2017

2.918 views

Category:

Education


4 download

TRANSCRIPT

PowerPoint Presentation

Its a pleasure when you use the correct measureHow to evaluate the unusualness (base rate) of WJ IV cluster or test standard score differences

Kevin McGrew, PhD.Educational/School PsychologistDirectorInstitute for Applied Psychometrics (IAP) Institute for Applied Psychometrics; Kevin McGrew 11-23-15

The content of this presentation represents the work and opinions of Dr. Kevin McGrew and does not necessarily reflect the opinions of all the WJ IV authors or the publisher of the WJ IV (HMH)

Also note that in the examples provided, interpretation uses the standard score (SS) metric. The preferable metric for understanding performance on the WJ IV measures is the Relative Performance Index (RPI). However, since the question that is addressed is how unusual must two test/cluster scores be from each other before I consider them to represent a meaningful and unusual difference?, the SS metric must be used as this is not possible when using the RPI metric.

Three primary models for evaluating score differences (Payne & Jones, 1957)

www.iapsych.com/articles/payne1957.pdf

Its a pleasure when you use the correct measure

Evaluating a prediction (Payne & Jones, 1957). If the difference implies a predictive relationship, then regression to the mean needs to be accounted for and the proper statistic is the SE(est).

Evaluating the reliability of a difference score (Payne & Jones, 1957). If the difference is a simple difference score, and the tests measure rather different traits (e.g., not within same broad CHC domain; low correlation/cohesion), then one can use the reliability of difference scoresSE(diff).

Evaluating abnormality (base rate) of a difference score (Payne & Jones, 1957). If difference is a simple difference score, and the explicit emphasis is on the cohesiveness (correlation) of tests within a composite/CHC domain, then the SD(diff) is a better statistic.

Three primary models for evaluating score differences

Its a pleasure when you use the correct measure Institute for Applied Psychometrics; Kevin McGrew 11-23-15

Simple Difference (XY)Prediction Error (Y)

ReliabilityAre these 2 scores different?Is this outcome different from expectations?

Abnormality (base rate)How unusual is it for these 2 scores to differ by this much?How unusual is it for this outcome to differ from expectations by this much ?

Reliability (Is there a difference?) vs. Abnormality (How unusual is the difference?)

(Distinction and table courtesy of Dr. Joel Schneider) Institute for Applied Psychometrics; Kevin McGrew 11-23-15

Evaluating a prediction (Payne & Jones, 1957). If the difference implies a predictive relationship, then regression to the mean needs to be accounted for and the proper statistic is the SE(est).

Its a pleasure when you use the correct measurePredicted scorePredictorObtained scoreDifference score-(minus)=The WJ IV Variation and Comparison procedures use a prediction model

SE(est) Institute for Applied Psychometrics; Kevin McGrew 11-23-15

WJ IV Comparison OptionsGIA/AchievementScholastic Aptitude/AchievementGf-Gc/Achievement/other cog.-ling. abilitiesBroad Oral Language/AchievementAcademic Knowledge/AchievementFive ability/achievement difference score procedures to help compare ability to current levels of achievement[Procedures account for regression-to-the mean (and how it varies by age)] Institute for Applied Psychometrics; Kevin McGrew 11-23-15

WJ IV Variation OptionsIntra-cognitive based on COG Tests 17Intra-achievement Based on ACH Tests 16 Based on Academic Skills, Academic Fluency, and Academic Applications clustersIntra-oral language based on OL Tests 14Four variation procedures to help document an individuals pattern of strengths and weaknesses. Based on core tests in each battery

Institute for Applied Psychometrics; Kevin McGrew 11-23-15

Evaluating the reliability of a difference score (Payne & Jones, 1957). If the difference is a simple difference score, and the tests measure rather different traits (e.g., not within same broad CHC domain; low correlation/cohesion), then one can use the reliability of difference scoresSE(diff).

Its a pleasure when you use the correct measureScore AScore BDifference score-(minus)=

Correlation (cohesion) ignored. Reliabilities of scores are used.

SE(diff) Institute for Applied Psychometrics; Kevin McGrew 11-23-15

Range of scores that contain examinees true score at a 68% level of confidence (+/- 1 SEM)Evaluate significance of difference between any 2 tests of clusters (statistical probability statements)SS 160PR 99.9

SS 160PR 99.9

If confidence bands overlap, assume no significant difference exists.WJ IV Standard Score/Percentile Rank Profiles

NOTES

SS 160PR 99.9

If separation between bands is less than the width of the wider band, assume a possible significant difference exists.SS 160PR 99.9

If separation between bands is greater than the width of the wider band, assume a significant difference exists.SS 160PR 99.9

SS 160PR 99.9

Woodcocks three rules-of-thumb for evaluating difference scores based on SEdiff

Simple Difference (XY)Prediction Error (Y)

ReliabilityAre these 2 scores different?Is this outcome different from expectations?

Abnormality (base rate)How unusual is it for these 2 scores to differ by this much?How unusual is it for this outcome to differ from expectations by this much ?

Reliability (Is there a difference?) vs. Abnormality (How unusual is the difference?)

(Distinction and table courtesy of Dr. Joel Schneider) Institute for Applied Psychometrics; Kevin McGrew 11-23-15The focus of the next slides

Its a pleasure when you use the correct measureScore AScore BDifference score-(minus)=Correlation (cohesion) accounted for.Evaluating abnormality (base rate) of a difference score (Payne & Jones, 1957). If difference is a simple difference score, and the explicit emphasis is on the cohesiveness (correlation) of tests within a composite/CHC domain, then the SD(diff) is a better statistic (sometimes still called the SE(diff) that utilizes the measures correlation and not their reliabilities).

SD(diff) Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

Ability Domain Cohesion (McGrew, 2002, 2008, 2011, 2012)CHC factors and test composites are a constellation or combination of elements that are related (correlated) and are combined together in a functional fashion Implies a form of a centrally inward directed force that pulls elements together much like magnetism (high inter-correlations of tests)

www.iapsych.com/articles/mcgrew2012.pdf Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

Cohesion appears the most appropriate term for this form of multiple element bonding. Cohesion is defined, as per the Shorter English Oxford Dictionary (Brown, 2002), as the action or condition of sticking together or cohering; a tendency to remain united (Brown, 2002, p. 444).

Element bonding and stickiness are also conveyed in the APA Dictionary of Psychology (VandenBos, 2007) definition of cohesion as the unity or solidarity of a group, as indicated by the strength of the bonds that link group members to the group as a whole (p. 192).

Cohesion definitions(McGrew, 2012) Institute for Applied Psychometrics; Kevin McGrew 11-23-15www.iapsych.com/articles/mcgrew2012.pdf

The WJ IV provides comparison and variation procedures based on the predictive score comparison model (SEest), as well as the ability to compare individual tests or cluster scores based on the simple difference model based on reliabilities (SEdiff); the confidence band overlap rules of thumb).

However the authors did not provide a means to evaluate the abnormality (base rate) of two cluster/test standard score differences based on the ability cohesion model.

What can I do?

Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

First conceptually understand the issue and the appropriate score difference model Institute for Applied Psychometrics; Kevin McGrew 11-23-15

WISC-IV within domain/composite scaled score (M=10; SD = 3) comparisonAverage correlation between tests(Table 5.1 tech. manual)1 SD(diff)1.5 SD(diff)VCI (Gc) - Sim/Vocab.742.23.3VCI (Gc) - Vocab/Comp.682.43.6VCI (Gc) - Sim/Comp.622.63.9PRI (Gv/Gf) - BD/MR.552.84.2WMI (Gsm) DS/LNS.493.04.5PRI (Gv/Gf) - BD/PicCn .413.24.8PRI (Gv/Gf) PicCn/MR.423.24.8

Commonly used 1 SD (3) or 1.5 SD (5) scaled score points on WISC-IV tests is not accurate for all potential test score difference comparisons (when using an ability cohesion score difference model)

Gc domain/composite is tight/cohesive (highly inter-correlated)

Note: Equation includes correlation of tests which addresses the cohesion, inter-correlation, or unitary/non-unitary characteristics of composite/ability3(Note. The WISC-IV statistical significance tables and software generated values are correct and reflect the simple score SE(est) difference model. The above is a recommended alterative difference score method within CHC domains (ability cohesion model) Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

20

WJ III within domain/composite scaled score (M=100; SD = 15) comparisonAverage correlation between tests(Computed by KMcGrew in norm data)1 SD(diff)1.5 SD(diff)Gc - Verb Comp/Gen Info.789.914.8Gf - Anl Syn/Conc Form.5514.221.3Gs - Vis Match/Dec Speed.5414.421.4Gsm - Num Rev/Mem Wrds .4016.424.6Ga - Snd Blend/Aud Attn.3616.024.0Glr - VAL/Ret Fluency.2718.127.2Gv - Spat Rels/Pic Recog.2118.828.2

Commonly used 1 SD (15) or 1.5 SD (23) standard score points on WJ III (or WJ IV) tests is not accurate for all potential test score difference comparisons

Gc domain/composite is tight/cohesive (highly inter-correlated)Glr & Gv domains/composites are loose or broad or weakly inter-correlated Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

If difference/discrepancy is for a simple difference score, and the explicit emphasis is on the cohesiveness of tests within a composite/CHC domain, then the SD(diff) is the better statistic (McGrew, 2011, 2012)Ability domain cohesion, or the degree of inter-correlation of abilities/tests within a ability domain/composite.

Remember:

If the domain is loose, SD=15 SS (SD=3 ss) will cook your goose

If the domain is tight, SD=15 SS (SD=3 ss) will not be right

vive la diffrence long live the SD(diff) Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

Latest XBA approach has adopted ability domain cohesion concept and related statistical score comparison methods

Second either become good friends with Appendices E/F in the WJ IV Technical Manual.

or use the following simplified tools and guides provided by Dr. Kevin McGrew Institute for Applied Psychometrics; Kevin McGrew 11-23-15

Ages 6 to 19

Correlations range from .82 to .89 (very similar); Mdn = .87;

75.7 % shared varianceSince the SE(diff) is based on the correlation between measures, find the respective WJ IV measure correlations in the WJ IV TM. Or, since the correlations from ages 6 to 19 (school age) do not differ much developmentally, use the average correlation for this age range.

Either compute the average (median) correlation across ages 6-8, 9-13, 14-19 (see TM) or.Use the average value computed across ages 6 to 19 in the WJ IV norm data (provided by Kevin McGrew in these slides)e.g. GIA/Scholastic Aptitude Cluster relations

Relations between WJ IV GIA and Scholastic Aptitude clusters Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

Ages 6 to 19

Correlations range from .82 to .89; Mdn = .87;

75.7 % shared variance

= 15 * [SQRT(2-2*.87)

= 7.6

How does this equation-based value correspond to a value calculated in the actual WJ IV norm data? Institute for Applied Psychometrics; Kevin McGrew 11-23-15

GIA_RDGAPADIFFGIA_RDGAPBDIFFGIA_MTHAPADIFFGIA_MTHAPBDIFFGIA_WRTAPADIFFGIA_WRTAPBDIFFN of Cases4,2124,2124,2064,2034,2124,212Minimum-34.82-25.33-32.50-33.33-26.50-25.24Maximum28.2730.5226.5037.9431.5232.20Arithmetic Mean-0.09-0.34-0.35-0.47-0.34-0.17Standard Dev.8.047.427.859.337.887.52

Mdn SD (SDdiff) approx. 7.7

Equation value approx. 7.6

WJ IV GIA SAPT distributions and sum. stats (ages 6 to 19)

Institute for Applied Psychometrics; Kevin McGrew 11-23-15

Lets check another example

WJ IV GIA Gf+Gc cluster distributions and sum. stats (ages 6 to 19)

r = .86

= 7.9

GIAGFGCDIFFN of Cases4,211Minimum-27.16Maximum30.73Median-0.28Arith Mean-0.34Standard Dev8.05

Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

WJ IV GIA Gf+Gc cluster difference score distributionsMedian SD values by ageConclusion: There is no systematic developmental (age) variation in the SD values calculated in the WJ IV norm data. Therefore, a single approximate value (8) is useful for clinical evaluation of GIA-Gf+Gc cluster score differences

WJ IV COG Gc tests (OV-GI) and Gf tests (NS-CF) difference score significance values (ages 6-19)r = .71r = .47OV_GI_DIFFNS_CF_DIFFN of Cases4,2114,212Minimum-51.00-68.47Maximum46.4359.88Arithmetic Mean0.400.24Standard Deviation11.9516.22

Equation value = 11.4Equation value = 15.4

Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15

Conclusion: Equation-based and WJ IV norm data-based SDdiff values provide approximately similar values. Institute for Applied Psychometrics; Kevin McGrew 11-23-15

Institute for Applied Psychometrics; Kevin McGrew 11-23-15.71/18/20Correlation between clusters or tests

SD(diff) 1.50 ( 13 % base rate)

SD(diff) 1.65 ( 10% base rate)Select WJ IV COG cluster/test score significance values (ages 6-19)

Key to numbers in next clinical aid slide figure

33

Oral Vocabulary

General InformationNumber Series

Concept FormationVerbal Attention

Number ReversedStory Recall

Vis-Auditory LearningVisualization

Picture RecogntionLet-Pattern Matching

Pair CancellationPhonologicalProcessing

Nonword Repetition Institute for Applied Psychometrics; Kevin McGrew 11-23-15GIA (7 tests)SAPTs (4 tests)Gf+Gc (4 tests)BIA (2 tests).87/12/13.86/12/13.71/18/20Gc-ExtGc.97/5/6.47/24/27Gf-Ext.94/8/9Gf.47/24/27Gwm-Ext.94/8/9GwmGlr.34/27/30.43/25/28.37/27/29.60/21/24CorrelationSD(diff) 1.50 ( 13 % base rate)SD(diff) 1.65 ( 10% base rate)GvGaGs.94/8/9Select WJ IV COG cluster/test score rule-of-thumb significance values (ages 6-19) ** Rounded valuescalculated in WJ IV norm data(ages 6 to 19)

34

How to interpret the base rate rule-of-thumb figure on prior slide: GIA/Gf+Gc exampleHow big of a SS difference is needed between a persons GIA and Gf+Gc cluster scores before I can consider the difference rare and meaningful?

If 1.5 (13 % base rate) is your rule, then the GIA/Gf+Gc difference must be approximately + 12 points or more.

If 1.65 (10 % base rate) is your rule, then the GIA Gf+Gc difference must be approximately + 13 points or more.

How to interpret the base rate rule-of-thumb figure on prior slide: Gf cluster exampleHow big of a SS difference is needed between a persons Number Series and Concept Formation scores (Gf cluster) before I can consider the difference rare and meaningful?

If 1.5 (13 % base rate) is your rule, then the Number Series/Concept Formation difference must be approximately + 24 points or more.

If 1.65 (10 % base rate) is your rule, then the Number Series/Concept Formation difference must be approximately + 27 points or more.

The required magnitude of SS differences required varies by degree of correlation (cohesion) between the two measuresNote that the critical base rate values for a cluster with highly correlated tests (Gc; r = .71; 18/20) are much smaller than for a cluster with tests that are more weakly correlated (Gf; r=47; 22/27)

What to do for comparisons not listed on prior slide?

Look up correlation in WJ IV Technical Manual (e.g. r = .71)

Use following nomograph Institute for Applied Psychometrics; Kevin McGrew 11-23-15

1.65 ( 10% base rate)1.50 ( 13 % base rate)1.00 ( 32 % base rate)SD(diff) by measure correlation nomograph Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15171911.8 12

1.65 ( 10% base rate)1.50 ( 13 % base rate)1.00 ( 32 % base rate)SD(diff) by measure correlation nomograph Institute for Applied Psychometrics (IAP)Dr. Kevin McGrew 11-20-15www.iapsych.com/articles/sddiffgraph.pdf

Courtesy of Dr. JoelSchneider