running head: ability tests and …jml0035/index_files/predverbal_ell.pdf · assessing the...

32
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS Assessing the Cognitive Abilities of Culturally and Linguistically Diverse Students: Predictive Validity of Verbal, Quantitative, and Nonverbal Tests Joni M. Lakin Auburn University Draft of August 9, 2011 This manuscript was published in Psychology in the Schools. Author Note Joni M. Lakin, Department of Educational Foundations, Leadership, and Technology, Auburn University. The data analyzed in this paper were collected as part of the Project Bright Horizon project, which was sponsored by a Jacob K. Javits Gifted and Talented Education grant to the Project Bright Horizon Research Team: Peter Laing, Project Director/Co–Principal Investigator,Washington Elementary School District, Phoenix, AZ; Dr. Jaime Castellano, Project Consultant; and Dr. Ray Buss, Arizona State University at the West Campus, Principal Investigator. The views and opinions expressed in this article are those of the author and should not be ascribed to any members of the Project Bright Horizon staff or its consulting partners. The author gratefully acknowledges the helpful suggestions of David Lohman, John Young, Brent Bridgeman, Don Powers, and Dan Eignor on earlier drafts of this article. Correspondence concerning this article should be addressed to Joni Lakin, Department of Educational Foundations, Leadership, and Technology, Auburn University, Auburn, AL 36831. Email: [email protected]

Upload: doanthu

Post on 07-Sep-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS

Assessing the Cognitive Abilities of Culturally and Linguistically Diverse Students:

Predictive Validity of Verbal, Quantitative, and Nonverbal Tests

Joni M. Lakin

Auburn University

Draft of August 9, 2011

This manuscript was published in Psychology in the Schools.

Author Note

Joni M. Lakin, Department of Educational Foundations, Leadership, and Technology, Auburn University.

The data analyzed in this paper were collected as part of the Project Bright Horizon project, which was sponsored by a Jacob K. Javits Gifted and Talented Education grant to the Project Bright Horizon Research Team: Peter Laing, Project Director/Co–Principal Investigator,Washington Elementary School District, Phoenix, AZ; Dr. Jaime Castellano, Project Consultant; and Dr. Ray Buss, Arizona State University at the West Campus, Principal Investigator. The views and opinions expressed in this article are those of the author and should not be ascribed to any members of the Project Bright Horizon staff or its consulting partners. The author gratefully acknowledges the helpful suggestions of David Lohman, John Young, Brent Bridgeman, Don Powers, and Dan Eignor on earlier drafts of this article.

Correspondence concerning this article should be addressed to Joni Lakin, Department of Educational Foundations, Leadership, and Technology, Auburn University, Auburn, AL 36831. Email: [email protected]

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 2

Abstract

Verbal and quantitative reasoning tests provide valuable information about cognitive abilities

that are important to academic success. Information about these abilities may be particularly

valuable to teachers of students who are English-language learners (ELL), because leveraging

reasoning skills to support comprehension is a critical aptitude for their academic success.

However, due to concerns about cultural bias, many researchers advise exclusive use of

nonverbal tests with ELL students despite a lack of evidence that nonverbal tests provide greater

validity for these students. In this study, a culturally and linguistically diverse sample of students

were administered a test measuring verbal, quantitative, and nonverbal reasoning. The two-year

predictive relationship between ability and achievement scores revealed that nonverbal scores

had weaker correlations with future achievement than quantitative and verbal reasoning ability

for ELL and non-ELL students. Results do not indicate differential prediction and do not support

the exclusive use of nonverbal tests for ELL students.

Keywords: Cognitive ability testing, English-language learners, Hispanic students, Validity

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 3

Assessing the Cognitive Abilities of Culturally and Linguistically Diverse Students:

Predictive Validity of Verbal, Quantitative, and Nonverbal Tests

Cognitive ability tests that measure verbal, quantitative, and nonverbal reasoning skills

are widely used by schools to provide valuable information to teachers hoping to differentiate

instruction to their students’ cognitive strengths (Lohman, 2009; Lohman & Hagen, 2001b).

Verbal and quantitative reasoning skills are particularly critical for academic success because of

the heavy reliance on these skills in traditional academic domains. This may be even more true

for English-language learner (ELL) students, for whom leveraging verbal reasoning skills to

support comprehension is a critical aptitude for school success and language acquisition. For

example, knowing an ELL student has relatively weak verbal reasoning skills, a teacher might

provide that student with more linguistic support than other ELL students need. However,

despite the potential utility of such information, many researchers advise the exclusive use of

nonverbal tests with ELL students, suggesting that the linguistic and cultural demands of the

items creates measurement bias (Lewis, 2001; McCallum, Bracken, & Wasserman, 2001;

Naglieri & Ronning, 2000). While this argument is persuasive to many, there is little direct

evidence that nonverbal tests provide more useful information about ELL students’ academic

aptitude than verbal or quantitative tests.

This study evaluated the validity and fairness of the Cognitive Abilities Test (CogAT,

Form 6; Lohman & Hagen, 2001a) in predicting reading and math achievement in a sample of

Hispanic ELL and Hispanic and White non-ELL students. The CogAT consists of verbal,

quantitative, and nonverbal batteries. These batteries have been found to provide strong

predictive validity for achievement in non-ELL populations (Lohman & Hagen, 2002). The

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 4

purpose of the study was to explore whether the batteries provide similar predictive validity for

Hispanic ELL and non-ELL students.

Considerations in the use of multi-battery ability tests

Multi-battery ability tests assess cognitive ability by sampling multiple content domains.

Such tests are useful for teachers because they provide information about the range of a student’s

talents. Both individually and group-administered tests can provide multiple test scores for

students that contrast their performance in various domains (Sattler, 2008). Teachers can use this

information to target both student weaknesses—for extra practice and instructional support—and

student strengths—for enrichment opportunities that make school more enjoyable and

challenging.

A common misconception is that ability tests should enable users to measure innate

ability that is uninfluenced by educational opportunity. In fact, ability tests measure developed

and well-practiced reasoning skills (Anastasi, 1980). Rather than providing qualitatively distinct

information from achievement tests, ability and achievement tests differ in the degree to which

they tap into recent and specific learning accomplishments versus general and long-term

acquisitions (Anastasi, 1980; Lohman, 2001). Thus, ability tests offer a different broader

perspective on developed knowledge and skills that can be contrasted with more narrowly

focused achievement test performance and can be useful to teachers who want to adapt the pace

and content of their instruction to students who differ widely in the speed and readiness with

which they learn (Lohman & Hagen, 2001b).

The misconception about ability tests measuring innate capabilities leads many to

conclude that mean differences on verbal and quantitative ability tests by definition reflect bias

in the assessments. Large mean differences have been documented between ELL and non-ELL

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 5

students on a range of ability tests (Lakin & Lohman, in press; Palmer, Olivarez, Willson, &

Fordyce, 1989; Patterson, Mattern, & Kobrin, 2007). Thus, a number of researchers and

educators have called for the exclusive use of nonverbal tests in assessing the cognitive strengths

of culturally and linguistically diverse students (Lewis, 2001; Naglieri & Ford, 2003).

However, the existence of mean differences is not in itself evidence of bias (Jensen,

1980; Reynolds, 1982). When test users interpret scores appropriately by controlling for

opportunity to learn, mean differences do not negate the utility of the tests for differentiating

instruction. Furthermore, the exclusive use of nonverbal tests to predict achievement and make

academic placement decisions has been widely criticized by many researchers, because those

tests clearly under-represent the domain of interest and lack the obvious links that verbal and

quantitative reasoning have to learning and school success (Braden, 2000; Figueroa, 1989; Lakin

& Lohman, in press; Ortiz & Dynda, 2005). They also do not provide teachers with a clear path

for differentiating instruction.

Use of nonverbal ability tests with ELL students

To support the contention that nonverbal tests are more valid and useful for

differentiating instruction for ELL students, researchers must first show conclusive evidence of

bias for tests measuring verbal and quantitative reasoning, and, second, that nonverbal tests

provide an effective alternative. The evidence of bias for verbal and quantitative tests is not

conclusive because most proponents of nonverbal assessments rely solely on mean differences as

evidence of bias. Evidence of differential prediction for tests measuring verbal and quantitative

ability would provide more conclusive evidence, but has not been found by previous research.

For example, despite finding large mean differences, Palmer et al. (1989) found no differences in

the regression slopes between language proficiency groups when predicting achievement scores

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 6

from the Kaufman Assessment Battery for Children (K-ABC; Kaufman & Kaufman, 1983) using

ability scores from the Wechsler Intelligence Scale for Children-Revised (WISC-R; Wechsler,

1974).

In contrast, there is strong evidence that nonverbal tests do not provide an effective

alternative to verbal and quantitative tests because these tests often yield low validities for

predicting reading and math achievement (both critical domains of academic development). For

a sample of ELL students, Borghese (2009) reported correlations between the Universal

Nonverbal Intelligence Test (UNIT; Bracken & McCallum, 1998) and achievement of r = .28 for

reading achievement. Prediction of math achievement was stronger at r = .51. Jones (2006) found

correlations below .10 between UNIT scores in first grade and reading achievement on the Texas

Assessment of Knowledge and Skills (TAKS) in third grade for both ELL and non-ELL students.

Even in non-ELL samples, the correlations between nonverbal tests and achievement usually

range between .3 and .6 (e.g., Balboni, Naglieri, & Cubelli, 2010; Naglieri & Ronning, 2000).

These values are far below what is typically observed for CogAT verbal and quantitative

batteries with non-ELL samples, which predict their relevant domain of achievement (reading

and mathematics, respectively) with correlations of .75-.80 (Lakin & Lohman, in press). Lakin

and Lohman (in press) showed that differences in correlations of this magnitude (.5 vs. .8) have

practical importance in the identification of academically talented students.

The purpose of this study was to provide additional data on the predictive validity of

verbal, quantitative, and nonverbal test batteries for culturally and linguistically diverse students.

The research questions that guided this study were:

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 7

1. Are there substantial mean differences in verbal, quantitative, and nonverbal ability

scores between students who differ in racial/ethnic background and/or language

proficiency?

2. Are the same achievement and ability measures useful as predictors of future

achievement for ELL and non-ELL students?

3. Does the nonverbal battery play a more important role in predicting later achievement for

ELL students?

Methods

Two schools in Arizona participated in the Project Bright Horizons study developed by a

team of researchers and school administrators (see Lohman, Korb, & Lakin, 2008). The data

used in this study came from students in the sample who were in 3rd to 5th grade in the first year

of the study and reported either White or Hispanic ethnicity. The sample consisted of 124

Hispanic ELL students, 161 Hispanic non-ELL students, and 72 White non-ELL students.

Ethnicity was based on district data, which relies on U.S. Census classifications. Other ethnic

groups of non-ELL students (Asian, American Indian, and African American) included fewer

than 30 students each and were omitted from the analyses. Table 1 provides additional

demographic information on the sample.

[Table 1]

ELL status in this study relied on district classifications reported by the schools.

StudentsThese classifications were based partially on student scores on the Stanford English

Language Proficiency Test (SELP; Harcourt Educational Measurement, 2003). For this study,

students were classified based on their ELL status in year 1. The range of English proficiency

varied considerably within the ELL group: 13% were first-year ELL students (i.e., low

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 8

proficiency) while another 33% were reclassified by the second year of the study (likely high

proficiency).

In the first year of the study, students completed both ability and achievement tests in the

late spring. In the second year, only achievement tests were administered. The achievement tests

were administered as part of the schools’ annual accountability testing. Only students with

complete teststest records were used. The variables with the greatest proportion of missing data

were the year-two achievement scores. Unlike many studies, in this case, White students had

missing scores more often than ELL and Hispanic students., perhaps due to differences in school

mobility.

Measures

Cognitive Abilities Test (Form 6)

The CogAT consists of three separate batteries measuring verbal, quantitative, and

nonverbal reasoning (Lohman & Hagen, 2001a). In this study, students received the appropriate

level of the CogAT given their grade level (levels A to C, respectively). The verbal (65 items),

quantitative (60 items), and nonverbal batteries (65 items) each consist of three subtests that use

different item formats. Universal scale scores on a vertical scale spanning grades K through 12

were used in this study. A previous research study on the same dataset indicated that the factor

structure of this test was consistent for ELL and non-ELL students, though the variance of the

verbal factor was attenuated for ELL students (Author, 2010). Another study found that the

reliability of the verbal battery was adequate (Φ = .82) for ELL students, though lower than that

of non-ELL students (Φ = .96). See Lakin and Lai (in press) for a detailed exploration of the

reliability of the CogAT for ELL students.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 9

All tests on the CogAT begin with directions that are read aloud by the teacher. In this

study, teachers read directions in Spanish as well as English when appropriate. All three subtests

of the verbal battery and one subtest of the quantitative battery require the examinee to complete

some reading in English. On the verbal battery, students must read either individual words

(verbal classification and verbal analogies) or short sentences (sentence completion). On the

quantitative relations subtest, students read individual words (e.g., foot, gallon). The other

quantitative tests and all of the nonverbal battery do not require reading.

Achievement test

The Arizona Instrument to Measure Standards Dual Purpose Assessment (AIMS DPA)

was designed to yield normative and criterion-referenced information about student achievement.

Thirty to fifty percent of items on the AIMS DPA come from the TerraNova achievement tests

(CTB/McGraw-Hill, 2002). The remaining items were developed by educators specifically for

the AIMS DPA to better align the test with state educational goals (Arizona Department of

Education, 2006). Reading/language arts and mathematics subtests of the AIMS DPA each

contained approximately 80 items. Separate scale scores are reported for mathematics and

reading.

Procedure

In separate models with year 2 reading and math achievement as the criterion, regression

analyses explored the incremental prediction of the ability tests when year-one achievement

scores are available. The order of entry for predictor variables was based on prior research,

which indicates that the best predictor of future achievement is prior achievement followed by

the ability to reason in the domain and then by general reasoning skills (Lohman, 2009). Thus,

year-one achievement scores entered first, followed by domain-relevant, year-one ability tests

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 10

scores (verbal or quantitative), and finally nonverbal ability scores. Variables for ethnicity (1 =

Hispanic, 0 = non-Hispanic) and ELL status (1 = ELL; 0 = non-ELL) were then entered as a

block. Finally, interaction terms of the ability scores with ELL status and Hispanic background

were entered as a block. To explore the utility of nonverbal tests for ELL students, a separate

series of regressions compared variance accounted for with different combinations of predictors.

Design

Interaction terms and regression residuals form the basis for analyzing differences in the

magnitude of the relationships between the predictor tests and the achievement criterion tests. In

the predictive bias framework originally outlined by Cleary (1968; see also Cleary, Humphreys,

Kendrick, & Wesman, 1975), two levels of differential prediction were defined. One type of

differential prediction was defined by an interaction of group membership with predictors in the

regression analysis and reflected bias in the slope of the regression lines. Differences in

regression slopes indicate that the predictors being used are less relevant to the criterion for one

group versus another. In this study, an interaction of the ability test scores with ethnicity or ELL

status might indicate that the tests are less predictive of achievement for those students.

Cleary (1968) defined another type of differential prediction as persistent under- or over-

prediction for one group. This form of differential prediction is detected by analyzing regression

residuals for evidence that one group’s observed criterion scores are significantly higher or lower

than the model predicts (Reynolds, 1982). In the absence of an interaction of group membership

with predictor variables, differences in residuals indicate that the regression slopes for two

groups are nearly parallel, but do not coincide. For this type of differential prediction with

parallel regression lines, Cleary et al. (1975) explained, “the test can be used within each group

with the same accuracy of prediction” (p. 27; see also Reynolds, 1982).

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 11

Results

Descriptive statistics are reported in Table 2. Mean differences were large between the

ELL and non-ELL groups (-1.0 to -2.1SD). The differences were largest for verbal reasoning and

math achievement and somewhat smaller for quantitative reasoning and year-one reading

achievement. Nonverbal reasoning scores and reading achievement in year two showed the

smallest differences, though they were still substantial. Mean differences between the two non-

ELL groups were much smaller. Only verbal reasoning and year-one reading showed moderate

effect sizes (-0.6SD and -0.4SD, respectively).

[Table 2]

Variance is another important characteristic of score distributions because restricted

range can attenuate correlations with other variables. In Table 2, the ratios of variance are

reported for each test. As an example, on the quantitative battery, the variance ratio of 1.7 for

ELL and non-ELL Hispanic groups indicated that the variance of non-ELL Hispanic students

was 70% greater than the variance for ELL Hispanic students. Across the board, non-ELL

students were much more variable than ELL students were and White students were more

variable than Hispanic students were. Despite this finding, there was no apparent floor effect in

the histograms of test scores. The data for all three groups also satisfied Bracken’s (2007)

heuristic for floor effects in that the range of scores extended above and below the mean by

2SDs.

Patterns of Correlations

Hispanic ELL students had substantially lower correlations between tests, which may be

related to their restricted variability in scores. See Table 3. Despite this, the pattern of

correlations between achievement and ability tests were consistent with previous research. For

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 12

all three groups of students, math achievement correlated most strongly with quantitative

reasoning and reading achievement correlated most strongly with verbal reasoning. Even for

ELL students, nonverbal ability scores had significantly lower correlations1 with year-one

achievement than verbal had with reading and quantitative had with math. Furthermore, the

relationship between the ability scores and achievement remained strong through year two.

[Table 3]

Multiple Regressions Including Year-One Achievement

The strong correlations between ability scores and year-two achievement highlight their

relevance to future academic success. However, ability tests can also provide incremental

prediction beyond the data that schools already have—namely, previous achievement test scores.

Thus, a series of regression models tested the incremental prediction of year-two achievement

from both ability scores and year-one achievement.

Math Achievement

Year-one math achievement accounted for 64% of the variance in year-two math

achievement. See Table 4. Quantitative reasoning added an additional 6% to the variance

accounted for, and nonverbal added an additional 1%. When ELL status and ethnicity (White vs.

Hispanic) entered the model, they did not account for an appreciable amount of variance.

Interaction variables between ethnicity and the ability scores also did not contribute to

prediction, indicating that the regression slope was the same for all three groups.

Reading Achievement

Year-one reading achievement accounted for 70% of year-two achievement. See Table 5.

Verbal reasoning added an additional 1% to the variance explained, but nonverbal reasoning

1 Using a Fisher r-to-z transformation (p<.05). See Hays (1994).

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 13

failed to improve prediction any further. ELL status and ethnicity accounted for a significant but

negligible amount of variance (less than 1%). Coefficients in the final model indicated that

neither coefficient was significant. Inspection of the coefficients before the interactions were

added indicated that ELL status had a slight negative effect on achievement (b = -.10).

Interactions between ELL status and test scores failed to add significantly to the prediction of

year-two reading achievement.

[Table 5]

Residuals for overall regression

To test for differential prediction in the absence of group interactions, regression

residuals across groups were analyzed as a one-way ANOVA to detect consistent under- or over-

prediction for one group (Reynolds, 1982). The same regression analyses for reading and

mathematics achievement were repeated and residuals recorded without the effects for ELL and

ethnicity status included. Means and SDs are reported in Table 6. For math achievement, there

was no main effect for residuals, indicating that the three groups of students did not vary

significantly in the fit of the regression model. For reading achievement, however, there was a

significant effect (F [2, 373] = 3.80, p < .025). Follow-up tests using Tukey’s comparisons

indicated that there was significant, though slight, under-prediction of reading achievement for

Hispanic, non-ELL students in year two of around 6 points on the reading achievement scale (an

effect size of about .17). On average, both White non-ELL and Hispanic ELL students showed

over-prediction for reading achievement.

[Table 6]

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 14

Alternative Predictive Models for ELL Students

Given the lower correlations between ability and achievement for ELL students, and the

arguments by some researchers that nonverbal tests provide more valid information about the

abilities of ELL students, we explored combinations of ability scores were explored to see if they

improved predictive validity for ELL students. See Table 7. For math achievement, quantitative

reasoning added the most predictive variance: 42% when entered first and adding 14%

incrementally when nonverbal was added first. For reading achievement, verbal reasoning

adding the most predictive variance: 28% when entered first and adding 13% incrementally

when nonverbal was added first. When entered first, nonverbal ability accounted for just 30% of

variance for math achievement and 17% of variance for reading achievement. When entered

second, nonverbal ability accounted for just 2% of variance in either domain.

[Table 7]

Discussion

The research questions addressed (1) the presence of mean differences, (2) the pattern of

correlations between ability and achievement tests across groups, and (3) the interaction of

nonverbal tests with ELL and Hispanic group membership. Large mean differences were found

between the observed test scores for ELL and non-ELL students, while small-to-negligible

differences were found between Hispanic and White non-ELL students. For math achievement,

these differences translated into a small, but significant, positive main effect for Hispanic

students in the regression analysis indicating that their year 2 achievement scores were higher

than those for White and ELL students with similar achievement and ability scores in year 1. For

reading achievement, the tests indicated a small negative main effect of ELL status indicating

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 15

that ELL students’ scores were lower than for the other two groups when controlling for prior

achievement and ability. Neither main effect appeared practically significant.

An interaction between ELL or Hispanic variables and ability test scores would indicate

differential prediction between the three groups. However, none of the interaction terms entered

in the final step of the regression analyses. were statistically significant. This finding indicates

that the same test variables are similarly important to the prediction of later achievement for all

three groups of students. This conclusion is further supported by the table of observed test

correlations, which showed that the same ability tests were most important for predicting

achievement in all three groups. One contradictory finding came from the analysis of residuals,

which revealed that Hispanic non-ELL students’ reading achievement was somewhat under-

predicted by the common regression line.

Separate analyses explored whether nonverbal ability scores were particularly important

in predicting achievement for ELL students. For math achievement, nonverbal tests were clearly

inferior to quantitative tests in predicting year-two achievement. For reading achievement, verbal

ability was clearly the best predictor for ELL students. NonverbalIn contrast to the recommended

use of nonverbal tests for ELL students, nonverbal ability scores did not appear to provide

similar predictive validity compared to quantitative or verbal ability and did not add much

incremental prediction beyond those scores even for ELL students. Thus, although nonverbal

ability tests can play an important role as part of an assessment battery, their relationship to

current and future achievement is not as strong as for verbal and quantitative ability tests.

Therefore, for teachers seeking guidance on how best to adapt instruction to the cognitive

strengths of their ELL students, this study provides evidence that, overall, nonverbal tests do not

provide superior information about the cognitive strengths and academic promise of ELL

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 16

students. As with non-ELL students, the most relevant information comes from verbal reasoning

for reading domains and quantitative reasoning for mathematics domains.

It would be reasonable to expect these results to generalize to other nonverbal ability tests

that are primarily unidimensional. The CogAT nonverbal battery consists of three item formats:

figure analogies, figure classification, and paper folding. The figure analogies format is related to

the item formats used by the Naglieri Nonverbal Ability Test and Raven’s Progressive Matrices

and shows strong convergent validity with those tests (Lohman, Korb, & Lakin, 2008). On this

basis, it is reasonable to assume that these findings would generalize to those tests.

Implications and Directions for Future Research

The consistency of the regression slope between ELL and non-ELL students indicated

that the tests provide similar information about the future achievement of all three groups of

students. For educators seeking to differentiate instruction, verbal and quantitative reasoning

tests show equally strong predictive accuracy for reading and mathematics achievement,

respectively. In this study, nonverbal measures did not provide an effective alternative and were

less useful for making decisions about which students are most likely to succeed in traditional

academic domains relative to other students with similar linguistic and cultural backgrounds.

The main effects of ELL status for reading achievement and Hispanic background for

math achievement in addition to the slight underprediction of reading achievement for non-ELL

Hispanic students indicate that the use of those scores requires careful interpretation. As Cleary

et al. (1975) explained, despite the presence of main effects in the regression (or mean

differences in observed scores), “when the [regression] lines are parallel, the test can be used

within each group with the same accuracy of prediction” (p. 27; see also Reynolds, 1982).

Recently, there have been innovations in making appropriate inferences about ability when using

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 17

tests that are affected by opportunity to learn or access to the curriculum. This is discussed in the

next section.

Normative inferences based on opportunity to learn

The common misconception that ability tests measure innate intelligence often leads to

the (mistaken) conclusion that mean differences must either be interpreted as immutable group

differences in intelligence or test bias (Jensen, 1980; Lohman, 2006a). In fact, ability tests

measure developed capabilities that are impacted by educational experience and opportunity to

learn (Anastasi, 1980; Martinez, 2000). This does not negate their utility for making inferences

about students’ intellectual capacity as long as opportunity to learn is taken into account. In fact,

comparing the performance of ELL students to appropriate norm groups (i.e., those with similar

opportunities to learn) is critical for making valid inferences about the cognitive abilities of ELL

students (Author, 2010). Comparing ELL students to national norms based on predominantly

non-ELL students will not provide appropriate inferences about the skills of ELL students.

Two strategies have recently been suggested to account for group differences that likely

reflect different degrees of opportunity to learn. Lohman (2006b, 2009) proposed the use of local

subgroup norms to provide a rudimentary adjustment for opportunity to learn when identifying

students for gifted programs and talent development. Weiss, Saklofske, Prifitera, and Holdnack

(2006) used national subgroup norms based on proxies for acculturation, including years in U.S.

schools, to provide multiple perspectives on student scores for the WISC-IV. Contextualizing

student scores with multiple norm comparisons can identify students from minority cultural or

linguistic backgrounds who excel relative to their educational opportunities even when they may

not compare favorably to the national norms (Callahan, 2009; Gándara, 2005; Weiss et al.,

2006). Lohman (2006b, 2009) provides practical guidance as to how local norms can be

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 18

developed and used by teachers for the identification of students for gifted and talented

programs. Additional research is needed o expand the use of local norms to instructional

differentiation as well as to explore the practicality and political feasibility of these solutions.

Instructional differentiation

Although in this study a multi-battery test has been found to provide useful information

about the cognitive abilities of ELL and non-ELL students, it does not follow that all students

identified with, for example, verbal strengths require the same instructional interventions

(Callahan, 2009). Recommendations for non-ELL students are already available (Lohman &

Hagen, 2001b). Therefore, additional research is needed to explore appropriate instructional

differentiation for ELL students.2 In fact, a wide range of research is needed to guide teachers’

use of assessment data to make appropriate educational decisions for ELL students (Young,

2009).

Limitations

Although no evidence of significant differential prediction was found in this study, there

may be other undetected sources of bias. For instance, if there is bias in both the predictor and

criterion, it will not affect the correlations (Cronbach, 1970). Given the central role that

achievement tests play in the modern educational system, bias in the criteria of this study

(reading and math achievement) deserve critical analysis that is beyond the scope of this paper.

Another important limitation is the unusual ethnic makeup of this study. Less than one-

third of the sample was White, which makes their relative weight in determining the shape of the

common regression line smaller than it would be in schools with a majority of White students.

2 It should be noted that efforts to capitalize on the apparent nonverbal strengths of ELL students (sometimes misconstrued as spatial strengths) neglect the impact of opportunity to learn on ability scores. Many ELL students may in fact have relative strengths in verbal and quantitative reasoning that are obscured by use of national norms.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 19

However, the regression lines for all three groups were nearly identical. To the extent that the

White students in this sample are similar to the population of White students in the U.S. as a

whole, there is no reason to expect that the common regression line would be much different if

White students made up a larger proportion of the sample.

Finally, it should be noted that the choice of assessment should depend on the type of

instructional differentiation being considered (Callahan, 2009). This study focused on traditional

academic domains and thus the CogAT was appropriate for predicting success. However, if a

talent development program were targeting skills beyond general reasoning and verbal and

quantitative domains, other tests might be more appropriate. Multiple indicators of student

aptitude are always critical to making decisions about gifted and talented program placement.

Conclusion

This study confirmed that within ELL groups and Hispanic and White ethnic groups,

multi-battery ability tests provide useful and valid information about the future performance of

students. The exclusive use of nonverbal tests does not appear warranted when assessing ELL

students with some level of English proficiency and when interpreting scores using appropriate

normative comparisons. In fact, assessing the verbal reasoning skills of ELL students may be

particularly helpful for teachers. Verbal reasoning skills, which include the ability to make sense

of incomplete verbal information, is critical for the academic success of ELL students who must

constantly leverage these skills to make sense of teachers, other students, and reading materials.

Knowledge about which students struggle to make connections within verbal information may

help teachers target those students for additional linguistic support. Although verbal reasoning

scores for ELL students have limitations in their psychometric qualities relative to scores for

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 20

non-ELL students, they are still very useful in this regard. Efforts to improve these measures and

promote appropriate uses should continue.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 21

References

Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in

Education, 14, 219-234.

Anastasi, A. (1980). Abilities and the measurement of achievement. New Directions for Testing

and Measurement, 5, 1-10.

Arizona Department of Education. (2006, November). AIMS student guides. Retrieved

September 4, 2007, from http://www.ade.state.az.us/standards/AIMS/AIMSSTGuides/

Author (2010). Multidimensional ability tests and culturally and linguistically diverse students:

Evidence of measurement invariance. Manuscript submitted for publication.

Balboni, G., Naglieri, J.A., & Cubelli, R. (2010). Concurrent and predictive validity of the Raven

Progressive Matrices and the Naglieri Nonverbal Ability Test. Journal of

Psychoeducational Assessment, 28, 222–235. doi: 10.1177/0734282909343763

Borghese, P. (2009). An analysis of predictive, convergent, and discriminant validity of the

Universal Nonverbal Intelligence Test with limited English proficient Mexican-American

elementary students (Doctoral dissertation). Retrieved from ProQuest. (AAT 3351828)

Bracken, B. A. (2007). Creating the optimal preschool testing situation. In B. A. Bracken, & R. J.

Nagle (Eds.), Psychoeducational assessment of preschool children (4th ed., pp. 137-154).

Mahwah, NJ: Lawrence Erlbaum Associates.

Bracken, B.A., & McCallum, R.S. (1998). Universal Nonverbal Intelligence Test examiner’s

manual. Itasca, IL: Riverside.

Braden, J.P. (2000). Editor’s introduction: Perspectives on the nonverbal assessment of

intelligence. Journal of Psychoeducational Assessment, 18, 204-210.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 22

Callahan, C.M. (2009). Myth 3: A family of identification myths: Your sample must be the same

as the population. There is a "silver bullet" in identification. There must be "winners" and

"losers" in identification and programming. Gifted Child Quarterly, 53, 239-241.

Cleary, T.A. (1968). Test bias: Prediction of grades of Negro and White students in integrated

colleges. Journal of Educational Measurement, 5, 115-124.

Cleary, T.A., Humphreys, L.G., Kendrick, S.A., & Wesman, A. (1975). Educational uses of tests

with disadvantaged students. American Psychologist, 30, 15-41.

Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.

CTB/McGraw-Hill. (2002). TerraNova, the Second Edition. Monterey, CA: Author.

Figueroa, R.A. (1989). Psychological testing of linguistic-minority students: Knowledge gaps

and regulations. Exceptional Children, 56, 145-152.

Gándara, P. (2005). Fragile futures: Risk and vulnerability among Latino high achievers.

Princeton, NJ: Educational Testing Service.

Harcourt Educational Assessment. (2003). Stanford English Language Proficiency Test. San

Antonio, TX: Author.

Jensen, A. R. (1980). Bias in mental testing. New York, NY: The Free Press.

Jones, C.K. (2006). The relationship of language proficiency, general intelligence, and reading

achievement with a sample of low performing, limited English proficient students

(Doctoral dissertation). Retrieved from ProQuest. (AAT 3296415)

Kaufman, A.S., & Kaufman, N.L. (2004). Kaufman Assessment Battery for Children, Second

Edition (KABC-II. Circle Pines, MN: American Guidance Service.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 23

Lakin, J.M., & Lai, E.R. (in press). Multi-group generalizability analysis of verbal, quantitative,

and nonverbal ability tests for culturally and linguistically diverse students. Educational

and Psychological Measurement.

Lakin, J.M., & Lohman, D.F. (in press). The predictive accuracy of verbal, quantitative, and

nonverbal reasoning tests: Consequences for talent identification and program diversity.

Journal for the Education of the Gifted.

Lewis, J. D. (2001). Language isn't needed: Nonverbal assessments and gifted learners. Growing

Partnerships for Rural Special Education, San Diego, CA.

Lohman, D. F. (2001, November). Aptitude for college: The importance of reasoning tests for

minority admissions. Talk given at Rethinking the SAT: The future of standardized testing

in university admissions. University of California at Santa Barbara. Retrieved from

http://faculty.education.uiowa.edu/dlohman/

Lohman, D. F. (2006a). Beliefs about differences between ability and accomplishment: From

folk theories to cognitive science. Roeper Review, 29, 32-40.

Lohman, D. F. (2006b). Practical advice on using the Cognitive Abilities Test as part of a talent

identification system. Retrieved from http://faculty.education.uiowa.edu/dlohman/

Lohman, D.F. (2009). Identifying academically talented students: Some general principles, two

specific procedures. In L. Shavinina (Ed.), International handbook on giftedness, (pp.

971-997). New York, NY: Springer.

Lohman, D. F., & Hagen, E. P. (2001a). Cognitive Abilities Test (Form 6). Itasca, IL: Riverside.

Lohman, D. F. & Hagen, E. P. (2001b). Cognitive Abilities Test (Form 6): Interpretive guide for

teachers and counselors. Itasca, IL: Riverside.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 24

Lohman, D. F., & Hagen, E. P. (2002). Cognitive Abilities Test (Form 6): Research handbook.

Itasca, IL: Riverside.

Lohman, D. F., Korb, K. A., & Lakin, J. M. (2008). Identifying academically gifted English-

language learners using nonverbal tests: A comparison of the Raven, NNAT, and CogAT.

Gifted Child Quarterly, 52(4), 275-296.

Martinez, M.E. (2000).Education as the cultivation of intelligence. Mahwah, NJ: Lawrence

Erlbaum Associates.

McCallum, R. S., Bracken, B. A., & Wasserman, J. D. (2001). Essentials of nonverbal

assessment. Hoboken, NJ: Wiley.

Naglieri, J. A. (1996). Naglieri Nonverbal Ability Test (NNAT). San Antonio, TX: Harcourt

Brace Educational Measurement.

Naglieri, J. A., & Ford, D. Y. (2003). Addressing underrepresentation of gifted minority children

using the Naglieri Nonverbal Ability Test (NNAT). Gifted Child Quarterly, 47, 155-160.

Naglieri, J. A., & Ronning, M. E. (2000). The relationship between general ability using the

Naglieri Nonverbal Ability Test (NNAT) and Stanford Achievement Test ( SAT) reading

achievement. Journal of Psychoeducational Assessment, 18, 230–239.

Ortiz, S. O., & Dynda, A.M. (2005). Use of intelligence tests with culturally and linguistically

diverse populations. In D. P. Flanagan, & P. L. Harrison (Eds.), Contemporary

Intellectual Assessment: Theories, Tests, and Issues (2nd ed., pp. 545-556). New York:

Guilford Press.

Palmer, D.J., Olivarez, A., Willson, L.V., & Fordyce, T. (1989). Ethnicity and language

dominance-influence on the prediction of achievement based on intelligence test scores in

nonreferred and referred samples. Learning Disability Quarterly, 12, 261-274.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 25

Patterson, B.F., Mattern, K.D., & Kobrin, J.L. (2007). Validity of the SAT for predicting FYGPA:

2007 SAT validity sample [Statistical Report]. New York, NY: College Board.

Raven, J. C., Court, J. H., & Raven, J. (1996). Manual for Raven’s Progressive Matrices and

Vocabulary Scales: Section3. Standard Progressive Matrices. Oxford, UK: Oxford

Psychologists Press.

Reynolds, C.R. (1982). Methods for detecting construct and predictive bias. In R.A. Berk,

Handbook of methods for detecting test bias (pp. 199-227). Baltimore, MD: Johns

Hopkins University Press.

Sattler, J.M. (2008). Assessment of children: Cognitive foundations (5th edition). La Mesa, CA:

Author.

Wechsler, D. (1974). Wechsler Intelligence Scale for Children-Revised (WISC-R). New York:

Psychological Corporation

Weiss, L. G., Saklofske, D.H., Prifitera, A., & Holdnack, J. A. (2006). WISC-IV Advanced

Clinical Interpretation. Burlington, MA: Elsevier.

Young, J.W. (2009). A Framework for Test Validity Research on Content Assessments Taken by

English Language Learners. Educational Assessment, 14, 122-138.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 26

Table 1

Breakdown of Sample by Demographic Category

Percent

Ethnicity Total

N Female FRL Home lang.

Grade 3

Grade 4

Hispanic ELL 128 45 98 100 45 35 Hispanic non-ELL 161 55 94 16 20 38 White non-ELL 72 44 44 4 19 36

Note. FRL = Eligible for free or reduced lunch price. Home lang. = Primary home language other than English

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 27

Table 2

Descriptive Statistics for ELL and Ethnic Groups

Means (SDs)

Grade

ELL prog. yrs CogAT

AIMS DPA

Verbal Quant.

Non-verbal

Y1 Rdg

Y1 Math

Y2 Rdg

Y2 Math

Hispanic ELL 3.9 3.8 150.7 160.0 173.0 419.3 409.6 450.1 431.5 N=124 (1.5) (0.8) (11.2) (14.0) (17.7) (32.0) (31.1) (39.0) (31.6) Hispanic non-ELL 4.2

177.5 182.0 192.7 468.4 470.5 498.1 485.0

N=161 (0.8)

(16.9) (18.3) (17.8) (39.8) (37.3) (44.5) (34.7)

White non-ELL 4.3

190.4 186.0 197.4 487.6 481.2 503.1 489.9 N=72 (0.8)

(24.5) (21.4) (21.8) (50.1) (49.4) (59.9) (41.6)

Cohen's d effect sizes

Hispanic ELL – Hispanic non-ELL

-1.9 -1.4 -1.1

-1.4 -1.8 -1.1 -1.6

Hispanic ELL - White non-ELL

-2.1 -1.4 -1.2

-1.6 -1.7 -1.0 -1.6

Hispanic - White non-ELL

-0.6 -0.2 -0.2

-0.4 -0.2 -0.1 -0.1

Variance Ratios

Hispanic non-ELL/ Hispanic ELL

2.26 1.70 1.01

1.55 1.44 1.30 1.20

White non-ELL/ Hispanic ELL

4.78 2.34 1.51

2.44 2.52 2.36 1.74

White/Hispanic non-ELL

2.12 1.37 1.49

1.58 1.75 1.81 1.44

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 28

Table 3

Correlations Between Tests in Year 1 and 2 Across ELL and Ethnic Groups

CogAT AIMS DPA

Verbal Quant Nonverb. Y1 Mth Y1 Rdg Y2 Mth

Hispanic

ELL

Quant 0.60

Nonverbal 0.58 0.67

Y1 Mth 0.53 0.70 0.58

N = 128 Y1 Rdg 0.60 0.50 0.39 0.63

Y2 Mth 0.54 0.64 0.55 0.65 0.51

Y2 Rdg 0.54 0.47 0.41 0.52 0.67 0.68

Hispanic non-ELL

Quant 0.62

Nonverbal 0.55 0.65

N = 161 Y1 Mth 0.69 0.76 0.68

Y1 Rdg 0.78 0.59 0.55 0.78

Y2 Mth 0.60 0.78 0.64 0.78 0.65

Y2 Rdg 0.67 0.54 0.47 0.66 0.77 0.69

White

non-ELL

Quant 0.83

Nonverbal 0.80 0.79

N = 72 Y1 Mth 0.85 0.90 0.76

Y1 Rdg 0.83 0.75 0.69 0.84

Y2 Mth 0.74 0.80 0.71 0.79 0.68

Y2 Rdg 0.74 0.66 0.62 0.69 0.75 0.77

Note. At these sample sizes, for correlations in the .40-.80 range, differences of .15 or greater are significant at the .05 level.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 29

Table 4

Multiple Regression for Math Achievement in Year 2

Comparison of models

Coefficients for final model

Model R R2 Change Statistics

95.0% CICIf

R2 Δ F Δ dfDf Final model Beta Lower Upper 1 a 0.800 0.640 0.640 662.64** 1, 373

Y1 Math 0.39 0.29 0.54

2 b 0.836 0.698 0.058 72.09** 1, 372

Quant 0.41 0.47 1.56 3 c 0.840 0.705 0.007 8.32** 1, 371

Nonverbal 0.16 -0.08 0.86

4 d 0.842 0.709 0.004 2.74 2, 369

ELL 0.13 -61.37 89.24 5 e 0.843 0.710 0.001 0.26 4, 365

Hispanic 0.26 -42.31 109.01

Interactions

ELL x Q -0.22 -0.70 0.41

ELL x N 0.09 -0.43 0.54

Eth x Q -0.03 -0.61 0.57

Eth x N -0.16 -0.68 0.46

Notes. a Model with Y1 AIMS Math. b Add CogAT Quantitative. c Add CogAT Nonverbal. d Add Hispanic and ELL variables. e Add interaction terms. f The final model included Y1 math achievement, CogAT Quantitative and Nonverbal scores, Hispanic and ELL variables, and interaction terms between the CogAT scores and Hispanic/ELL categories. * p < .05. ** p < .01.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 30

Table 5

Multiple Regression for Reading Achievement in Year 2

Comparison of models

Coefficients for final modelmodelf

Model R R2 Change Statistics

95.0% CI

R2 Δ F Δ df Final model Beta Lower Upper 1 a 0.839 0.704 0.704 889.24** 1, 374

Y1 Reading 0.59 0.43 0.63

2 b 0.847 0.718 0.014 18.89** 1, 373

Verbal 0.15 -0.09 0.68 3 c 0.849 0.721 0.002 3.24 1, 372

Nonverbal 0.05 -0.28 0.47

4 d 0.853 0.728 0.007 4.68** 2, 370

ELL -0.32 -99.40 38.98 5 e 0.854 0.729 0.002 0.53 4, 366

Hispanic -0.16 -79.67 44.74

Interactions

ELL x V -0.03 -0.53 0.49

ELL x N 0.26 -0.23 0.51

Eth x V 0.22 -0.29 0.57

Eth x N -0.02 -0.46 0.44

Notes. a Model with Y1 AIMS Reading. b Add CogAT Verbal. c Add CogAT Nonverbal. d Add Hispanic and ELL variables. e Add interaction terms. f The final model included Y1 reading achievement, CogAT Verbal and Nonverbal scores, Hispanic and ELL variables, and interaction terms between the CogAT scores and Hispanic/ELL categories. * p < .05. ** p < .01.

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 31

Table 6

Descriptives for mean residuals

Mathematics Reading

M SD M SD

Hispanic ELL 0.97 27.83 -3.08 22.43 Hispanic non-ELL 2.16 25.33 3.71 22.4 White non-ELL -6.31 33.23 - 2.56 26.5

Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 32

Table 7

Multiple Regression of Achievement for ELL students

Predicting Math Achievement Order of Entry R R2 Δ Order of Entry R R2 Δ Quantitative 0.65 0.42 Nonverbal 0.56 0.31 Nonverbal 0.67 0.03 Quantitative 0.67 0.14 Verbal 0.68 0.02 Verbal 0.68 0.02

Predicting Reading Achievement Verbal 0.53 0.28 Nonverbal 0.42 0.17 Nonverbal 0.55 0.02 Verbal 0.55 0.13 Quantitative 0.57 0.02 Quantitative 0.57 0.02