www.jstor.org stable pdfplus 27896732

8/9/2019 Www.jstor.org Stable Pdfplus 27896732

1/13

Teachers of English to Speakers of Other Languages, Inc. TESOL)

Do Language Proficiency Test Scores Differ by Gender?Author(s): CINDY L. JAMESSource: TESOL Quarterly, Vol. 44, No. 2 (June 2010), pp. 387-398Published by: Teachers of English to Speakers of Other Languages, Inc. (TESOL)Stable URL: http://www.jstor.org/stable/27896732.

Accessed: 07/12/2014 08:06

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

of scholarship. For more information about JSTOR, please contact [email protected].

.

Teachers of English to Speakers of Other Languages, Inc. (TESOL)is collaborating with JSTOR to digitize,

preserve and extend access to TESOL Quarterly.

http://www.jstor.org

This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions


2/13

Do

Language Proficiency

Test

Scores

Differ

by

Gender?

CINDY L. JAMES

Thompson

Rivers

University

Kamloops,

British

Columbia,

Canada

doi:

10.5054/U?.2010.222215

Most

postsecondary

educational

institutions

employ

some

type

of

language proficiency

assessment

for

international

applicants

to

assess

their language skills (Alderson, Krahnke, & Stansfield, 1987; Chalhoub

Deville &

Turner,

2000;

Kahn,

Butler,

Weigle,

&

Sato,

1994;

Paltridge,

1992;

Person, 2002;

Rees,

1999;

Roemer,

2002;

Seaman &

Hayward,

2000).

The

performance

of

these

applicants

is of

interest

to

adminis

trators,

faculty,

staff,

and researchers

alike,

with

gender

variations

being

one

issue often studied. These

types

of

studies tend

to

compare

the

performance

of

females with males

in

terms

of

mean

test

scores

by

subtest

and/or

total

test

score,

and in

some

cases

by

specific

test

questions

or

types

of

questions.

The

score

differences

are

often

reported

as raw score

differences,

but

to

enhance

comparability

between

tests,

the

differences

also

can

be

expressed

in

a

standardized

form such

as a

standard

mean

difference

or a

percent

difference.

The standard

mean

difference,

denoted

by

D,

is

considered

by

Willingham

and Cole

(1997)

in

their

meta-analysis

of

gender

and

assessment

as one

of the

most

common

measurements.

It

is

calculated

by

subtracting

the male

mean score

from

the female

mean

score

and

dividing

the difference

by

the

average

standard

deviation

(SD):

female

mean

?

male

mean

average SD

Referring

to

the

classifications

by

Cohen

(1988),

values of

D

from

0.20

to

0.49 indicate

small

differences,

from 0.50

to

0.79

are

considered

medium

differences,

and 0.80

or

higher

equate

to

large

differences.

When standard

deviations

are

not

available,

a

percent

difference

can

be

employed

by

calculating

the

difference

between the

male

and female

mean

and

dividing

this

by

the

total

score

of

the

test

and

converting

the

answer

to

a

percentage:

/ _

female

mean

?

male

mean

%

Difference

=---

100%

total

test

score

BRIEF

REPORTS

AND

SUMMARIES

387



3/13

To

date,

many

of

the

gender

studies of

language proficiency

tests

have

revealed

stronger

performances

by

females

compared

with

males,

although these differences ingeneral tend tobe quite small. For instance,

Zeidner

(1987)

explored

the

impact

of

gender

and

other

factors

on

test

scores

for

an

English

language aptitude

test

used for

selection

and

placement

at

Israeli

educational institutions. His

research

revealed

that

the

mean

test

scores

were

significantly

different,

with

females,

in

general,

scoring

higher

than males.

Although

standard

differences

were

not

provided,

it

was

possible

to

calculate

post

hoc with the

standard

mean

difference

being

0.17

and the

percent

difference

being

1.36%.

Females also scored

slightly higher

on

the

academic examination

of the

International

English

Language

Testing

System

(IELTS)

based

on

data

from

2004 (University of Cambridge, 2006). This testing system includes listening,

reading,

writing,

and

speaking

sections,

and

in each

case

the

mean

band

scores

for females

were

greater

than those for

males,

as was

the

overall

test

score.

Once

again,

the standard

differences

were

not

included in

this

IELTS

report,

but

it

was

possible

to

calculate

post

hoc the

percent

difference

with

the data

provided.

For

the

reading

test

the

percent

difference

was

1.44%,

for

the

listening

and

speaking

tests

it

was

1.56%,

for

the

writing

test

the

difference

was

1.78%,

and

for the

total

test

score

it

was

1.67%.

Similarly,

a

data

report

by

the Educational

Testing

Service

(2007)

for the

Test

of

English

as a

Foreign

Language

Internet-Based Test

(TOEFL iBT)

revealed

that,

overall,

females scored

marginally higher

thanmales on tests

conducted between

September

2005

and December

2006.

At

the

individual

test

level,

the

mean

test

scores

for females

were

higher

than

those

for

males

on

three of the four

sections?listening, speaking,

and

writing

test?but the

reverse was

true

for

the

reading

test.

This TOEFL iBT

score

report

provided

means

and

standard

deviations,

so

it

was

possible

to

calculate

standard

differences

post

hoc.

The

standard

mean

and

percentage

differences

between females and males

on

the subtests

were as

follows: 0.04 and

1.0%

for

listening,

0.21

and

3.7%

for

speaking,

0.09

and

1.7%

for

writing,

and

?0.05 and

?1.3%

for

reading.

This

equated

to

an

overall

standard

mean

difference

of

0.04 and

a

percent

difference of

0.8%.

Similar results

were

reported

for

the

Michigan

English

Language

Assessment

Battery

(MELAB),

with the

mean

final

MELAB

score

again

being higher

for

females than

males,

based

on

scores

collected

in

2007

(Johnson

&

Song,

2008).

The

final

MELAB

score

is

calculated

by

averaging

the

scores on

the three

compulsory

sections of

the MELAB?

listening,

reading,

and

writing.

Scores

by

gender

for

these

individual

sections

were

not

provided

in this

report,

so

post

hoc

analysis

was

only

possible

on

the final

mean

scores,

generating

a

standard

mean

difference of 0.12 and a percentage difference of 1.2%.

A

comparison

of

female and

male

mean

test

scores on

the Canadian

Academic

English

Language

(CAEL)

assessment tests

also

showed

females

388

TESOL

QUARTERLY



4/13

scoring slightly

higher

on

all

four

of the

language

subtests?writing,

listening, reading,

and

speaking?based

on

candidate

scores

from

2002

to

2008 (Carleton University, 2009).

Post

hoc

analysis of the

mean

differences

produced

values of 0.18 for

writing,

0.17

for

listening,

0.20

for

reading,

and

0.23

for

speaking.

This

equated

to

percentage

differences

of

3.0%

for

writing,

2.8%

for

listening,

3.2%

for

reading,

and

3.8%

for

speaking.

In

terms

of

the studies

that

have

analyzed

gender

differences

for

specific

questions

or

types

of

questions,

results

vary.

For

example,

a

study

conducted

by

Pae

(2004)

examining

the

effect of

gender

on

an

English

reading

comprehension

test

for Korean learners

revealed that

items with

content

relating

to

mood,

impression,

and

tone

tended

to

be easier

for

females,

whereas

passages

with

logical

inferences

were

easier

for

males.

A

study by

Takala and Kaftandjieva (2000) involving an English vocabulary test?one

of the

subtests of the

Finnish

Foreign Language

Certificate Examination?

also revealed that

some

of

the

test

items

tended

to

favour

females,

whereas

others favoured

males;

however,

they

concluded that the

test

as a

whole

was

considered

to

be

gender

neutral.

Meanwhile,

a

review

of

writing

assessments

used

for

postsecondary

admission

by

Breland,

Bridgeman,

and Fowles

(1999)

found

that males tended

to

do

better

than females

on

the

multiple-choice

subtests

of

the

TOEFL but the females

performed

better

on

the

essay

portion

of the TOEFL.

Another

TOEFL

study

by

Wainer

and Lukhele

(1997)

that

investigated gender

differences

on

reading

comprehension

testlets?sections

with

several

questions

based

on one

reading passage?found

that there

were

essentially

no

gender

differences

(greatest

difference

was


5/13

ranged

in

age

from

16

years

to

54

years,

with

an

average age

of

22.0

years

(SD

=

3.80).

The

gender

distribution

was

fairly

even,

with

57%

males

and

43%

females. The ethnic

background

of

these students

was

diverse,

with

representation

from

47

different

countries.

Testing

Tool and Procedures

The

Accuplacer

ESL

testing

system

is

a

Web-based

program

marketed

by

the

College

Board that

assesses

the

English

skills

of

students who have

learned

English

as a

second

or

alternate

language.

It

consists

of

four

multiple-choice

tests:

Reading

Skills,

which evaluates

the

student's

comprehension

of short

passages; Language Usage,

which

measures

grammar

and

usage;

Sentence

Meaning,

which

assesses

the

under

standing

of word

meanings

in

one- or

two-sentence

contexts;

and

Listening,

which

measures

the

ability

to

listen

to

and

understand

one

or

more

speakers

(College

Examination

Board,

2007).

All

of these

tests

are

offered

in

a

computer-adaptive

test

format where the

pool

of

test

items is

calibrated

for

difficulty

and

content.

When

the

candidate

starts

each

test,

she

or

he is

presented

with

a

question

of

average

difficulty

randomly

selected from

several

starter

questions

of the

same

level of

difficulty.

If

the examinee

answers

the

question correctly,

the

next

question

to

be

administered is chosen from a group of somewhat more difficult

questions,

whereas

an

incorrect

answer

will

cause

the

next

question

to

be

somewhat easier.

Each of

the

four

tests

consists of

20

questions

offered

in this

adaptive

format and scored

on

a

scale

of 0

to

120.

The

Accuplacer

ESL

testing

system

also includes the

WritePlacer

ESL

test,

which

requires

the examinee

to

complete

a

writing

sample

based

on

a

randomly

assigned

prompt.

This

test

is

scored

by

IntelliMetric?an

electronic

scoring

system

that utilizes

a

holistic

scoring

process

designed

to

emulate

human

scorers,

rating

each

essay

on

a

scale of 0

to

6

based

on

its overall effectiveness

(Elliot, 2003).

Detailed

descriptions

of

this

testing

system

and each test can be obtained from the

Accuplacer

Web

site

(College

Examination

Board,

2008).

The

Accuplacer

ESL

testing

system

was

adopted

by

TRU for

several

reasons,

including

its

ability

to

assess

a

wide

range

of

skills

because

of its

computer-adaptive

format and its

ability

to

test

all

relevant skill

areas.

Other

reasons

related

to

the

efficiency

and

cost

of

the

system?namely,

it

was

fairly straightforward

to

administer,

the results

were

available

immediately,

and it

was

relatively

inexpensive.

The

Accuplacer

ESL

tests

were

administered in

computer

labs

by

the

TRU Assessment Centre staff to international students during their first

week

of

orientation

at

TRU,

before classes

began.

For

the

multiple

choice

tests

there

was

no

time

limit; however,

for the

writing

sample

a

390

TESOL

QUARTERLY



6/13

1-hour

time limit

was

imposed.

In

most

cases,

students took

approxi

mately

2

hours

to

complete

all

five

tests;

however,

some

took

only

90 minutes, and others took

a

full 3 hours.

Data Collection

and

Analysis

The

raw

test

scores

for

each of

the

five

Accuplacer

ESL

tests

were

collected from

students

who

wrote

the

assessment

during

the interna

tional

intakes

(fall,

winter,

and

summer)

from

May

2006

to

May

2008,

along

with

basic

demographic

information. To

determine

whether there

were

any

differences

in

mean scores

by

gender

for

the five

Accuplacer

ESL

tests,

an

analysis

of

variance

(ANOVA)

was

employed.

The

standard

mean

difference,

percentage

differences,

and

intercorrelations

also

were

calculated for

comparison

purposes.

A

line

graph

of

the

mean

test

scores

by

gender

was

generated

to

provide

a

pictorial representation

of the

mean score

distributions. The

WritePlacer

ESL

test

scores

were

adjusted

by

a

factor of

20

so

that

they

too

would be

out

of

a

total of

120

and

hence

could

be

plotted

on

the

same

line

graph

with

the

other

tests.

RESULTS

The

descriptive

and mean difference statistics for the test scores

by

gender

are

provided

in

Table 1.

The

distribution

of

the

test

scores

were

normal

or

near

normal

with

slight

skewedness.

For

every

test,

the

females

scored

higher

than the

males;

however,

based

on

the

percentage

difference and the

standard

mean

difference,

these

differences

were

quite

small,

as

displayed

in

Table 1.

TABLE

1

Descriptive

and

Mean

Difference

Statistics for

Accuplacer

ESL

Tests

by

Gender

ACCUPLACER ESL tests

Gender

Statistic

Listen

ing

Language

usage

Reading

skills

Sentence

meaning

WritePlacer

ESL

Female

(

:

211)

Mean

SD

Male

(n

=

283)

Mean

SD

67.7*

16.23

66.1

19.39

87.3*

20.88

81.7*

24.65

82.5*

21.54

77.6*

25.63

80.8

22.42

75.6

26.00

2.87*

2.788

2.60*

1.773

Maximum

120

120

120

120

score

%Difference

1.42%

4.67% 4.08%

4.33%

D

.095 0.245 0.207 0.214

4.50%

0.116

Note,

SD

?

standard

deviation;

D

=

standard

mean

difference.

*Slightly

skewed distributions.

BRIEF

REPORTS AND

SUMMARIES

391



7/13

50

Ustenir?

Ltfi0ng?

?

Sentence

WritePlacer

SL

FIGURE

1.

ACCUPLACER

ESL

mean

scores

by

gender.

A

pictorial

representation

of the

mean

test

scores

for

the

Accuplacer

ESL

tests

by

gender

is

provided

in

Figure

1,

and

the

ANOVA

analysis

is

provided

in

Table

2.

As

shown

in

Figure

1,

both

females

and

males

scored

highest

on

the

Language

Usage

test

and

lowest

on

the

WritePlacer

ESL

test.

The

ANOVA

analysis

revealed

that

the

differences

between

mean

test scores were not significant for the Listening or

WritePlacer

ESL

tests,

but

were

significant

for

the

Language

Usage ,

Reading

Skills,

and

Sentence

Meaning

tests

(Table

2).

The

Pearson

correlations

between

the

five

tests

by gender

produced

large

(Cohen,

1988),

significant

correlations for both

females

and

males

and

all

combinations of

tests

at

the

0.01

level

(2-tailed;

Table

3).

In

all

cases,

the male

subgroup produced

the

larger

correlations,

ranging

from

0.648

to

0.839.

For

the

females,

the

correlations

ranged

from

0.539

to

0.811. For

both

genders,

the

smaller

correlations

were

between

the

Listening

and

WritePlacer ESL

tests,

and the

largest

were

between the

Reading Skills and Sentence

Meaning

tests (Table 3).

TABLE

2

ANOVA

Analysis

of

Accuplacer

ESL

Mean

Test

Scores

by

Gender

Mean

test score

ACCUPLACER ESL

test

Female

Male

Total

F

Significance

Listening

Language Usage

Reading

Skills

Sentence Meaning

WritePlacer ESL

67.75

87.30

82.51

80.82

2.87

211

66.08

81.69

77.59

75.60

2.60

283

66.79

84.08

79.69

77.83

2.71

494

1.025

7.060

5.088

5.478

3.053

0.312

0.008

0.025

0.020

0.081

392

TESOL

QUARTERLY



8/13

TABLE 3

Pearson

Correlations

Between Test Scores

by

Gender

Accuplacer ESL Language Reading Sentence WritePlacer

tests

Listening

usage

skills

meaning

ESL

Listening

1

Language Usage

0.660*

(F)

1

0.728*

(M)

Reading

Skills

0.727*

(F)

0.744*

(F)

1

0.758*

(M)

0.798*

(M)

Sentence

Meaning

0.764*

(F)

0.739*

(F)

0.811*

(F)

1

0.813*

(M)

0.824*

(M)

0.839*

(M)

WritePlacer ESL 0.539*

(F)

0.601*

(F)

0.643*

(F)

0.616*

(F)

1

0.648*(M)

0.715*

(M)

0.733*

(M)

0.699*

(M)

Note. F

=

female;

M

=

male.

Correlation is

significant

at the 0.01 level (2-tailed).

DISCUSSION

In

this

study,

on

average,

females scored

higher

than males

on

the

Accuplacer

ESL

tests,

with

significant

differences in

mean scores

for

three

of

the five

tests?Language Usage,

Reading

Skills,

and Sentence

Meaning.

Although

these differences

were

statistically

significant,

based

on

the

standard

mean

differences

and

percent

differences,

they

were

quite

small. These results

are

akin

to

those

reported

in

most

of

the

aforementioned

studies,

with females

scoring

slightly

higher

overall

on

other

language

tests

such

as

the IELTS

(University

of

Cambridge,

2006),

TOEFL

iBT

(Educational

Testing

Service,

2007),

and MELAB

(Johnson

&

Song,

2008).

In

terms

of skill

areas,

this

study

found

that

the

subtests

measuring

reading

and

vocabulary

skills

produced

the

greatest

difference

in

mean

scores

(0.207

www.jstor.org stable pdfplus 27896732

Documents