www.jstor.org stable pdfplus 27896732
TRANSCRIPT
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
1/13
Teachers of English to Speakers of Other Languages, Inc. TESOL)
Do Language Proficiency Test Scores Differ by Gender?Author(s): CINDY L. JAMESSource: TESOL Quarterly, Vol. 44, No. 2 (June 2010), pp. 387-398Published by: Teachers of English to Speakers of Other Languages, Inc. (TESOL)Stable URL: http://www.jstor.org/stable/27896732.
Accessed: 07/12/2014 08:06
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
Teachers of English to Speakers of Other Languages, Inc. (TESOL)is collaborating with JSTOR to digitize,
preserve and extend access to TESOL Quarterly.
http://www.jstor.org
This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
2/13
Do
Language Proficiency
Test
Scores
Differ
by
Gender?
CINDY L. JAMES
Thompson
Rivers
University
Kamloops,
British
Columbia,
Canada
doi:
10.5054/U?.2010.222215
Most
postsecondary
educational
institutions
employ
some
type
of
language proficiency
assessment
for
international
applicants
to
assess
their language skills (Alderson, Krahnke, & Stansfield, 1987; Chalhoub
Deville &
Turner,
2000;
Kahn,
Butler,
Weigle,
&
Sato,
1994;
Paltridge,
1992;
Person, 2002;
Rees,
1999;
Roemer,
2002;
Seaman &
Hayward,
2000).
The
performance
of
these
applicants
is of
interest
to
adminis
trators,
faculty,
staff,
and researchers
alike,
with
gender
variations
being
one
issue often studied. These
types
of
studies tend
to
compare
the
performance
of
females with males
in
terms
of
mean
test
scores
by
subtest
and/or
total
test
score,
and in
some
cases
by
specific
test
questions
or
types
of
questions.
The
score
differences
are
often
reported
as raw score
differences,
but
to
enhance
comparability
between
tests,
the
differences
also
can
be
expressed
in
a
standardized
form such
as a
standard
mean
difference
or a
percent
difference.
The standard
mean
difference,
denoted
by
D,
is
considered
by
Willingham
and Cole
(1997)
in
their
meta-analysis
of
gender
and
assessment
as one
of the
most
common
measurements.
It
is
calculated
by
subtracting
the male
mean score
from
the female
mean
score
and
dividing
the difference
by
the
average
standard
deviation
(SD):
female
mean
?
male
mean
average SD
Referring
to
the
classifications
by
Cohen
(1988),
values of
D
from
0.20
to
0.49 indicate
small
differences,
from 0.50
to
0.79
are
considered
medium
differences,
and 0.80
or
higher
equate
to
large
differences.
When standard
deviations
are
not
available,
a
percent
difference
can
be
employed
by
calculating
the
difference
between the
male
and female
mean
and
dividing
this
by
the
total
score
of
the
test
and
converting
the
answer
to
a
percentage:
/ _
female
mean
?
male
mean
%
Difference
=---
100%
total
test
score
BRIEF
REPORTS
AND
SUMMARIES
387
This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
3/13
To
date,
many
of
the
gender
studies of
language proficiency
tests
have
revealed
stronger
performances
by
females
compared
with
males,
although these differences ingeneral tend tobe quite small. For instance,
Zeidner
(1987)
explored
the
impact
of
gender
and
other
factors
on
test
scores
for
an
English
language aptitude
test
used for
selection
and
placement
at
Israeli
educational institutions. His
research
revealed
that
the
mean
test
scores
were
significantly
different,
with
females,
in
general,
scoring
higher
than males.
Although
standard
differences
were
not
provided,
it
was
possible
to
calculate
post
hoc with the
standard
mean
difference
being
0.17
and the
percent
difference
being
1.36%.
Females also scored
slightly higher
on
the
academic examination
of the
International
English
Language
Testing
System
(IELTS)
based
on
data
from
2004 (University of Cambridge, 2006). This testing system includes listening,
reading,
writing,
and
speaking
sections,
and
in each
case
the
mean
band
scores
for females
were
greater
than those for
males,
as was
the
overall
test
score.
Once
again,
the standard
differences
were
not
included in
this
IELTS
report,
but
it
was
possible
to
calculate
post
hoc the
percent
difference
with
the data
provided.
For
the
reading
test
the
percent
difference
was
1.44%,
for
the
listening
and
speaking
tests
it
was
1.56%,
for
the
writing
test
the
difference
was
1.78%,
and
for the
total
test
score
it
was
1.67%.
Similarly,
a
data
report
by
the Educational
Testing
Service
(2007)
for the
Test
of
English
as a
Foreign
Language
Internet-Based Test
(TOEFL iBT)
revealed
that,
overall,
females scored
marginally higher
thanmales on tests
conducted between
September
2005
and December
2006.
At
the
individual
test
level,
the
mean
test
scores
for females
were
higher
than
those
for
males
on
three of the four
sections?listening, speaking,
and
writing
test?but the
reverse was
true
for
the
reading
test.
This TOEFL iBT
score
report
provided
means
and
standard
deviations,
so
it
was
possible
to
calculate
standard
differences
post
hoc.
The
standard
mean
and
percentage
differences
between females and males
on
the subtests
were as
follows: 0.04 and
1.0%
for
listening,
0.21
and
3.7%
for
speaking,
0.09
and
1.7%
for
writing,
and
?0.05 and
?1.3%
for
reading.
This
equated
to
an
overall
standard
mean
difference
of
0.04 and
a
percent
difference of
0.8%.
Similar results
were
reported
for
the
Michigan
English
Language
Assessment
Battery
(MELAB),
with the
mean
final
MELAB
score
again
being higher
for
females than
males,
based
on
scores
collected
in
2007
(Johnson
&
Song,
2008).
The
final
MELAB
score
is
calculated
by
averaging
the
scores on
the three
compulsory
sections of
the MELAB?
listening,
reading,
and
writing.
Scores
by
gender
for
these
individual
sections
were
not
provided
in this
report,
so
post
hoc
analysis
was
only
possible
on
the final
mean
scores,
generating
a
standard
mean
difference of 0.12 and a percentage difference of 1.2%.
A
comparison
of
female and
male
mean
test
scores on
the Canadian
Academic
English
Language
(CAEL)
assessment tests
also
showed
females
388
TESOL
QUARTERLY
This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
4/13
scoring slightly
higher
on
all
four
of the
language
subtests?writing,
listening, reading,
and
speaking?based
on
candidate
scores
from
2002
to
2008 (Carleton University, 2009).
Post
hoc
analysis of the
mean
differences
produced
values of 0.18 for
writing,
0.17
for
listening,
0.20
for
reading,
and
0.23
for
speaking.
This
equated
to
percentage
differences
of
3.0%
for
writing,
2.8%
for
listening,
3.2%
for
reading,
and
3.8%
for
speaking.
In
terms
of
the studies
that
have
analyzed
gender
differences
for
specific
questions
or
types
of
questions,
results
vary.
For
example,
a
study
conducted
by
Pae
(2004)
examining
the
effect of
gender
on
an
English
reading
comprehension
test
for Korean learners
revealed that
items with
content
relating
to
mood,
impression,
and
tone
tended
to
be easier
for
females,
whereas
passages
with
logical
inferences
were
easier
for
males.
A
study by
Takala and Kaftandjieva (2000) involving an English vocabulary test?one
of the
subtests of the
Finnish
Foreign Language
Certificate Examination?
also revealed that
some
of
the
test
items
tended
to
favour
females,
whereas
others favoured
males;
however,
they
concluded that the
test
as a
whole
was
considered
to
be
gender
neutral.
Meanwhile,
a
review
of
writing
assessments
used
for
postsecondary
admission
by
Breland,
Bridgeman,
and Fowles
(1999)
found
that males tended
to
do
better
than females
on
the
multiple-choice
subtests
of
the
TOEFL but the females
performed
better
on
the
essay
portion
of the TOEFL.
Another
TOEFL
study
by
Wainer
and Lukhele
(1997)
that
investigated gender
differences
on
reading
comprehension
testlets?sections
with
several
questions
based
on one
reading passage?found
that there
were
essentially
no
gender
differences
(greatest
difference
was
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
5/13
ranged
in
age
from
16
years
to
54
years,
with
an
average age
of
22.0
years
(SD
=
3.80).
The
gender
distribution
was
fairly
even,
with
57%
males
and
43%
females. The ethnic
background
of
these students
was
diverse,
with
representation
from
47
different
countries.
Testing
Tool and Procedures
The
Accuplacer
ESL
testing
system
is
a
Web-based
program
marketed
by
the
College
Board that
assesses
the
English
skills
of
students who have
learned
English
as a
second
or
alternate
language.
It
consists
of
four
multiple-choice
tests:
Reading
Skills,
which evaluates
the
student's
comprehension
of short
passages; Language Usage,
which
measures
grammar
and
usage;
Sentence
Meaning,
which
assesses
the
under
standing
of word
meanings
in
one- or
two-sentence
contexts;
and
Listening,
which
measures
the
ability
to
listen
to
and
understand
one
or
more
speakers
(College
Examination
Board,
2007).
All
of these
tests
are
offered
in
a
computer-adaptive
test
format where the
pool
of
test
items is
calibrated
for
difficulty
and
content.
When
the
candidate
starts
each
test,
she
or
he is
presented
with
a
question
of
average
difficulty
randomly
selected from
several
starter
questions
of the
same
level of
difficulty.
If
the examinee
answers
the
question correctly,
the
next
question
to
be
administered is chosen from a group of somewhat more difficult
questions,
whereas
an
incorrect
answer
will
cause
the
next
question
to
be
somewhat easier.
Each of
the
four
tests
consists of
20
questions
offered
in this
adaptive
format and scored
on
a
scale
of 0
to
120.
The
Accuplacer
ESL
testing
system
also includes the
WritePlacer
ESL
test,
which
requires
the examinee
to
complete
a
writing
sample
based
on
a
randomly
assigned
prompt.
This
test
is
scored
by
IntelliMetric?an
electronic
scoring
system
that utilizes
a
holistic
scoring
process
designed
to
emulate
human
scorers,
rating
each
essay
on
a
scale of 0
to
6
based
on
its overall effectiveness
(Elliot, 2003).
Detailed
descriptions
of
this
testing
system
and each test can be obtained from the
Accuplacer
Web
site
(College
Examination
Board,
2008).
The
Accuplacer
ESL
testing
system
was
adopted
by
TRU for
several
reasons,
including
its
ability
to
assess
a
wide
range
of
skills
because
of its
computer-adaptive
format and its
ability
to
test
all
relevant skill
areas.
Other
reasons
related
to
the
efficiency
and
cost
of
the
system?namely,
it
was
fairly straightforward
to
administer,
the results
were
available
immediately,
and it
was
relatively
inexpensive.
The
Accuplacer
ESL
tests
were
administered in
computer
labs
by
the
TRU Assessment Centre staff to international students during their first
week
of
orientation
at
TRU,
before classes
began.
For
the
multiple
choice
tests
there
was
no
time
limit; however,
for the
writing
sample
a
390
TESOL
QUARTERLY
This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
6/13
1-hour
time limit
was
imposed.
In
most
cases,
students took
approxi
mately
2
hours
to
complete
all
five
tests;
however,
some
took
only
90 minutes, and others took
a
full 3 hours.
Data Collection
and
Analysis
The
raw
test
scores
for
each of
the
five
Accuplacer
ESL
tests
were
collected from
students
who
wrote
the
assessment
during
the interna
tional
intakes
(fall,
winter,
and
summer)
from
May
2006
to
May
2008,
along
with
basic
demographic
information. To
determine
whether there
were
any
differences
in
mean scores
by
gender
for
the five
Accuplacer
ESL
tests,
an
analysis
of
variance
(ANOVA)
was
employed.
The
standard
mean
difference,
percentage
differences,
and
intercorrelations
also
were
calculated for
comparison
purposes.
A
line
graph
of
the
mean
test
scores
by
gender
was
generated
to
provide
a
pictorial representation
of the
mean score
distributions. The
WritePlacer
ESL
test
scores
were
adjusted
by
a
factor of
20
so
that
they
too
would be
out
of
a
total of
120
and
hence
could
be
plotted
on
the
same
line
graph
with
the
other
tests.
RESULTS
The
descriptive
and mean difference statistics for the test scores
by
gender
are
provided
in
Table 1.
The
distribution
of
the
test
scores
were
normal
or
near
normal
with
slight
skewedness.
For
every
test,
the
females
scored
higher
than the
males;
however,
based
on
the
percentage
difference and the
standard
mean
difference,
these
differences
were
quite
small,
as
displayed
in
Table 1.
TABLE
1
Descriptive
and
Mean
Difference
Statistics for
Accuplacer
ESL
Tests
by
Gender
ACCUPLACER ESL tests
Gender
Statistic
Listen
ing
Language
usage
Reading
skills
Sentence
meaning
WritePlacer
ESL
Female
(
:
211)
Mean
SD
Male
(n
=
283)
Mean
SD
67.7*
16.23
66.1
19.39
87.3*
20.88
81.7*
24.65
82.5*
21.54
77.6*
25.63
80.8
22.42
75.6
26.00
2.87*
2.788
2.60*
1.773
Maximum
120
120
120
120
score
%Difference
1.42%
4.67% 4.08%
4.33%
D
.095 0.245 0.207 0.214
4.50%
0.116
Note,
SD
?
standard
deviation;
D
=
standard
mean
difference.
*Slightly
skewed distributions.
BRIEF
REPORTS AND
SUMMARIES
391
This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
7/13
50
Ustenir?
Ltfi0ng?
?
Sentence
WritePlacer
SL
FIGURE
1.
ACCUPLACER
ESL
mean
scores
by
gender.
A
pictorial
representation
of the
mean
test
scores
for
the
Accuplacer
ESL
tests
by
gender
is
provided
in
Figure
1,
and
the
ANOVA
analysis
is
provided
in
Table
2.
As
shown
in
Figure
1,
both
females
and
males
scored
highest
on
the
Language
Usage
test
and
lowest
on
the
WritePlacer
ESL
test.
The
ANOVA
analysis
revealed
that
the
differences
between
mean
test scores were not significant for the Listening or
WritePlacer
ESL
tests,
but
were
significant
for
the
Language
Usage ,
Reading
Skills,
and
Sentence
Meaning
tests
(Table
2).
The
Pearson
correlations
between
the
five
tests
by gender
produced
large
(Cohen,
1988),
significant
correlations for both
females
and
males
and
all
combinations of
tests
at
the
0.01
level
(2-tailed;
Table
3).
In
all
cases,
the male
subgroup produced
the
larger
correlations,
ranging
from
0.648
to
0.839.
For
the
females,
the
correlations
ranged
from
0.539
to
0.811. For
both
genders,
the
smaller
correlations
were
between
the
Listening
and
WritePlacer ESL
tests,
and the
largest
were
between the
Reading Skills and Sentence
Meaning
tests (Table 3).
TABLE
2
ANOVA
Analysis
of
Accuplacer
ESL
Mean
Test
Scores
by
Gender
Mean
test score
ACCUPLACER ESL
test
Female
Male
Total
F
Significance
Listening
Language Usage
Reading
Skills
Sentence Meaning
WritePlacer ESL
67.75
87.30
82.51
80.82
2.87
211
66.08
81.69
77.59
75.60
2.60
283
66.79
84.08
79.69
77.83
2.71
494
1.025
7.060
5.088
5.478
3.053
0.312
0.008
0.025
0.020
0.081
392
TESOL
QUARTERLY
This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions
-
8/9/2019 Www.jstor.org Stable Pdfplus 27896732
8/13
TABLE 3
Pearson
Correlations
Between Test Scores
by
Gender
Accuplacer ESL Language Reading Sentence WritePlacer
tests
Listening
usage
skills
meaning
ESL
Listening
1
Language Usage
0.660*
(F)
1
0.728*
(M)
Reading
Skills
0.727*
(F)
0.744*
(F)
1
0.758*
(M)
0.798*
(M)
Sentence
Meaning
0.764*
(F)
0.739*
(F)
0.811*
(F)
1
0.813*
(M)
0.824*
(M)
0.839*
(M)
WritePlacer ESL 0.539*
(F)
0.601*
(F)
0.643*
(F)
0.616*
(F)
1
0.648*(M)
0.715*
(M)
0.733*
(M)
0.699*
(M)
Note. F
=
female;
M
=
male.
Correlation is
significant
at the 0.01 level (2-tailed).
DISCUSSION
In
this
study,
on
average,
females scored
higher
than males
on
the
Accuplacer
ESL
tests,
with
significant
differences in
mean scores
for
three
of
the five
tests?Language Usage,
Reading
Skills,
and Sentence
Meaning.
Although
these differences
were
statistically
significant,
based
on
the
standard
mean
differences
and
percent
differences,
they
were
quite
small. These results
are
akin
to
those
reported
in
most
of
the
aforementioned
studies,
with females
scoring
slightly
higher
overall
on
other
language
tests
such
as
the IELTS
(University
of
Cambridge,
2006),
TOEFL
iBT
(Educational
Testing
Service,
2007),
and MELAB
(Johnson
&
Song,
2008).
In
terms
of skill
areas,
this
study
found
that
the
subtests
measuring
reading
and
vocabulary
skills
produced
the
greatest
difference
in
mean
scores
(0.207