www.jstor.org stable pdfplus 27896732

Upload: karlita-b

Post on 01-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    1/13

    Teachers of English to Speakers of Other Languages, Inc. TESOL)

    Do Language Proficiency Test Scores Differ by Gender?Author(s): CINDY L. JAMESSource: TESOL Quarterly, Vol. 44, No. 2 (June 2010), pp. 387-398Published by: Teachers of English to Speakers of Other Languages, Inc. (TESOL)Stable URL: http://www.jstor.org/stable/27896732.

    Accessed: 07/12/2014 08:06

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    .

    Teachers of English to Speakers of Other Languages, Inc. (TESOL)is collaborating with JSTOR to digitize,

    preserve and extend access to TESOL Quarterly.

    http://www.jstor.org

    This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    2/13

    Do

    Language Proficiency

    Test

    Scores

    Differ

    by

    Gender?

    CINDY L. JAMES

    Thompson

    Rivers

    University

    Kamloops,

    British

    Columbia,

    Canada

    doi:

    10.5054/U?.2010.222215

    Most

    postsecondary

    educational

    institutions

    employ

    some

    type

    of

    language proficiency

    assessment

    for

    international

    applicants

    to

    assess

    their language skills (Alderson, Krahnke, & Stansfield, 1987; Chalhoub

    Deville &

    Turner,

    2000;

    Kahn,

    Butler,

    Weigle,

    &

    Sato,

    1994;

    Paltridge,

    1992;

    Person, 2002;

    Rees,

    1999;

    Roemer,

    2002;

    Seaman &

    Hayward,

    2000).

    The

    performance

    of

    these

    applicants

    is of

    interest

    to

    adminis

    trators,

    faculty,

    staff,

    and researchers

    alike,

    with

    gender

    variations

    being

    one

    issue often studied. These

    types

    of

    studies tend

    to

    compare

    the

    performance

    of

    females with males

    in

    terms

    of

    mean

    test

    scores

    by

    subtest

    and/or

    total

    test

    score,

    and in

    some

    cases

    by

    specific

    test

    questions

    or

    types

    of

    questions.

    The

    score

    differences

    are

    often

    reported

    as raw score

    differences,

    but

    to

    enhance

    comparability

    between

    tests,

    the

    differences

    also

    can

    be

    expressed

    in

    a

    standardized

    form such

    as a

    standard

    mean

    difference

    or a

    percent

    difference.

    The standard

    mean

    difference,

    denoted

    by

    D,

    is

    considered

    by

    Willingham

    and Cole

    (1997)

    in

    their

    meta-analysis

    of

    gender

    and

    assessment

    as one

    of the

    most

    common

    measurements.

    It

    is

    calculated

    by

    subtracting

    the male

    mean score

    from

    the female

    mean

    score

    and

    dividing

    the difference

    by

    the

    average

    standard

    deviation

    (SD):

    female

    mean

    ?

    male

    mean

    average SD

    Referring

    to

    the

    classifications

    by

    Cohen

    (1988),

    values of

    D

    from

    0.20

    to

    0.49 indicate

    small

    differences,

    from 0.50

    to

    0.79

    are

    considered

    medium

    differences,

    and 0.80

    or

    higher

    equate

    to

    large

    differences.

    When standard

    deviations

    are

    not

    available,

    a

    percent

    difference

    can

    be

    employed

    by

    calculating

    the

    difference

    between the

    male

    and female

    mean

    and

    dividing

    this

    by

    the

    total

    score

    of

    the

    test

    and

    converting

    the

    answer

    to

    a

    percentage:

    / _

    female

    mean

    ?

    male

    mean

    %

    Difference

    =---

    100%

    total

    test

    score

    BRIEF

    REPORTS

    AND

    SUMMARIES

    387

    This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    3/13

    To

    date,

    many

    of

    the

    gender

    studies of

    language proficiency

    tests

    have

    revealed

    stronger

    performances

    by

    females

    compared

    with

    males,

    although these differences ingeneral tend tobe quite small. For instance,

    Zeidner

    (1987)

    explored

    the

    impact

    of

    gender

    and

    other

    factors

    on

    test

    scores

    for

    an

    English

    language aptitude

    test

    used for

    selection

    and

    placement

    at

    Israeli

    educational institutions. His

    research

    revealed

    that

    the

    mean

    test

    scores

    were

    significantly

    different,

    with

    females,

    in

    general,

    scoring

    higher

    than males.

    Although

    standard

    differences

    were

    not

    provided,

    it

    was

    possible

    to

    calculate

    post

    hoc with the

    standard

    mean

    difference

    being

    0.17

    and the

    percent

    difference

    being

    1.36%.

    Females also scored

    slightly higher

    on

    the

    academic examination

    of the

    International

    English

    Language

    Testing

    System

    (IELTS)

    based

    on

    data

    from

    2004 (University of Cambridge, 2006). This testing system includes listening,

    reading,

    writing,

    and

    speaking

    sections,

    and

    in each

    case

    the

    mean

    band

    scores

    for females

    were

    greater

    than those for

    males,

    as was

    the

    overall

    test

    score.

    Once

    again,

    the standard

    differences

    were

    not

    included in

    this

    IELTS

    report,

    but

    it

    was

    possible

    to

    calculate

    post

    hoc the

    percent

    difference

    with

    the data

    provided.

    For

    the

    reading

    test

    the

    percent

    difference

    was

    1.44%,

    for

    the

    listening

    and

    speaking

    tests

    it

    was

    1.56%,

    for

    the

    writing

    test

    the

    difference

    was

    1.78%,

    and

    for the

    total

    test

    score

    it

    was

    1.67%.

    Similarly,

    a

    data

    report

    by

    the Educational

    Testing

    Service

    (2007)

    for the

    Test

    of

    English

    as a

    Foreign

    Language

    Internet-Based Test

    (TOEFL iBT)

    revealed

    that,

    overall,

    females scored

    marginally higher

    thanmales on tests

    conducted between

    September

    2005

    and December

    2006.

    At

    the

    individual

    test

    level,

    the

    mean

    test

    scores

    for females

    were

    higher

    than

    those

    for

    males

    on

    three of the four

    sections?listening, speaking,

    and

    writing

    test?but the

    reverse was

    true

    for

    the

    reading

    test.

    This TOEFL iBT

    score

    report

    provided

    means

    and

    standard

    deviations,

    so

    it

    was

    possible

    to

    calculate

    standard

    differences

    post

    hoc.

    The

    standard

    mean

    and

    percentage

    differences

    between females and males

    on

    the subtests

    were as

    follows: 0.04 and

    1.0%

    for

    listening,

    0.21

    and

    3.7%

    for

    speaking,

    0.09

    and

    1.7%

    for

    writing,

    and

    ?0.05 and

    ?1.3%

    for

    reading.

    This

    equated

    to

    an

    overall

    standard

    mean

    difference

    of

    0.04 and

    a

    percent

    difference of

    0.8%.

    Similar results

    were

    reported

    for

    the

    Michigan

    English

    Language

    Assessment

    Battery

    (MELAB),

    with the

    mean

    final

    MELAB

    score

    again

    being higher

    for

    females than

    males,

    based

    on

    scores

    collected

    in

    2007

    (Johnson

    &

    Song,

    2008).

    The

    final

    MELAB

    score

    is

    calculated

    by

    averaging

    the

    scores on

    the three

    compulsory

    sections of

    the MELAB?

    listening,

    reading,

    and

    writing.

    Scores

    by

    gender

    for

    these

    individual

    sections

    were

    not

    provided

    in this

    report,

    so

    post

    hoc

    analysis

    was

    only

    possible

    on

    the final

    mean

    scores,

    generating

    a

    standard

    mean

    difference of 0.12 and a percentage difference of 1.2%.

    A

    comparison

    of

    female and

    male

    mean

    test

    scores on

    the Canadian

    Academic

    English

    Language

    (CAEL)

    assessment tests

    also

    showed

    females

    388

    TESOL

    QUARTERLY

    This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    4/13

    scoring slightly

    higher

    on

    all

    four

    of the

    language

    subtests?writing,

    listening, reading,

    and

    speaking?based

    on

    candidate

    scores

    from

    2002

    to

    2008 (Carleton University, 2009).

    Post

    hoc

    analysis of the

    mean

    differences

    produced

    values of 0.18 for

    writing,

    0.17

    for

    listening,

    0.20

    for

    reading,

    and

    0.23

    for

    speaking.

    This

    equated

    to

    percentage

    differences

    of

    3.0%

    for

    writing,

    2.8%

    for

    listening,

    3.2%

    for

    reading,

    and

    3.8%

    for

    speaking.

    In

    terms

    of

    the studies

    that

    have

    analyzed

    gender

    differences

    for

    specific

    questions

    or

    types

    of

    questions,

    results

    vary.

    For

    example,

    a

    study

    conducted

    by

    Pae

    (2004)

    examining

    the

    effect of

    gender

    on

    an

    English

    reading

    comprehension

    test

    for Korean learners

    revealed that

    items with

    content

    relating

    to

    mood,

    impression,

    and

    tone

    tended

    to

    be easier

    for

    females,

    whereas

    passages

    with

    logical

    inferences

    were

    easier

    for

    males.

    A

    study by

    Takala and Kaftandjieva (2000) involving an English vocabulary test?one

    of the

    subtests of the

    Finnish

    Foreign Language

    Certificate Examination?

    also revealed that

    some

    of

    the

    test

    items

    tended

    to

    favour

    females,

    whereas

    others favoured

    males;

    however,

    they

    concluded that the

    test

    as a

    whole

    was

    considered

    to

    be

    gender

    neutral.

    Meanwhile,

    a

    review

    of

    writing

    assessments

    used

    for

    postsecondary

    admission

    by

    Breland,

    Bridgeman,

    and Fowles

    (1999)

    found

    that males tended

    to

    do

    better

    than females

    on

    the

    multiple-choice

    subtests

    of

    the

    TOEFL but the females

    performed

    better

    on

    the

    essay

    portion

    of the TOEFL.

    Another

    TOEFL

    study

    by

    Wainer

    and Lukhele

    (1997)

    that

    investigated gender

    differences

    on

    reading

    comprehension

    testlets?sections

    with

    several

    questions

    based

    on one

    reading passage?found

    that there

    were

    essentially

    no

    gender

    differences

    (greatest

    difference

    was

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    5/13

    ranged

    in

    age

    from

    16

    years

    to

    54

    years,

    with

    an

    average age

    of

    22.0

    years

    (SD

    =

    3.80).

    The

    gender

    distribution

    was

    fairly

    even,

    with

    57%

    males

    and

    43%

    females. The ethnic

    background

    of

    these students

    was

    diverse,

    with

    representation

    from

    47

    different

    countries.

    Testing

    Tool and Procedures

    The

    Accuplacer

    ESL

    testing

    system

    is

    a

    Web-based

    program

    marketed

    by

    the

    College

    Board that

    assesses

    the

    English

    skills

    of

    students who have

    learned

    English

    as a

    second

    or

    alternate

    language.

    It

    consists

    of

    four

    multiple-choice

    tests:

    Reading

    Skills,

    which evaluates

    the

    student's

    comprehension

    of short

    passages; Language Usage,

    which

    measures

    grammar

    and

    usage;

    Sentence

    Meaning,

    which

    assesses

    the

    under

    standing

    of word

    meanings

    in

    one- or

    two-sentence

    contexts;

    and

    Listening,

    which

    measures

    the

    ability

    to

    listen

    to

    and

    understand

    one

    or

    more

    speakers

    (College

    Examination

    Board,

    2007).

    All

    of these

    tests

    are

    offered

    in

    a

    computer-adaptive

    test

    format where the

    pool

    of

    test

    items is

    calibrated

    for

    difficulty

    and

    content.

    When

    the

    candidate

    starts

    each

    test,

    she

    or

    he is

    presented

    with

    a

    question

    of

    average

    difficulty

    randomly

    selected from

    several

    starter

    questions

    of the

    same

    level of

    difficulty.

    If

    the examinee

    answers

    the

    question correctly,

    the

    next

    question

    to

    be

    administered is chosen from a group of somewhat more difficult

    questions,

    whereas

    an

    incorrect

    answer

    will

    cause

    the

    next

    question

    to

    be

    somewhat easier.

    Each of

    the

    four

    tests

    consists of

    20

    questions

    offered

    in this

    adaptive

    format and scored

    on

    a

    scale

    of 0

    to

    120.

    The

    Accuplacer

    ESL

    testing

    system

    also includes the

    WritePlacer

    ESL

    test,

    which

    requires

    the examinee

    to

    complete

    a

    writing

    sample

    based

    on

    a

    randomly

    assigned

    prompt.

    This

    test

    is

    scored

    by

    IntelliMetric?an

    electronic

    scoring

    system

    that utilizes

    a

    holistic

    scoring

    process

    designed

    to

    emulate

    human

    scorers,

    rating

    each

    essay

    on

    a

    scale of 0

    to

    6

    based

    on

    its overall effectiveness

    (Elliot, 2003).

    Detailed

    descriptions

    of

    this

    testing

    system

    and each test can be obtained from the

    Accuplacer

    Web

    site

    (College

    Examination

    Board,

    2008).

    The

    Accuplacer

    ESL

    testing

    system

    was

    adopted

    by

    TRU for

    several

    reasons,

    including

    its

    ability

    to

    assess

    a

    wide

    range

    of

    skills

    because

    of its

    computer-adaptive

    format and its

    ability

    to

    test

    all

    relevant skill

    areas.

    Other

    reasons

    related

    to

    the

    efficiency

    and

    cost

    of

    the

    system?namely,

    it

    was

    fairly straightforward

    to

    administer,

    the results

    were

    available

    immediately,

    and it

    was

    relatively

    inexpensive.

    The

    Accuplacer

    ESL

    tests

    were

    administered in

    computer

    labs

    by

    the

    TRU Assessment Centre staff to international students during their first

    week

    of

    orientation

    at

    TRU,

    before classes

    began.

    For

    the

    multiple

    choice

    tests

    there

    was

    no

    time

    limit; however,

    for the

    writing

    sample

    a

    390

    TESOL

    QUARTERLY

    This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    6/13

    1-hour

    time limit

    was

    imposed.

    In

    most

    cases,

    students took

    approxi

    mately

    2

    hours

    to

    complete

    all

    five

    tests;

    however,

    some

    took

    only

    90 minutes, and others took

    a

    full 3 hours.

    Data Collection

    and

    Analysis

    The

    raw

    test

    scores

    for

    each of

    the

    five

    Accuplacer

    ESL

    tests

    were

    collected from

    students

    who

    wrote

    the

    assessment

    during

    the interna

    tional

    intakes

    (fall,

    winter,

    and

    summer)

    from

    May

    2006

    to

    May

    2008,

    along

    with

    basic

    demographic

    information. To

    determine

    whether there

    were

    any

    differences

    in

    mean scores

    by

    gender

    for

    the five

    Accuplacer

    ESL

    tests,

    an

    analysis

    of

    variance

    (ANOVA)

    was

    employed.

    The

    standard

    mean

    difference,

    percentage

    differences,

    and

    intercorrelations

    also

    were

    calculated for

    comparison

    purposes.

    A

    line

    graph

    of

    the

    mean

    test

    scores

    by

    gender

    was

    generated

    to

    provide

    a

    pictorial representation

    of the

    mean score

    distributions. The

    WritePlacer

    ESL

    test

    scores

    were

    adjusted

    by

    a

    factor of

    20

    so

    that

    they

    too

    would be

    out

    of

    a

    total of

    120

    and

    hence

    could

    be

    plotted

    on

    the

    same

    line

    graph

    with

    the

    other

    tests.

    RESULTS

    The

    descriptive

    and mean difference statistics for the test scores

    by

    gender

    are

    provided

    in

    Table 1.

    The

    distribution

    of

    the

    test

    scores

    were

    normal

    or

    near

    normal

    with

    slight

    skewedness.

    For

    every

    test,

    the

    females

    scored

    higher

    than the

    males;

    however,

    based

    on

    the

    percentage

    difference and the

    standard

    mean

    difference,

    these

    differences

    were

    quite

    small,

    as

    displayed

    in

    Table 1.

    TABLE

    1

    Descriptive

    and

    Mean

    Difference

    Statistics for

    Accuplacer

    ESL

    Tests

    by

    Gender

    ACCUPLACER ESL tests

    Gender

    Statistic

    Listen

    ing

    Language

    usage

    Reading

    skills

    Sentence

    meaning

    WritePlacer

    ESL

    Female

    (

    :

    211)

    Mean

    SD

    Male

    (n

    =

    283)

    Mean

    SD

    67.7*

    16.23

    66.1

    19.39

    87.3*

    20.88

    81.7*

    24.65

    82.5*

    21.54

    77.6*

    25.63

    80.8

    22.42

    75.6

    26.00

    2.87*

    2.788

    2.60*

    1.773

    Maximum

    120

    120

    120

    120

    score

    %Difference

    1.42%

    4.67% 4.08%

    4.33%

    D

    .095 0.245 0.207 0.214

    4.50%

    0.116

    Note,

    SD

    ?

    standard

    deviation;

    D

    =

    standard

    mean

    difference.

    *Slightly

    skewed distributions.

    BRIEF

    REPORTS AND

    SUMMARIES

    391

    This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    7/13

    50

    Ustenir?

    Ltfi0ng?

    ?

    Sentence

    WritePlacer

    SL

    FIGURE

    1.

    ACCUPLACER

    ESL

    mean

    scores

    by

    gender.

    A

    pictorial

    representation

    of the

    mean

    test

    scores

    for

    the

    Accuplacer

    ESL

    tests

    by

    gender

    is

    provided

    in

    Figure

    1,

    and

    the

    ANOVA

    analysis

    is

    provided

    in

    Table

    2.

    As

    shown

    in

    Figure

    1,

    both

    females

    and

    males

    scored

    highest

    on

    the

    Language

    Usage

    test

    and

    lowest

    on

    the

    WritePlacer

    ESL

    test.

    The

    ANOVA

    analysis

    revealed

    that

    the

    differences

    between

    mean

    test scores were not significant for the Listening or

    WritePlacer

    ESL

    tests,

    but

    were

    significant

    for

    the

    Language

    Usage ,

    Reading

    Skills,

    and

    Sentence

    Meaning

    tests

    (Table

    2).

    The

    Pearson

    correlations

    between

    the

    five

    tests

    by gender

    produced

    large

    (Cohen,

    1988),

    significant

    correlations for both

    females

    and

    males

    and

    all

    combinations of

    tests

    at

    the

    0.01

    level

    (2-tailed;

    Table

    3).

    In

    all

    cases,

    the male

    subgroup produced

    the

    larger

    correlations,

    ranging

    from

    0.648

    to

    0.839.

    For

    the

    females,

    the

    correlations

    ranged

    from

    0.539

    to

    0.811. For

    both

    genders,

    the

    smaller

    correlations

    were

    between

    the

    Listening

    and

    WritePlacer ESL

    tests,

    and the

    largest

    were

    between the

    Reading Skills and Sentence

    Meaning

    tests (Table 3).

    TABLE

    2

    ANOVA

    Analysis

    of

    Accuplacer

    ESL

    Mean

    Test

    Scores

    by

    Gender

    Mean

    test score

    ACCUPLACER ESL

    test

    Female

    Male

    Total

    F

    Significance

    Listening

    Language Usage

    Reading

    Skills

    Sentence Meaning

    WritePlacer ESL

    67.75

    87.30

    82.51

    80.82

    2.87

    211

    66.08

    81.69

    77.59

    75.60

    2.60

    283

    66.79

    84.08

    79.69

    77.83

    2.71

    494

    1.025

    7.060

    5.088

    5.478

    3.053

    0.312

    0.008

    0.025

    0.020

    0.081

    392

    TESOL

    QUARTERLY

    This content downloaded from 82.33.77.134 on Sun, 7 Dec 2014 08:06:10 AMAll use subject to JSTOR Terms and Conditions

  • 8/9/2019 Www.jstor.org Stable Pdfplus 27896732

    8/13

    TABLE 3

    Pearson

    Correlations

    Between Test Scores

    by

    Gender

    Accuplacer ESL Language Reading Sentence WritePlacer

    tests

    Listening

    usage

    skills

    meaning

    ESL

    Listening

    1

    Language Usage

    0.660*

    (F)

    1

    0.728*

    (M)

    Reading

    Skills

    0.727*

    (F)

    0.744*

    (F)

    1

    0.758*

    (M)

    0.798*

    (M)

    Sentence

    Meaning

    0.764*

    (F)

    0.739*

    (F)

    0.811*

    (F)

    1

    0.813*

    (M)

    0.824*

    (M)

    0.839*

    (M)

    WritePlacer ESL 0.539*

    (F)

    0.601*

    (F)

    0.643*

    (F)

    0.616*

    (F)

    1

    0.648*(M)

    0.715*

    (M)

    0.733*

    (M)

    0.699*

    (M)

    Note. F

    =

    female;

    M

    =

    male.

    Correlation is

    significant

    at the 0.01 level (2-tailed).

    DISCUSSION

    In

    this

    study,

    on

    average,

    females scored

    higher

    than males

    on

    the

    Accuplacer

    ESL

    tests,

    with

    significant

    differences in

    mean scores

    for

    three

    of

    the five

    tests?Language Usage,

    Reading

    Skills,

    and Sentence

    Meaning.

    Although

    these differences

    were

    statistically

    significant,

    based

    on

    the

    standard

    mean

    differences

    and

    percent

    differences,

    they

    were

    quite

    small. These results

    are

    akin

    to

    those

    reported

    in

    most

    of

    the

    aforementioned

    studies,

    with females

    scoring

    slightly

    higher

    overall

    on

    other

    language

    tests

    such

    as

    the IELTS

    (University

    of

    Cambridge,

    2006),

    TOEFL

    iBT

    (Educational

    Testing

    Service,

    2007),

    and MELAB

    (Johnson

    &

    Song,

    2008).

    In

    terms

    of skill

    areas,

    this

    study

    found

    that

    the

    subtests

    measuring

    reading

    and

    vocabulary

    skills

    produced

    the

    greatest

    difference

    in

    mean

    scores

    (0.207