lnfluences of rater variables on college japanese l2 ... · characteristics on l2 writing...

.lournal CA.ILE. Vol. 6 (2004)

lnfluences of Rater Variables onCollege Japanese L2 Writing Assessment

SHIBATA SetsueCalifornia State University, Fullcfton

ssh ibata(L0 fu I lerton. edu

E fi EE#!H l: #i lf 6 t+4 a ftyJ*t i E+{ffi t tt' 6 h';+{ffi A o ++{* L 7 QY:'EI=9Af 6ffin

*EE Ffi+i11 t) ,.t :i. .tv:-_-f ,)ll jt klz ,t l_ >+i

Abstract:Although thc Japancse L2 writ ing asscssment is crit ical in evaluating a

studcnt's Japancse language skil ls, there has been litt le research into the effect ofraters' background on Japanese L2 lvrit ing assessment. The present studyexamines thc cflects of four variables of the ratcr's background; academicspecialty, teaching expericnce, attitudes toward cor.nposition in general, andattitudes toward the cornposition he/she is scoring as paft of the Collegc students'Japancse L2 writ ing asscssment. Twenty-one collcge Japanese instructorsassessed 15 contpositions on both holistic and analytic scales. On the basis oftheir academic specialty, they were divided into 3 groups: Linguistics.Literature/Asian Studies and Education. The results show that academic specialty,teaching expericnce and attitudes toward conlposition in general are not the

tnajor lactors af{'ecting the ratcrs' leniency, but thcir personal pref'erences fbr awriting they arc scoring are one of the major factors that decrease intcr-raterrcliabil i ty o1' writ ing assessment. The study re-conflnns the importance ofrnultiple ratings to rnaintain faimcss and accuracy in the writ ing assessmcnt'

lntroduction

The writ ing assesslncnt, which is a crit ical part of language instntctton,

mccts at least three purposes: program placement, monitoring student's progress,

and accountabil ity. Whatever the pulpose, accuracy and reliabil i ty of rating are

thc key factors in an accurate assessmcnt, and proof of an accurate and reliable

assesslrent is consistency of score among raters (inter-rater rcliabil i ty). Since a

ratcr's judgemcnt has always played an important role in the assessment of

writ ing, adcquate training and better spccification of scoring criteria are crucial

1o rninimize the raters' bias (Lumley,2002). However, the sgbstantial variation in

41

Journal CAJLE, Vol. 6 (2004)

rater harshness (or leniency) that cxists cannot be easily eliminated due to the

nature of human bcings (Carson & Carson, 1984; Lumley and McNamara, 1995;

Kondo-Brown, 2002).

Although many previous studies investigated the factors which influence

the inter-ratcr dif l-crcnces of L2 writ ing assessment (e.g., Cumrr.ring, Kantor, and

Powcrs,2002; Lurnley,2002; Song and Caruso, 1996), most of t l.rese studics arc

in the field of ESL/EFL. The purpose of this study is to investigate how a rater's

variables such as educational background, attitude towards composition in

general, and prcfercnce torvards a student's writing that he/she is scoring

influence his/her analytic and holistic scoring in thc field ofJapanese as a foreign

or second language tL2).

Previous Studies

The assessrnent of writ ing proficiency is an essential part of L2 instruction,

but is far morc complex, challenging, and time consuming than with native

speakers of the target language. Three types ol rating scales arc usually used in

scoring a writ ing: analytic, primary trait, and holistic. Furlhemore, each has a

diflerent purpose and focus in instruction and will provide different types of

infbrmation to teachers and students (Cohen, 1994). Analytic scoring is

considered the most appropriate when diagnostic and specific feedback is

required, while holistic scoring is used to assess a student's overall perfbrmance

particularly in casc where only a l imited time is available fbr assessment.

Holistic scoring is often used in case of screening, placement, and accountabil ity,

e.9., to see if students have attained a relative expectcd level of proficiency.

Holistic scoring is considered less reliable than analytic scoring, since it

produces a single score in which the total quality of writ ing is not the sum of its

components, but is viewed as a whole and tends to be more influenced by

individual rater's charactcristics than analytic scoring (Hamp-Lyons, 1995).

Whatevcr the purpose of assessment and type of scoring used, the assessment

needs to be performed accurately, consistently among raters, and effectively

rvithin the l imited tirne available. Rater bias can be minimized by rater training

t1++

Journal ClA.lLE, Vol. 6 (2004)

or experience, but is no1 likely to be elirninated completely due to an individual's

unique characteristics (Kondo-Brown, 2002; Lurnley, 2002).

Previous studies have bccn conducted fbcusing on the raters'

characteristics on L2 writ ing assessment, particularly in thc field of ESL/EFL.

Song and Caruso ( I 996) compared ESL faculty and English faculty regarding the

results of holistic and analytic evaluations of college students'essays written by

non-native and native English spcakers. They fbund no significant difference

between the two groups ol faculty on analytic rating, but found that the English

laculry was more lenient on holistic rating than thc ESL faculty. The study also

found that with more experience in the writ ing assessment, raters became more

lenient in their holistic evaluation. Lumley (2002) investigatcd the process by

which raters rnake their scoring decision and found that thc rating was heavily

influenced by the individual intuit ive impression of the text obtained whcn a

ratcr first read it. Cumrning, Kantor, and Powers (2002) fbund that the two

groups of ESL/EFL teachers and English teachers fbr native English speakers

used sirnilar decision-making behaviors while assessing the TOEFL essays. They

also found that the ESL/EFL teachers fbcused on language rather than on rhetoric

and ideas, while the English teachers werc more l ikely to fbcus on rhetoric and

ideas in their overall assessment.

Therc are a number of studies on rater variables in the fleld of writing

assessment of Japanese as L2. Among them Kondo-Brown (2002) investigated

how judgrrents of raters were biased, fbcusing on the interaction of ratcrs and

types of writ ing. She analyzed thc data of rating scores ol college Japanese L2

writings rated by three raters using FACETS program. She found that all raters

had their own unique bias pattems regardless of their relatively similar language

and professional backgrounds. Her finding supports the necessity ol multiple

ratings even with the reliable assessment procedure. Tanaka, Tsubone, and

Hajikano (1998) examincd the differences between Japanese L2 teachers and

non-tcachers of native Japanese speakers regarding how the groups evaluate L2

Japanese compositions using analytic scoring. The ANOVA rcsults indicated that


Japanese L2 teachcrs weighted on the content and the languagc use, and

non-teachers wcighted on the content and thc accuracy, especially on the

particles. It was also reporlcd that teachers scored more lenicntly overall than

non-teachcrs did.

As fbr thc writ ing assessmcnt of Japanesc as Ll (Kokrgo), Ishida and

Mori (1985) found that there was a signiticant difference betwcen elernentary

school tcachers and college students regarding how. they assessed the

compositions of Japanesc elementary school children. According to their study,

tcachers fbcused on language use while college students paid r.norc attention to

clearncss of the ther.r.re. Thc study concludcd that teachers' assessment reflccted

thcir own educational point ot'r ' iew, a bias that did not apply to the other group.

Kaji i (2001) investigated how the raters' psychological f 'actors aflcct their

assessment of writ ing. He analyzed the data obtained on 21 elcmentary school

teachcrs in Japan, and fbund that the ratcr's personal prcference towards a

writ ing that he/she is rating is closely associated with a higher score. According

to the report of Kokuritsu Kokugo Kenkyuujo fNational Japancsc Language

lnstitutc] ( 1978), elen.rentary school childre n whose homeroom teachcrs'

specialty is . lapanese arc more l ikely to have fbvorablc attitudes tolvards

composition. The study irrplies that thc teachers' prcf-erences and attitudes

towards writing rnay inf'luencc their children's pref-erences and attitudes toward

writing. Brown (1995) examined the eflbct of the raters'background on thc oral

asscssment of Japanesc using the Japanesc Language Tcst fbr Tour Guidcs. She

compared the rcsults of oral assessmcnt taken by 5 1 tcst candidates rated by the

three groups ofraters, based on their occupational background, i.c., a group that

has guiding experiencc only. a group that has teaching cxperience only, and a

group that have both guiding and tcaching experienccs. She also cornpared the

difl-erences bctwccn 1wo groups of ratcrs, a group of native speakers and a group

of near-nativc speakers of Japancse. The results showed no significant dif l 'erencc

among three groups olthe occupational background regarding the consistency of

rating, and showed no significant dif l-erence bctn'een native and near-native

46

Journal CAJLE. Vol. 6 (2004)

ratcrs' groups regarding harshness/lcniency of rating.

As mentioned earlier, thcrc is only a l imited number of studies focusing

on rater-variables on assessment of Japanese L2 writ ings. This study further

examines thc rater variables that may influence assessment of Japanese L2

lvrit ings.

The Study

Rcsearch Ouestions

In this study. the following questions are addressed.

l. Are ratcrs of a parlicular academic background more harsh/lenient in

l

assessing Japanese L2 writ ings?

Are raters with more cxpcrience teaching more harsUlcnient in assessing

Japanese L2 writ ings?

Are raters' personal preferences/attitudes toward composition in general

associated with harshness/lenicncy of their ratings?

Are raters' pref-ercnccs toward a composition that they are scoring

associated with harshness/leniency of their ratings?

3

Pafticipants (raters of Japancsc L2 writ ings)

Participants wcre 2l native Japanese language instructors who teach

Japanese as L2 at the post-secondary lcvcl. All participants had at least a

Master's degree, and their academic backgrounds werc thc following: Seven of

2l rvere Linguistics, cight were Education (Forcign Language Education,

TESOL, arrd Second Language Acquisit ion), six wcrc Literature, and fbur were

Asian Studics. The range oftheir ages is from 26 to 58, and the average years of

tcaching.lapanese as L2 is 10.1 years. They were askcd to answer a variety of

survey questions including length of expcricnce teaching, educational

background, pcrsonal preference for writ ing (l-5 scale where 5 is the highest).

They wcrc also asked to assess l5 compositions by both holistic and analytic

mcthods.

4.

47

Journal CAJI-E. Vol. 6 (2004)

Compositions

The cornpositions were written by l5 college students who studied

Japanese as L2 and were in the second year or higher Japancsc language classes.

The students who agreed to participate in the study were asked to write a

composition in the classroom. They were given 30 minutes to write a

composition. Compositions were descriptive in nature, and the students chose

one topic fiom the following: l. The most fbvorite/unfavorable experience in

your l if-e; 2. Introducing your hometown; and 3. Introducing your favorite book

or movie.

Data

Independent variables :

L Raters' academic background: This data was obtained from the survey

questionnaire. Participants were divided into three groups based on their

acadernic backgrounds of either Linguistics, Education or Literature/Asian

Studies. Four teachers of Asian Studies background were in the same group

with tcachers of Literature background, since they claimed that they took

morc Literature courses than other areas.

2. Teaching experience: Number of years of teaching experience was obtained

l iom thc survcy quest ionnaire.

3. The raters'attitudes toward composition in general: The data was obtained

from the survey questionnaire. Parlicipants registered their level of their

attitudes toward composition on a 5-point scale where 5 was 'the most

favorite'.

4. The raters' prefcrence/attitr"rdes toward a composition that they are scoring:

Parlicipants were asked their level of prefcrence of each of the Japanese L2

compositions that they werc scoring. Rating was based on a 5-point scale

where 5 was 'the most favorite.'

Dependent variables:

Each participant assessed l5 Japanese L2 writ ings. He/she assessed each

48

Journal CAJLE, Vol. 6 (200.1)

composition by two types of scoring, i.e., holistic and analytic rating.

l. Holistic rating score: ACTFL writ ing scale (Breiner-Sandcrs. Swender, and

Terry, 2002) was used fbr the holistic scale. Level of proficiency is

converted to the numerical numbers, from I (Novice-low) to l0 (Supcrior).

2. Analytic rating score: A modified scale developed by Sasaki and Hirose

(1999) was used fbr the analytic scalc. Since the scale is primarily for

Japanese Ll writ ings, rubrics that were considcrcd as either not appropriate

or not clearly stated for cvaluating an L2 writ ing wcre eliminated. Therc are

five assessment areas, i.e., Content, Organization, Language use and

vocabulary, Mechanics/Accuracy, and Appeal to the readers. E,ach area has

two to flve specific rubrics, and the scale of each rubric is a 5-point scale,

wherc 5 is 'the strongly agrec.'The total score is the sum of the scores of all

rubrics. The highcst score is 95 (5 x l9 rubrics).

Results

Table I

Table oJ Means and Standard Deviations of Rtttings by the Raters'Specialty

Group Linguist ics

Mean SD

Education

Mean SD

Literature/Humanities

Mean SD

Analytic (n-23)

Hol ist ic (n-23)

60.85 6. r8

6.86 t .0 l

59.00 4.20

6.61 r .03

60.87 4.98

6.10 0.93

The means and standard deviations of holistic and analytic rating scores

that were conducted by raters of three different groups of specialties are shown in

Table l. As illustrated in the table, the group of raters with an Education

background gave the students lower score (i.e., rated them more harshly) than the

other two groups in both types of ratings. One way ANOVA was conducted for

holistic rating and analytic rating separately to examine the differences among

the groups' rat ing character ist ics.

49


Table 2

Anolvsis o.f variance on Raters'Specialtv./ttr Anal.v-tic und Holistic'Ratings

Surn of Squarcs df Mean Square F Sig.

Analyt ic

Hol ist ic

33.15

0.28

16.88

0.14

0.66

0.15

0.5

0.87

2

2

Table 3

Strmmurl- of Mean,s ancl Analysis of Varionce fitr Anctll:tic Scoring Rubric'

Rubric Ling.(n-7)

Edu. Li t . Human f(n:8) (n-10)

1. Content 8.86 8.50 9.30 0.3 |

I - I . Is the therne clcar'J

1-2. Is the theme supported bySufllcient f-actual information'l

1-3. ls thc content consistcnt wi ththe tit le' l

3.00

3.r4

3.51

2.83

2.84

- l - - ) - )

3.01 0.07

3.10 0.3s

3.80 l . l1

2. Organization 1.t4 6.50 7.50 0.92

1t Arc paragraphs appropriatelyformed?

Arc all paragraphs logicallyconnectcd'J

3.57

3.56

J.JJ

3. l l

3.80 0.66

3.70 0.8611

Use and Vocabu

3- I . ls lvord usage corrcct'?

3-2. Are sentences sufficiently shorl?

3-3. Are sentences adequatelyconnected u i th appropr iatc useof conjunction and dernonstrativcwords?

12.00 I r .83 t2.20 0.06

4.00

4.t4

4.42

3.83

3.82

4.00

4. l1

4.30

4.40

0.27

0.69

0.93

50

4. Mechanics/Accuracy t7.86


t7 . t7 16.60 2.32

zl- l . Are the part icles used correctly 'J

4-2. ls the rcrb adjcct i rc conjugal ionused correctly'?

ls thc tcnsc appropriately used?

Is the grar.nrrar correctly uscd(othcr than 4- 1 4-3)'l

4-5. Is kana/kanji character writtencorrectly'/

+--).

4-4.

4.42

3.85

4.86

4.51

+.t l

4.00

3.83

4.61

4.50

4.98

4.20 l .8 l

3.60 0.51

4.10 1.35

4.50 0.04

4.80 0.91

5. Appeal to thc Rcadcrs 9.91 9.66 9.90 1.68

5- I . Is the handwri t ing neat? 4.98

5-2. Is there sufl lcient amount o1'kanjicharactcrs at this level'J 4.86

4.83

4.98

4.96 0.25

4.90 0.39

As shown in Table 2, there was no significant dif l 'erence among the three

groups of ratcrs in both the holistic rating scores and the analytic rating scores

with rcgards to harshness/leniency. ANOVA was also conducted frlr each rubric

of f lve areas of thc analytic scoring to examir.rc the diftbrences among the groups.

Thc rcsult shows that there were no significant differences among groups for

thcir harshncss/ lcnicncy in any rubric o1-analytic scoring (see Table 3).

Table 4

Correlcrtion Coe/lic'ients Matrix Belvteen Variobles

Variablcs

l . IJol ist ic2. Analyt ic3. Experience4. Atritudc5. Pref'erence

0.53 *x -0.r00.18

-0.04-0.1 10.53* *

0.63 * *

0.55 * *

-0.01-0.07

) l

Note. **p..01 .

Joumal CAJLE. Vol. 6 (2004)

Tablc 4 shows Pearson's Correlation Coefficients between variables. As

shown in the table, four significant relationships were found; between holistic and

analytic ratings, bctween holistic rating and prefcrence ol a writ ing that he/she

scorcd, betwecn analytic rating and preference ofa writ ing that he/shc scored, and

between tcaching expcrience and attitude towards writing in gencral.

Table 5

Summorv of'simultanaous Regression Analysis /or Analvtic Rating

Variable SEB

Experience

Attitude

Preference

-0. l3

0.15) A1

0.14

1.02

0.85

-0.20

0.03

0.05 *+

Note. **p<0.01.

Table 6

Summary of'sirnLrltaneou,s Regression Analysis /br Holistic Rating

Variable SEB

Experiencc

Attitude

Pref-crence

-t.648-02

s.97E-02

0.53

0.03

0.18

0.15

-0.14

0.07

0.63 **

Note. **p<0.01.

Sirnultaneous rcgression arralysis was conducted to see if the three

independcnt variablcs, i.e., raters' teaching experiencc, attitude towards writ ing

in general, and raters' preference of a writing that they scored influenced the

holistic rating and analytic rating. As shown in Table 5 and Table 6, the results

indicate that a rater's personal prcference towards thc writing that he/she scored

was positively associated with both types of scoring while the rater's teaching

expcrience and his/her attitude toward writing in general do not aflect their rating

scores on both fypcs of rating. It was also fbund that there is a positive

52

correlat ion bctwcen

analyt ic rat ings using

Journal CAJLE, Vol. 6 (200,1)

holistic ratings rated by the ACTFL scale and the

modified Sasaki's analytic scale.

Discussion

The results ofthc current study rcvealed that the ratcrs'area ofacadcmic

specialty and number of ycars of teaching experience did not directly influencc

their harshncss/leniency whcn rating the collcge Japanese L2 writ ings. Brown's

stuciy (1995) also did not f ind significant dif lerences of oral assessmcnt among

the thrcc groups ofoccupational backgrounds.

It is rcported that dif l-crcnces in assessing writ ing werc fbund between

ESL/EF'L teachcrs and English tcachers (Song and Caruso, 1996), between

Japancsc Ll teachers and Japanese with no teaching experience (Tanaka,

Tsubonc, and Ha.jikano, 1998), between Japanese elementary school teachers and

Japanesc college students (lshida and Mori, 1985). In thcse studies, ratcrs'

tcaching background, i.e.. either Ll or L2, was thc fbcus, rather than their

specialty. The participants in the currcnt study are from three differcnt

acaderric specialt ies but all have L2 teaching experience. Rcsults show that it

was not the ratcrs' academic spccialty but the ratcrs' teaching background, i.e.,

cithcr trained as L2 teachei or not, that inf' luenccd their rating characteristics.

According to Song and Caruso (1996), ESL teachers with longer expencnce

assessing wri t ing ski l ls seem to be more lenient in their hol ist ic cvaluat ion. In

this study. teaching experiencc did not have a significant positive relationship

with their harshncss/lcniency in thcir rating. It is possible that longcr teaching

expcricnce does not autornatically mean longer assessment cxperience. Howevcr,

s ince few studies have been conducted on tcachers 'special ty and wri t ing

asscssment expericnce as a rater-variable on .lapancsc L2 writ ing asscssment,

furlhcr studies are nccdcd to verify thc rcsults ofthis study.

It is also for.rnd that the raters' attitudes toward writ ing did not influence

their ratings on both holistic and analytic scales. As Kokuritsu Kokugo

Kenkyuu.jo suggcsted (1978). thc instructors' positivc attitudes toward writ ings

are usually a f 'actor that promotes the studcnts' motivation and desire to lcam

the

the

53


.lapanese, but the current study indicates it did not affect their assessment directly'

Whether attitudes toward writing are rcgistcred as 'the most f-avoritc' or not for

the raters was not a significant factor in their harshness/lenicncy of thc

asscssmcnt of Japancse L2 writ ings.

I t r r rs also revcaled in th is study thal raters 'preferences toward a wr i t ing

thcy scorc<l affectcd their ratings, i.e., raters tended to give a higher score to thc

writ ing that they felt the rnost positivc about. In other words, personal prefcrence

toward a writing they scorecl rnakes the assessment rather subjectivc, and as a

result can be a factor of a rater-bias. The study suppolls Lumley (2002)' who

concluded that rating is heavily influcnccd by the complex intuit ive impression

of the tcrt obtained rvhcn the raters first read it. What makes the raters have a

positive f 'eeling/attitudes toward a writ ing is difTercnt from individual to

individual; some weigh on rhctoric of the text, some on the theme and content,

ancl others on handwriting and neatness. Such possible intcractions between a

composition and a rater is bcyond the scopc olthe current study, but is required

to bc explorcd in l t r ture resc'arch.

Although a causal relationship is unknown, it was found that teachers

u,ith longcr teaching cxpcrience tendcd to have positive attitudes toward writ ings

in this stLrdy. Tcachcrs who like to "write" tend to stay longcr in thc teaching

position of a language. orthey develop a more positive attitude toward writ ing as

thcir teaching careers bccome longer. This should also be verif ied in futurc

rcsearch rr i th a t t t t tch larger sarnplc.

-l 'he purposc of this study was not to cxaminc thc rcliabil i ty of the

assessment tools. but intra-ratcr reliabil i ty was fbund betwecn two types of rating

scales. the ACITFL rating scale and the rnodiflcd Sakaki's analytic rating scale'

Conclusion and lmPl icat ions

Assessment rs a cr.it ical part of Japanese language instruction, and it

requires accuracy and consistency. Adequate rater-training and clear criteria of

rating scale can climinatc biases among ratcrs. but it is sli l l not possible to obtain

pcrf 'ectaccuracyandconsistencybccausetheasscssment isconductcdbya


hun.ran being. All raters have their own unique bias pattcms, and such patterns

and fhctors that make individual diflbrences in assessment arc vcry complicated

and variable.

In this study, it was fbund that raters' academic background, nurnber of

years of teaching experience, and their attitudes toward writ ing were not the

factors influencing their rating charactcristics, i.e., harshness/lenicncy on

Japanese L2 writ ing. It was also found that the ratcrs'attitudes toward a writ ing

that they are scoring makes the assessment rathcr subjective, i.e., raters tend to

score higher when they havc a positive feeling toward thc writ ing they are

scor ing.

It can be concluded that the lactors that cause rater-bias are more

individualized and complex in nature, associated with the individual and their

pcrsonality rather than the writ ing that they are scoring. Such mcchanism of

interactions bctwccn raters and writ ing tcxts should be explored in future

research. Further exploration of sirnilarit ies and diffcrences among raters on

assessment wil l l ikely yield a bctter understanding of the nature of assessment,

and a morc complete accurate understanding of the phenornenon of assessment

in Japanese L2 writ ing process.

AcknowledgmentsThe author wishes 1o acknow,ledge the Faculty Developmcnt Center of Califbrnia StatcUniversity, Fullerton. which providcd funding fbr this study.

References

Brcincr-Sanders, K. E,., Swendcr, E., & Terry, R. M. (2002). Prelirninary

prcrficiency guidclines Writing revised 2001 . Foreign Language Annals,

J-r .9-15.

Brown, A. ( 1995). The eff-ect of rater variables in thc development of

anoccupatioll-specific language performancc LesI. Language Testing, I 2,

l - 15.

Cason, G. J. & Cason. Cl . L. (1984). A determinist ic thcory of c l in ical

performancc rating. Evoluation und the Health Pro/bssion.;, 7,221-241 .

55

Jounral CAJLE, Vol. 6 (2004)

Cohcn, A. D. (1994). Assessing languctge ability in the classroom. Boston, MA:

Heinle & Heinle.

Cumrning, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating

ESL/EFL writing tasks: A descriptive fiamework. The Modern Language

.Jotrnol. 86.67-96.

Hamp-Lyons, L. (1995). Rating nonnative writ ing: The trouble with holistic

scoring. TESOL Quurterly, 29, 7 59-7 62.

Ishida, J. & Mori, T. ( 1985). Shougakusei no bunshou-hyougen-ryoku

nohattatsu-teki henka fChanges in the devclopment of writ ing skil ls

among elcmentary school children). Hiroshima University, Departntr'nt ttf

Education Bulletin. -lJ. 125- I 3 I .

Kaji i, Y. (2001). Jidou no sakubun wa dono you ni hyouka sareru noka'? [How do

teachers evaluate elementary school children's composition?1. Shinrigaku

Kenkrau [Journal of Educational Psychology], 49, 480-490.

Kokuritsu Kokugo Kenkyuujo [National Japanese Language Institute]. (1978).

Jidou no h),ougen-tyoku to sukubtul [Children's self-expression and

composition skil lsl. Tokyo: Tokyo Shoscki Publishing.

Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring

Japanese sccond language writing perfbrmance. Language Testing, 19,

3-29.

Lurnley, T. (2002). Assessment critcria in a large-scale writing test: What do they

rcally mean to the ratcrs? Language Te.sting, 19,246-216.

Lurnley, T. & McNamara, T. F. (1995). Rater characteristics and rater bias:

Irnplications fbr training. Langttage Testing, 12, 54-l l.

Sasaki, M. & Hirose, K. (1999). Development of an analytic rating scalc for

Japanese Ll writ ing. Langtrage Testing, 16, 457-478.

Song, B. & Caruso, l. (1996). Do English and ESL faculty dif l 'er in cvaluating

thc essays of native English-speaking and ESL students'l Journal o/

Scc'ond Longuage Wt'iting, 5, 163-182.

Tanaka. M., Tsubone, Y., & Hajikano, A. (1998). Daini-gcngo toshitc nonihongo

ni okcru sakubun hyouka kijun: Nihongo kyoushi to ippan nihon-jin no

hikaku IEvaluation criteria for Japanese L2 writ ing: A comparison

between Japanese language tcachers and the general public]. Nihongo

K)rntiku [. lournal ofJapanesc Language Education], 96, l-12.

56

lnfluences of rater variables on college japanese l2 ... · characteristics on l2 writing...

Documents