lnfluences of rater variables on college japanese l2 ... · characteristics on l2 writing...
TRANSCRIPT
.lournal CA.ILE. Vol. 6 (2004)
lnfluences of Rater Variables onCollege Japanese L2 Writing Assessment
SHIBATA SetsueCalifornia State University, Fullcfton
ssh ibata(L0 fu I lerton. edu
E fi EE#!H l: #i lf 6 t+4 a ftyJ*t i E+{ffi t tt' 6 h';+{ffi A o ++{* L 7 QY:'EI=9Af 6ffin
*EE Ffi+i11 t) ,.t :i. .tv:-_-f ,)ll jt klz ,t l_ >+i
Abstract:Although thc Japancse L2 writ ing asscssment is crit ical in evaluating a
studcnt's Japancse language skil ls, there has been litt le research into the effect ofraters' background on Japanese L2 lvrit ing assessment. The present studyexamines thc cflects of four variables of the ratcr's background; academicspecialty, teaching expericnce, attitudes toward cor.nposition in general, andattitudes toward the cornposition he/she is scoring as paft of the Collegc students'Japancse L2 writ ing asscssment. Twenty-one collcge Japanese instructorsassessed 15 contpositions on both holistic and analytic scales. On the basis oftheir academic specialty, they were divided into 3 groups: Linguistics.Literature/Asian Studies and Education. The results show that academic specialty,teaching expericnce and attitudes toward conlposition in general are not the
tnajor lactors af{'ecting the ratcrs' leniency, but thcir personal pref'erences fbr awriting they arc scoring are one of the major factors that decrease intcr-raterrcliabil i ty o1' writ ing assessment. The study re-conflnns the importance ofrnultiple ratings to rnaintain faimcss and accuracy in the writ ing assessmcnt'
lntroduction
The writ ing assesslncnt, which is a crit ical part of language instntctton,
mccts at least three purposes: program placement, monitoring student's progress,
and accountabil ity. Whatever the pulpose, accuracy and reliabil i ty of rating are
thc key factors in an accurate assessmcnt, and proof of an accurate and reliable
assesslrent is consistency of score among raters (inter-rater rcliabil i ty). Since a
ratcr's judgemcnt has always played an important role in the assessment of
writ ing, adcquate training and better spccification of scoring criteria are crucial
1o rninimize the raters' bias (Lumley,2002). However, the sgbstantial variation in
41
Journal CAJLE, Vol. 6 (2004)
rater harshness (or leniency) that cxists cannot be easily eliminated due to the
nature of human bcings (Carson & Carson, 1984; Lumley and McNamara, 1995;
Kondo-Brown, 2002).
Although many previous studies investigated the factors which influence
the inter-ratcr dif l-crcnces of L2 writ ing assessment (e.g., Cumrr.ring, Kantor, and
Powcrs,2002; Lurnley,2002; Song and Caruso, 1996), most of t l.rese studics arc
in the field of ESL/EFL. The purpose of this study is to investigate how a rater's
variables such as educational background, attitude towards composition in
general, and prcfercnce torvards a student's writing that he/she is scoring
influence his/her analytic and holistic scoring in thc field ofJapanese as a foreign
or second language tL2).
Previous Studies
The assessrnent of writ ing proficiency is an essential part of L2 instruction,
but is far morc complex, challenging, and time consuming than with native
speakers of the target language. Three types ol rating scales arc usually used in
scoring a writ ing: analytic, primary trait, and holistic. Furlhemore, each has a
diflerent purpose and focus in instruction and will provide different types of
infbrmation to teachers and students (Cohen, 1994). Analytic scoring is
considered the most appropriate when diagnostic and specific feedback is
required, while holistic scoring is used to assess a student's overall perfbrmance
particularly in casc where only a l imited time is available fbr assessment.
Holistic scoring is often used in case of screening, placement, and accountabil ity,
e.9., to see if students have attained a relative expectcd level of proficiency.
Holistic scoring is considered less reliable than analytic scoring, since it
produces a single score in which the total quality of writ ing is not the sum of its
components, but is viewed as a whole and tends to be more influenced by
individual rater's charactcristics than analytic scoring (Hamp-Lyons, 1995).
Whatevcr the purpose of assessment and type of scoring used, the assessment
needs to be performed accurately, consistently among raters, and effectively
rvithin the l imited tirne available. Rater bias can be minimized by rater training
t1++
Journal ClA.lLE, Vol. 6 (2004)
or experience, but is no1 likely to be elirninated completely due to an individual's
unique characteristics (Kondo-Brown, 2002; Lurnley, 2002).
Previous studies have bccn conducted fbcusing on the raters'
characteristics on L2 writ ing assessment, particularly in thc field of ESL/EFL.
Song and Caruso ( I 996) compared ESL faculty and English faculty regarding the
results of holistic and analytic evaluations of college students'essays written by
non-native and native English spcakers. They fbund no significant difference
between the two groups ol faculty on analytic rating, but found that the English
laculry was more lenient on holistic rating than thc ESL faculty. The study also
found that with more experience in the writ ing assessment, raters became more
lenient in their holistic evaluation. Lumley (2002) investigatcd the process by
which raters rnake their scoring decision and found that thc rating was heavily
influenced by the individual intuit ive impression of the text obtained whcn a
ratcr first read it. Cumrning, Kantor, and Powers (2002) fbund that the two
groups of ESL/EFL teachers and English teachers fbr native English speakers
used sirnilar decision-making behaviors while assessing the TOEFL essays. They
also found that the ESL/EFL teachers fbcused on language rather than on rhetoric
and ideas, while the English teachers werc more l ikely to fbcus on rhetoric and
ideas in their overall assessment.
Therc are a number of studies on rater variables in the fleld of writing
assessment of Japanese as L2. Among them Kondo-Brown (2002) investigated
how judgrrents of raters were biased, fbcusing on the interaction of ratcrs and
types of writ ing. She analyzed thc data of rating scores ol college Japanese L2
writings rated by three raters using FACETS program. She found that all raters
had their own unique bias pattems regardless of their relatively similar language
and professional backgrounds. Her finding supports the necessity ol multiple
ratings even with the reliable assessment procedure. Tanaka, Tsubone, and
Hajikano (1998) examincd the differences between Japanese L2 teachers and
non-tcachers of native Japanese speakers regarding how the groups evaluate L2
Japanese compositions using analytic scoring. The ANOVA rcsults indicated that
Journal CAJLE, Vol. 6 (2004)
Japanese L2 teachcrs weighted on the content and the languagc use, and
non-teachers wcighted on the content and thc accuracy, especially on the
particles. It was also reporlcd that teachers scored more lenicntly overall than
non-teachcrs did.
As fbr thc writ ing assessmcnt of Japanesc as Ll (Kokrgo), Ishida and
Mori (1985) found that there was a signiticant difference betwcen elernentary
school tcachers and college students regarding how. they assessed the
compositions of Japanesc elementary school children. According to their study,
tcachers fbcused on language use while college students paid r.norc attention to
clearncss of the ther.r.re. Thc study concludcd that teachers' assessment reflccted
thcir own educational point ot'r ' iew, a bias that did not apply to the other group.
Kaji i (2001) investigated how the raters' psychological f 'actors aflcct their
assessment of writ ing. He analyzed the data obtained on 21 elcmentary school
teachcrs in Japan, and fbund that the ratcr's personal prcference towards a
writ ing that he/she is rating is closely associated with a higher score. According
to the report of Kokuritsu Kokugo Kenkyuujo fNational Japancsc Language
lnstitutc] ( 1978), elen.rentary school childre n whose homeroom teachcrs'
specialty is . lapanese arc more l ikely to have fbvorablc attitudes tolvards
composition. The study irrplies that thc teachers' prcf-erences and attitudes
towards writing rnay inf'luencc their children's pref-erences and attitudes toward
writing. Brown (1995) examined the eflbct of the raters'background on thc oral
asscssment of Japanesc using the Japanesc Language Tcst fbr Tour Guidcs. She
compared the rcsults of oral assessmcnt taken by 5 1 tcst candidates rated by the
three groups ofraters, based on their occupational background, i.c., a group that
has guiding experiencc only. a group that has teaching cxperience only, and a
group that have both guiding and tcaching experienccs. She also cornpared the
difl-erences bctwccn 1wo groups of ratcrs, a group of native speakers and a group
of near-nativc speakers of Japancse. The results showed no significant dif l 'erencc
among three groups olthe occupational background regarding the consistency of
rating, and showed no significant dif l-erence bctn'een native and near-native
46
Journal CAJLE. Vol. 6 (2004)
ratcrs' groups regarding harshness/lcniency of rating.
As mentioned earlier, thcrc is only a l imited number of studies focusing
on rater-variables on assessment of Japanese L2 writ ings. This study further
examines thc rater variables that may influence assessment of Japanese L2
lvrit ings.
The Study
Rcsearch Ouestions
In this study. the following questions are addressed.
l. Are ratcrs of a parlicular academic background more harsh/lenient in
l
assessing Japanese L2 writ ings?
Are raters with more cxpcrience teaching more harsUlcnient in assessing
Japanese L2 writ ings?
Are raters' personal preferences/attitudes toward composition in general
associated with harshness/lenicncy of their ratings?
Are raters' pref-ercnccs toward a composition that they are scoring
associated with harshness/leniency of their ratings?
3
Pafticipants (raters of Japancsc L2 writ ings)
Participants wcre 2l native Japanese language instructors who teach
Japanese as L2 at the post-secondary lcvcl. All participants had at least a
Master's degree, and their academic backgrounds werc thc following: Seven of
2l rvere Linguistics, cight were Education (Forcign Language Education,
TESOL, arrd Second Language Acquisit ion), six wcrc Literature, and fbur were
Asian Studics. The range oftheir ages is from 26 to 58, and the average years of
tcaching.lapanese as L2 is 10.1 years. They were askcd to answer a variety of
survey questions including length of expcricnce teaching, educational
background, pcrsonal preference for writ ing (l-5 scale where 5 is the highest).
They wcrc also asked to assess l5 compositions by both holistic and analytic
mcthods.
4.
47
Journal CAJI-E. Vol. 6 (2004)
Compositions
The cornpositions were written by l5 college students who studied
Japanese as L2 and were in the second year or higher Japancsc language classes.
The students who agreed to participate in the study were asked to write a
composition in the classroom. They were given 30 minutes to write a
composition. Compositions were descriptive in nature, and the students chose
one topic fiom the following: l. The most fbvorite/unfavorable experience in
your l if-e; 2. Introducing your hometown; and 3. Introducing your favorite book
or movie.
Data
Independent variables :
L Raters' academic background: This data was obtained from the survey
questionnaire. Participants were divided into three groups based on their
acadernic backgrounds of either Linguistics, Education or Literature/Asian
Studies. Four teachers of Asian Studies background were in the same group
with tcachers of Literature background, since they claimed that they took
morc Literature courses than other areas.
2. Teaching experience: Number of years of teaching experience was obtained
l iom thc survcy quest ionnaire.
3. The raters'attitudes toward composition in general: The data was obtained
from the survey questionnaire. Parlicipants registered their level of their
attitudes toward composition on a 5-point scale where 5 was 'the most
favorite'.
4. The raters' prefcrence/attitr"rdes toward a composition that they are scoring:
Parlicipants were asked their level of prefcrence of each of the Japanese L2
compositions that they werc scoring. Rating was based on a 5-point scale
where 5 was 'the most favorite.'
Dependent variables:
Each participant assessed l5 Japanese L2 writ ings. He/she assessed each
48
Journal CAJLE, Vol. 6 (200.1)
composition by two types of scoring, i.e., holistic and analytic rating.
l. Holistic rating score: ACTFL writ ing scale (Breiner-Sandcrs. Swender, and
Terry, 2002) was used fbr the holistic scale. Level of proficiency is
converted to the numerical numbers, from I (Novice-low) to l0 (Supcrior).
2. Analytic rating score: A modified scale developed by Sasaki and Hirose
(1999) was used fbr the analytic scalc. Since the scale is primarily for
Japanese Ll writ ings, rubrics that were considcrcd as either not appropriate
or not clearly stated for cvaluating an L2 writ ing wcre eliminated. Therc are
five assessment areas, i.e., Content, Organization, Language use and
vocabulary, Mechanics/Accuracy, and Appeal to the readers. E,ach area has
two to flve specific rubrics, and the scale of each rubric is a 5-point scale,
wherc 5 is 'the strongly agrec.'The total score is the sum of the scores of all
rubrics. The highcst score is 95 (5 x l9 rubrics).
Results
Table I
Table oJ Means and Standard Deviations of Rtttings by the Raters'Specialty
Group Linguist ics
Mean SD
Education
Mean SD
Literature/Humanities
Mean SD
Analytic (n-23)
Hol ist ic (n-23)
60.85 6. r8
6.86 t .0 l
59.00 4.20
6.61 r .03
60.87 4.98
6.10 0.93
The means and standard deviations of holistic and analytic rating scores
that were conducted by raters of three different groups of specialties are shown in
Table l. As illustrated in the table, the group of raters with an Education
background gave the students lower score (i.e., rated them more harshly) than the
other two groups in both types of ratings. One way ANOVA was conducted for
holistic rating and analytic rating separately to examine the differences among
the groups' rat ing character ist ics.
49
Journal CAJLE. Vol. 6 (2004)
Table 2
Anolvsis o.f variance on Raters'Specialtv./ttr Anal.v-tic und Holistic'Ratings
Surn of Squarcs df Mean Square F Sig.
Analyt ic
Hol ist ic
33.15
0.28
16.88
0.14
0.66
0.15
0.5
0.87
2
2
Table 3
Strmmurl- of Mean,s ancl Analysis of Varionce fitr Anctll:tic Scoring Rubric'
Rubric Ling.(n-7)
Edu. Li t . Human f(n:8) (n-10)
1. Content 8.86 8.50 9.30 0.3 |
I - I . Is the therne clcar'J
1-2. Is the theme supported bySufllcient f-actual information'l
1-3. ls thc content consistcnt wi ththe tit le' l
3.00
3.r4
3.51
2.83
2.84
- l - - ) - )
3.01 0.07
3.10 0.3s
3.80 l . l1
2. Organization 1.t4 6.50 7.50 0.92
1t Arc paragraphs appropriatelyformed?
Arc all paragraphs logicallyconnectcd'J
3.57
3.56
J.JJ
3. l l
3.80 0.66
3.70 0.8611
Use and Vocabu
3- I . ls lvord usage corrcct'?
3-2. Are sentences sufficiently shorl?
3-3. Are sentences adequatelyconnected u i th appropr iatc useof conjunction and dernonstrativcwords?
12.00 I r .83 t2.20 0.06
4.00
4.t4
4.42
3.83
3.82
4.00
4. l1
4.30
4.40
0.27
0.69
0.93
50
4. Mechanics/Accuracy t7.86
Journal CAJLE. Vol. 6 (2004)
t7 . t7 16.60 2.32
zl- l . Are the part icles used correctly 'J
4-2. ls the rcrb adjcct i rc conjugal ionused correctly'?
ls thc tcnsc appropriately used?
Is the grar.nrrar correctly uscd(othcr than 4- 1 4-3)'l
4-5. Is kana/kanji character writtencorrectly'/
+--).
4-4.
4.42
3.85
4.86
4.51
+.t l
4.00
3.83
4.61
4.50
4.98
4.20 l .8 l
3.60 0.51
4.10 1.35
4.50 0.04
4.80 0.91
5. Appeal to thc Rcadcrs 9.91 9.66 9.90 1.68
5- I . Is the handwri t ing neat? 4.98
5-2. Is there sufl lcient amount o1'kanjicharactcrs at this level'J 4.86
4.83
4.98
4.96 0.25
4.90 0.39
As shown in Table 2, there was no significant dif l 'erence among the three
groups of ratcrs in both the holistic rating scores and the analytic rating scores
with rcgards to harshness/leniency. ANOVA was also conducted frlr each rubric
of f lve areas of thc analytic scoring to examir.rc the diftbrences among the groups.
Thc rcsult shows that there were no significant differences among groups for
thcir harshncss/ lcnicncy in any rubric o1-analytic scoring (see Table 3).
Table 4
Correlcrtion Coe/lic'ients Matrix Belvteen Variobles
Variablcs
l . IJol ist ic2. Analyt ic3. Experience4. Atritudc5. Pref'erence
0.53 *x -0.r00.18
-0.04-0.1 10.53* *
0.63 * *
0.55 * *
-0.01-0.07
) l
Note. **p..01 .
Joumal CAJLE. Vol. 6 (2004)
Tablc 4 shows Pearson's Correlation Coefficients between variables. As
shown in the table, four significant relationships were found; between holistic and
analytic ratings, bctween holistic rating and prefcrence ol a writ ing that he/she
scorcd, betwecn analytic rating and preference ofa writ ing that he/shc scored, and
between tcaching expcrience and attitude towards writing in gencral.
Table 5
Summorv of'simultanaous Regression Analysis /or Analvtic Rating
Variable SEB
Experience
Attitude
Preference
-0. l3
0.15) A1
0.14
1.02
0.85
-0.20
0.03
0.05 *+
Note. **p<0.01.
Table 6
Summary of'sirnLrltaneou,s Regression Analysis /br Holistic Rating
Variable SEB
Experiencc
Attitude
Pref-crence
-t.648-02
s.97E-02
0.53
0.03
0.18
0.15
-0.14
0.07
0.63 **
Note. **p<0.01.
Sirnultaneous rcgression arralysis was conducted to see if the three
independcnt variablcs, i.e., raters' teaching experiencc, attitude towards writ ing
in general, and raters' preference of a writing that they scored influenced the
holistic rating and analytic rating. As shown in Table 5 and Table 6, the results
indicate that a rater's personal prcference towards thc writing that he/she scored
was positively associated with both types of scoring while the rater's teaching
expcrience and his/her attitude toward writing in general do not aflect their rating
scores on both fypcs of rating. It was also fbund that there is a positive
52
correlat ion bctwcen
analyt ic rat ings using
Journal CAJLE, Vol. 6 (200,1)
holistic ratings rated by the ACTFL scale and the
modified Sasaki's analytic scale.
Discussion
The results ofthc current study rcvealed that the ratcrs'area ofacadcmic
specialty and number of ycars of teaching experience did not directly influencc
their harshncss/leniency whcn rating the collcge Japanese L2 writ ings. Brown's
stuciy (1995) also did not f ind significant dif lerences of oral assessmcnt among
the thrcc groups ofoccupational backgrounds.
It is rcported that dif l-crcnces in assessing writ ing werc fbund between
ESL/EF'L teachcrs and English tcachers (Song and Caruso, 1996), between
Japancsc Ll teachers and Japanese with no teaching experience (Tanaka,
Tsubonc, and Ha.jikano, 1998), between Japanese elementary school teachers and
Japanesc college students (lshida and Mori, 1985). In thcse studies, ratcrs'
tcaching background, i.e.. either Ll or L2, was thc fbcus, rather than their
specialty. The participants in the currcnt study are from three differcnt
acaderric specialt ies but all have L2 teaching experience. Rcsults show that it
was not the ratcrs' academic spccialty but the ratcrs' teaching background, i.e.,
cithcr trained as L2 teachei or not, that inf' luenccd their rating characteristics.
According to Song and Caruso (1996), ESL teachers with longer expencnce
assessing wri t ing ski l ls seem to be more lenient in their hol ist ic cvaluat ion. In
this study. teaching experiencc did not have a significant positive relationship
with their harshncss/lcniency in thcir rating. It is possible that longcr teaching
expcricnce does not autornatically mean longer assessment cxperience. Howevcr,
s ince few studies have been conducted on tcachers 'special ty and wri t ing
asscssment expericnce as a rater-variable on .lapancsc L2 writ ing asscssment,
furlhcr studies are nccdcd to verify thc rcsults ofthis study.
It is also for.rnd that the raters' attitudes toward writ ing did not influence
their ratings on both holistic and analytic scales. As Kokuritsu Kokugo
Kenkyuu.jo suggcsted (1978). thc instructors' positivc attitudes toward writ ings
are usually a f 'actor that promotes the studcnts' motivation and desire to lcam
the
the
53
Journal CAJLE. Vol. 6 (2004)
.lapanese, but the current study indicates it did not affect their assessment directly'
Whether attitudes toward writing are rcgistcred as 'the most f-avoritc' or not for
the raters was not a significant factor in their harshness/lenicncy of thc
asscssmcnt of Japancse L2 writ ings.
I t r r rs also revcaled in th is study thal raters 'preferences toward a wr i t ing
thcy scorc<l affectcd their ratings, i.e., raters tended to give a higher score to thc
writ ing that they felt the rnost positivc about. In other words, personal prefcrence
toward a writing they scorecl rnakes the assessment rather subjectivc, and as a
result can be a factor of a rater-bias. The study suppolls Lumley (2002)' who
concluded that rating is heavily influcnccd by the complex intuit ive impression
of the tcrt obtained rvhcn the raters first read it. What makes the raters have a
positive f 'eeling/attitudes toward a writ ing is difTercnt from individual to
individual; some weigh on rhctoric of the text, some on the theme and content,
ancl others on handwriting and neatness. Such possible intcractions between a
composition and a rater is bcyond the scopc olthe current study, but is required
to bc explorcd in l t r ture resc'arch.
Although a causal relationship is unknown, it was found that teachers
u,ith longcr teaching cxpcrience tendcd to have positive attitudes toward writ ings
in this stLrdy. Tcachcrs who like to "write" tend to stay longcr in thc teaching
position of a language. orthey develop a more positive attitude toward writ ing as
thcir teaching careers bccome longer. This should also be verif ied in futurc
rcsearch rr i th a t t t t tch larger sarnplc.
-l 'he purposc of this study was not to cxaminc thc rcliabil i ty of the
assessment tools. but intra-ratcr reliabil i ty was fbund betwecn two types of rating
scales. the ACITFL rating scale and the rnodiflcd Sakaki's analytic rating scale'
Conclusion and lmPl icat ions
Assessment rs a cr.it ical part of Japanese language instruction, and it
requires accuracy and consistency. Adequate rater-training and clear criteria of
rating scale can climinatc biases among ratcrs. but it is sli l l not possible to obtain
pcrf 'ectaccuracyandconsistencybccausetheasscssment isconductcdbya
Journal CAJLE, Vol. 6 (2004)
hun.ran being. All raters have their own unique bias pattcms, and such patterns
and fhctors that make individual diflbrences in assessment arc vcry complicated
and variable.
In this study, it was fbund that raters' academic background, nurnber of
years of teaching experience, and their attitudes toward writ ing were not the
factors influencing their rating charactcristics, i.e., harshness/lenicncy on
Japanese L2 writ ing. It was also found that the ratcrs'attitudes toward a writ ing
that they are scoring makes the assessment rathcr subjective, i.e., raters tend to
score higher when they havc a positive feeling toward thc writ ing they are
scor ing.
It can be concluded that the lactors that cause rater-bias are more
individualized and complex in nature, associated with the individual and their
pcrsonality rather than the writ ing that they are scoring. Such mcchanism of
interactions bctwccn raters and writ ing tcxts should be explored in future
research. Further exploration of sirnilarit ies and diffcrences among raters on
assessment wil l l ikely yield a bctter understanding of the nature of assessment,
and a morc complete accurate understanding of the phenornenon of assessment
in Japanese L2 writ ing process.
AcknowledgmentsThe author wishes 1o acknow,ledge the Faculty Developmcnt Center of Califbrnia StatcUniversity, Fullerton. which providcd funding fbr this study.
References
Brcincr-Sanders, K. E,., Swendcr, E., & Terry, R. M. (2002). Prelirninary
prcrficiency guidclines Writing revised 2001 . Foreign Language Annals,
J-r .9-15.
Brown, A. ( 1995). The eff-ect of rater variables in thc development of
anoccupatioll-specific language performancc LesI. Language Testing, I 2,
l - 15.
Cason, G. J. & Cason. Cl . L. (1984). A determinist ic thcory of c l in ical
performancc rating. Evoluation und the Health Pro/bssion.;, 7,221-241 .
55
Jounral CAJLE, Vol. 6 (2004)
Cohcn, A. D. (1994). Assessing languctge ability in the classroom. Boston, MA:
Heinle & Heinle.
Cumrning, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating
ESL/EFL writing tasks: A descriptive fiamework. The Modern Language
.Jotrnol. 86.67-96.
Hamp-Lyons, L. (1995). Rating nonnative writ ing: The trouble with holistic
scoring. TESOL Quurterly, 29, 7 59-7 62.
Ishida, J. & Mori, T. ( 1985). Shougakusei no bunshou-hyougen-ryoku
nohattatsu-teki henka fChanges in the devclopment of writ ing skil ls
among elcmentary school children). Hiroshima University, Departntr'nt ttf
Education Bulletin. -lJ. 125- I 3 I .
Kaji i, Y. (2001). Jidou no sakubun wa dono you ni hyouka sareru noka'? [How do
teachers evaluate elementary school children's composition?1. Shinrigaku
Kenkrau [Journal of Educational Psychology], 49, 480-490.
Kokuritsu Kokugo Kenkyuujo [National Japanese Language Institute]. (1978).
Jidou no h),ougen-tyoku to sukubtul [Children's self-expression and
composition skil lsl. Tokyo: Tokyo Shoscki Publishing.
Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring
Japanese sccond language writing perfbrmance. Language Testing, 19,
3-29.
Lurnley, T. (2002). Assessment critcria in a large-scale writing test: What do they
rcally mean to the ratcrs? Language Te.sting, 19,246-216.
Lurnley, T. & McNamara, T. F. (1995). Rater characteristics and rater bias:
Irnplications fbr training. Langttage Testing, 12, 54-l l.
Sasaki, M. & Hirose, K. (1999). Development of an analytic rating scalc for
Japanese Ll writ ing. Langtrage Testing, 16, 457-478.
Song, B. & Caruso, l. (1996). Do English and ESL faculty dif l 'er in cvaluating
thc essays of native English-speaking and ESL students'l Journal o/
Scc'ond Longuage Wt'iting, 5, 163-182.
Tanaka. M., Tsubone, Y., & Hajikano, A. (1998). Daini-gcngo toshitc nonihongo
ni okcru sakubun hyouka kijun: Nihongo kyoushi to ippan nihon-jin no
hikaku IEvaluation criteria for Japanese L2 writ ing: A comparison
between Japanese language tcachers and the general public]. Nihongo
K)rntiku [. lournal ofJapanesc Language Education], 96, l-12.
56