pba implications

7/29/2019 PBA Implications

1/5

NDy Sheila Sco tt

s associate professor of m usic at Brandon University i n Brandon, Manitoba, Canada.

ost c lassroom teachers d o not thin k m uchab ou t th e reliability of the assessments theyuse. Performance-based assessments, in par-ticular, often seem very subjective. Teachersmay wonder what reliability really is andw they can improv e t he reliability of the assessments

A process of discovering how the reliability of assessmenteneral music teacher. W hile this article is written

her poin t of view, her experiences are a combination ofautho r's experiences, the experiences of other teachers, and

may be typical of elementary music classrooni

My name is Anita. I'm a general music teacher in an ele-s to docu men t a student's attainm ent of musical

1994) as a basis for develop ing myE N C to develop perforniance- based assessments that pro-about what students are able to d o as a resultI recently attended a state-sponsored workshop on t he ben-

ntation covered th e technical features of per-ance- based assessments-including an in-depth examina-of reliability. I was, however, left with two unanswered(1) W hat should I, as a general music teacher, know

abou t reliability in performance-based assessment to helin my everyday work in th e classroom? (2) Ho w can thimation help me obtain meaningful information about tmusic skills and knowledge my students acquire as a resinstruction? T o understand this issue, I examined two aof reliability: (1) What is reliability in perform ance-baseassessments? (2) H ow can m usic teachers enhance the rety of the scores obtained from the performance-based asments used to evaluate their students' work?

I examined selected literature in the area of student ament and applied recommendations from these sourcesclassroom practice. Here is what I learned.Reliability and Performance-Based AssessmentsThe reliability of performance-based assessments refethe consistency of scores obtained by using an observatimeasure such as a checklist or rubric to obtain inform atabo ut what a studen t is able to do as a result of ins tructiAs noted by Popham (1999):

Reliability is a central concept in measurement . . if anassessment procedure fails to yield consistent results, i t isalmost impossible to make any accurate inferences aboutwhat an examinee's score signifies. Inconsistent measure-ment is, indeed, unreliable. (p. 32)In terms of my own practice, this means that when e

ing th e consistency of assessments, I need to have confidtha t the score assigned is based o n the student's performon the intended task and not o n conditions irrelevant toperformance of this task. For example, was Lynn's critiqthe symphony performance graded lower than expected

M T S P R I N G 2 0 0 3


2/5

her faulty use of grammar? Did Andrew perform the ostinatoincorrectly becau se th e assessment was administered on a Fri-day afternoo n? Were the scores Lucy earned on the singingrubric lower than those from last year because her formermusic teacher was a lenient rater? Did Kevin d o better thanexpected on the com pos ition assignm ent because he was work-ing with a group of musically gifted ~ t u d e n t s ? ~

I also learned that there are many different types of reliabili-ty. For example, if I use a 5-point scale to assess my students'performances of the song "Over My Head" in September andrepeat this assessmen t on e month later, the results would bereliable across this period of time (N itko 1996). If anothe rmusic teacher rates these perform ances using the sam e scale, acomp arison of ou r scores would provide a measure of reliabili-ty across different raters. Using a third example, if I assess mystudents' performances of the song "Over My Head" in Sep-tember and one mo nth later use the same rubric to assess theirability to sing the song "Old Brass Wagon," th e scores wouldbe "reliable with respect to equ ivalent versions of th e sametask" (p. 63 ). As these exam ples illustrate, reliability in assess-me nt represents different forms of consistency-con sistencyover time, consistency w ith different raters, and consistencyacross similar tasks.How Can MusicTeaaeS Eiiha~e~e-ReliiibiIRyaf-Assessments?My concern for the reliability of perform ance-based assess-ments stems from my need for confidence in the accuracy ofthe scores obtained by using these measures (Nitko 1996).W ha t does a score of 5 on a 5-p oin t scale really mean? If m orethan one teache r assesses the performance, what does eachscore signify if th e judges disagree? If performances scored atth e beginning of a mark ing session are rated differently fromnces sco red at th e end of this session, what does the

gned to each stude nt really mean? (Herman,and Win ter 1992). As I thought about these ques-

, I w ondered whethe r it was possible for assessment resultssubjective judgm ent to be reliable and fair. T o gainn this problem , I studied two aspects of reliability forthe consistency of scores generated from performance-

the specificity of the scoring guide or rubric, and

of the Scoring GuideIn studying the literature, I found that the consistency ofargely depends o n t he specificity of th e scoring guide.

. good criteria are explicit, understood, and agreed by allassessors and also by students. A small number of simple cri-teria usually lead to greater reliability than do complicatedmarking schemes, because they are more manageable andZ e X o fZ d K th T m ind ~ X n ~ r K i i i g T h e ~ s ~ a n i tthe nature of the task, thereby helping your students focuson exactly what is required. (p. 25-26)

In applying these principles in practice, I have many es about how to document my stude nts' proficiency in aety of elementary-level music tasks-for example, the prciency with which my first grade students maintain a beUsing a checklist, I can docum ent w hether the students step to the beat by recording my judgment in the corresing column (figure 1) .I can also use rating scales to assess each pe rform ancevarious levels of proficiency. In this case, th e evaluative care described in general terms, for exam ple, along a contranging from "unable to perform" to "musical performan(figure 2). I can also use a rubric in w hich th e levels of pciency reflect th e performances I expect my students todemonstrate (figure 3). Th e details provided in this rubafford the most confidence in the ratings.

T he time needed to develop rubrics seems daunting. nately, I do no t have to develop all my scoring guides froscratch. The M EN C publication Performance StandardsMusic 199 6) has rubrics for on e assessment strategy forachievement standard appearing under the nine nationaltent standards for m usic. These scoring guides provide slspecific feedback at three levels of proficiency-basic, procient, and advanced. W hen used in the classroom, theserubrics supply diagnostic information about each studenperformanceTTh&fsfeeMMatI rarr-ptarmlnreducationa l experiences and feedback that my students cto improve future performances.Realizing that my colleagues were facing a similar dileI organized a districtwide workshop that brough t teachertogether to examine the use of rubrics in th e elemen tary classroom. Group discussions, using th e M E N C pub licaa foundation, helped clarify the expected learning outcomassessed by these rubr ics, thereby increasing th e reliabilitythese measures when used in o ur classroom s.Teachers9JudgmentsConsistency of judgment is a key to reliable assessmenjudgments are reliable if I m ake the sam e scoring decisionwhenever I view a similar performance. So, if I assign a ceperformance an A, all other performances where studentsdem onstrate a similar level of ability should receive an A.task is often difficult. How ever, there are strategies that chelp increase the reliability of scores obtained with performance-based measures, such as mo nitorin g the discrepanbetween the scores I assign and those assigned by two o r other teachers, or monitoring discrepancies when I score same performances more than once.T o examine consistency w ith different raters, I asked fassistance from a music teacher in a neighboring school dtrict. Beginning with the rubric, the colleague and I decidhow the given criteria would be applied to the stu dent pe

m m s ~ e r m a r r ; - A s & b a e h e r ; a & % t e f s W 2 9 mthere, we both scored video taped examples of student wo


3/5

ts in the interp retation of the rubric-thereby facilitatinger delineation of the scoring guide.

Although I foun d i t helpful to work with a colleague, thisas so time-consum ing that I would only use it when

ltiple aspects of performance are assessed con-(fo r example, assessing music- based im provisations).I needed to find strategies that I could use on

n to improve th e reliability of performance-based assess-I tested my consistency over time by assessing videotapedrman ces twice-with two weeks separating the first and

(Ni tko 1996). I then examined both sets ofsee whether I gave the same ratings each time. Con-d using th e same rubric t o

sess performances obta ined with two similar tasks (for exam-e beat while singing the songs "Bow, Wow ,

I fou nd them useful for identifying inconsis-e scores. I decided to co ntinu e using this strategyscored assessm ents.

My scoring decisions changed as I continued through they studen ttions, I found that scores awarded to the first ten stu-

r than those awarded to the last ten students.o mo nito r my consistency by rescoring samples of stu-nt work regularly (Herm an, Aschbacher, and WintersWhile this procedure helped me check the consistency of

I assigned, it did not help me identify personal bias-I addressed this problem by examiningerrors may negatively influence the scores I assign to mywork. I did this by applying Nitko's (1996 ) summary

Leniency error occurs when a teacher tends to make almostall ratings toward the high end of the scale, avoiding the lowend of the scale. Severity error is the opposite of leniencyerror:A teacher tends to make almost all ratings toward thelow end of the scale. Central tendency error occurs when ateacher hesitates to use extremes and uses the middle part ofthe scale only. . A halo effect occurs when a teacher lets hergeneral impression of the student affect how she rates thestuden t on specific dimensions. (p. 277)

I noticed leniency errors in my scoring patterns when II tended to award higher marksthose warranted by the performances because students

e. O n the other hand, Irn of severity errors when scoring perfor-

I wanted subsequent assessments to show

growth or improv ement over time. Cen tral tendency errorsoccurred when I assessed creative aspects of musical proficiency such as improvisation and composition. In hesitating toaward marks at the low or high ends of the scale, I tended toscore all performances as "average" (Nitko 1996).

I observed the halo effect when I scored perform ances ofstudents who were struggling in music class but who were trying t o succeed. In these cases, I tended to award marks basedon effort rather than on attributes of th e completed work.Conversely, I sometimes awarded low scores to the perfor-mances of students who d emonstrated a haphazard approachto the music class. I also foun d the halo effect at work when made decisions about performances on the b order betweentwo scores (N itko 1996)-for example, should a recorder peformance be awarded a 2 or a 3 out of 5?In these cases, therecorder performances of the hard-working students weregiven the higher score, while the performances of students wfailed to d emonstrate a serious approach t o this work (oftenthose stud ents who challenged my skills in classroom management) were awarded the lower score.

For scores to be reliable, they should be based on the m erof the performances, not on the predilections of the evaluatoT he classifications outlin ed by N itko (199 6) serve as a guideidentifying sources of bias in my own work. I also use certainstrategies to monitor my consistency over time, consistencyacross similar tasks, and consistency with different raters i n aeffort to im prove the reliability of the scores I assign using peformance-based assessments. In the fut ure , I will engage mystudents in the process of assessment. Tog ether, we can reviewthe rub ric tha t will be used to score performance of a particultask. They can assist me in determining the properties of thesperformances at the basic, proficient, and advanced levels. Iwill be interested in learning how my views correspond to ordiffer from those of my students. T he insight gained from thiexperience should increase the reliability of the scores generated from the performance-based assessments I use in my class-room.Conclusion"Anita" is a composite of hard-working professionals whostrive to improv e their educational practices. H er story showstha t teachers can develop and use performance-based assess-ments to provide information about what the ir students areable to d o as a result of instruction . Using these assessments isnot en ough. Teachers need to be confiden t abou t the consis-tency or reliability of the scores generated with this meth od oassessment.

Music teachers do not need to study the statistical proper-ties of reliability to apply this concep t to the ir educationalpractice. Rather, they can apply a practical know ledge of reliability to im prove the consistency of subjectively scored mea-surements used to docum ent what students can do as a resultof formal music instruction. Using the strategies outlined here

M T S P R I N G 2 0 0 3


4/5

can improve the consistency with which they award Freeman, R. , and R. Lewis. 1998. Planning andimplementinon performance-basedassessments.Based on these stan- ment. London: Kogan.assessment results founded on subjectivejudgment canreliable and fair. Herman, J. L., P. R. Aschbacher, L. Winters. 1992. A pmt i cto alternative assessment. Alexandria, VA: Association forsion and Cuniculum.

as a result of instruction.Students may also gain understandings and competencies in for-mal and informal educational settings outside the classroom.

adapted from H erm an, Aschbacher, and Winters(1992).

is summ ary of perform ance-based assessment is based o n S cott(2001).

M EN C: T he N ational Association for Music Education. 199formance standards formusic. Reston, VA:MENC.Nitko, A. J. 1996. Educational assessment o f students. 2d ed.

wood Cliffs, NJ: Prentice Hall.Popham, W. J. 1999. C la sm om issessment: W ha t teirchen nknow. 2d ed. Boston, MA: Allyn and Bacon.Scott, S. J. 2001. Using checklists and rating scales (rubrics) t

student learning in music: Heather's story. Manitoba Musof National Arts Education Associations. 1994. National cator41(3):7-9,standards tbrarts education. Reston, V A : M E N C .

Figure 2. Generic Rating Scale (Rubn ?)Specific learning target: Given songs in 2/4 and 4/4 meter, student steps the beat

Figure I . ChecklistSpecific learning target: given song in 2/4 and 4/4 meter, student steps the beatDirections: Observe the student as he/she steps the beat

Scale1. Unable to perform2. Experiences some difficulty3. Inaccurate performance4. Accurate performance5 . Musical performance

NameJ. B. LockH. E. WoodwardF. FaraciM. LucerneJ. M. Fortier

YesXXX

No

XX


5/5

F i g u ~ . Skil l Specific Rating Scale (Rubric)Specific lea rning targer: Given songs in 2/4 an d 4/4 meter, s tud ent steps the beat

Unable to perform Expcricnces difficulty Inaccurate Accuratc Musical-Unsteady bcat -Able to fecl aecentcd beat -Able to fcel acccnted beat -No errors -No crrors-Most remaining -Performance cmrs -Stiff rnovcmcnts -Free movements

bcats incorrcct -Drift.$away from bcatand then returns

M T S P R I N G 2 0 0 3

pba implications

Documents