peer-review/assessment aid to learning & assessment

Peer-Review/Assessment

Aid to Learning & Assessment

Phil DaviesDivision of Computing & Mathematical Sciences

Department of ComputingFAT

University of Glamorgan

Defining Peer-Assessment

• In describing the teacher ..

A tall b******, so he was. A tall thin, mean b******, with a baldy head like a light bulb. He’d make us mark each other’s work, then for every wrong mark we got, we’d get a thump. That way – he paused – ‘we were implicated in each other’s pain’

McCarthy’s Bar McCarthy’s Bar (Pete McCarthy, 2000,page (Pete McCarthy, 2000,page 68)68)

AUTOMATICALLY

CREATE A MARK THAT REFLECTS THE QUALITY OF AN ESSAY/PRODUCT

VIA PEER MARKING,

AND ALSO

A MARK THAT REFLECTS THE QUALITY OF THE

PEER MARKING PROCESS i.e. A FAIR/REFLECTIVE

MARK FOR MARKING AND COMMENTING

Below are comments given to students.Place in Top FOUR Order of Importance to YOU

1. I think you’ve missed out a big area of the research2. You’ve included a ‘big chunk’ that you haven’t cited3. There aren’t any examples given to help me understand4. Grammatically it is not what it should be like5. Your spelling is atroceious6. You haven’t explained your acronyms to me7. You’ve directly copied my notes as your answer to the

question8. 50% of what you’ve said isn’t about the question9. Your answer is not aimed at the correct level of

audience10.All the points you make in the essay lack any

references for support

Order of Answers

• Were the results all in the ‘CORRECT’ order – probably not?

• Why not!

• Subject specific?• Level specific – school, FE, HE• Teacher/Lecturer specific?

• Peer-Assessment is no different – Objectivity through Subjectivity

Typical Assignment Process

• Students register to use system - CAP

• Create an essay in an area associated with the module

• Provide RTF template of headings

• Submit via Bboard Digital Drop-Box

• Anonymous code given to essay automatically by system

• Create comments database / categories

Each Student is using a different set of weighted comments

Comments databases sent to tutor

First Stage => Self Assess own Work

Second Stage (button on server) => Peer Assess 6 Essays

Self/Peer Assessment

• Often Self-Assessment stage used– Set Personal Criteria– Opportunity to identify errors– Get used to system

• Normally peer-mark about 5/6

• Raw peer MEDIAN mark produced• Need for student to receive Comments + Marks

• Need for communication element?

AUTOMATICALLY EMAIL THE MARKER ..

ANONYMOUS

The communications element

• Requires the owner of the file to ‘ask’ questions of the marker

• Emphasis ‘should’ be on the marker• Marker does NOT see comments of other

markers who’ve marked the essays that they have marked

• Marker does not really get to reflect on their own marking – get a reflective 2nd chance

• I’ve avoided this in past -> get it right first time

Feedback Index

• Produce an index that reflects the quality of commenting

• Produce a Weighted Feedback Index

• Compare how a marker has performed against these averages

• Judge quality of marking and commenting i.e. provide a mark for marking AUTOMATICALLY

CompensationHigh and Low Markers

• Need to take this into account

• Each essay has a ‘raw’ peer generated mark - MEDIAN

• Look at each student’s marking and ascertain if ‘on average’ they are an under or over marker

• Offset mark given by this value

• Create a COMPENSATED PEER MARK

How to work out Mark (& Comment) Consistency

• Marker on average OVER marks by 10%• Essay worth 60%• Marker gave it 75%• Marker is 15% over• Actual consistency index (Difference) = 5

• This is done for all marks and comments• Creates a consistency factor for marking and

commenting

Marks to Comments Correlation

• Jennifer Robinson – a third of comments not useful

• Liu – Holistic comments not specific

• Davies – Really good correlation between marks and comments received

-5 -4 -3 -2 -1 -0 +0 1 2 3 4 5 6 7 8 9

29 44 41 49 46 53 64 49 53 60 62 69 68 69 82

38 48 47 51 45 54 58 53 62 62 64 65 73

49 51 50 60 57 57 67 66

51 58 53 50 59

57 63

59 65

64

0 4.2 5.0 1.4 3.5 4.0 6.8 4.8 3.6 3.9 4.7 2.5 3.1 2.8 0

29 41 45 48 49 49 56 52 56 58 59 67 64 71 82

Ra

ng

e

Fre

qu

en

cy

Ma

rk Diffe

ren

ce

Ma

rk Co

nsiste

ncy

Ma

rkC

on

sisten

cyR

an

ge

s

Fe

ed

ba

ckD

iffere

nce

Fe

ed

ba

ckC

on

sisten

cy

Fe

ed

ba

ck Co

nsiste

ncy

Ra

ng

es

We

igh

ted

Fe

ed

ba

ck D

iffere

nce

We

igh

ted

Fe

ed

ba

ck Co

nsiste

ncy

We

igh

ted

Fe

ed

ba

ckR

an

ge

s

80> 1 0.57 8.32 8.32 0.58 3.62 3.62 -3.1 11.93 11.93

70> 1 -4.8 10.48 10.48 -1.12 4.18 4.18 -4.8 8.0 8.0

65-69

8 -0.93 7.22 11.41-4.77

-0.03 3.05 5.06-1.98 -0.34 11.51 15.60-6.32

60-64

10 -2.00 5.90 10.20-1.63

-0.36 2.89 4.36-0.57 -1.90 12.57 15.89-9.13

55-59

8 0.72 5.48 9.74-2.97 0.08 2.88 4.44-1.83 -0.04 16.12 27.75-7.63

50-54

10 -2.69 7.20 10.64-1.4 -1.2 4.55 9.54-2.04 -3.1 11.37 16.24-6.0

45-49

8 2.14 5.61 6.79-3.49 1.42 2.43 3.37-1.26 5.13 15.56 24.2-8.41

40-44

2 4.17 4.56 5.42-3.68 0.99 1.81 2.38-1.25 5.01 16.35 17.69-15.01

35-39

1 -4.67 3.73 3.73 -1.84 2.7 2.7 -1,24 7.27 7.27

25-29

1 4.6 5.78 5.78 0.27 2.12 2.12 -0.87 15.6 15.6

Automatically Generate Mark for Marking

• Linear scale 0 -100 mapped directly to consistency … the way in HE?

• Map to Essay Grade Scale achieved (better reflecting ability of group)?

• Expectation of Normalised Results within a particular cohort / subject / institution?

Current ‘Simple’ Method

• Average Marks– Essay Mark = 57%– Marking Consistency = 5.37

• Ranges– Essay 79% <-> 31%– Marking Consistency 2.12 <-> 10.77

• Range Above Avge 22% <-> 3.25 (6.76=1)

• Range Below Avge 26% <-> 5.40 (4.81=1)

Innovation Grant Proposal

• Put the emphasis on the marker to get it right• Get the opportunity to ‘reflect’ on COMMENTS

before go back to essay owner• 2nd chance – not sure if I want the results to

have a major effect – hope they get it right the 1st time – consistency

• Is there a Need to have discussion between markers at this stage? – NO as it is dynamic

• Will review stage remove need for compensation?

Used on Final Year Degree + MSc DEGREE DCS

• 36 students on module• 192 markings• 25 ‘replaced’ markings out of 192 (13%)• Average time per peer marking = 37 minutes• Range of time taken to do markings 6-116• Average number of menu comments/marking = 9.8• Raw average mark for essays = 61%• Out of the 25 Markings ‘replaced’ (1 student replaced a

marking twice) only 6 marks changed 6/192 (3%)• Number of students who did replacements = 11(out of

36)• 1 student ‘Replaced’ ALL his/her markings• 6 markings actually changed mark +7, -4, -9, +3, -6, +6

(Avge = -0.5)

Used on Final Year Degree + MSc MSc EL&A

• 13 students • 76 markings• 41 replaced markings (54%)• Average time per marking = 42 minutes• Range of time taken to do markings 3-72 minutes• Average number of menu comments/marking = 15.7• Raw average mark = 61%• Out of 41 Markings ‘replaced’ –> 26 changed mark 26/76

(34%)• Number of students who did replacements = 8 (out of 13)• 2 students ‘Replaced’ ALL his/her markings• 26 markings actually changed mark • -1,+9, -2,-2, +1, -8, -3,-5, +2, +8, -2, +6, +18(71-89), -1,

-4, -6, -5, -7, +7, -6, -3, +6, -7, -7, -2, -5 (Avge -0.2)

Current Conclusions• The results of the mapping of the compensated peer-marks to the

average feedback indexes are very positive. Although the weighted development of the average feedback index only produces a slight improvement to an already very positive correlation, it addresses a concern that the subjectivity of the comments derived from the menu driven system were not totally subjective.

• The main concern of this method of automatically developing a mark for marking & commenting is the mapping of the consistency factors to an absolute grade. It should be kept in mind how difficult it currently is to explain to a student why they have been awarded 69% and their colleague has 71% within a traditional assessment.

• Review Stage -> Tangible or Non-Tangible -> MARKS OR REFLECTION

Some Points Outstanding or Outstanding Points

• What should students do if they identify plagiarism?

• What about accessibility?• Is a computerised solution valid for all?• At what age / level can we trust the use of peer

assessment?• How do we assess the time required to perform

the marking task?• What split of the marks between creation &

marking

Contact Information

[email protected]

Phil Davies

J316

X2247

University of Glamorgan

peer-review/assessment aid to learning & assessment

Documents