2. 'in praise of educational research' = formative assessment
TRANSCRIPT
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 1/16
BERA
'In Praise of Educational Research': Formative AssessmentAuthor(s): Paul Black and Dylan WiliamSource: British Educational Research Journal, Vol. 29, No. 5, 'In Praise of EducationalResearch' (Oct., 2003), pp. 623-637Published by: Taylor & Francis, Ltd. on behalf of BERA
Stable URL: http://www.jstor.org/stable/1502114 .Accessed: 19/10/2013 05:12
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
Taylor & Francis, Ltd. and BERA are collaborating with JSTOR to digitize, preserve and extend access to
British Educational Research Journal.
http://www.jstor.org
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 2/16
British EducationalResearchJournal
Vol. 29, No. 5, October2003
CarfaxPublishingTaylor Francis roup
'In Praise of Educational Research':
formative assessment
PAUL BLACK & DYLAN WILIAM, King's College London
ABSTRACT The authors trace the development of the King's Formative Assessment
Programmefrom its origins in diagnostic testing in the 1970s, through the gradedassessment movement in the 1980s, to the present day. In doing so, they discuss the
practicalissues involvedin
reviewingresearch and outline the
strategiesthat were used
to try to communicate the findings to as wide an audience as possible (including
policy-makersand practitionersas well as academics). Theydescribe how they worked
with teachers to develop formativepractice in classrooms,and discuss the impactthat
this work has had on practice and policy. Finally, they speculate about some of the
reasons for this impact, and make suggestions for how the impact of educational
research on policy and practice might be improved.
Introduction
It has long been recognisedthat assessment can support earningas well as measure t.
Whilst it appears that Michael Scriven first used the term 'formative evaluation' in
connection with the curriculumandteaching(Scriven, 1967), it was Bloom et al. (1971)who first used the term in its generally accepted current meaning. They defined
summativeevaluation tests as those tests given at the end of episodes of teaching(units,courses, etc.) for the purpose of gradingor certifying students, or for evaluatingthe
effectiveness of a curriculum(p. 117). They contrasted these with 'anothertype of
evaluation which all who are involved-student, teacher, curriculummaker-would
welcome because they find it so useful in helping them improvewhat they wish to do'(p. 117), which they termed 'formativeevaluation'.
From theirearliest use it was clear thatthe terms 'formative'and summative'appliednot to the assessments themselves, but to the functions they served. However, themethods and questions of traditionalsummativetests might not be very useful for the
purpose of the day-to-day guidance of learning. So the development of formativeassessmentdependedon the developmentof new tools. To make optimumuse of theseteacherswould also have to change their classroompractices. There would also be aneed to align formative and summative work in new overall systems, so that teachers'
ISSN 0141-1926 (print)/ISSN 1469-3518 (online)/03/050623-15 ?2003 British Educational Research Association
DOI: 10.1080/0141192032000133721
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 3/16
624 P. Black & D. Wiliam
formative work would not be underminedby summativepressures,and indeed, so that
summative requirementsmight be better served by taking full advantageof improve-ments in teachers' assessment work.
The Rise and Fall of Formative Assessment
In the 1970s and 1980s, the developmentof new tools was advancedby a series of
researchprojectsat ChelseaCollege (whichmergedwith King's College in 1985) which
explored ways in which assessmentsmight support earning.The Conceptsin SecondaryMathematics and Science (CSMS) project investigated mathematicaland scientific
reasoningin studentsthrough he use of tests thatwere intendedto illuminateaspectsof
students' thinking,ratherthanjust measure achievement.
In this venture, the science 'wing' of the project followed a broadly Piagetianapproach, seeking to understandstudents' scientific reasoning in terms of stages of
development (Shayer & Adey, 1981). This approachdid not lead as directlyto results
applicable in normalteaching as did the more empirical approachof the mathematics
team, which focused on the diagnosis of errors in the concepts formed by secondaryschool students, and looked for ways to address them (Hart, 1981). Their subsequent
projects sought to understandbetter the relationshipbetween what was taughtand what
was learned(Johnson, 1989).Such interest in the use of assessment to support learningwas given added impetus
by the recommendationof the Committee of Inquiry nto the Teachingof Mathematicsin Schools (1982) that a system of 'gradedtests' be developed for pupils in secondaryschools whose level of achievement was below that certificated n the currentschool-
leaving examinations. Similar systems had been used to improve motivation and
achievement in modernforeign languagesfor many years (Harrison,1982).The groupat Chelsea College chose rather o aim at a system for all pupils, and with
supportfrom both the Nuffield Foundationand the InnerLondonEducationAuthority(ILEA), their Graded Assessment in Mathematics(GAIM) Project was established in
1983. It was one of five gradedassessmentschemes supportedby the ILEA.
This developmentwas more ambitious in attemptingto establish a new system. Inmathematics,English, andcraft,design andtechnology,the schemes set out to integratethe summative unction with the formative.Information ollected for formativepurposes
by teachers as part of the day-to-dayclassroom work would be aggregatedat specific
points in order to recognise the achievement of studentsformally.On leaving a school,the achievements o datewould be 'cashedin' to secure a school-leavingcertificate.This
was allowed before 1988, but then the new criteria for the General Certificateof
Secondary Education (GCSE) specified that, in mathematics, the assessments must
include a writtenend-of-courseexaminationwhich had to count for at least 50% of the
available marks (Departmentof Education and Science & Welsh Office, 1985). Theoriginal developments n othersubjectsmademoreuse of frequent ormaltests, but were
similarlyconstrainedby the GCSE rules.
These rules were just one of three factors which undermined he gradedassessment
developments.The second was the introductionof a national curriculum or all schools
in Englandand Wales in 1988. The National CurriculumTaskGroupon Assessmentand
Testing (TGAT) (1988a, 1988b) adopted the model of age-independent levels of
achievement that had been used by the graded assessment schemes, but requireda
system of 10 levels to cover the age range 5-16, arrangedso that the average studentcould be
expectedto achieve one level
everytwo
years.This was too
coarse-graineda
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 4/16
FormativeAssessment 625
system to be directly useful in classroom teaching: the graded assessment schemes
needed 20 levels for the same age range. To add to the problem of recalibrating he
levels to the new National Curriculum evels, a third problem became increasingly
serious. Assuring comparabilityof awards, both between schools and with othertraditionallybased awards,requiredcostly administration.The combinedeffect of these
three factors led, by 1995, to the abandonmentof all the schemes.
Here began the decline of the developmentin formative assessmentachieved in the
1970s and 1980s. The gradedassessmentschemes had achieved remarkable uccess. In
mathematics in particularthey had given a clear indication of how formative and
summative functions might be integrated n a large-scaleassessmentsystem. They had
also providedmodels of how reportingstructures or the end-of-key-stageassessments
might foster and support progression in learning, and also some ideas about how
moderation of teachers' awards might operate in practice. Ironically, whilst theseachievementshada strong nfluence on the TGAT's formulationof its recommendations,the new system that developed aftertheir reportserved to undermine hem.
The originalNational Curriculumproposalshad envisaged that much of the assess-
ment at 7, 11 and 14 would:
be done by teachers as an integralpartof normal classroom work. But at the
heartof the assessmentprocess there will be nationallyprescribedtests done
by all pupils to supplement he individualteachers' assessments.Teacherswill
markand administer hese, but theirmarking-and their assessmentsoverall-will be externallymoderated. Departmentof Educationand Science & Welsh
Office, 1987, p. 11)
Manyof these suggestionsweretakenup in the TGAT'srecommendations.n particular,the Task Group'sfirstreportpointedout that while fine-scale assessments which served
a formativefunction could be aggregated o serve a summative unction,the reverse was
not true:assessmentsdesignedto serve a summative unction could not be disaggregatedto identify learningneeds. The Task Groupthereforerecommended hat the formative
function should be paramountor Key Stages 1, 2 and 3. Manyin the teachingprofessionwelcomed the TGAT report,but the initial receptionof the researchcommunitywas
quite hostile, as was that of the Prime Minister(Thatcher,1993, pp. 594-595).Whilst the governmentacceptedthe recommendations f the TaskGroup(in a written
parliamentary nswer dated 7 June 1988), over the next 10 years the key elements were
removed,one by one, so that all that now remains n place is the idea of age-independentlevels of achievement(Black, 1997; Wiliam, 2001). In particular,he idea thatNational
Curriculumassessment should supportthe formativepurposehas been all but ignored.
Daugherty(1995, p. 78) points out that the words 'teacherassessment'appearedon the
agenda of the School Examinationsand Assessment Council (the body responsibleforthe implementationof National Curriculumassessment)only twice between 1988 and
1993.
Throughoutthe 1990s, the debate continued about how evidence from the external
tests and teachers' judgements could be combined. In the end, it was resolved by
requiringschools to publish data derived from teachers'judgements and those derived
from the external tests, side by side. Whilst the governmentcould claim that it was
giving equal priorityto the two components,in practice schools were not requiredto
determine the final teacherjudgementsuntil after the test resultshad been received by
the school. This created an incentivefor teachersto match theirown assessments to the
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 5/16
626 P. Black & D. Wiliam
results of the tests, ratherthan face criticismfor having standards hat were either too
low or too strict.
So by 1995 nothing was left of the advances made in the previous decades.
Government was lukewarm or uninterestedin formative assessment: the systems tointegrate t with the summativehadgone, andthe furtherdevelopmentof tools was only
weakly supported.Therewere some minorattempts n the 1990s by the governmentand
its agencies to support ormative assessmentwithin the NationalCurriculum ssessment
system, includinga projectto promotethe formative use of data from summativetests,but these efforts were little more than window dressing.
The debateover the relativeweights to be appliedto the resultsof external tests and
teachers'judgementsobscuredthe fact thatboth of these were summativeassessments.
While differentteachersdevelopeddifferentmethodsof arrivingatjudgementsaboutthe
levels achieved by their students (Gipps et al., 1995), it is clear that most of therecord-keepingoccasioned by the introductionof the National Curriculumplaced far
more emphasison supporting he summative functionof assessment. At the same time,the work of Tunstall and Gipps (1996) showed that while some teachers did use
assessmentin supportof learning, n manyclassroomsit was clear thatmuch classroom
assessment did not support earning,and was often used more to socialise childrenthan
to improveachievement(Torrance& Pryor,1998). Such studies werebeginningto direct
attention to the classroomprocesses, i.e. to the fine detail of the ways in which the
day-to-day actions of teachers put formative principles into practices focused on
learning.Thus, one sign of change was that within the educationalresearchcommunity,there
was increasingconcern that the potentialof assessment to support learningwas being
ignored and there were continuing calls for the formative function of assessment to
receive greater emphasis.The lead here was takenby the British EducationalResearch
Association's Policy Task Groupon Assessment.Theirarticle(Harlenet al., 1992) and
that of Torrance(1993) reiterated he importanceof the formativefunction of assess-
ment, although there was disagreementabout whether the formative and summative
functions should be separatedor combined (Black, 1993a).In
1997,as
partof this effort to reassertthe
importanceof formative
assessment,the
British EducationalResearch Association Policy Task Groupon Assessment (with the
support of the Nuffield Foundation)commissioned us to undertake a review of the
researchon formativeassessment.They thereby nitiateda new stage in the developmentof formativeassessments.
Defining the Field and Reviewing the Relevant Research
Two substantialreview articles,one by Natriello(1987) andthe otherby Crooks(1988),had addressedthe field in which we were
interested,and so our review of the research
literatureconcentratedon articlespublishedafter 1987, which we assumed had been the
cut-off point for these earlier reviews. Natriello's review covered the full range of
assessment purposes (which he classified as certification, selection, direction and
motivation)while Crooks's review coveredonly formativeassessment(whichhe termed
'classroom evaluation'). An indication of the difficulty of defining the field, and of
searching the literature, s given by the fact that while the reviews by Natriello and
Crooks cited 91 and 241 items respectively, only nine references were cited in both.
Natriello'sreview used a model of the assessmentcycle, beginningwith purposes,and
moving on to the setting of tasks, criteria and standards,evaluating performanceand
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 6/16
FormativeAssessment 627
providing feedback and then discussed the impact of these processes on students.He
stressedthat the majorityof the researchhe cited was largelyirrelevantbecause of weak
theorisation,which resulted n key distinctions(e.g. the qualityandquantityof feedback)
being conflated.Crooks'sarticle had a narrowerocus-the impactof evaluationpracticeson students.
He concludedthatthe summative unction of assessmenthas been too dominantandthat
more emphasis should be given to the potential of classroom assessments to assist
learning. Most importantly,assessments must emphasise the skills, knowledge and
attitudesregardedas most important,not just those that are easy to assess.
Ourreview also built on four otherkey reviews of researchpublishedsince those byNatrielloand Crooks-reviews by Bangert-Drowns t al. and Kuliket al. into the effects
of classroom testing (Kulik et al., 1990; Bangert-Drownset al., 1991a, 1991b) and a
review by one of us of researchon summative and formative assessment in scienceeducation(Black, 1993b).
To identifyrelevant iteraturewe first searchedthe ERIC database.However,the lack
of consensus on the appropriatekeywords in this area meant that this process consist-
ently failed to yield studies that we knew from our own readingto be relevant. Citation
index searcheson the existingreviews added additional tudies,but still failed to identifyothers. Study of articles cited in the studies we had identified yielded some further
studies. However, it was clear overall that we needed a differentapproach f we were
to identify all of the many relevantstudies. So we resortedto a manual searchthrough
every issue, from 1987 to 1998, of 76 of the most likely journals.As we read throughthese journals manually, rather than relying on keywords, our notion of what was
relevantwas continually expanding.This process generated681 publications hatappeared,at firstsight, to be relevant.By
reading abstracts,and in some cases full articles, this number was reduced to 250
publicationsthat were read in full, and coded with our own keywords. Keywordswere
thengroupedto form sections of the review. In synthesisingthe studies allocatedto each
section, we rejectedthe use of meta-analysis.In the firstplace, many of the studies did
not providesufficient details to calculate standardised ffect sizes, which is the standard
procedurefor combining results from different studies in meta-analysis(Glass et al.,1981). Second, the use of standardised ffect sizes in publishedstudies overestimates he
size of actual effects. This is because the low statisticalpower of most experiments n
the social sciences (Cohen, 1988) means thatmany experimentsgenerateresults that fail
to reach the thresholdfor statisticalsignificance,and go unpublished,so that those that
arepublishedarenot a representative ampleof all resultsactuallyfound (Harlowet al.,
1997). Third,and most importantly n our view, given the relativelyweak theorisation
of the field, we felt it was not appropriate imply to specify inclusion criteriaand then
include only studies that satisfied them. Instead, we assembled the review througha
process of 'best evidence synthesis' (Slavin, 1990). Such an approachis inevitablysomewhatsubjective,and is in no sense replicable-other authors,even given the same
250 focal studies, would have produceda different review. Instead, our aim was to
producean account of the field thatwas authentic, aithfuland convincing.However, it
is worthnoting that none of the six shortcommentariesby experts in this field, which
were published in the same journal issue as our review, disagreedwith our findings.We believe that this story of the developmentof our review makes several important
points about educational research. In the first place, reviewing research is much more
difficult in any social science than it is in (say) the physical sciences because the
complexity of the field precludes any simple universal system of keywords. Any
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 7/16
628 P. Black & D. Wiliam
automatedresearchprocess is bound to result in a systematic underrepresentationf the
body of research because it cannothope to identify all the relevant studies.
Second, synthesisingresearchcannot be an objectiveprocess.While review protocols
such as those being used by the EPPI-Centre or the evaluation of individualresearchstudies are helpful in identifying strengths and weaknesses in those studies, the
significanceattachedto each study, and the way thatgeneralconclusions are drawnby
weighing sometimes conflicting findings, will inevitablyremainsubjective.
Third, it seems to us important o point out that reviewing research is not merely a
derivativeform of scholarship.Reviews such as those by Natriello and Crooks can serve
to reconceptualise, to organise, and to focus research. Because our definition of
'relevance' expanded as we went along, we too had to find ways of organising a
widening field of research,and were forced to make new conceptuallinksjust in order
to be able to relate the various researchfindings into as coherent a pictureas possible.This was one reasonwhy our review generateda momentum or work in this field that
would be difficult to create in any other way.One feature of our review was that most of it was concernedwith such issues as
students'perceptions,peer- and self-assessment,and the role of feedbackin a pedagogyfocused on learning.Thus, it helpedto takethe emphasisin formativeassessment studies
away from systems, with its emphasis on the formative-summativeinterface, and
relocate it on classroomprocesses.
Disseminating the Findings
Experience in producing a review of mathematics education research for teachers
(Askew & Wiliam, 1995) hadtaughtone of us thatit is impossibleto satisfy, in the same
document, the demands of the academic community for rigour and the demands for
accessibilityby practitioners.Discussions with primary eachersshowed that even when
that draft review had been written in the most straightforward anguage we could
manage,they still found it unsuitable.Extensiveredrafting ventuallymade it accessible,but this work took as long the originalcollection andwriting,and it thenturnedout that
the academiccommunityfound it of little interest.Such experienceconvinced us that we had to producedifferentoutputsfor different
audiences. For other academics,we produceda 30,000-wordjournal article (Black &
Wiliam, 1998a), which, togetherwith shortresponsesfrom invited commentators rom
around the world, formed the whole of a special issue of the journal Assessment in
Education.As well as detailingour findings,we tried to lay out as clearly as possiblehow we had constructed the review so that, while we would not necessarily expectdifferent authors to reach identical conclusions, we hoped that the process which we
followed was verifiable and could be repeated.
To make the findings accessible to practitionersand policy-makers,we produceda21-page booklet in A5 format entitledInside the Black Box (Black & Wiliam, 1998b).
We also produced a slightly revised version for the internationalaudience (Black &
Wiliam, 1998c).In Inside the Black Box, we very briefly outlined the approachwe had taken in our
review and summarised ts conclusions. However, we also felt it essential to explore
implicationsfor policy and practice. In so doing we inevitably, at some points, went
beyond the evidence, relying on our experienceof many years' work in the field. If we
had restrictedourselves to only those policy implicationsthat followed logically and
inevitablyfrom the researchevidence, we would have been able to say verylittle.Whilst
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 8/16
FormativeAssessment 629
the policy implicationswe drew were not determinedby the researchbase, they were,
we felt, the most reasonableinterpretation f the findings (for more on reasonableness
as a criterion,see later).The title itself was significant,for it pointedto our mainpolicy
plea-that teachers'work in the
classroomwas the
keyto
raisingstandardsand that
systems of externaltesting shouldbe restructuredo ensure that this work was supportedrather than underminedby them. The process now claimed priorityin the system.
In order to secure the widest possible impactof our work, we plannedthe booklet's
launch with membersof the King's College London's External RelationsDepartment.We held a launch conference on 5 February1998, hosted by the Nuffield Foundation,
who had supported he writingof the review, and on the previousday had held a series
of briefingsforjournalistsof the nationaldaily newspapersand the specialisteducational
press. The result was that almost every single nationaldaily newspapercarried some
reference to the work. While some of the reports were either inaccurate or highlypoliticised, much of the coverage was broadlyaccurate, f somewhatselective.
In the five years since its launch,Inside the Black Box has sold over 30,000 copies,and data from the Authors' Licensing and Collecting Society (ALCS) suggests that at
least another25,000 copies have been made in schools.
Inside the BlackBox did not try to lay out whatformativeassessment would look like
in practice. Indeed, it was clear that this was neither advisablenor possible:
Thus the improvementof formative assessment cannot be a simple matter.
There is no'quick ix'
that can be added toexistingpractice
withpromise of
rapidreward.On the contrary, f the substantial ewardsof whichthe evidence
holds out promiseare to be secured,this will only come about if each teacher
finds his or her own ways of incorporatinghe lessons and ideas that are set
out above into her or his own patternsof classroom work. This can only
happen relatively slowly, and throughsustainedprogrammesof professional
developmentand support. (p. 15, emphasis in original)
Putting it into PracticeThe three research reviews offered strong evidence that improving the quality of
formative assessment would raise standardsof achievement n each countryin which it
had been studied.Furthermore,he consistency of the effects across ages, subjectsand
countries meant that even though most of the studies reviewed had been conducted
abroad,these findingscould be expected to generaliseto the UK. For us, the questionwas not, therefore, 'Does it work?' but 'How do we get it to happen?'
In mid-1998 the Nuffield Foundationagreedto supporta two-year projectto involve
24 teachersin six schools in two local education authorities Oxfordshireand Medway)in exploring how formative assessment might be put into practice. So began the
King's-Medway-OxfordshireFormativeAssessment Project(KMOFAP).One of the key assumptions of the project was that if the promise of formative
assessment was to be realised,traditional esearchdesigns-in which teachers are 'told'
what to do by researchers-would not be appropriate.We argued that a process of
supporteddevelopmentwas an essential next step:
In such a process, the teachers in their classrooms will be working out the
answers to many of the practical questions that the evidence presentedhere
cannotanswer,andreformulatinghe issues, perhaps n relationto fundamental
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 9/16
630 P. Black & D. Wiliam
insights, and certainlyin terms that can make sense to theirpeers in ordinaryclassrooms. (Black & Wiliam, 1998b, p. 16)
We had chosen to work with Oxfordshire and Medway because we knew that their
officers were interested n formative assessment and would be able to supportour work
locally. We asked each authority o select three secondaryschools and, after discussion
with membersof the research eam,the six so chosen agreedto be involved. At this stagewe were not concerned to find 'typical' schools, but schools that could provide'existenceproofs' of good practice,and would producethe 'living examples' alluded to
earlierfor use in furtherdissemination.
We decidedto start with mathematicsand science because these were subjectswhere
we felt there were clearmessages from the researchand also where we hadexpertisein
the subject-specificdetails thatwe thoughtessentialin practicaldevelopment.The choice
of teachers, two in mathematics and two in science in each school, was left to theschools: a variety of methods were used, so that there was a considerablerange of
expertise and experience amongst the 24 teachers selected. In the following year we
augmented the project with one additional mathematics and one additional science
teacher,and also began working with two English teachersfrom each school.
Our 'intervention'with these teachers had two main components:
* a series of nine one-day in-service education and training(INSET) sessions over a
period of 18 months, during which teachers were introduced to our view of the
principlesunderlying ormativeassessment,and were given the opportunity o developtheir own plans;
* visits to the schools, duringwhich the teacherswould be observedteachingby projectstaff, andhave an opportunity o discuss their ideas and theirpractice;feedback from
the visits helped us to attune the INSET sessions to the developing thinking and
practiceof the teachers.
The key feature of the INSET sessions was the developmentof actionplans. Since we
were aware from other studies that effective implementationof formativeassessment
requiresteachersto renegotiatethe 'learningcontract'that they had evolved with theirstudents(Brousseau, 1984; Perrenoud,1991), we decided that implementingformative
assessmentwould best be done at the beginningof a new school year. For the first six
months of the project(January1999 to July 1999), therefore,we encouraged he teachers
to experimentwith some of the strategiesandtechniquessuggested by the research,such
as rich questioning,comment-onlymarking,sharingcriteriawith learners,and student
peer- and self-assessment.Each teacherwas then asked to drawup an actionplan of the
practicesthey wished to develop and to identify a single focal class with whom these
strategieswould be introduced n September1999. Details of these plans can be found
in Black et al. (2003).Our interventiondid not impose a model of 'good formative assessment' on teachers,
but, rather,supportedthem in developing their own professionalpractice. Since eachteacher was free to decide which class to experiment with, we could not impose a
standardexperimentaldesign-we could not standardisethe outcome measures, nor
could we rely on having the same 'input' measures for each class. In order to secure
quantitative vidence, we thereforeused an approach o the analysisthatwe have termed'local design', making use of whatever data were available within the school in thenormal course of events. In most cases, these were the results on the NationalCurriculum ests or GCSE but in some cases we also made use of
scoresfrom
school
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 10/16
FormativeAssessment 631
assessments. Each teacher consultedwith us to identify a focal variable(i.e. dependentvariableor 'output')and in most cases we also had reference variables(i.e. independentvariablesor 'inputs'). We then set up, for each experimentalclass, the best possible
control class in the school. In some cases, this was a parallelclass taught by the sameteacher(eitherin the same or previousyears);in others,it was a parallelclass taughtbya different teacher.Failing that, we used a non-parallelclass taught by the same or a
different teacher. We also made use of national norms where these were available.In
most cases, we were able to condition the focal variable on measures of priorachievementor generalability.By dividingthe differencesbetween the mean scores of
control groupand experimentalgroups by the pooled standarddeviation,we were able
to derive a standardised ffect size (Glass et al., 1981) for each class. The medianeffect
size was 0.27 standarddeviations,and a jackknifeprocedure Mosteller& Tukey, 1977)
yielded a point estimate of the meaneffect size as 0.32, with a 95% confidenceintervalof [0.16, 0.48]. Of course, we cannot be sure that it was the increased emphasis on
formativeassessmentthatwas responsiblefor this improvement n students' scores, but
this does seem the most reasonableinterpretationsee discussion of 'reasonableness'
below). For furtherdetails of the experimentalresults, see Wiliam et al. (2004).
Outcomes
The quantitative vidence that formativeassessmentdoes raise standards f achievement
on National Curriculum ests and GCSE examinations is importantin showing thatinnovations thatworkedin researchstudies in other countriescould also be effective in
typicalUK classrooms.Partof the reasonthat formativeassessmentworksappears o bean increasein students' 'mindfulness'(Bangert-Drowns t al., 1991a),andwhile this hasbeen shown to increaselong-termretention Nuthall& Alton-Lee, 1995), it also dependsto a certainextenton the kind of knowledgethat is being assessed. More will be gainedfrom formative feedbackwhere a test calls for the mindfulnessthat it helps to develop.Thus, it is significanthere that almost all the high-stakesassessments in this countryrequireconstructedresponses(as opposedto multiplechoice) and often assess higher-or-
der skills.However,other outcomesfrom theprojectareat least as important.Throughour work
with teachers, we have come to understandmore clearly how the task of applyingresearch into practiceis much more than a simple process of 'translating' he findingsof researchersnto the classroom. The teachersin ourprojectwere engagedin a processof knowledgecreation,albeit of a distinctkind,andpossibly relevantonly in the settingsin which they work (see Hargreaves,1999).
As the teachersexploredthe relevance of formativeassessment for their own practice,they transformed deas fromotherteachers nto new ideas, strategiesandtechniques,and
these were in turncommunicated o otherteachers,creatinga 'snowball'effect. Also, aswe have introducedmore and more teachers to these ideas, we have become betterat
communicating he key ideas. A case in point is that we have each been asked severaltimes by teachers, 'Whatmakes for good feedback?'-a questionto which, at first,wehad no good answer.Over the course of two or threeyears, we have evolved a simpleanswer-good feedback causes thinking.
Ever since the publicationof Inside the Black Box we have been producinga seriesof articles in journals aimed at practitioners hat showed how the ideas of formative
assessment could be used in practice. We are also publishing researchpapers arising
from the project.These have includedaccounts of our collaborativeworkwith teachers,
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 11/16
632 P. Black & D. Wiliam
of the processes of teacherchange, of the quantitativeevidence of learning gains, and
of a theoretical ramework or classroom formativeassessment(see the project'swebsite
at: < http://www.kcl.ac.uk/depsta/education/KAL/ASSESSMENT.htmlfor details of
the programme'spublications).In addition,since the launch of Inside the Black Box in February1998, members of
the research eam at King's have spokento groupsof teachers andpolicy-makerson over
400 occasions, suggestingthatwe have addressedwell over 20,000 people directlyabout
our work.
Following the success of Inside the BlackBox, we decided to use the same formatfor
communicatingthe results of the KMOFAPprojectto teachers. The resultingbooklet,
Working nside the Black Box, sold 15,000 copies in the six months afterits launchin
July 2003.
Of course, working as intensively as we did with 24 teachers could not possiblyinfluence more than a small fractionof teachers, so we have also given considerable
thought to how the work could be 'scaled up'. We are working with other local
educationauthorities notably Hampshire) o develop local expertise, in both formative
assessment and in strategiesfor dissemination.In Scotland, formative assessment has
become an importantcomponentof the Scottish Executive's strategyfor schools and
membersof the projectteam, includingthe projectteachers,have providedsupportand
advice.
This work has also led to furtherresearchwork.We arepartners,with the Universities
of Cambridge, Reading and the Open University, in the Learning How to Learn: inclassrooms, schools and networksproject, which is part of the Economic and Social
ResearchCouncil's Teaching and LearningResearchProgramme.This projectaims to
promote and understandchange in classrooms, but also is investigating how these
changes are supportedor inhibitedby factors at the level of the whole school, and of
networksof schools. We are also workingwith StanfordUniversity in California on a
similarproject,which againcombinesdetailedwork with small numbersof teacherswith
a focus on how small-scalechangescan be scaled up. Assessment for learninghas also
become one of the two key foci (along with thinking skills) of the government'sKey
Stage 3 Strategyfor the foundationsubjects.
Reflections
Critiques of educational research typically claim that it is poorly focused, fails to
generatereliable and generalisablefindings, is inaccessible to a non-academicaudience
and lacks interpretationor policy-makersandpractitioners Hillage et al., 1998). While
some of these criticismsareundoubtedlyapplicableto some studies,we believe that this
characterisation f the field as a whole is unfair.We do not believe that all educationalresearchshould be useful, for two reasons.The
first is that,just as most research n the humanities s not conductedbecause it is useful,
we believe that there should be scope for some researchin educationto be absolutelyuninterested n considerationsof use. The second reason is that it is impossibleto state,with any certainty,which researchwill be useful in the future.
Havingsaid this, we believe stronglythatthe majorityof research n educationshould
be undertakenwith a view to improvingeducationalprovision-research in what Stokes
(1997) calls 'Pasteur'squadrant'.And althoughwe do not yet know everythingabout
'what works' in teaching,we believe that there is a substantialconsensus on the kinds
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 12/16
FormativeAssessment 633
of classrooms thatpromotethe best learning.Whatwe know much less aboutis how to
get this to happen.
Policy-makersappear o want large-scaleresearchconducted to the highest standards
of analyticrationality,whose findings are also relevantto policy. However, it appearsthat these two goals are, in fact, incompatible. Researching how teachers take on
research,adapt it, and make it their own is much more difficult than researchingthe
effects of differentcurricula,of class sizes, or of the contribution f classroomassistants.
While we do not know as much as we would like to know about effective professional
development,if we adopt 'the balance of probabilities'ratherthan 'beyond reasonable
doubt' as our burdenof proof, then educationalresearchhas muchto say. When policywithout evidence meets developmentwith some evidence, developmentshould prevail.
In terms of our own work, perhapsthe most puzzling issue arisingfrom this story is
why our work has had the impact that it has. Our review did add to the weight ofevidence in supportof the utility of formativeassessment,but did not substantiallyalter
the conclusionsreachedby Crooks andNatriello 10 yearsearlier.It could be, of course,that the current nterestin formativeassessment,and its policy impact, is nothingto do
with ourwork,but thatit takes 10 yearsor so for suchfindingsto filterthrough o policy,or that the additionalstudies identifiedby us in some way tipped the balance to make
the findingsmore credible. All of this seems unlikely.It thereforeappears hatsomethingthatwe did, not just in undertakinghe review, but also in the way it was disseminated,has had a profoundimpact on how this research has fed into policy and practice.Of
course, identifyingwhich elements have been most important n this impact is imposs-ible, but it does seem appropriateo speculateon the factorsthat have contributed o it.
The first factor is thatalthoughmost of the studies cited in our review emanated rom
overseas, the review was originatedand published in this country. This provided a
degree of local 'ownership'that is likely to have attractedattentionto the research.
The second is that althoughwe triedto adhereclosely to the traditional tandardsof
scholarshipin the social sciences when conductingand writing our review, we did not
do so when exploring the policy implications in Inside the Black Box. While the
standardsof evidence we adoptedin conductingthe review might be characterisedas
those of 'academicrationality', he standardwithin Inside the Black Box is much closerto that of 'reasonableness'advocatedby StephenToulmin for social enquiry(Toulmin,
2001). In some respects,Inside the Black Box representsour opinionsandprejudicesasmuch as anything else, although we would like to think that these are supportedbyevidence, and are consistentwith the 50 yearsof experienceof workingin this field that
we have between us. It is also important o note in this regard hat the success of Inside
the Black Box has been as much due to its rhetorical force as to the evidence that
underpins t. This will certainlymakemanyacademicsuneasy-for it appears o blurthe
line between fact and value, but as Flyvbjerg(2001) argues, social enquiryhas failed
precisely because it has focused on analyticrationalityrather hanvalue-rationalityseealso Wiliam, 2003).
The thirdfactorwe believe has been significant s the stepswe have takento publiciseour work. As noted earlier, this has included writing different kinds of articles for
differentaudiences,and has also involved workingwithmedia-relations xpertsto securemaximumpress coverage.Again, these are not activities thataretraditionallyassociatedwith academic research,but they are, we believe, crucial if research is to impact on
policy.A fourthfactorthat appears o have been important s the credibilitythatwe brought
as researchers o the process. In theirprojectdiaries,several of the teacherscommented
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 13/16
634 P. Black & D. Wiliam
that it was our espousal of these ideas, as much as the ideas themselves, thatpersuadedthem to engage with the project:where educationalresearch s concerned,the facts do
not necessarily speak for themselves.
A fifth factor is that the ideas had an intrinsicacceptability
to the teachers. We were
talkingaboutimproving earning n the classroom,which was centralto the professionalidentities of the teachers, as opposed to bureaucraticmeasures such as target-setting.
Linked to this factor is our choice to concentrateon the classroomprocesses and to
live with the external constraintsoperatingat the formative-summative nterface:the
failed attemptsto change the system in the 1980s and 1990s were set aside. Whilst it
might have been merely prudent to not try again to tilt at windmills, the more
fundamental trengthwas thatit was at the level chosen, thatof the core of learning, hat
formative work stakes its claim for attention.Furthermore, iven thatany changehas to
work out in teachers'practicalaction,
this is where reformshouldalways
have started.
The evidence of learning gains, from the literature eview andfrom ourproject,restates
and reinforcesthe claim for priorityof formative work that the TGAT tried in vain to
establish. The debate about how policy should secure optimum synergy between
teachers'formative, eachers'summative,andexternalassessments s unresolved,but the
terms and the balance of the argumentshave been shifted.
The final,andperhapsmost important,act in all this is thatin ourdevelopmentmodel
we attendedto both the content and the process of teacherdevelopment(Reeves et al.,
2001). We attended o the processof professionaldevelopment hroughan acknowledge-ment that teachers need
time, freedom,and
supportfrom
colleaguesin order to reflect
criticallyupon and to develop their practice(Lee, 2000), whilst offering also practical
strategiesandtechniquesabout how to begin the process. By themselves, however,these
arenot enough.Teachers also need concreteideas about the directions n which they can
productivelytake their practice,and thus there is a need for work on the professional
development of teachers to pay specific attention to subject-specific dimensions of
teacherlearning(Wilson & Berne, 1999).One might ask of our work whether it drew, or could have drawn,strength rom the
psychology of learning.For the practicaldemandfor good tools, e.g. a good questionto
exploreideas about the
conceptof
momentum,the
helpcould
onlycome from studies
in mathematicsand science education.We have shown earlierhow the historicaloriginsof ourworkcontributed o such studies. The fact thatthe King's teamhad strongsubject
backgroundsand could draw on earlier work on such tools was essential. However, it
remainsthe case that for most school subjectsrich sources for the appropriateools are
lacking.At a more general level, however, it could be seen that the practical activities
developed did implementprinciplesof learningthat are prominentin the psychologyliterature.Examplesare the constructivistprinciplethat learningaction must startfrom
the learner'sexisting knowledge,
the need for active andresponsible
nvolvementof the
learner, he need to establishin the classroom a communityof subjectdiscourse,and the
value of developing meta-cognition see Black et al., 2003). Ourorientation n develop-
ing the activities in line with such principleswas in part intuitive,but became explicitwhen the teachers asked us to give a seminaron learningtheoryat one of the group'sINSET sessions. We judge now that to have startedwith such principleswhen we had
little idea of how to implement them in practice might have done little to motivateteachers.
One feature thatemergesfrom consideringthe historyof this work is thatits success
seems far from inevitable,but is, rather,the result of a series of contingencies.What
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 14/16
FormativeAssessment 635
would have happenedif our review article had not been commissioned, or if we had
finished work when the review was complete (which is all we had been commissioned
to do)? What if we had failed to secure funding for our project,or had failed to find
teacherswilling
to work withus,
or if we workedin adepartment
orced to make cuts
in staffingas a result of a researchassessmentexercise? The continuityof staffingseems
to us to be particularly mportant.We do not think that it is a coincidence that the work
herebuildson a 25-yeartraditionof workon diagnosticand formativeassessmentwithin
a single institution,andone in whichthe turnoverof staff has beenextremely ow. While
some of the expertisegatheredover this time can be made explicit, much of it cannot,and we believe that our collective implicit knowledge has been at least as important o
our work as publishedresearchfindings.This history paints a picture of educational researchnot as a steady building up of
knowledge towardssome objective truthabout teaching and learning,but, rather,as atrajectorybuffeted by combinations of factors. Calling this 'the mangle of practice',
Pickering(1995) shows thatneitherthe view of scientificknowledgeas a series of truths
waiting to be discovered,nor the view that scientific knowledge is whateverscientists
do, are adequatedescriptions.Instead,he suggests, scientists build ideas abouthow the
world is, which are then 'mangled' in their contact with the objective physical world.
When questions become too difficult, scientists change the questions to ones that are
tractable. n social sciences, the mangleis even morecomplex, in thatwhat it is possibleto research,and what it is good to researchkeeps changing.
From this perspective, all our activities-the developmentof theoreticalresources,work with teachers,rhetoricalcampaignsto convince policy-makersand teachers of the
utility of formative assessment, even co-opting the print and broadcast media to our
purpose-makes a kind of sense.
Educationalresearch can and does make a difference,but it will succeed only if we
recogniseits messy, contingent,fragilenature.Some policy-makersbelieve thatsupport-
ing educationalresearch s crazy,but surelythe real madnessis to carryon what we have
been doing, and yet to expect differentoutcomes.
Acknowledgements
We would like to acknowledgethe initiative of the AssessmentPolicy TaskGroupof the
British Educational Research Association (now known as the Assessment Reform
Group)who gave the initialimpetusandsupport or our researchreview. We aregratefulto the Nuffield Foundation,who funded the originalreview, and the firstphase of our
project.We are also gratefulto ProfessorMyron Atkin and his colleagues in Stanford
University,who securedfundingfrom the US National Science Foundation NSF Grant
REC-9909370), for the last phase. Finally, we are indebted to the Medway andOxfordshire local education authorities,their six schools and above all to their 36
teachers who took on the central and risky task of turning our ideas into practical
working knowledge.
Correspondence:Paul Black, King's College London, School of Social Science and
Public Policy, Departmentof Education and Professional Studies, Franklin-Wilkins
Building, WaterlooRoad, London SE1 9NN, UK.
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 15/16
636 P. Black & D. Wiliam
REFERENCES
ASKEW,M. & WILIAM, . (1995) Recent Research in Mathematics Education 5-16 (London, HMSO).
BANGERT-DROWNS,. L., KULIK -L. C., KULIK,. A. & MORGAN,M. T. (1991a) The instructionaleffect
of feedback n test-likeevents,
Reviewof
EducationalResearch,61, pp.
213-238.
BANGERT-DROWNS,. L., KULIK,. A. & KULIK,C-L. C. (1991b) Effects of frequent classroom testing,Journal of Educational Research, 85, pp. 89-99.
BLACK, . (1997) Whatever happened to TGAT?, in: C. CULLINGFORDEd.) Assessment vs. Evaluation,
pp.24-50 (London,Cassell).BLACK,. J. (1993a)Assessmentpolicy andpublicconfidence: ommentson the BERAPolicy Task
Group'sarticle Assessment ndtheimprovementf education',TheCurriculumournal,4, pp.421-
427.
BLACK, . J. (1993b) Formative and summative assessment by teachers, Studies in Science Education, 21,
pp.49-97.BLACK,P. J. & WILIAM, D. (1998a) Assessmentandclassroom earning,Assessment in Education, 5,
pp.7-73.BLACK,P. J. & WILIAM, D. (1998b) Inside the Black Box: raising standards through classroom
assessment London,King'sCollegeLondonSchool of Education).BLACK,.J.&WILIAM,D. (1998c)Inside he blackbox:raising tandardshrough lassroom ssessment,
Phi Delta Kappan, 80, pp. 139-148.
BLACK, ., HARRISON,., LEE,C., MARSHALL,. & WILIAM,D. (2003) Assessmentfor Learning: puttingit intopractice(Buckingham,OpenUniversityPress.)
BLOOM,. S.;HASTINGS,. T. & MADAUS,G. F. (Eds) (1971) Handbook on the Formative and Summative
Evaluation of Student Learning (New York, McGraw-Hill).
BROUSSEAU,. (1984) The crucialrole of the didacticalcontract n the analysisand construction fsituations n teachingand learningmathematics,n: H.-G. STEINEREd.) Theoryof Mathematics
Education: ICME 5 topic area and miniconference, pp. 110-119 (Bielefeld, Institut fuirDidaktik der
Mathematik erUniversittitBielefeld).COHEN, . (1988) Statistical Power Analysis for the Behavioural Sciences (Hillsdale, NJ, Lawrence
ErlbaumAssociates).COMMITTEEOF INQUIRY NTO THE TEACHINGOF MATHEMATICSN SCHOOLS(1982) Report: mathematics
counts(London,HMSO).CROOKS,. J. (1988)Theimpactof classroom valuationpracticeson students,Reviewof Educational
Research, 58, pp. 438-481.
DAUGHERTY,. (1995) National CurriculumAssessment: a review of policy 1987-1994 (London, Falmer
Press).
DEPARTMENTOF EDUCATIONAND SCIENCE& WELSH OFFICE (1985) General Certificate of SecondaryEducation: the national criteria (London, HMSO).
DEPARTMENTOF EDUCATIONAND SCIENCE& WELSH OFFICE (1987) The National Curriculum 5-16: a
consultationdocumentLondon,Departmentf Education ndScience).FLYVBJERG,B. (2001) Making Social Science Matter: why social inquiryfails and how it can succeed
again (Cambridge,CambridgeUniversityPress).
GIPPS,C. V., BROWN,M. L., MCCALLUM,. & MCALISTER,. (1995) Intuition or Evidence? Teachers and
the national assessment of seven year olds (Buckingham, Open University Press).
GLASS, G. V.;McGAw, B. & SMITH, M. (1981) Meta-analysis in Social Research (BeverlyHills, CA,
Sage).HARGREAVES,D.H. (1999) Theknowledgecreating chool,British Journal of Educational Studies, 47,
pp. 122-144.HARLEN,W., GIPPS,C., BROADFOOT,. & NUTTALL, . (1992) Assessment and the improvement of
education, The CurriculumJournal, 3, pp. 215-230.
HARLOW,. L., MULAIK,. A. & STEIGER,. H. (Eds) (1997) WhatIf There Were No Significance Tests?
(Mahwah,NJ, LawrenceErlbaumAssociates).HARRISON, A. (1982) Review of Graded Tests (London,Methuen).HART, K. M. (Ed.) (1981) Children's Understanding of Mathematics: 11-16 (London, John Murray).
HILLAGE,J., PEARSON,R., ANDERSON,A. & TAMKIN, P. (1998) Excellence in Research on Schools
(London,Departmentor Education ndEmployment).JOHNSON,D. C. (Ed.) (1989) Children's Mathematical Frameworks 8-13: a study of classroom teaching
(Windsor,NFER-Nelson).
This content downloaded from 202.43.95.117 on Sat, 19 Oct 2013 05:12:24 AMAll use subject to JSTOR Terms and Conditions
7/27/2019 2. 'in Praise of Educational Research' = Formative Assessment
http://slidepdf.com/reader/full/2-in-praise-of-educational-research-formative-assessment 16/16
FormativeAssessment 637
KULIK,-L. C., KULIK,. A. & BANGERT-DROWNS,. L. (1990) Effectivenessof mastery learning
programs: a meta-analysis, Review of Educational Research, 60, pp. 265-299.
LEE,C. (2000) The King's MedwayOxfordFormativeAssessmentProject: tudyingchangesin the
practiceof two teachers.Paper presentedat symposiumentitled 'Gettinginside the black box:
formative assessment in practice' at the British Educational Research Association 26th AnnualConference,CardiffUniversity.
MOSTELLER,. W. & TUKEY, . W. (1977) Data Analysis and Regression: a second course in statistics
(Reading,MA, Addison-Wesley).NATIONALURRICULUMASKGROUPONASSESSMENTNDTESTING1988a) A Report (London, Depart-
mentof Education ndScience).NATIONALURRICULUMASKGROUP NASSESSMENTNDTESTING1988b) Three Supplementary Reports
(London,Departmentf Education ndScience).NATRIELLO,. (1987) The impact of evaluation processes on students, Educational Psychologist, 22,
pp. 155-175.
NUTHALL,. & ALTON-LEE,. (1995) Assessing classroom learning: how students use their knowledge
andexperienceo answer lassroom chievementestquestionsn scienceandsocialstudies,AmericanEducational Research Journal, 32, pp. 185-223.
PERRENOUD,. (1991) Towardsa pragmaticapproacho formativeevaluation,n: P. WESTONEd.)Assessment of Pupil Achievement, pp. 79-101 (Amsterdam, Swets & Zeitlinger).
PICKERING,. (1995) The Mangle of Practice: time, agency, and science (Chicago, IL, University of
ChicagoPress).REEVES,., MCCALL,. & MACGILCHRIST,. (2001) Change leadership: planning, conceptualization and
perception, in: J. MACBEATH P. MORTIMOREEds) Improving School Effectiveness, pp. 122-137
(Buckingham,Open UniversityPress).SCRIVEN,M. (1967) The Methodology of Evaluation (Washington,DC, AmericanEducationalResearch
Association).SHAYER,M. & ADEY, P. S. (1981) Towards a Science of Science Teaching: cognitive development and
curriculum demand (London, Heinemann).
SLAVIN, R. E. (1990) Achievement ffects of abilitygrouping n secondary chools:a best evidence
synthesis, Review of Educational Research, 60, pp. 471-499.
STOKES, . E. (1997) Pasteur's Quadrant:basic science and technological innovation (Washington, DC,
Brookings nstitutionPress).THATCHER,. (1993) The Downing Street Years (London, Harper Collins).
TORRANCE,. (1993) Formativeassessment: some theoreticalproblemsand empiricalquestions,Cambridge Journal of Education, 23, pp. 333-343.
TORRANCE,. & PRYOR, . (1998) Investigating Formative Assessment (Buckingham, Open University
Press).TOULMIN,. (2001) Return to Reason (Cambridge, MA, HarvardUniversity Press).
TUNSTALL,. & GIPPS,C. (1996) Teacherfeedbackto young childrenin formativeassessment:a
typology, British Educational Research Journal, 22, pp. 389-404.
WILIAM, D. (2001) Level Best? Levels of Attainment in National Curriculum Assessment (London,Associationof TeachersandLecturers).
WILIAM, D. (2003) Theimpactof educationalesearch n mathematicsducation,n: A. BISHOP,. A.
CLEMENTS,. KEITEL,. KILPATRICK F. K. S. LEUNG Eds) Second International Handbook ofMathematics Education, pp. 469-488 (Dordrecht, Kluwer).
WILIAM, ., LEE,C., HARRISON,. & BLACK,P. (2004) Teachers developing assessment for learning:
impact on student achievement. To appear in Assessment in Education: Principles, Policy and
Practice, 11(2).
WILSON,. M. & BERNE,. (1999) Teacher earningand the acquisition f professionalknowledge:anexaminationof researchon contemporary rofessionaldevelopment,n: A. IRAN-NEJADP. D.PEARSONEds) Review of Research in Education, pp. 173-209 (Washington, DC, American Educa-
tional ResearchAssociation).