when enough is enough: a conceptual basis for fair and defensible practice performance assessment
TRANSCRIPT
When enough is enough: a conceptual basis for fairand defensible practice performance assessment
L W T Schuwirth,1 L Southgate,2 G G Page,3 N S Paget,4 J M J Lescop,5 S R Lew,6 W B Wade7 &M Baron-Maldonado8
Introduction An essential element of practice perform-
ance assessment involves combining the results of var-
ious procedures in order to see the whole picture. This
must be derived from both objective and subjective
assessment, as well as a combination of quantitative
and qualitative assessment procedures. Because of the
severe consequences an assessment of practice per-
formance may have, it is essential that the procedure is
both defensible to the stakeholders and fair in that it
distinguishes well between good performers and und-
erperformers.
Lessons from competence assessment Large samples of
behaviour are always necessary because of the domain
specificity of competence and performance. The test
content is considerably more important in determining
which competency is being measured than the test
format, and it is important to recognise that the process
of problem-solving process is more idiosyncratic than
its outcome. It is advisable to add some structure to the
assessment but to refrain from over-structuring, as this
tends to trivialise the measurement.
Implications for practice performance assessment A practice
performance assessment should use multiple instru-
ments. The reproducibility of subjective parts should
not be increased by over-structuring, but by sampling
through sources of bias. As many sources of bias may
exist, sampling through all of them may not prove
feasible. Therefore, a more project-orientated approach
is suggested using a range of instruments. At various
timepoints during any assessment with a particular
instrument, questions should be raised as to whether
the sampling is sufficient with respect to the quantity
and quality of the observations, and whether the totality
of assessments across instruments is sufficient to see
�the whole picture�. This policy is embedded within a
larger organisational and health care context.
Keywords clinical competence ⁄ *standards; physicians,
family ⁄ *standards; education, medical ⁄ *standards;
quality of health care ⁄ standards.
Medical Education 2002;36:925–930
Introduction
The area of practice performance assessment is relat-
ively new in the field of medical assessment. However, it
has been very high on the agenda in the last decade,
because of the possibilities that assessment offers for
improving quality of patient care and because of the
demonstrated limitations of competence assessment
procedures. Moreover, it has received additional focus
as a result of societal concerns about the quality of
practising doctors. Many major medical boards have
defined standards for good medical practice and are now
seeking useful instruments to assess whether practising
doctors meet these standards.1–4 While these instru-
ments may be used as screening tools to detect areas of
strength and weakness in order to guide remediation, in
some cases the purpose is to decide whether the assessee
is still fit for practice. Clearly the consequences in the
latter case may have enormous impact both on society
and on the assessee; therefore rigour in the assessment
procedures and the determination of outcomes is
1Department of Educational Development and Research, University
of Maastricht, The Netherlands, 2Centre for Health Informatics and
Multiprofessional Education, University College London, UK,3Department of Medicine, Division of Educational Support and
Development, University of British Columbia, Canada, 4Royal
Australasian College of Physicians, Sydney, Australia, 5College des
Medecins de Quebec, Montreal, Canada, 6Royal Australian College of
General Practitioners, Melbourne, Australia, 7Royal College of
Physicians, London, UK, 8Department of Physiology, University of
Alcala, Madrid, Spain
Correspondence: L W T Schuwirth, MD PhD, Maastricht University,
Department of Educational Development and Research, PO Box 616,
6200 Maastricht, The Netherlands. Tel.: 00 31 43 388 1129; Fax: 00
31 43 388 4140; E-mail: [email protected]
Papers from the 10th Cambridge Conference
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:925–930 925
essential. Not only should the procedures used in
practice performance assessment be defensible, but they
must also inspire confidence and be congruent with best
evidence-based medical educational standards.
Although existing assessment methods, both tradi-
tional and new, are being suggested and trialed for
performance assessment, the value of many of them for
this purpose has not been demonstrated. The findings
from competence assessment make it highly unlikely
that one superior assessment instrument will be devel-
oped for practice performance assessment, given that
the goal in such assessments is to see the whole picture
of a practitioner’s performance. Instead, it is more likely
that a varied palette of methods will be necessary to
achieve this goal.5,6 In this paper we will suggest an
approach to selecting, using and combining methods to
set up a practice performance assessment that will
illuminate the entirety of a practitioner’s performance,
presenting a picture which is accurate and defensible.
Before doing so, however, it would be useful to
distinguish between competence and performance.
Many different definitions of competence and perform-
ance have been proposed, but they mainly converge on
the notion that competence indicates what people will
do under optimal conditions, knowing that they are
challenged to demonstrate that they have the particular
knowledge, skills and attitudes required for a task.
Performance indicates how people will behave when
unobserved, in real life, on a day-to-day basis.7 Further
agreement seems to exist that competence is a neces-
sary but not sufficient requirement for performance.8,9
In other words, performance can be seen as the result of
competence combined with the conditions which both
enable and impose boundaries on the practitioner.
Because competence and performance are strongly
related, some of the lessons learnt from competence
assessment can serve as initial guides to assessing
performance.
Lessons from competence assessment
Domain specificity requires large samples.
The domain specificity of many competencies consti-
tutes a large threat to the reproducibility of assessment
results.10 Although intuitively it is often assumed that
competencies are stable, generic traits that, once
mastered, can be used in any given situation, the
opposite has been proven.10,11 This has particularly
been studied in the field of medical problem solving,
where the result obtained on one case proved to be a
poor predictor for the result on any other given case.
For this reason, large samples of cases are needed to
achieve sufficient reliability of the assessment.12
The test content, rather than the format, decides which
competency is being measured and it is not possible for any
one format to assess all aspects of medical competence.
In terms of validity or what the assessment really
measures, the format appears to be relatively unimpor-
tant.13–16 If the same content is assessed, it is not how
things are asked but what things are asked that decides
which competency is being measured. It is logical that
certain formats are better for certain content, but there
is certainly no single method that can do it all. A
complete assessment package for competence assess-
ment must therefore consist of a variety of methods,
each chosen on the basis of its effectiveness in assessing
a particular aspect of competence.17,18
The problem-solving process is more idiosyncratic than the
outcome.
When presented with the same problem, different
experts will suggest different strategies to solve it,
although they may come to the same solution.19,20
Each individual strategy is determined by the expert’s
individual experiences and organisation of his or her
knowledge, and idiosyncrasy tends to increase with
increasing expertise. Therefore, an outcome-based ap-
proach will be more useful for high stakes assessments
of experts than a process-based approach. In other
words, the quality of a doctor’s clinical decisions is often
a better measure of his or her medical competence than
the reasoning process leading to the decisions.
Some structure in assessment adds a lot, but too much
structure loses ground.
Structuring an assessment lightly can make enormous
improvements in reproducibility.21 Adding too much
Key learning points
A combination of assessment instruments (both
objective measurements and subjective
judgements) is necessary to achieve fair and
defensible practice performance assessment.
In order to be able to view the whole picture of a
candidate’s performance, adequate sampling
through possible bias and error sources is more
effective than the sole use of objective instruments.
A project management approach can contribute
towards making a broad sampling feasible and
resource-effective.
Careful planning and production of a written
project plan is essential in fair and defensible
practice performance assessment.
Making practice performance assessment fair and defensible • L W T Schuwirth et al.926
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:925–930
structure, however, trivialises the measurement. Some-
times relevant facets of an assessment, such as rapport
with the patient, cannot be structured or made objective
and thus assessment designers may resolve the issue by
ignoring them. It is important to recognise that some
elements of clinical behaviour are more subjective than
others and cannot be assessed objectively in an assess-
ment method. This is not an argument against precision,
but a challenge to assessment developers not to apply
objectivity as the sole approach for reproducibility.
Performance assessment as a method of judging the
whole picture
Given that the purpose of practice performance assess-
ment is to see the whole picture, pointillist painting
serves as a useful metaphor. In such painting, the
quality of each of the dots, as well as their quantity,
must be sufficient for their purpose. Moreover, the
relationship between the dots is essential to facilitating
our view of the whole picture. We suggest a conceptual
approach based on this metaphor for making practice
performance assessment procedures both fair and
defensible. Fairness is used here as the level of accuracy
to which decisions about candidates (e.g. �good per-
formance�, �in need of remediation�, �poor perform-
ance�) can be made. This implies an approach in which
assessment is an integrated view of the quality of the
dots (analogous to assessment methods), the quantity
of the dots (analogous to the number of items ⁄obser-
vations per method) and the relationship between the
dots (analogous to the combination of the different
assessment methods).
Implications for the concept of performance assessment
In setting up a fair and defensible performance assess-
ment procedure, especially when it is being used for
high stakes decisions, it is advisable to incorporate these
lessons. Procedures for performance assessment should
consist of:
1 obtaining sufficiently large samples of practice;
2 with sufficiently large variety of methods;
3 with a main focus on outcomes, and
4 with a judicious blend of structure ⁄objectivity and
subjective methods.
A popular misconception about subjectivity exists in
assessment. It is often thought that subjectivity is
synonymous with unreliability, and that objectivity is
synonymous with reliability. As a consequence, we
might surmise that the only way to improve reliability is
to add structure to the measurement and to make the
assessment more objective. However, this risks trivial-
ising the assessment rather than improving it. Often it is
more effective to stick to subjective judgements, but to
sample across error-sources. If, for example, the judge’s
bias negatively influences reproducibility, it is better to
collect independent judgements from many different
judges than to produce overly detailed checklists.
Returning to the metaphor of pointillist painting, the
artistic quality of a pointillist painting is determined not
only by the number of dots and the quality of each of
the individual dots (all of which may be different from
one another), but, more importantly, by the combina-
tion of the dots. In judging the artistic value of a
painting, application of rating scales for the quality of
the dots and their quantity would not be sufficient; a
judgement of the whole picture is necessary. A more
rational approach when trying to assess the artistry of a
painter would be to have 10 different experts look
independently at 10 paintings from the artist. This
would lead to a matrix of 100 judgements, each of
which might be subjective or qualitative. The average
judgement however, would be generalisable, even
though no detailed yes-no criteria were developed for
artistry. The same argument applies to judgements of
practice performance. If judgements and assessments
are collected which sample through all possible sources
of error, the end result may be highly reproducible
despite the subjectivity of some of the individual
observations. As noted earlier, some structure in the
form of general criteria, and the fact that the judgement
is based on concrete observations is important in this
respect.
Implications for the practice of performance assessment
Several sources of bias may exist in judgements and
assessment of practice performance. The most obvious
is the personal bias of the judge. The judge may be
harsh or lenient, or may even have a personal liking or
dislike for the assessee. But there are other sources of
error. The specific time frame of the assessment, the
assessment methods used, the specific selection of
patients, the specific selection of domains or elements
of performance, the selection of tasks, the specific
occasions – these are all examples of context. Ideally
a performance assessment procedure would sample
across all these sources of error or bias. However
conceptually appealing as this may be, it is not feasible.
In order to be able to see the whole picture a seven-
dimensional blueprint would be needed, defining the
sample content and sample size for each of the biases or
error sources mentioned above. This is not possible in
practice, as each individual assessment would simply be
too extensive.
Making practice performance assessment fair and defensible • L W T Schuwirth et al. 927
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:925–930
A more efficient approach would involve including
decision points in the practice performance assessment
procedure and determining at each point when suffi-
cient information has been collected to see the whole
picture. At these decision points, four questions should
be considered.
Is the quality of the individual sample sufficient?
For each element of performance, certain assessment
methods can be used. It is important to make a rational
decision in choosing the method for the specific
element. If decision-making is considered a perform-
ance element of interest, a chart review may be more
useful than under-cover simulated patients. If patient
education is the aim of the sampling, assessing the
information retained by the patient may be more
informative than videotaping the patient education
process in the consultation room.
Is the quantity of the individual sample sufficient?
There are two important aspects to this question.
Firstly, the quantity of the sample should be large
enough to judge the purported element of performance
(not to provide a general judgement about the
assessee’s performance). Secondly, the decision to
collect more evidence should be based on the results
of the previous judgements. If, for example, the
judgements about communication behaviour in the
previous 10 observations are excellent, it is highly
unlikely that the next will be far below standard. In
determining whether or not to go on collecting
observations or asking items, a binomial approach
is more helpful than a standard generalisability ap-
proach. After each observation or item, the chance for
the next item to give contradictory information can be
calculated. If, on the other hand, the 10 judgements
are inconclusive, more evidence is needed. Therefore,
Figure 1 A decision schematic for effi-
cient sampling in practice performance
assessment.
Making practice performance assessment fair and defensible • L W T Schuwirth et al.928
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:925–930
the size of the sample can vary from situation to
situation.
Is the quality of the samples sufficient?
The instruments used to collect evidence for good or
bad performance should be sufficiently diverse. Meth-
ods should be selected on their merits. Avoiding the use
of overlapping methods will avoid redundancy. On the
contrary, the combination of methods that will cover a
good range of the elements of the whole picture should
be selected.
Is the quantity of the samples sufficient?
Good coverage of the whole picture can only be
obtained when sufficient methods are used. The use
of previous results to determine whether or not more
methods should be used is also relevant to this decision.
Figure 1 shows a schematic for the procedure of a
practice performance assessment.
When is enough enough?
The approach described above demonstrates the need
for a detailed and structured plan for the set-up of an
assessment procedure. We would like to support the
idea of incorporating the approach into a larger plan of
attack. This plan should be described in a formal
document, in much the same way as project planning.
This document would then serve as the basis for
defending the approach chosen for practice perform-
ance assessment. The main considerations to describe
in this document would be:
1 the purposes of the assessment procedure and how
the process is tailored to meet the purposes as closely
as possible;
2 the regulatory structure of the assessment, dealing
with consequences of decisions, appeal possibilities;
3 quality control measures for methods, domains,
judges, tasks, time frame;
4 cost-effectiveness or, better still, investment-benefit
analysis;
5 rationale and ⁄or scientific underpinning of the
choices made;
6 relationships between evidence (defined as the col-
lection of judgements and outcomes), criteria (de-
fined as what ideal judgements and outcomes should
be) and standards (defined as qualities the assessee
should have).
In a larger framework, practice performance assess-
ment would entail a process that starts with the writing
of a plan containing the above-mentioned elements of
defensibility and honesty. The implementation of the
plan should entail a careful evaluation of the outcomes,
which in turn may have consequences for the assessee
or for the assessment plan itself. Figure 2 presents this
proposal schematically.
Conclusion
In this paper we have suggested and defended a
systematic approach towards the set-up of a perform-
ance assessment process. We suggest that defensibility
of such a programme would be better served by careful
selection of a variety of quantitative and qualitative
instruments and careful monitoring of their values,
rather than by trying to design individual instruments
that are made as objective as possible. In performance
assessment, it is more important to view the whole
picture than to examine individual �dots�. It is essential
that some of the lessons learned from competence
assessment and best evidence medical education be
applied in this area.
Acknowledgements
Grateful acknowledgement is made to the sponsors of
the 10th Cambridge Conference: the Medical Council
of Canada, the Smith & Nephew Foundation, the
American Board of Internal Medicine, the National
Board of Medical Examiners and the Royal College of
Physicians.
plan
purposes
elements of defensibility
quality control measures
pathwaysimplementation
judgement
consequences
assessee
Figure 2 Organisational schematic for the implementation of a
practice performance assessment.
Making practice performance assessment fair and defensible • L W T Schuwirth et al. 929
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:925–930
References
1 Jolly B, McAvoy P, Southgate L. GMC’s proposals for
revalidation. Effective revalidation system looks at how
doctors practise and quality of patients’ experience. BMJ
2001;322:358–9.
2 Southgate L, Dauphinee D. Maintaining standards in British
and Canadian medicine: the developing role of the regulatory
body. BMJ 1998;319:697–700.
3 Southgate L, Pringle M. Revalidation in the United Kingdom:
general principles based on experience in general practice.
BMJ 1999;319:1180–3.
4 Southgate L, Hays R, Norcini J, Mulholland H, Ayers B,
Woolliscroft J et al. Setting performance standards for medical
practice: a theoretical framework. Med Educ 2001;35:474–81.
5 Southgate L, Cox J, David T, Hatch D, Howes A, Johnson N
et al. The assessment of poorly performing doctors: the
development of the assessment programmes for the General
Medical Council’s Performance Procedures. Med Educ
2001;35:2–8.
6 Southgate L, Cox J, David T, Hatch D, Howes A, Johnson N
et al. The General Medical Council’s performance procedures:
peer review of performance in the workplace. Med Educ
2001;35:9–19.
7 Rethans J, Sturmans F, Drop M, Van der Vleuten C.
Assessment of performance in actual practice of general
practitioners by use of standardized patients. Br J General Prac
1991;41:97–9.
8 Miller GE. The assessment of clinical skills ⁄ competence ⁄performance. Acad Med 1990;65:S63–7.
9 Southgate L, Campbell M, Cox J, Foulkes J, Jolly B,
McCrorie P et al. The General Medical Council’s perform-
ance procedures: the development and implementation of
tests of competence with examples from general practice. Med
Educ 2001;35:20–8.
10 Elstein AS, Shulmann LS, Sprafka SA. Medical Problem-
Solving: an Analysis of Clinical Reasoning. Cambridge,
Massachusetts: Harvard University Press; 1978.
11 Chi MTH, Glaser R, Rees E. Expertise in problem solving.
In: Sternberg RJ, ed. Advances in the Psychology of Human
Intelligence. Hillsdale, New Jersey: Lawrence Erlbaum;
1982:7–76.
12 Swanson DB. A measurement framework for performance-
based tests. In: Hart I, Harden R, eds. Further Developments in
Assessing Clinical Competence. Montreal: Can-Heal Publica-
tions; 1987:13–45.
13 An evaluation of the construct validity of four alternative theories of
clinical competence. Proceedings of the 25th Annual RIME Con-
ference. Chicago: AAMC; 1986.
14 Norman G, Tugwell P, Feightner J, Muzzin L, Jacoby L.
Knowledge and clinical problem-solving. Med Educ
1985;19:344–56.
15 Norman GR, Smith EKM, Powles AC, Rooney PJ, Henry
NL, Dodd PE. Factors underlying performance on written
tests of knowledge. Med Educ 1987;21:297–304.
16 Norman GR. Reliability and construct validity of some cog-
nitive measures of clinical reasoning. Teaching Learning Med
1989;1:194–9.
17 Ram P. Comprehensive Assessment of General Practitioners.
Maastricht: University of Maastricht; 1998.
18 Van der Vleuten CPM. The assessment of professional com-
petence: developments, research and practical implications.
Adv Health Sci Education 1996;1:41–67.
19 Polsen P, Jeffries R. Expertise in problem solving. In: Stern-
berg RJ, ed. Advances in the Psychology of Human Intelli-
gence. Hillsdale, New Jersey: Lawrence Erlbaum; 1982:367–
411.
20 Swanson DB, Norcini JJ, Grosso LJ. Assessment of clinical
competence: written and computer-based simulations.
Assessment Evaluation Higher Education 1987;12:220–46.
21 Frijns P. Scoringsmodellen Voor Open-Vraag Vormen [Scoring
Models for Open-Ended Question Formats]. Maastricht: Uni-
versity of Maastricht; 1992.
Received 21 March 2002; editorial comments to authors 13 June 2002;
accepted for publication 17 June 2002
Making practice performance assessment fair and defensible • L W T Schuwirth et al.930
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:925–930