peering through the looking glass: towards a programmatic view of the qualifying examination
TRANSCRIPT
|||
André De Champlain, PhD
Director, Psychometrics and Assessment Services
Presented at the MCC Annual General Meeting | 27 September 2015
Peering Through the
Looking Glass: Towards a
Programmatic View of the
Qualifying Examination
||
A Brave New Journey…. Programmatic Assessment
2MCC Annual Meeting – September 2015
||
Guiding Vision: The Assessment Review Task Force
3
6 ARTF RECOMMENDATIONSRecommendation 1
• LMCC becomes ultimate credential (legislation issue)
Recommendation 2
• Validate and update blueprint for MCC examinations
Recommendation 3
• More frequent scheduling of the exams and associated automation
Recommendation 4
• IMG assessment enhancement and national standardization (NAC & Practice Ready Assessment)
Recommendation 5
• Physician practice improvement assessments
Recommendation 6
• Implementation oversight, committee priorities, and budgets
MCC Annual Meeting – September 2015
|||
Validate and Update Blueprint for MCC Examinations
ARTF Recommendation #2
||
Blueprint Project
6
That the content of MCC examinations be expanded by:
• Defining knowledge and behaviours in all CanMEDS Roles that demonstrate competency of the physician about to enter independent practice
• Reviewing adequacy of content and skill coverage on blueprints for all MCC examinations
• Revising examination blueprints and reporting systems with aim of demonstrating that appropriate assessment of all core competencies is covered and fulfills purpose of each examination
• Determining whether any general core competencies considered essential cannot be tested employing the current MCC examinations, and exploring the development of new tools to assess these specific competencies when current examinations cannot
MCC Annual Meeting – September 2015
||
Addressing Micro-Gaps
7
A number of efforts are underway to assess how
the current MCCQE Parts I & II might evolve
towards better fulfilling the MCCQE blueprint
• New OSCE stations focusing on more generic skills and complex
presentations expected of all physicians, irrespective of
specialty, have been piloted with great success in the fall 2014
and spring 2015 MCCQE Part II administrations
• Potential inclusion of innovative item types, including situational
judgment test challenges, is also under careful consideration and
review
• From a micro-gap perspective, the MCCQE is better aligning
towards the two dimensions outlined in our new blueprint MCC Annual Meeting – September 2015
||
Tacit Sub-Recommendation: Macro-Analysis
8
An intimated challenge outlined in the ARTF
report pertains to the need to conduct a
macro-analysis and review of the MCCQE
• Applying a systemic (macroscopic) lens to the MCCQE as an
integrated examination system and not simply as a restricted
number of episodic “hurdles” (MCCQE Parts I &II)
• How are the components of the MCCQE interconnected and
how do they inform key markers along a physician’s
educational and professional continuum?
• How can the MCCQE progress towards embodying an
integrated, logically planned and sequenced system of
assessments that mirrors the Canadian physician’s journey? MCC Annual Meeting – September 2015
||
Key Recommendation: MEAAC
9
• Medical Education Assessment Advisory Committee (MEAAC)
• MEAAC was a key contributor to our practice analysis (blueprinting) efforts through their report, Current Issues in Health Professional and Health Professional Trainee Assessment
• Key recommendations
◦ Explore the implementation of an integrated & continuous model of assessment (linked assessments)
◦ Continue to incorporate “authentic” assessments in the MCCQE
– OSCE stations that mimic real practice
– Direct observation based assessment to supplement MCCQE Parts I & II
MCC Annual Meeting – September 2015
|||
Framework for a Systemic Analysis
of the MCCQE
|| 11
• Calls for a “deliberate”, arranged set of longitudinal
assessment activities
• Joint attestation of all data points for decision and remediation
purposes
• Input of expert professional judgment is a cornerstone of this
model
• (Purposeful) link between assessment and learning/remediation
• Dynamic, recursive relationships between assessment and
learning points
Programmatic Assessment (van der Vleuten et al., 2012)
MCC Annual Meeting – September 2015
|| 12
• Application of a program evaluation framework to assessment
• Systematic collection of data to answer specific questions about
a program
• Gaining in popularity within several medical education settings
• Competency-based workplace learning
• Medical schools (e.g., Dalhousie University, University of
Toronto, etc.)
• Etc.
Programmatic Assessment (van der Vleuten et al., 2012)
MCC Annual Meeting – September 2015
|| 13
Reductionism
• A system reduces to its
most basic elements (e.g.,
corresponds to the sum of
its parts)
• Decision point I =
MCCQE Part I
• Decision point II –
MCCQE Part II
Emergentism
• A system is more than the sum of its parts & also depends on complex interdependencies amongst its component parts
• Decision point I: Purposeful integration of MCCQE Part I scores with other data elements
• Decision point II: Purposeful integration of MCCQE Part II scores with other data elements
VS.
Programmatic Assessment Refocuses the Debate
MCC Annual Meeting – September 2015
||
Can a Programmatic Assessment Framework Be
Applied to the MCCQE?
14
• Programmatic assessment is primarily restricted to local
contexts (medical school, postgraduate training
program, etc.)
• What about applicability to high-stakes registration/
licensing exam programs?
• The model is limited to programmatic assessment in the
educational context, and consequently licensing assessment
programmes are not considered” (van der Vleuten et al.; 2012; p. 206)
MCC Annual Meeting – September 2015
||
Can a Programmatic Assessment Framework Be
Applied to the MCCQE?
15
• Probably not as conceived due to differences in:
• Settings (medical school vs. qualifying exam)
• Stakes (graduation vs. licence to practise as a physician)
• Outcomes
• Interpretation of data sources
• Nature of the program and its constituent elements
MCC Annual Meeting – September 2015
||
Can the Philosophy Underpinning Programmatic
Assessment Be Applied to the MCCQE?
16MCC Annual Meeting – September 2015
||
How can the MCCQE Evolve?
17
• At its philosophical core, from: (1) an episodic system of two point-in-
time exams to; (2) an integrated program of assessment, continued to be
supported by best practice and evidence, which includes:
• The identification of data elements aimed at informing key decisions and
activities along the continuum of a physician’s medical education
• Clearly laid out relationships that are exemplified by the interactions
between those elements, predicated on a clearly defined program (the
MCCQE)
• Defensible feedback interwoven at key points in the program
• The $64,000 question: What does the MCCQE program of
assessment look like (actually, the $529,153.80 question)?
MCC Annual Meeting – September 2015
||
D2
In PracticePostgraduate Training
Assessment Continuum for the Canadian Trainees
Undergraduate Education
Continuing Professional Development
D1
Fu
ll
Lic
.
UGME Assessments Potentially
Leading to MCC BP Decision Point 1 in
Clerkship:
• SR-items (17/17)
• CR-items (16/17)
• OSCE (17/17)
• Direct observation reports (14/17)
• In-training evaluation (13/17)
• Simulation (10/17)
• MSF/360 (6/17)
• Others (11/17)
PPI (FMRAC)
• Assessment of practice
• Audits
CFPC:
• Direct obs.
• CR-items
(SAMPs)
• Structured
orals
Royal College
32 Entry Specialties:
• ITEs
• Direct observation
• SR-items
• CR-items
• OSCE/orals
• Simulations
• Chart audits
18
|| 19
• What constitutes the learning/assessment continuum for physicians from “cradle to grave” (UGME to PPI)?
• At a pan-Canadian level:
◦ A temporal timeline is a necessary, but insufficient condition, for better understanding the lifecycle of a physician
◦ What competencies do physicians develop throughout this life cycle?
◦ What behavioural indicators (elements) best describe “competency” at various points in the life cycle?
◦ How are these competencies related (both within and across)?
◦ How do these competencies evolve?
◦ Etc.
• All of these questions are critical in better informing the development of a programmatic model for the LMCC
Major Step Towards a Programmatic View of the MCCQE
MCC Annual Meeting – September 2015
|| 20
November Group on Assessment (NGA)
• Purpose
• To define the “life of a physician” from the beginning of medical school to retirement in terms of assessments
• To propose a common national framework of assessment using a programmatic approach
• Composition
• Includes representation from the AFMC, CFPC, CMQ, FMRAC, MCC, MRAs and Royal College
• First step
• Physician pathway
First Step Towards a Programmatic View of the MCCQE
MCC Annual Meeting – September 2015
|| 21
• Summit to define a program of assessment• Planned for first quarter of 2016
• Starting points to develop a program of assessment◦ Various ongoing North American EPA projects
◦ Milestone projects (Royal College, ACGME)
◦ CanMEDS 2015
◦ MCCQE Blueprint!
◦ … and many others
• Critical to develop an overarching framework (program) prior to specifying elements and relationships of this program
November Group on Assessment: Next Step
MCC Annual Meeting – September 2015
|||
Validating a Program of Assessment
||
Appeal
23
• Emphasis is on a composite of data elements (quantitative and
qualitative) to better inform key educational decisions as well as
learning
• Additional intricacy of including both micro-level (elements) and
macro-level (complex system of interrelated elements) indicators
adds an extra layer of complexity in the MCCQE validation process
• Systemic nature of programmatic assessment requires validating
not only the constituent elements (various data points) but also the
program in and of itself
• How do we proceed?
MCC Annual Meeting – September 2015
|| 24MCC Annual Meeting – September 2015
||
Standards for Educational and Psychological Testing (2014)
25
Key Objectives
• Provide criteria for the development and evaluation of tests and testing practices and to provide guidelines for assessing the validity of interpretations of test scores for the intended test uses
• Although such evaluations should depend heavily on professional judgment, the Standards provides a frame of reference to ensure that relevant issues are addressed
MCC Annual Meeting – September 2015
|| 26
Assessing the Foundational Properties of a Program of
Assessment
1• Reliability
2• Validity
MCC Annual Meeting – September 2015
|| 27
• “Test” Score: A reminder
• Any assessment, by virtue of practical constraints (e.g., available
testing time), is composed of a very restricted number of items,
stations, tasks that comprise the domain of interest
• My WBA program includes 12 completed mini-CEX forms
• But as a test score user, are you really interested in the
performance of candidates in those 12 very specific
encounters? No!
Reliability
MCC Annual Meeting – September 2015
|| 28
• You’re interested in generalizing from the performance in those
12 very specific encounters to the broader domains of interest
• Reliability provides us with an indication of the degree of
consistency (or precision) with which test scores and/or
decisions are being measured by a given examination (sample
of OSCE stations, sample of MCQs, sample of workplace-based
assessments, etc.)
Reliability
MCC Annual Meeting – September 2015
|| 29
• Measurement error arises from multiple sources (multifaceted)
• For a WBA, measurement error could be attributable to:
◦ Selection of a particular set of patient encounters
◦ Patient effects
◦ Occasion effects
◦ Rater effects
◦ Setting (if given at multiple locations)
• Need to clearly identify these sources and address them a priori
• The impact of all of these sources needs to be estimated
Reliability
MCC Annual Meeting – September 2015
||
Reliability of a Program of Assessment?
30
• Programmatic assessment is predicated on the notion that many purposefully selected and arranged data elements contribute to the evaluation of candidates
• In addition to assessing the reliability of each element in the system, the reliability of scores/decisions based on this composite of measures therefore needs to be assessed
• Models:
◦ Multivariate generalizability theory (Moonen van-Loon et al., 2013)
◦ Structural equation modeling
MCC Annual Meeting – September 2015
||
Validity: What It Is
31
Validity is an overall evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores or other modes of assessment. (Messick, 1989)
MCC Annual Meeting – September 2015
||
Validity: What It’s Not
32
There is no such thing as a valid or invalid exam or assessment• Statements such as “my mini-CEX shows
construct validity” are completely devoid of meaning
• Validity refers to the appropriateness of inferences or judgments based on test scores, given supporting empirical evidence
MCC Annual Meeting – September 2015
|| 33
1• State the interpretive argument as clearly as possible
2• Assemble evidence relevant to the interpretive argument
3• Evaluate the weakest part(s) of the interpretive argument
4• Restate the interpretive argument and repeat
• Five key arguments
Validity: Kane’s Framework (1992)
MCC Annual Meeting – September 2015
|| 34
Validity: Kane’s Five Key Arguments
1. Evaluation Argument
• The scoring rule is appropriate
• The scoring rule is applied accurately and consistently
• Evidence
• Clearly documented training, scoring rules and processes for elements included in program as well as for complex interactions among these components
2. Generalization Argument
• The sample of items/cases in the exam is representative of the domain (universe of items/cases)
• Evidence
• Practice analysis/blueprinting effort
MCC Annual Meeting – September 2015
|| 35
Validity: Kane’s Five Key Arguments
3. Extrapolation Argument
• Does the program of assessment lead to intended outcomes?
• Evidence
• Do the outcomes of the program (LMCC or not) relate to clear practice-based indicators as anticipated?
4. Explanation Argument
• Is the program of assessment measuring what was intended?
• Evidence
• Structural equation modeling, mapping of expert judgments, etc.
MCC Annual Meeting – September 2015
|| 36
Validity: Kane’s Five Key Arguments
5. Decision-Making Argument
• Is the program of assessment appropriately passing and failing the
“right” candidates?
• How do we set a passing standard for a program of assessment?
• Evidence
• Internal validity
◦ Documentation of process followed
◦ Inter-judge reliability, generalizability analyses, etc.
• External validity
◦ Relationship of performance on exam to other criteria
MCC Annual Meeting – September 2015
||
A Practical Framework (Dijkstra et al., 2012)
37
Collecting information:
• Identify the components of the assessment program (What?)
• Identify how component contributes to the goal of the
assessment program for stakeholders (Why?)
• Outline the balance between components that beast achieves
the goal of the assessment program for stakeholders (How?)
MCC Annual Meeting – September 2015
||
A Practical Framework (Dijkstra et al., 2012)
38
Obtaining support for the program:
• Significant amount of faculty development required to assure a
level of expertise in performing critical tasks (e.g., rating)
• The higher the stakes, the more robust procedures need to be
• Acceptability
• Involve and seek buy-in from key stakeholders
MCC Annual Meeting – September 2015
||
A Practical Framework (Dijkstra et al., 2012)
39
Domain mapping:
• Gather evidence to support that each assessment component
targets the intended element in the program (micro-level)
• Gather evidence to support that the combination of components
measures the overarching framework (macro-level)
MCC Annual Meeting – September 2015
||
A Practical Framework (Dijkstra et al., 2012)
40
Justifying the program:
• All new initiatives need to be supported by scientific
(psychometric) evidence
• Cost-benefit analysis undertaken in light of the purpose(s) of the
assessment program
MCC Annual Meeting – September 2015
|||
Some Additional Challenges
||
Narrative Data in the MCCQE
• Narrative (qualitative) data poses unique opportunities and challenges for inclusion into a high-stakes program of assessment (MCCQE)
• Opportunity
◦ Enhance the quality and usefulness of feedback provided at key points in the program
• Challenge
◦ How to best integrate qualitative data in a sound, defensible, reliable and valid fashion in a program of assessment that fully meets legal and psychometric best practice
• How can we better systematize feedback?
42MCC Annual Meeting – September 2015
||
Narrative Data in the MCCQE
43
• Automated Essay Scoring (AES)
• AES can build scoring models based on previously human-
scored responses
• AES relies on:
◦ Natural language processing (NLP) to extract linguistic features
of each written answer
◦ Machine-learning algorithms (MLA) to construct a mathematical
model linking the linguistic features and the human scores
• The same scoring model can be applied to new sets of answers
MCC Annual Meeting – September 2015
||
Narrative Data in the MCCQE
44
• AES of MCCQE Part I CDM write-in responses
• AES was used to parallel score 73 spring 2015 CDM write-ins
(LightSide)
• Overall human-machine concordance rate >0.90
◦ Higher for dichotomous items; lower for polytomous items
• Overall pass/fail concordance near 0.99, whether CDMs are
scored by residents or computer
• AES holds a great deal of promise as a means to
systematize qualitative data in the MCCQE program
MCC Annual Meeting – September 2015
||
Argument for Accreditation of Observation-Based Data
45
• Insufficient evidence to support the incorporation of “local”
(e.g., medical school) based scores (ratings) obtained
from direct observation in the MCCQE without addressing
a number of issues
• Examiner training, patient problem variability, etc.
◦ Issues may never be fully resolved to high-stakes assessment
standards
• However, accrediting (attestation) observational data
sources based on strict criteria and guidelines might be a
viable compromise MCC Annual Meeting – September 2015
||
Argument for Accreditation of Observation-Based Data
46
• Accrediting (with partners) all facets of observation-based
data sources will require meeting a number of agreed-
upon standards:
• Selection of specific rating tool(s) (e.g., mini-CEX)
• Adherence to a strict examiner training protocol
◦ Attestation that examiners have successfully met training targets
(online video training module)
• Sampling strategy (patient mix) based on agreed-upon list of
common problems (and MCCQE blueprint)
• Adherence to common scoring modelsMCC Annual Meeting – September 2015
||
Putting the Pieces Together
47
• At a programmatic level, how can we aggregate this combination
of low- and high-stakes data to arrive at a defensible decision both
for entry into supervised and independent practice?
• Standard setting process offers a defensible model that would
allow expert judgment to be applied towards the development of a
policy that could factor in all sources of data
• Empirical (substantively-based) analyses would then be carried out
to support & better inform (or even refute that policy)
• Structural equation modeling, multivariate generalizability analysis,
etc.
MCC Annual Meeting – September 2015
||
Next Steps
48
• Begin to lay foundation for a MCCQE program of assessment
• Define both the micro- and macro-elements that define a program
of assessment leading up to each MCCQE decision point
• Initial efforts led by the November Group on Assessment
• Agree on all supporting standards that need to be uniformly
adopted by all stakeholders:
• Accreditation criteria, where applicable
• Core tools and pool of cases to be adopted by all schools
• Training standards and clear outcomes for examiners
• Scoring and standard setting frameworks
MCC Annual Meeting – September 2015
||
Next Steps
49
• Collaborative pilot project framework with key partners
and stakeholders
• Formulate key targeted research questions needed to support the
implementation of a programmatic framework for the MCCQE
• Identify collaborators (e.g., UGME programs, postgraduate training
programs, MRAs, etc.) to answer specific questions from
investigations
• Aggregate information to better inform and support a programmatic
model of assessment for the MCCQE
MCC Annual Meeting – September 2015
|| 50
Would you tell me,
please, which way I ought
to go from here?
That depends a good deal
on where you want
to get to!
- Alice in Wonderland
MCC Annual Meeting – September 2015