ucr school of medicine writing & interpreting the...

83
UCR School of Medicine Writing & Interpreting the Results of High Quality Exam Multiple Choice Questions (MCQ) A Faculty Development Workshop Thursday – August 10, 2017 Lawrence Loo, MD, MACP Clinical Professor Medicine, UCR School of Medicine CME - No Relevant Financial Conflicts of Interest www.cartoonstock.com/.../mba/lowres/mban347l.jpg

Upload: phamnguyet

Post on 13-Apr-2018

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

UCR School of Medicine

Writing & Interpreting the Results

of High Quality Exam

Multiple Choice Questions (MCQ) A Faculty Development Workshop

Thursday – August 10, 2017

Lawrence Loo, MD, MACP Clinical Professor Medicine, UCR School of Medicine

CME - No Relevant Financial Conflicts of Interest

www.cartoonstock.com/.../mba/lowres/mban347l.jpg

Page 2: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“The Puppy Problem”

- The poodle has 9

puppies.

- The collie had 5

puppies.

- How many more

puppies does the

poodle have?

Students’ common

response . . .

“None”

Why?

“It said she had 9

puppies, but it

didn’t say she had

any more, so it’s

none.”

REVISED ITEM

The poodle has 9

puppies.

- The collie had 5

puppies.

- How many more

puppies does the

poodle have than

the collie?

Page 3: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Learning Objectives

At the end of this session, attendees will be able to

(1) Write high quality multiple choice questions

(MCQ) according to the recommendations of the

NBME (National Board of Medical Examiners)

(2) Construct well written MCQs that test “higher

order” cognitive skills rather than simply recall of

facts.

(3) Interpret the test item analysis provided to

assist instructors in improving their MCQs and

ultimately to facilitate student learning. 3

Page 4: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Resources National Board of Medical Examiners (NBME):

Constructing written test questions for the basic

and clinical sciences. 3rd & 4th Editions, 2002, 2016.

Swanson DB, Case SM: Assessment in basic science instruction.

Directions for practice and research. Adv Health Sci Educ Theory

Pract. 1997;2:71-84.

Haladyna TM: Developing and validating multiple-choice test items.

Lawrence Eribaum Assoc, 1994.

Considine J, Botti M, Thomas S: Design, format, validity and reliability

of multiple choice questions for use in nursing research and

education. Collegian. 2005; 12:19-4.

Cohen RJ, Swerdlik, ME: Psychological testing and assessment. 6th

Edition. McGraw-Hill Publications, 2005.

Lissitz RW, Samuelsen K: Dialogue on validity: a suggested change in

terminology and emphasis regarding validity and education.

Educational Researcher. 2007; 36:437-448.

Page 5: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Teaching & Learning includes three basic

actions:

– Set clear expectations

– Provide specific experiences

– Evaluate outcomes

Page 6: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Teaching & Learning includes three basic

actions:

– Set clear expectations

– Provide specific experiences

– Evaluate outcomes

Page 7: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Teaching without evidence of

learning is “entertainment,”

not learner engagement.

The fundamental purpose of

an educational program is

NOT demonstration of

faculty teaching

but evidence of student

learning.

Page 8: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Workshop Agenda

(1) Why do we test?

(2) What should we test?

(3) How should we test?

(4) What do the test results mean? – To me – the instructor: How can I improve my test?

– To the course: How can I improve my course?

– To the curriculum: How can I improve student

learning?

8

Page 9: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Why

do we test?

Page 10: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Purposes of Testing

Measuring attainment of educational objectives.

– For the individual instructor or course

– For the curriculum as a whole

Motivating students to learn.

Helping students identify what’s important.

Rewarding students for their efforts.

Provide a learning experience.

Preparing students for national certifying exams.

Determining final grades and /or making promotional

decisions.

Stratifying performance of students.

Page 11: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Purposes of Testing

Competency

Individual (Formative &

Summative Feedback)

Instructors (Individual & Block

Coordinators)

School of Medicine (Promotion & Curriculum

Management)

Medical

Knowledge

(MK)

Learning Teaching &

Learning

Promotion;

Curriculum

Management

Practice-based

Learning &

Improvement

(PBLI)

Assess personal

strengths/limitations;

Seeks constructive

feedback

Relationship

between MK &

clinical practice

Constructive Feedback

Critical Reasoning

Lifelong Learning

Systems-based

Practice (SBP)

How do health

care systems

promote learning

Methods of

Teaching Mission

Page 12: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

What should we test?

Page 13: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

What Should be Tested?

Exam content should align with course objectives.

Important topics should be weighted more heavily

than less important topics.

The testing time should reflect the relative

importance of the topic.

The sample items should be representative of the

instructional goals.

Page 14: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Bloom’s Taxonomy of Cognitive (Thinking) Skills

Page 15: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“Lower-order” Cognitive Skills

Knowledge is the memory of previously learned information

– Knowledge is measured by having students recall it.

Sample Knowledge Question:

– Which of the following types of cells lack functioning mitochondria?

A. Erythrocyte

B. Hepatocyte

C. Myocardiocyte

D. Astrocyte

Page 16: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“Low/High-order” Cognitive Skills

Comprehension is the ability to grasp the meaning of information.

– Comprehension is measured by having students explain or interpret the significance of information.

Sample Comprehension Question:

– The absence of which of the following is the most likely

explanation for why a mature red blood cell (RBC) is

unable to carry out beta-oxidation of fatty acids?

A. Mitochondria

B. Endoplasmic reticulum

C. Golgi apparatus

D. Intracellular oxygen

Page 17: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“Low/High-order” Cognitive Skills

Comprehension is the ability to grasp the meaning of information.

– Comprehension is measured by having students explain or interpret the significance of information. Tables, graphs & figures are often used.

Sample Comprehension Question: Which of the following pulmonary function tests is most consistent

with a patient with idiopathic pulmonary fibrosis (IPF). (Values are

percent predicted except for FEV1/FVC which is actual numeric value.)

A. FEV1 = 58%, FVC = 62%, FEV1/FVC = >70%, TLC = 68%, DLCO = 64%

B. FEV1 = 52%, FVC = 80%, FEV1/FVC = <70%, TLC = 110%, DLCO = 65%

C. FEV1 = 55%, FVC = 87%, FEV1/FVC = <70%, TLC = 100%, DLCO = 88%

D. FEV1 = 57%, FVC = 82%, FEV1/FVC = <70%, TLC = 70%, DLCO = 68%

E. FEV1 = 66%, FVC = 72%, FEV1/FVC = >70%, TLC = 75%, DLCO = 66%

Page 18: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“Higher-order” Cognitive Skills

Application is the ability to apply familiar information

to new situations.

– Application is measured by having students solve problems

not previously encountered.

Sample Application Question:

– A knock-out mouse is bred to be deficient in the carrier that

normally transports fatty acids from the cytoplasm into the

mitochondria. Compared to wild type mice, the knock-out

mouse would most likely be unable to generate which of the

following metabolites?

A. Glucose

B. Ketoacid

C. Lactic acid

D. Inositol triphosphate

Page 19: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“Higher-order” Cognitive Skills

Application is the ability to apply familiar

information to new situations.

– Application is measured by having students solve problems

not previously encountered.

Sample Application Question: What history is most

consistent with the following flow volume loop?

A. 69 year old male with 50 pack-year smoking history.

B. 29 year old South-African female with iodine deficiency.

C. 20 year old female 5 months after prolonged intubation.

D. 67 year old male with idiopathic pulmonary fibrosis.

E. 40 year old pulmonologist after giving a 3 hour lecture

to medical students.

Page 20: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“Higher-order” Cognitive Skills

Analysis is the ability to discern relationships

between the component parts of what is known.

– Analysis is measured by having students make

inferences based on similarities and differences.

Sample Analysis Question:

– Patient 1 has chronic bronchitis and chronic respiratory

acidosis. Patient 2 has chronic liver failure and chronic

respiratory alkalosis. Compared to patient 1, patient 2 is

more likely to have a lower value for which of the

following?

A. Plasma potassium concentration

B. Plasma ammonia concentration

C. Urine osmolality

D. Urine pH

Page 21: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

“Higher-order” Cognitive Skills

Synthesis is the ability to put different pieces of

information together to form a new whole.

– Synthesis is measured by having students formulate

or predict what will happen.

Sample Analysis Question: – A 29 year-old woman presents with fatigue and ankle edema.

Her PE shows that her JVP is 15 cm and she has 2+ pitting

edema in both lower legs. Her lung exam is normal. Her cardiac

exam shows that her 2nd heart sound is louder that her 1st heart

sound at the cardiac apex. A 4th heart sound is audible at the 4th

lower left sternal border. An increase of which of the following

pressures is most likely to found during cardiac catherterization?

A. Pulmonary capillary wedge pressure

B. Man left atrial pressure

C. Right ventricular end-diastolic pressure

D. Systemic blood pressure

Page 22: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

What level

of cognitive

skills are

being tested (according to

Bloom’s

taxonomy)? (Small Group

Exercise #1)

Page 23: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High
Page 24: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Basic Rules for Single-best Answer Questions

Recall versus Comprehension, Application, Analysis, Synthesis

Page 25: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

What level

of cognitive

skills are

being tested (according to

Bloom’s

taxonomy)? (Small Group

Exercise #1)

Page 26: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Small Groups Working

Page 27: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Large Group Discussion

Page 28: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Blueprinting Exams

Educational

Objectives Knowledge Comprehension Application Analysis Synthesis

Total

#

Objective #1 1 1 1 3

Objective #2 2 1 3

Objective #3 2 2 1 5

Objective #4 1 1

Objective #5 1 1 2

Total # 6 5 2 1 1

“Lower Order”

Cognitive Skills

“Higher Order”

Cognitive Skills

Page 29: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Characteristics of Well-Written Exams

Assess “higher-order” cognitive skills

– A minimum of 40% of items should assess “higher” order skills:

comprehension, application, analysis, synthesis

Emphasize practical / clinical relevance of what’s being

tested.

– Test items are developed around clinical vignettes

Want “good” students to get the correct answer more

frequently than the “bad” students.

Do not contain controversial subject matter where the

keyed answer can be subject to dispute.

Avoid “hinging” were students must know the answer to one question in order to answer another.

Do not contain technically-flawed questions that benefit

test-wise examinees.

Page 30: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

How

should we

test?

Page 31: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Advantages of Single Best Answer

Multiple Choice Questions

(1) Only format used by the NBME/USMLE

(2) Readily evaluates 5 cognitive skills – Knowledge, Comprehension, Application, Analysis and Synthesis

(3) Easy to score

(4) Influenced less by guessing / “gaming” the

system than other formats

– No True/False. No “K,” “C” type questions.

(5) Answers are subject to less dispute than

other formats. 3

1

Page 32: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High
Page 33: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

The Graveyard of NBME Test Formats

If use, make sure 100% correct without any exceptions.

Page 34: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

The Graveyard of NBME Test Formats

Page 35: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

The Graveyard of NBME Test Formats

Page 36: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Disadvantages of Single Best Answer

Multiple Choice Questions

(1) Depends on the recognition of the correct

answer and de-emphasizes the process

by which the answer was obtained.

(2) Performance may be affected more by

reading speed and comprehension than

on true-false and matching formats

(4) Unable to measure certain skills or

behaviors and competencies

– Cannot measure physical exam skills, written communi-

cation skills, professional attitudes and values, etc. 3

6

Page 37: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Component Parts of the Multiple Choice Exam Question

Keyed or correct answer All other options are Distractors or

the set of wrong answers.

“Item” is the question (stem + lead-in) and all possible options.

The part of the item that precedes the options

The part of the stem that asks the examinee to select something

The answer set from which a selection is to be made from

Page 38: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Refining Your Test Writing Skills:

Format & Structure

of Test Questions:

Looking for Technical Flaws

Page 39: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for Technical Flaws (from National Board of Medical Examiners: Constructing Written Questions for the Basic and Clinical

Science. 2002, 3rd Edition Revised; and ACP’s Prep for Boards 2: An Enhancement to MKSAP, 2005 )

(1)Grammatical cues: one or more of the distractors do

not follow grammatically from the stem.

A 35 year-old woman has a discrete palpable mass in the upper outer quadrant of the right breast. A mammogram is ordered, and no abnormalities are noted in either breast. Ultrasound does reveal a complex cyst in the area of the palpable mass. The next most appropriate step is an

(a) aspiration biopsy

(b) excisional biopsy

(c) repeat breast examination

(d) repeat mammogram

(e) repeat ultrasonography

Test-wise individuals would eliminate options C, D, and E as incorrect because they do not follow grammatically from the “an” in the stem. Using this cue, the odds of guessing the correct answer increase from 20% to 50%. 39

Page 40: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for Technical Flaws (from National Board of Medical Examiners: Constructing Written Questions for the Basic and Clinical

Science. 2002, 3rd Edition Revised; and ACP’s Prep for Boards 2: An Enhancement to MKSAP, 2005 )

(2)Logic cues: a subset of the options covers all possible

answers.

As compared with young individuals, the prevalence of venous

stasis ulceration in the elderly is: (a) more frequent

(b) as frequent

(c) less frequent

(d) related to socioeconomic level

(e) related to social isolation

Test-wise individuals would notice that options A, B, and C include all the

logical possibilities, and one of them must be correct. Note that options D

and E do not measure the same dimension (frequency) as do options A

through C. Options D & E appear to be an “after thought” when the

options are written and can be eliminated from consideration. Using this

cue, the odds of guessing the correct answer increase from 20% to 33%. 40

Page 41: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for Technical Flaws (from National Board of Medical Examiners: Constructing Written Questions for the Basic and Clinical

Science. 2002, 3rd Edition Revised; and ACP’s Prep for Boards 2: An Enhancement to MKSAP, 2005 )

(3)Absolute Terms: incorrect options are identified by

their association with modifiers such as “always” and

“never.” Often occurs when verbs are included in the

options rather than in the lead-in.

Patients with diabetes mellitus 2 and early onset diabetic

nephropathy and heart failure should: (a) take supplemental potassium chloride

(b) never be treated with spironolactone

(c) always be placed on low-protein diets

(d) never be treated with enalapril

(e) avoid non-steroidal anti-inflammatory drugs

Test-wise individuals would eliminate options B, C, and D because they

are less likely to be true than something that is stated less absolutely.

Options A and E, being less dogmatic, are the most attractive options.

Using this cue, the odds of guessing the correct answer increase from 20%

to 50%. 41

Page 42: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for Technical Flaws (from National Board of Medical Examiners: Constructing Written Questions for the Basic and Clinical

Science. 2002, 3rd Edition Revised; and ACP’s Prep for Boards 2: An Enhancement to MKSAP, 2005 )

(4)Indefinite Modifiers: Terms such as “sometimes,” “often,”

“frequently,” and “usually” often cue the correct option.

The cardiac impulse in most patients having isolated mitral

stenosis is : (a) sustained in character

(b) located in the anterior axillary line

(c) associated with a palpable third heart sound

(d) impossible to detect

(e) usually normal in all respects

Test-wise individuals would note option E allows for the broadest

possibility and is the only one cued with a modifier. Using this cue, the

odds of guessing the correct answer increase from 20% to ~100%!

42

Page 43: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for Technical Flaws (from National Board of Medical Examiners: Constructing Written Questions for the Basic and Clinical

Science. 2002, 3rd Edition Revised; and ACP’s Prep for Boards 2: An Enhancement to MKSAP, 2005 )

(5)Long Correct Answer: the correct answer is the longest,

most complete, and most specific of all the options. In hospitalized, elderly patients, the mechanisms most likely res-

ponsible for the development of a decubitus pressure ulcer are: (a) age and malnutrition

(b) dementia and obesity

(c) diabetes and peripheral vascular disease

(d) local heat and moisture

(e) prolonged pressure on the skin due to immobility and shear forces when moving the patient on the bed sheets

Because test writers often pay more attention to the correct option than to the

distractors, the correct answer tends to be longer and more complete because of the

additional instructional material or explanatory “teaching points” that increase the

length and specificity. If only one choice is a compound answer (with “and”), this

is often the correct one. Test-wise individuals would eliminate options A, B, C, and

D using this cue, the odds of guessing the correct answer increase from 20% to

100%! 43

Page 44: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for Technical Flaws (from National Board of Medical Examiners: Constructing Written Questions for the Basic and Clinical

Science. 2002, 3rd Edition Revised; and ACP’s Prep for Boards 2: An Enhancement to MKSAP, 2005 )

(6)Repeated Word or Phrase: a word is found in both the

stem and the correct option (“clang clues”)

A 58-year old man with a history of heavy alcohol use and previous psychiatric hospitalization is confused and agitated. He speaks of experiencing the world as unreal. This symptom is called

(a) depersonalization

(b) derailment

(c) derealization

(d) focal memory deficit

(e) signal anxiety

Test-wise individuals would note the word “unreal” in the stem, and “derealization” is the most likely correct answer. Sometimes, a word is repeated only in a metaphorical sense, e.g. a stem mentioning bone pain, with the correct answer beginning with the prefix “osteo-”. Using this cue, the odds of guessing the correct answer increase from 20% to ~100%!

44

Page 45: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for Technical Flaws (from National Board of Medical Examiners: Constructing Written Questions for the Basic and Clinical

Science. 2002, 3rd Edition Revised; and ACP’s Prep for Boards 2: An Enhancement to MKSAP, 2005 )

(7)Convergence: the correct option contains the most

elements in common with the remaining options.

In the management of congestive heart failure, what

combination of medications has been shown to prolong survival? (a) digoxin and spironolactone

(b) digoxin and enalapril

(c) enalapril and spironolactone

(d) asprin and enalapril

(e) aspirin and spironolactone

Option C is most likely to be the correct answer because it contains 2

components (enalapril and spironolactone) that were each mentioned a total

of three times in the five option sets. Digoxin and aspirin were mentioned

only twice and were not combined in a single option. Exam writers will

often write the correct combination first (enalapril and spironolatone) and

then create incorrect options by combining other distractors with parts of

the correct option (drug X and enalapril, drug y and spironolactone). 45

Page 46: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Issues Related to “Testwiseness”

Grammatical cues – one or more distractors don’t follow grammatically from the stem.

Logic cues – a subset of the options is collectively exhaustive.

Absolute modifiers - terms such as “always” or “never” are in some options, often cueing incorrect choices.

Indefinite modifiers – terms such as “usually,” “often,” “frequently,” “may,” and “sometimes” are in some options, often cueing the correct choice.

Long correct answer - correct answer is often longer, more specific, or more complete than other options.

Word repeats - a word or phrase is included in the stem and in the correct answer.

Convergence strategy - the correct answer includes the most elements in common with the other options.

Page 47: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Basic Rules

for

Single Best Answer Questions

Page 48: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Basic Rules for Single-best Answer Questions

Each item should focus on an important concept, typically a common or potentially catastrophic clinical problem. – Develop items around a course objective.

– Avoid test questions assessing knowledge of trivial facts. Focus on problems that would be encountered in real life. Avoid trivial, “tricky,” or overly complex questions.

Whenever possible, items should assess application of knowledge, not recall of an isolated fact. – At least 40% of the items should measure “higher” order

cognitive skills (i.e. comprehension, application, analysis, synthesis)

The stem of an item must pose a clear question, and it should be possible to arrive at an answer with the options covered. – Cover up the options and see if the question is clear and if the

examinee can pose an answer based only on the stem. Rewrite the stem and/or options if the examinee cannot.

All answer options should be homogeneous.

Avoid technical flaws that trigger “testwiseness.”

Page 49: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Focus on the

Stem

Page 50: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

For Basic Science Items: – Evaluate biomedical knowledge in the context of scientific and

clinical relevance

– Consider using laboratory data and clinical vignettes to assess

comprehension and higher-order cognitive skills

For Clinical Science Items: – Focus on common or potentially catastrophic problems

encountered in general medical practice.

– Use patient vignettes to test application of knowledge by

requiring examinees to make a diagnosis or a medical decision

– Require clinical decision- making tasks commensurate with an

examinee’s level of training

Stem Goals

Page 51: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Basic Rules for Single-best Answer Questions:

The Stem

An unambiguous statement that contains a verb and only

essential information.

– Avoid pseudovignettes where extraneous information is supplied and the question can be answered just by the lead-in.

Requires the examinees to prioritize the options/

– Which of the following is most likely? Which of the following should be administered?

Should not have technical jargon that is subject to

misinterpretation by examinees.

Avoid negative phraseology since this turns an item into

a multiple true-false question (which may be impossible to

answer when all options are covered).

– Avoid: Which of the following is incorrect? Which of the following is false? Which of the following is least likely?

Page 52: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

A 33 year old man has used cocaine so frequently that he drained his finances, lost two jobs and his wife. He continues to use cocaine because he craves the “high” the drug produces. Which CNS tracts are most involved in the desire to repeatedly use cocaine?

A. Nigrostriatal dopamine tracts

B. Basal forebrain acetycholine tracts

C. Mesolimbic dopamine tracts

D. Basal ganglia acetycholine tracts

E. Hypothalamic neurotensin tracts

Avoid Pseudovignettes

Pseuodvignettes use clinical scenarios that can

be completely ignored to answer the lead-in

question.

Page 53: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Only relevant information that supports or

refutes the key answer.

Every word has a purpose – No superfluous information to distract examinees from the keyed answer

The purpose of the information is to funnel

examinee to the keyed answer.

The Stem for Basic Science Items

Should Contain:

Page 54: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

A 33 year old man has used a drug so frequently that he has drained his finances, lost two jobs and his wife. He continues to use the drug because he craves the “high” it produces. A drug affecting which of the following tracts would most likely produce this behavior?

A. Nigrostriatal dopamine tracts

B. Basal forebrain acetycholine tracts

C. Mesolimbic dopamine tracts

D. Basal ganglia acetycholine tracts

E. Hypothalamic neurotensin tracts

Avoid Pseudovignettes

True Vignette: Note the question now tests a

“higher” order cognitive skill that requires

comprehension and application.

Page 55: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Patient age and gender (e.g. a 45-year old man)

Site of Care or visit (e.g. comes to the emergency room,

clinic, or hospitalized)

Presenting complaint (e.g. because of headache)

Duration (e.g. that has lasted 2 days)

Relevant past history (? + Family history)

Relevant physical exam findings (e.g. photophobia)

Relevant diagnostic studies (e.g. negative head CT scan)

Relevant treatment outcomes (e.g. Unresponsive to

acetaminophen)

The Stem for Clinical Science

Items Should Contain:

Page 56: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Basic Rules for Single-best Answer Questions

Recall versus Comprehension, Application, Analysis, Synthesis

At least 40% of all items should measure “higher” order cognitive

skills - i.e. comprehension, application, analysis, or synthesis.

Page 57: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Focus on the

Answer Options

Page 58: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Have five or more (at least four) equally plausible and unique options for each stem

Be homogeneous – All are diagnoses, treatments, outcomes, etiologies, mechanisms, etc.

Be short and do not contain a verb

Not contain indefinite modifiers – Avoid vague and ambiguous terms such as “frequently,” “commonly,”

“usually,” “probably,” “often,” “may,” etc.

Avoid “all of the above” and “none of the above” – “All of the above” is the keyed answer > 50 - 70% of the time

– “None of the above” is the keyed answer < 10% of the time

Be placed in a logical sequence – If numeric, ascending or descending order

– If non-numeric, consider alphabetical order

The Answer Option Set Should:

Page 59: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Which of the following statements is true regarding

cystic fibrosis?

A. It is diagnosed with a sweat chloride test

B. It is a type of restrictive lung disease

C. It is adequately treated with antibiotics

D. It is due to a defective potassium channel

E. It occurs in 1 in 10,000 people

Basic Rules for Single-best Answer Questions

Answers: Non Homogeneous Option Set

Stem

Option

Set

A - diagnosis B - etiology

C - treatment

D - mechanism

E - incidence

Page 60: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Numeric Data are Not Stated Consistently

Page 61: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Focus on the

Lead-In Question

Page 62: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

See Handout - for Short Summary

Page 63: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

NOT: Which of the following is

most likely?

Lead-In Question

Best Practice: Which of the following

__________ is most likely? - “mechanisms,” “diagnoses,” “treatments,” “enzymes,”

“next best steps,” etc.

- Helps avoid non-homogenous answer options

Page 64: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Looking for

Technically

Flawed

Items (Small Group

Exercise #2)

Page 65: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Small Groups Working

Page 66: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Large Group Discussion

Page 67: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

What do the test results mean?

Page 68: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Where should I go from here?

Page 69: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Interpreting Test Results

and Item Analyses

Am I

“Just” Teaching or

Are the Students Learning?

Page 70: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Is my test too easy

(or too hard)?

What should be the average

percent correct?

Page 71: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Is my test too hard

(or too easy)?

Answer: NBME designs test scores

so that to achieve the 50th percentile,

the examinee needs to get 70%

percent correct.

Most test averages should be between

60 – 80%.

Page 72: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Is my individual test item /

question too hard (or too easy)?

Optimal “P Value” = 0.5 + 0.5 (1/a) – Where a = number of alternatives

– “P Value” = percent answering correct True / False (2 options ) = 0.75

3 options / choices = 0.66

4 options / choices = 0.62

5 options / choices = 0.60

Consider revising most individual test questions where the percent correct is < 30% or > 90%.

Questions with a percent correct between 30% and 70% contribute most towards test reliability.

Page 74: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

KR20: General Index of Reliability

(or a test’s precision of measurement or reproducibility)

Page 75: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

KR20: General Index of Reliability

(or a test’s precision of measurement or reproducibility)

Reliability Score Interpretation

0.90 and Above Excellent reliability: at a level of the best

standardized tests

0.80 – 0.89 Very good for a classroom test

0.70 – 0.79 Good for a classroom test, probably a few items

that could be improved upon.

0.60 – 0.69 Somewhat low. Probably some items that that

could be improved. Test should be supplemented

by other measures to determine grade.

0.50 – 0.59 Suggests need for revision of test, unless it is

short ( less than 10 items). Test definitely needs

to be supplemented by other measures.

Less than 0.50 Questionable reliability, needs revision. Should

not contribute much to course grade.

Page 76: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Point Biserial

Page 77: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Point Biserial

Point Biserial Interpretation

0.30 and above Very good item

0.20 to 0.29 Reasonably good item.

0.09 – 0.19 Marginal item. Needs improvement.

below 0.09 Poor item. Reject or improve.

Offers potentially the greatest contribution to test reliability

A discriminator if high scoring students answer it correctly

and low scoring students answer it incorrectly

Interpretation: Should be a positive number. A negative

number means more low scorers answered correctly.

Page 78: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Options and Distractors

NBME: Most exam questions have one

right answer and one close distractor but

not exactly correct.

If a distractor choice is never chosen,

consider revising or deleting, especially

for “too easy” questions.

Page 79: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

At least 40% of the questions should test “higher” order cognitive functions – Comprehension, Application, Analysis, Synthesis

At least 10 – 15% of the questions should evaluate “core” concepts and teaching effectivenes – To measure the effectiveness of teaching, the percent correct

should be > 90%. Ignore the point biserial. If not, reassess your own teaching.

Look carefully at individual test questions and consider revising: – When the percent correct is < 30% or > 90% (with the noted

exception above) or

– When the point biserial is a strongly negative number (> - 0.15)

Make the exam average somewhere between 60 – 80%. – Too high (> 80%), test too easy. Too low (< 60%), test too hard.

Larry’s Exam Plan

Page 80: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Examining the Exam (Large Group Interactive Session #3)

Page 81: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High
Page 82: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

Teaching without evidence of

learning is “entertainment,”

not learner engagement.

The fundamental purpose of

an educational program is

NOT demonstration of

faculty teaching

but evidence of student

learning.

Page 83: UCR School of Medicine Writing & Interpreting the …medschoolfacdev.ucr.edu/workshop_archives/pdf/hq_exam_loo.pdfUCR School of Medicine Writing & Interpreting the Results of High

UCR School of Medicine

Writing & Interpreting the

Results of High Quality Exam

Multiple Choice Questions (MCQ) A Faculty Development Workshop – Thursday August 10, 2017

END