david dibattista, ph.d. brock university department of psychology creating effective multiple-choice...
TRANSCRIPT
David DiBattista, Ph.D.Brock University
Department of Psychology
Creating EffectiveMultiple-choice Questions
July, 2012
©D. DiBattista 2012
Overview
Some essential terminology The why and the how of testing Two challenges in MC testing Addressing the challenges
©D. DiBattista 2012
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?
A. ordinal scale
B. nominal scale
C. ratio scale
D. interval scaleA well-constructed four-option multiple-choice question
©D. DiBattista 2012
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?
A. ordinal scale
B. nominal scale
C. ratio scale
D. interval scale
This part is the STEM.
©D. DiBattista 2012
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?
A. ordinal scale
B. nominal scale
C. ratio scale
D. interval scale
These are the OPTIONS.
©D. DiBattista 2012
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?
A. ordinal scale
B. nominal scale
C. ratio scale
D. interval scale
The one correct (or best) option is the KEYED OPTION.
©D. DiBattista 2012
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?
A. ordinal scale
B. nominal scale
C. ratio scale
D. interval scale
The incorrect options are called DISTRACTORS.
©D. DiBattista 2012
Overview Some essential terminology The why and the how of testing Two challenges in MC testing Addressing the challenges
©D. DiBattista 2012
The Why and How of Testing
A primary goal of testingTo measure the extent to which test-takers have learned the facts, concepts, procedures, and skills that have been taught in the course.An effective testTest-takers who have learned more will obtain higher test scores, and those who have learned less will obtain lower scores. To be effective, a test must consist of effective items.
©D. DiBattista 2012
The Why and How of Testing
A primary goal of testingTo measure the extent to which test-takers have learned the facts, concepts, procedures, and skills that have been taught in the course.An effective testTest-takers who have learned more will obtain higher test scores, and those who have learned less will obtain lower scores. To be effective, a test must consist of effective items.
©D. DiBattista 2012
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item
A poor item
An awful item
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Percent answering correctly
This chart is on Page 2 of the handout.
©D. DiBattista 2012
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90
A poor item
An awful item
Percent answering correctly
©D. DiBattista 2012
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
This chart is on Page 2 of the handout.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50
A poor item
An awful item
Percent answering correctly
©D. DiBattista 2012
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item
An awful item
Percent answering correctly
©D. DiBattista 2012
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71
An awful item
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71 69
An awful item
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71 69 +2
An awful item
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
This poor item is simply not pulling enough weight.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71 69 +2
An awful item
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Note that these two items are equally difficult(i.e., 70% chose the keyed option in each item).
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71 69 +2
An awful item 5
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71 69 +2
An awful item 5 25
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71 69 +2
An awful item 5 25 B20
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
The Discrimination Index is oftenexpressed as a proportion: +40 +0.40
We always want the Discrimination Indexto be positive, and the bigger the better.
(Top-Bottom) = Discrimination Index
Top 25%of test-takers
Bottom 25%of test-takers Top-Bottom
A good item 90 50 +40
A poor item 71 69 +2
An awful item 5 25 B20
©D. DiBattista 2012
Percent answering correctly
What is an effective test item?For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.
Interpreting the Discrimination Index
Value of DI Interpretation
≥+0.50
+0.40 to +0.49
+0.30 to +0.39
+0.20 to +0.29
0 to +0.19
<0
This chart is on Page 3 of the handout.©D. DiBattista 2012
Value of DI Interpretation
≥+0.50 Outstanding
+0.40 to +0.49
+0.30 to +0.39
+0.20 to +0.29
0 to +0.19
<0
©D. DiBattista 2012
Interpreting the Discrimination Index
This chart is on Page 3 of the handout.
Value of DI Interpretation
≥+0.50 Outstanding
+0.40 to +0.49 Very good
+0.30 to +0.39
+0.20 to +0.29
0 to +0.19
<0
©D. DiBattista 2012
Interpreting the Discrimination Index
Value of DI Interpretation
≥+0.50 Outstanding
+0.40 to +0.49 Very good
+0.30 to +0.39 Good
+0.20 to +0.29
0 to +0.19
<0
©D. DiBattista 2012
Interpreting the Discrimination Index
Value of DI Interpretation
≥+0.50 Outstanding
+0.40 to +0.49 Very good
+0.30 to +0.39 Good
+0.20 to +0.29 Acceptable(but could be better!)
0 to +0.19
<0
©D. DiBattista 2012
Interpreting the Discrimination Index
Value of DI Interpretation
≥+0.50 Outstanding
+0.40 to +0.49 Very good
+0.30 to +0.39 Good
+0.20 to +0.29 Acceptable(but could be better!)
0 to +0.19 Unsatisfactory(despite being ≥0)
<0
©D. DiBattista 2012
Interpreting the Discrimination Index
Value of DI Interpretation
≥+0.50 Outstanding
+0.40 to +0.49 Very good
+0.30 to +0.39 Good
+0.20 to +0.29 Acceptable(but could be better!)
0 to +0.19 Unsatisfactory(despite being ≥0)
<0 Harmful!
©D. DiBattista 2012
Interpreting the Discrimination Index
A key pointThe Discrimination Index tends to suffer when items are either very easy or very hard.Very easy: 85% or more answer correctlyVery hard: 35% or less answer correctly“Just right”: 40 to 80% answer correctly
©D. DiBattista 2012This information is on Page 3 of the handout.
Final exam #1: Discrimination Index vs. Difficulty
-0.20
0.00
0.20
0.40
0.60
0.80
0 10 20 30 40 50 60 70 80 90 100
Percent answering MC item correctly
Dis
crim
inat
ion
Inde
x
100 MC itemsClass mean = 62.6%
Mean Discrimination Index = +0.42
Harder Easier
10% of itemsare weak discriminators.
This chart is on Page 4 of the handout.
©D. DiBattista 2012
Difficulty Index=0.71Discrimination Index=0.30
Final exam #2: Discrimination Index vs. Difficulty
-0.2
0
0.2
0.4
0.6
0.8
0 10 20 30 40 50 60 70 80 90 100
Percent answering MC item correctly
Dis
crim
inat
ion
In
dex
211 MC itemsClass mean = 66.0%
Mean Discrimination Index = +0.23
Harder Easier
51% of itemsare poor discriminators,and 6.6% are negative.
©D. DiBattista 2012
Key pointsIn general, the Discrimination Index of MC items will be greatest when:
they are in the mid-range of difficulty, they conform to widely-accepted item- writing guidelines, their content is consistent with the course
learning objectives, and the instruction provided has allowed
motivated students to learn the material.
©D. DiBattista 2012
This information is on Page 3 of the handout.
Overview Some essential terminology The why and the how of testing Two challenges in MC testing Addressing the challenges
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review.
Palmer & Devitt, 2007
©D. DiBattista 2012
Some good news:Writing high-quality MC items is a learnable skill!
Challenge #1“Many will contain item-writing flaws” Flawed items less effectively discriminate
among students who differ in achievement.
©D. DiBattista 2012
Two Challenges in MC Testing
Challenge #2“Most will do no more than test factual recall” An emphasis on memory-based items over
higher-level items may threaten the content validity of the test.
©D. DiBattista 2012
Two Challenges in MC Testing
Overview Some essential terminology The why and the how of testing Two challenges in MC testing Addressing the challenges
Constructing high-quality items
Assessing higher-level thinking
©D. DiBattista 2012
Tips for MC Item Construction
When writing the stem, use question format rather than sentence-completion format.
©D. DiBattista 2012
The complete list of tips is on Page 5 of the handout.
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?A. ordinal scaleB. nominal scaleC. ratio scaleD. interval scale
©D. DiBattista 2012
Here the stem is in question format.
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?A. ordinal scaleB. nominal scaleC. ratio scaleD. interval scale
©D. DiBattista 2012
Here the stem is in question format.
You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of a(n)A. ordinal measurement scale.B. nominal measurement scale.C. ratio measurement scale.D. interval measurement scale.
©D. DiBattista 2012
Here the stem ends with an incomplete sentence.
Question format works better.Shuttling—ESL
The stem should present the issue under consideration CLEARLY and contain as much information as possible.
Do not include irrelevant information in the stem unless it plays a role in the assessment procedure.
Avoid using long, complex sentences.
©D. DiBattista 2012
Tips for MC Item Construction
PoorSouth AmericaA. imports coffee from Australia.B. is where the Gobi Desert is located. C. was heavily colonized by people from Spain.D. has a larger population than the United States of America.
The stem is not informative at all,and there really is no question here.
©D. DiBattista 2012
Having lengthy options increases theamount of reading that students must do.
Even worseSouth America, which has an area of more than 17 million square kilometres,A. imports coffee from Australia.B. is where the Gobi Desert is located.
C. was heavily colonized by people from Spain.D. has a larger population than the United States of America.
Avoid window dressing.
©D. DiBattista 2012
PoorWhich of the following statements about South America is true?A. South America imports coffee from Australia.B. The Gobi Desert is located in South America.C. South America was heavily colonized by people from Spain.D. South America has a larger population than the United States of America.This is really a multiple true-false question.
The stem contains little information and doesnot pose a question related to the topic.
©D. DiBattista 2012
BetterPeople from which of these countries colonized a large part of South America?A. SpainB. FranceC. HollandD. England
©D. DiBattista 2012
This is a clear, straightforward question,and it is focused on a single topic.
Even poorly constructed items can sometimes provide the inspiration for a useful item!
Which classical theorist’s insights were tested by Zurcher in the real-life social laboratory provided by the Kansas tornado about change and social solidarity?A. MartineauB. MarxC. DurkheimD. Weber
©D. DiBattista 2012
Which classical theorist’s insights about change and social solidarity did Zurcher study in the context of the Kansas tornado?A. MartineauB. MarxC. DurkheimD. Weber
©D. DiBattista 2012
Which classical theorist’s insights about change and social solidarity did Zurcher study in the context of the Kansas tornado?A. MartineauB. MarxC. DurkheimD. Weber
The importance of good writing!
©D. DiBattista 2012
If we suppose that John’s score on a recent Canadian History test is 80, and the distribution of test scores, which has a mean of 70 and a standard deviation of 10, contains 100 scores and is positively skewed, then what is John’s standard score?A. -1.0 B. +1.0C. -10.0 D. +10.0
©D. DiBattista 2012
This question has 45 words, all in one sentence!
Henry David Thoreau (1817-1862)
“Simplify, simplify.”
If we suppose that John’s score on a recent Canadian History test is 80, and the distribution of test scores, which has a mean of 70 and a standard deviation of 10, contains 100 scores and is positively skewed, then what is John’s standard score?A. -1.0 B. +1.0C. -10.0 D. +10.0
©D. DiBattista 2012
So let’s simplify this 45-word question…
A set of 100 test scores is positively skewed, with a mean of 70 and a standard deviation of 10. John’s test score is 80. What is his standard score?A. -1.0 B. +1.0C. -10.0 D. +10.0
©D. DiBattista 2012
Easy reading: The stem now has 30 words in three sentences, including a straightforward question.
©D. DiBattista 2012
A set of 100 test scores is positively skewed, with a mean of 70 and a standard deviation of 10. John’s test score is 80. What is his standard score?A. -1.0 B. +1.0C. -10.0 D. +10.0
Aim to make sentences shorter and simpler,rather than longer and more complex.
Note that some information in the stem is not needed to answer the questionC
but that’s okay here.
For math-based problems:Focus on the principles.
Keep the numbers simple.Put options in reader-friendly order.
©D. DiBattista 2012
A set of 100 test scores is positively skewed, with a mean of 70 and a standard deviation of 10. John’s test score is 80. What is his standard score?A. -1.0 B. +1.0C. -10.0 D. +10.0
After a bad day at work, George comes home and yells at his young son, who starts to cry. What type of behaviour is
A. projectionB. displacement C. sublimationD. reaction formation
he demonstrating?
©D. DiBattista 2012
Watch for ambiguity!
After a bad day at work, George comes home and yells at his young son, who starts to cry. What type of behaviour is
A. projectionB. displacement C. sublimationD. reaction formation
George demonstrating?
©D. DiBattista 2012
Watch for ambiguity!
Mary loves being in the limelight. On which Big Five factor would you expect her to have a very high score?A. conscientiousnessB. extroversionC. agreeablenessD. neuroticism
©D. DiBattista 2012
Watch for idioms and uncommon words.
Mary enjoys talking and spending time with others, and many of her friends consider her a natural leader. On which Big Five factor would you expect Mary to have a very high score?A. conscientiousnessB. extroversionC. agreeablenessD. neuroticism
Of course, discipline-related technicalterms are perfectly appropriate.
©D. DiBattista 2012
Watch for idioms and uncommon words.
Whenever possible, avoid negative wording in the stem, and be sure to emphasize it when it does occur.
©D. DiBattista 2012
Tips for MC Item Construction
PoorWhich of the following terms is not usually associated with Sigmund Freud?A. superegoB. extinctionC. repression D. latent content
©D. DiBattista 2012
BetterWhich of the following terms is NOT usually associated with Sigmund Freud?A. superegoB. extinctionC. repression D. latent content
Negation adds an extra cognitive burden,so use it only when really necessary.
©D. DiBattista 2012
Also betterWhich of the following terms is usually associated with behaviourism?A. synesthesiaB. extinctionC. repression D. closure
©D. DiBattista 2012
The keyed option is still the same,but the question is now positively framed.
Check carefully for spelling and grammatical errors, giving special attention to distractors.
©D. DiBattista 2012
Tips for MC Item Construction
What do stamp collectors use stamp hinges for?A. to pick up stampsB. to fold learge stamps in halfC. to mount stamps in albumsD. to joining stamps together
©D. DiBattista 2012
Errors like these are more likely to crop upin the distractors than in the keyed option.
Such errors can give clues to testwise students!
What do stamp collectors use stamp hinges for?A. to pick up stampsB. to fold learge stamps in halfC. to mount stamps in albumsD. to joining stamps together
©D. DiBattista 2012
Errors like these are more likely to crop upin the distractors than in the keyed option.
Such errors can give clues to testwise students!
What do stamp collectors use stamp hinges for?A. to pick up stampsB. to fold large stamps in halfC. to mount stamps in albumsD. to join stamps together
©D. DiBattista 2012
Errors like these are more likely to crop upin the distractors than in the keyed option.
Such errors can give clues to testwise students!
All distractors should be plausible. Four options will usually be quite adequate,
but the number used is best determined by the number of PLAUSIBLE distractors you can supply.
©D. DiBattista 2012
Tips for MC Item Construction
Which river flows through the city of Edmonton? A. North Saskatchewan RiverB. Peace RiverC. Milk RiverD. Athabasca River
©D. DiBattista 2012
These four rivers are all in the same “domain.”
E. Mississippi River F. Seine River
Which river flows through the city of Edmonton? A. North Saskatchewan RiverB. Nile RiverC. Amazon RiverD. Rhine River
Distractor plausibility is a key to success!
©D. DiBattista 2012
To generate plausible distractors Use students’ most common errors on
constructed-response tests. Use distractors that are similar to the correct
answer in content, length, and complexity. Use words that sound important or have
associations to the stem. Use distractors that are true, but do not
correctly answer the question.
©D. DiBattista 2012
Tips for MC Item Construction
Name the river that flows through the city of Edmonton.
Athabasca RiverWhat do stamp collectors use stamp hinges for?
To join stamps together
©D. DiBattista 2012
And listen carefully to questions students ask, and watch for their misconceptions.
To generate plausible distractors Use students’ most common errors on
constructed-response tests. Use distractors that are similar to the correct
answer in content, length, and complexity. Use words that sound important or have
associations to the stem. Use distractors that are true, but do not
correctly answer the question.
©D. DiBattista 2012
Tips for MC Item Construction
In severe cases of obesity, there may be a substantial increase in the number of adipocytes. Which of the following terms is used to refer to this increase?A. hyperboleB. hyperplasiaC. hypertrophyD. hypertonicity
©D. DiBattista 2012
Knowing that the answer is “hyper-something”is not enough to get this item correct.
A. Barack ObamaB. Muhammad AliC. Martin LutherD. Joseph Wolpe
Who developed the behavioural therapy known as systematic desensitization?
These four people have little in common—that is, they are not in the same domain.
©D. DiBattista 2012
A. Anna FreudB. Jean PiagetC. Wilhelm WundtD. Joseph Wolpe
Who developed the behavioural therapy known as systematic desensitization?
More challenging:All four of these people are well known
within the domain of psychology.
©D. DiBattista 2012
A. Ivan PavlovB. Albert EllisC. B. F. SkinnerD. Joseph Wolpe
Even more challenging:All four of these people have a connectionto the domain of behavioural psychology.
©D. DiBattista 2012
Who developed the behavioural therapy known as systematic desensitization?
To generate plausible distractors Use students’ most common errors on
constructed-response tests. Use distractors that are similar to the correct
answer in content, length, and complexity. Use words that sound important or have
associations to the stem. Use distractors that are true, but do not
correctly answer the question.
©D. DiBattista 2012
Tips for MC Item Construction
In responding to a lengthy survey, a man answers “yes” to every yes-no question asked. It is reasonable to suspect that his responses may be influenced by which of the following?A. response acquiescenceB. opportunistic characterizationC. the partial reinforcement effectD. the conspicuous agreement predisposition
©D. DiBattista 2012
To generate plausible distractors Use students’ most common errors on
constructed-response tests. Use distractors that are similar to the correct
answer in content, length, and complexity. Use words that sound important or have
associations to the stem. Use distractors that are true, but do not
correctly answer the question.
©D. DiBattista 2012
Tips for MC Item Construction
Which of the following events caused the Prime Minister of Canada to proclaim the War Measures Act?A. Quebec was invaded by Germany in 1940.B. The October Crisis occurred in 1970.C. The first Quebec Referendum was held in 1980.D. The Meech Lake Accord was defeated in 1990.
©D. DiBattista 2012
Option A can be ruled out simply because it is a FALSE statement.
Because Options C and D are TRUE,they must be considered as possible
answers to the question posed in the stem.
Avoid patterns in the length and location of correct answers that could provide clues that are unrelated to content.
Balance the answer key so that the correct response appears in each position about the same number of times.
©D. DiBattista 2012
Tips for MC Item Construction
What characteristic of hallucinations would make their occurrence sufficient for a diagnosis of schizophrenia?A. a satanic or religious themeB. bizarre content C. derailment and neologismsD. voices providing a running commentary on the person’s behaviour, or two or more voices conversing with one anotherThe keyed response is too often the longest–
and testwise students know this!
©D. DiBattista 2012
In a four-option multiple-choice test, about how often should the correct answer appear in each of the four locations?A. 10% of the timeB. 25% of the timeC. 40% of the timeD. 60% of the time
Balance the answer key!
©D. DiBattista 2012
Who invented the binaural recording system commonly known as “stereo”?A. XxxxxxxxxxxxxxB. XxxxxxxxxxxxxxC. XxxxxxxxxxxxxxD. Xxxxxxxxxxxxxx
“Edge avoidance”
©D. DiBattista 2012
When the four options appear, make your best guess as quickly as you can!
Edge avoidance can be a majorproblem for the creators of MC tests!
In one test I came across, 74% ofthe keyed options were either B or C.Think about those testwise students!
In one test I came across, 74% ofthe keyed options were either B or C.Think about those testwise students!Thanks, Alan!
©D. DiBattista 2012
Who invented the binaural recording system commonly known as “stereo”?A. XxxxxxxxxxxxxxB. XxxxxxxxxxxxxxC. XxxxxxxxxxxxxxD. Xxxxxxxxxxxxxx
Who invented the binaural recording system commonly known as “stereo”?A. Alan Dower BlumleinB. Alan Dower Blumlein C. Alan Dower BlumleinD. Alan Dower Blumlein
For numerical options, let the correct answer appear in each of the positions about the same number of times.
©D. DiBattista 2012
Tips for MC Item Construction
How many chromosomes are found in an ovum of a healthy adult woman?A. 18B. 23C. 37D. 46
Item-writers tend NOT to let the key be either the smallest or largest value in the option list.
Knowing this, testwise studentsdiscount the smallest and largest values.
©D. DiBattista 2012
←Options are in reader-friendly order.
Avoid having the options include a single pair of opposites, one of which is the keyed option.
©D. DiBattista 2012
Tips for MC Item Construction
A psychologist administers an aptitude test to 200 people, and then one month later she has the same people take the test again. The correlation between the two sets of scores is +0.91. What should she conclude about the test?A. 91% of the items are effective.B. It has poor test-retest reliability.C. It has good test-retest reliability.D. It has poor criterion-related validity.
A problem: When the options include a single pair of opposites, one member of the pair is the keyed option 75-80% of the time.
©D. DiBattista 2012
A psychologist administers an aptitude test to 200 people, and then one month later she has the same people take the test again. The correlation between the two sets of scores is +0.91. What should she conclude about the test?A. It has poor test-retest reliability.B. It has good test-retest reliability.C. It has poor criterion-related validity. D. It has good criterion-related validity.
Using two pairs of opposites generally solves the problem.
©D. DiBattista 2012
Do not use “none of the above.”
©D. DiBattista 2012
Tips for MC Item Construction
Which of these 19th century authors wrote Middlemarch?A. Jane Austen B. Anne Bronte C. Wilkie CollinsD. none of the above
“None of the above” as the key
©D. DiBattista 2012
I’m sure Dickens wrote Middlemarch, so I’ll go with “none of the above.”
©D. DiBattista 2012
Dickens didn’t write Middlemarch.George Eliot wrote it!
But here is the problem:
When NOTA is the keyed option,misinformed students often earn full marks.
©D. DiBattista 2012
NOTA is often used as“the distractor of last resort.”
Which of these 19th century authors wrote Middlemarch?A. Jane Austen B. Anne Bronte C. Wilkie CollinsD. none of the above
©D. DiBattista 2012
D. George Eliot
So let’s fix this NOTA item…
Do not use “all of the above.”
©D. DiBattista 2012
Tips for MC Item Construction
Which of these terms is associated with Sigmund Freud?A. superegoB. repressionC. latent contentD. all of the above
©D. DiBattista 2012
I never heard of latent content, but superego and repression are both definitely Freudian
terms, so it must be “all of the above.”
©D. DiBattista 2012
Which of these terms is associated with Sigmund Freud?A. superego B. repression
C. latent contentD. all of the above
©D. DiBattista 2012
Which of these terms is associated with Sigmund Freud?A. superego B. repression C. latent contentD. all of the above
©D. DiBattista 2012
Which of these terms is associated with Sigmund Freud?A. superego B. repression C. latent content ???D. all of the above
©D. DiBattista 2012
Which of these terms is associated with Sigmund Freud?A. superego B. repression C. latent content ???D. all of the above
When AOTA is the keyed option, studentswith partial knowledge can still earn full marks.
©D. DiBattista 2012
Moreover, AOTA usually serves as the keyed option–and testwise students know this!
BetterWhich of these terms is associated with Sigmund Freud?A. latent contentB. fixed-interval schedule C. cognitive dissonanceD. bulimia nervosa
©D. DiBattista 2012
Overview Some essential terminology The why and the how of testing Two challenges in MC testing Addressing the challenges
Constructing high-quality items Assessing higher-level thinking
©D. DiBattista 2012
Two Challenges in MC TestingChallenge #2“Most (MCQs) do no more than test factualrecall” An emphasis on memory-based items
over higher-level items may threaten the content validity of the test.
©D. DiBattista 2012
Evaluation
Synthesis
Analysis
Application
Comprehension
Knowledge
The Original Bloom’s Taxonomy
©D. DiBattista 2012
Factual
Knowledge Dimension
Conceptual Procedural Metacognitive
CognitiveProcess
Dimension
Remember
Understand
Apply
Analyze
Evaluate
Create
The Revised Bloom’s Taxonomy Anderson and Krathwohl, 2001
©D. DiBattista 2012
See Pages 6-7 of the handout!
Factual
Knowledge Dimension
Conceptual Procedural Metacognitive
CognitiveProcess
Dimension
Remember
Understand
Apply
Analyze
Evaluate
Create
The Revised Bloom’s Taxonomy Anderson and Krathwohl, 2001
These are all ACTION verbs—i.e., things students can DO with their knowledge.
©D. DiBattista 2012
Good tests allow us to determine what our students are capable of.
…but they can be used effectively to assess all of the other cognitive processes.
MC questions are notuseful for assessing creativity…
Factual
Knowledge Dimension
Conceptual Procedural Metacognitive
CognitiveProcess
Dimension
Remember
Understand
Apply
Analyze
Evaluate
Create
The Revised Bloom’s Taxonomy Anderson and Krathwohl, 2001
Can you remember this?
Can you understand this?
Can you apply this?
Can you evaluate this?
Can you create this?
Can you analyze this?
©D. DiBattista 2012
Some thoughts about REMEMBER…
These are all ACTION verbs—i.e., things students can DO with their knowledge.
ALL assessment tasks involve usingmemory to at least some degree.
BUT“If assessment tasks are to tap higher-order cognitive processes, they must require that students cannot answer them correctly by relying on memory ALONE.”
—Anderson and Krathwohl, 2001, page 71
A simple, unfortunate fact:Creating MC items that rely on memory alone is far easier than creating higher-level items.
©D. DiBattista 2012
Let’s take a closer look at howthe cognitive processes in theRevised Bloom’s Taxonomy
relate to multiple-choice questions.
COGNITIVE PROCESS DIMENSION
REMEMBERRetrieve relevant knowledge from long-term memory• Recognize; Recall
UNDERSTANDDetermine the meaning of instructional messages, including oral, written, and graphic communications• Interpret; Exemplify; Classify; Summarize; Infer; Compare; Explain
Observable behaviours
©D. DiBattista 2012
See Pages 8-9 of handout for further details.
Because MC is a selected response technique, remember-level items always
involve recognition rather than recall.
©D. DiBattista 2012
What city is the capital of the state of California?A. Sacramento
B. Los Angeles
C. San Francisco
D. Fresno
Remember-level items are very easy tocreate, which is probably why there areso many of them on classroom tests!
If the options were not included in this item,it would involve recall rather than recognition.
©D. DiBattista 2012
What city is the capital of the state of California?
COGNITIVE PROCESS DIMENSION
REMEMBERRetrieve relevant knowledge from long-term memory• Recognize; Recall
UNDERSTANDDetermine the meaning of instructional messages, including oral, written, and graphic communications• Interpret; Exemplify; Classify; Summarize; Infer; Compare; Explain
Observable behaviours
©D. DiBattista 2012
“If assessment tasks are to tap higher-order cognitive processes, they must require that students cannot answer them correctly by relying on memory ALONE.”
—Anderson and Krathwohl, 2001, page 71
•Interpret. In the graph shown below, which group has the most variability in its scores?
©D. DiBattista 2012
Note the important role of NOVELTY.
If the exact same chart is shown in the textbook,then this will actually be a remember-level item!
0
5
10
15
20
25
30
1 2 3 4
Sco
re (M
ean
+/-
SD
)
Treatment group
A. Group 1B. Group 2C. Group 3D. Group 4
•Classify. You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale?
•Exemplify. Which of the following is an example of negative feedback?
•Summarize. Which of the following statements best summarizes Carol Gilligan’s response to Lawrence Kohlberg’s theory of moral development?
©D. DiBattista 2012
•Infer. Which of the words listed below best completes the following analogy? Retina is to Cranial Nerve II as hair cells are to ______.
•Compare. In what way are a neuron and a battery similar to each other?
•Explain. Why is the z-test for independent samples so rarely used?
©D. DiBattista 2012
APPLYCarry out or use a procedure in a given situation•Execute; Implement
ANALYZEBreak material into its constituent parts and detect how the parts relate to one another and to an overall structure or purpose•Differentiate; Organize; Attribute
Observable behaviours
©D. DiBattista 2012
ExecuteWorking with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the median for this set of scores?A. 0B. 2C. 3D. 5
©D. DiBattista 2012
ExecuteWorking with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the median for this set of scores?A. 0B. 2C. 3D. 5
Execution involves being told what procedure to apply and then carrying it out.
©D. DiBattista 2012
ImplementWorking with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the most appropriate measure of central tendency for this set of scores?A. 0B. 2C. 3D. 5
©D. DiBattista 2012
ImplementWorking with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the most appropriate measure of central tendency for this set of scores?A. 0B. 2C. 3D. 5
Implementation involves deciding what procedure to apply and then carrying it out.
©D. DiBattista 2012
APPLYCarry out or use a procedure in a given situation•Execute; Implement
ANALYZEBreak material into its constituent parts and detect how the parts relate to one another and to an overall structure or purpose•Differentiate; Organize; Attribute
Observable behaviours
©D. DiBattista 2012
DifferentiateKeri’s history test grade was 70. A total of 200 students took the test, and the lowest score was 30. The class mean was 60, and the variance was 100. Which of these values must you use to obtain Keri’s standard score?A. 30, 70, 100B. 30, 70, 200C. 60, 70, 100D. 60, 70, 200
Differentiation involves distinguishing the parts of a whole with respect to their relevance or importance.
©D. DiBattista 2012
OrganizeSuppose you are reviewing the research literature on a particular topic. Which of the following patterns would be most likely to describe the methodological progress of the research over time?A. case studies first, then experimental studies, then correlational studiesB. case studies first, then correlational studies, then experimental studiesC. experimental studies first, then case studies, then correlational studiesD. experimental studies first, then correlational studies, then case studies
Organization involves identifying the elements of a situation and recognizing how
they fit together into a coherent structure.
©D. DiBattista 2012
Attribution involves determining the point of view, bias, values, or intent associated
with a written work or an action.
AttributeWhich of the following would a Rogerian therapist be MOST likely to say when working with a client?A. You seem to be feeling a bit down today.B. Your dream about going to the zoo—what do you think it might signify?C. You should talk to your sister and find out if she agrees with you.D. There are some things I want you to work on before we meet again next week.
©D. DiBattista 2012
EVALUATEMake judgments based on criteria and standards•Check; Critique
CREATEPut elements together to form a novel, coherent whole or make an original product•Generate; Plan; Produce
Observable behaviours
©D. DiBattista 2012
Checking involves looking for internal contradictions and determining whether a
conclusion is appropriate, and assessing whether evidence supports or disconfirms a hypothesis.
CheckAlyssa has carried out a one-way ANOVA for independent groups and rejected the null hypothesis. Which of the following would indicate to you that Alyssa has made an error in her work?A. She says that df-total is 197.B. She says that F-critical is 0.47.C. She says that the F-statistic is 6.84.D. She says that eta-squared is 0.22.
©D. DiBattista 2012
CheckWhich of these research findings would suggest that differences in Trait X are influenced by genetic factors?A. Sisters reared apart have more similar scores on X than sisters reared together.B. Sisters reared together have more similar scores on X than sisters reared apart.C. Identical twins reared together have more similar scores on X than fraternal twins reared together.D. Fraternal twins reared together have more similar scores on X than identical twins reared together.
Checking involves looking for internal contradictions and determining whether a
conclusion is appropriate, and assessing whether evidence supports or disconfirms a hypothesis.
©D. DiBattista 2012
Critiquing involves assessing the positive and negative aspects of a product, idea or action and
making a judgment based on external criteria.
CritiqueBill wants to compare the effectiveness of two training methods for teaching people to juggle. He obtains a group of non-jugglers and randomly assigns each person to one of the two training methods. He sets alpha at 0.05, two-tailed, and he determines that beta is equal to 0.60. Which of the following is a valid criticism of this research study?A. The power of the statistical test is too low.B. The probability of a Type I error is too high.C. He should use a one-tailed test.D. People should select their own training method.
©D. DiBattista 2012
EVALUATEMake judgments based on criteria and standards•Check; Critique
CREATEPut elements together to form a novel, coherent whole or make an original product•Generate; Plan; Produce Observable behaviours
©D. DiBattista 2012
EVALUATEMake judgments based on criteria and standards•Check; Critique
CREATEPut elements together to form a novel, coherent whole or make an original product•Generate; Plan; Produce
Because multiple choice is a selectedresponse technique, it is NOT useful
for assessing the ability to create.
Other testing techniques are needed to do this.
Observable behaviours
©D. DiBattista 2012
Overview Some essential terminology The why and the how of testing Two challenges in MC testing Addressing the challenges
Constructing high-quality items
Assessing higher-level thinking
©D. DiBattista 2012
David DiBattista, Ph.D.Brock University
Department of Psychology
Creating EffectiveMultiple-choice Questions
July, 2012
©D. DiBattista 2012