metacognition and multiple choice tests

Upload: 119568

Post on 03-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Metacognition and Multiple Choice Tests

    1/7

    Not All Errors Are Created Equal: Metacognition andChanging Answers on Multiple-Choice Tests

    Abst ract Two experiments investigated the role ofmetacognition in changing answers to multiple-choice, gen-eral-knowledge questions. Both experiments revealed quali-tatively different errors produced by speeded respondingversus confusability amongst the alternatives; revision com-pletely corrected the former, but had no effect on the latter.

    Experiment 2 also demonstrated that a pretest, designed tomake participants actual experience with answer changingeither positive or negative, affected the tendency to correcterrors. However, this effect was not apparent in the propor-tion of correct responses; it was only discovered when themetacognitive component to answer changing was isolatedwith a Type 2 signal-detection measure of discrimination.Overall, the results suggest that future research on answerchanging should more closely consider the metacognitivefactors underlying answer changing, using Type 2 signal-detection theory to isolate these aspects of performance.

    Many academics follow the dictum publish or per-ish, and spend their time tweaking parameters in well-established paradigms to gather data for the next paper.Lee, instead, follows the dictum publish only if youare convinced that you have discovered somethingtruly new and interesting about the world. Quite right-ly, many of us should be nervous applying this philos-ophy to our own careers; such an approach will only

    work for true intellects scientists with unique philoso-phy and vision. But for the special few with such

    vision, the result is science that is long lasting, andcareers that are distinguished. From his block letter F

    research (Brooks, 1967), to the beginnings of instancetheory (Brooks, 1978), to his more recent work in med-ical diagnosis (e.g., Brooks, LeBlanc, & Norman, 2000),it is clear that Lee is one of those special few.

    Those of us who were lucky enough to receive Leesmentorship learned many lessons. One such lesson wasto stay open-minded, reading more than the mostimmediate literature on a research topic, and trying tothink beyond the current paradigm used to investigate

    a phenomenon. Such a strategy can sometimes lead tonew research as cross talk between previously inde-pendent literatures is developed. Developing suchcross talk between the recognition memory domain,and the implicit learning domain, motivated theresearch reported in Higham and Brooks (1997). The

    current research focuses on two other literatures wherecross talk could be fruitful: psychological testing andthe newer field of metacognition. Although substantialresearch exists on metacognition in educational set-tings, this research has been predominantly focused onhow various subjective ratings (e.g., judgments of con-fidence, comprehension, or learning) relate to objectivetest performance (e.g., Maki & McGuire, 2002; Pressley,Ghatala, Woloshyn, & Pirie, 1990). Very little to noresearch has considered the metacognitive issuesinvolved in actual performance on the objective teststhemselves.

    In this paper, we will report two experiments that

    examine an old issue in the testing literature theeffect of changing answers on multiple-choice tests(e.g., see Benjamin, Cavell, & Shallenberger, 1984;Crawford, 1928; Crocker & Benson, 1980; Foote &Belinki, 1972; Geiger, 1996; Harvil & Davis, 1997;McMorris, Demers, & Schwartz, 1987; Mueller & Wasser,1977; Vispoel, 1998, 2000). Overall, this research hasrevealed a number of consistent findings that include:1) a small proportion of answers on a given test aretypically changed, 2) the changes usually increase theoverall test score, indicating that more answers arechanged from wrong to right, than from right to wrong,and, 3) most examinees change a few answers. This

    last finding is particularly interesting because surveyshave indicated that students tend toward the belief thatit is preferable to stay with first hunches, and not tomake changes to their answers. Indeed, Mathews(1929) reported that 86% of his sample of studentsbelieved that changing answers lowered the overall testscore. This belief may partially stem from the fact thateducators often pass this information on to their stu-dents (e.g., see Foote & Belinky, 1972).

    Philip A. Higham and Catherine GerrardUniversity of Southampton

    Canadian Journal of Experimental Psychology, 2005, 59-1, 28-34

  • 8/12/2019 Metacognition and Multiple Choice Tests

    2/7

    METACOGNITION AND ANSWER CHANGING 29

    In our view, however, there is no straightforwardsolution to the problem of whether or not to changeanswers because the interplay of at least three variablesis involved: 1) the number of initial answers that arecorrect, 2) the number of initial answers that arechanged, and, 3) students ability to discriminatebetween initial answers that are correct, and those thatare not. Unfortunately, however, the literature onanswer changing has never taken all three of these fac-tors into account when considering students best revi-sion strategy, and to our knowledge, none of it directlyaddresses the metacognitive or discrimination processesthat may be involved.

    In the experiments that we report here, we intro-duce a methodology that directly addresses metacogni-tion in answer changing. In each experiment, partici-pants were presented with a series of four-alternative,multiple-choice, general-knowledge questions one at atime at a computer workstation (Phase 1). After com-pleting the test, each question was presented again,along with the provided answer, and participants weregiven the opportunity to change their initial answer(Phase 2). The two experiments differed in that apaper-and-pencil pretest was used in Experiment 2, butnot in Experiment 1. There were two versions of thepretest, intended to vary participants actual experience

    with changing answers before starting the main test.One version contained deceptive questions, whereasthe other contained non-deceptive questions.Participants were required to change a large portion(33%) of their initial pretest answers to both question

    sets. We hypothesized that being required to changeone-third of their answers to non-deceptive questionswould result in a decrease to participants scores,because one-third was more than the optimal propor-tion (i.e., several correct to incorrect changes wouldoccur). Conversely, being required to change one-thirdof their answers to deceptive questions should increaseparticipants scores because several first hunches mayhave been wrong.

    What particularly marks our research as unique,however, is the application of Type 2 signal-detectiontheory (SDT; e.g., Clarke, Birdsall, & Tanner, 1959;Galvin, Podd, Drga, & Whitmore, 2003; Higham, 2002;

    Higham & Tam, in press; Vokey & Higham, 2005),which allowed for the derivation of separate quantita-tive estimates of the three components of answerchanging listed above. Type 2 SDT differs from stan-dard Type 1 SDT in that participants are required to dis-criminate between their own correct and incorrectanswers, rather than between signal-plus-noise andnoise-only trials. Consequently, the discriminationindex in Type 2 SDT is a measure of metacognitivemonitoring.

    Table 1 shows how we applied Type 2 SDT in a par-adigm involving answer changing. The left and rightcolumns represent the frequency of initially correctand incorrect answers, respectively, whereas the topand bottom rows represent the number of unchangedand changed answers, respectively. The combination ofthese variables yields four cell frequencies: A =unchanged, initially-correct items; B = unchanged, ini-tially-incorrect items; C = changed, initially-correctitems; and D = changed, initially-incorrect items.Because changing an initially-incorrect answer can

    result in either a correct or incorrect second responseon any test involving more than two options, e is alsoincluded in Table 1. This value, which appears in Table1 in lowercase to reflect the fact that it is a subset of D,represents the number of initially-incorrect items thatare correct after a change. These five values (i.e., A toe) allow several measures of performance to be deter-mined. The first two measures are standard in the liter-ature: 1) proportion correct before change = (A + C)/(A+ B + C + D), and 2) proportion correct after change =(A + e)/(A + B + C + D). However, it is also possible toderive Type 2 SDT hit (A/[A + C]) and false alarm (B/[B+ D]) rates, which represent the proportion of initially-

    correct responses that were unchanged, and the pro-portion of initially-incorrect responses that wereunchanged, respectively. Using these rates, measures ofdiscrimination (A; Grier, 1971) and change bias (BD;Donaldson, 1992) were calculated. These latter twomeasures are unique in this context, and reflect partici-pants ability to monitor the accuracy of their initialanswers, and their tendency to change answers, respec-tively.

    TABLE 1The 2 x 2 Contingency Table and Formulae Used to Derive theVarious Measures Discussed in the Text

    Initial (Phase 1) Answer

    Response Correct IncorrectUnchanged A BChanged C D (e)

    Note.A = correct Phase 1 answers left unchanged in Phase 2; B =incorrect Phase 1 answers left unchanged in Phase 2; C = correctPhase 1 answers changed in Phase 2 (to incorrect); D = incorrectPhase 1 answers changed in Phase 2 (to either correct or incor-rect answers); e = incorrect Phase 1 answers that were changedin Phase 2 to correctanswers; Phase 1 proportion correct = (A +C) / (A + B + C + D); Phase 2 proportion correct = (A + e) / (A +B + C + D); hit rate (h) = A / ( A + C); false alarm rate (fa) = B /(B + D); monitoring =A = .5 + [(h - fa)(1 + h -fa)] / [4h(1 - fa)];change bias =BD = [(1 - h)(1 - fa) - h * fa] / [(1 - h)(1 - fa) + h *fa].

  • 8/12/2019 Metacognition and Multiple Choice Tests

    3/7

    30 Higham and Gerrard

    MethodParticipants

    There were 24 participants in Experiment 1, and 48in Experiment 2. In Experiment 2, 25 participants wereassigned to the negative pretest group, and 23 to thepositive pretest group.

    Design and Materials

    The main test in both Experiment 1 and Experiment2 consisted of 110, four-alternative, multiple-choice,general-knowledge questions adapted from the Nelsonand Narens (1980) norms. Ten of these questions werefor practise and the remaining 100 were critical ques-tions used in the analyses. For both experiments, two

    within-subject independent variables were manipulated,each with two levels: item difficulty and answer dead-line. Item difficulty was manipulated by varying thealternatives provided with each question. For easy ques-tions, the correct answer was included with three incor-rect alternatives not easily confused with the correctanswer (e.g., Of which country is Buenos Aires thecapital? [a] Canada [b] Mexico [c] Argentina [d] Cuba),

    whereas for di ff icul t ques tions, the confusab il it yamongst the alternatives was greater (e.g., [a]Columbia, [b] Brazil [c] Argentina [d] Chile). Assignmentof correct answers to the a, b, c, and d alterna-tives was balanced across questions. Answer deadline

    was manipulated by providing participants with either 5s or 10 s to provide an answer to each question.

    Assignment of questions to each of these four experi-mental conditions was counterbalanced across partici-

    pants using four test formats, with 25 critical questionsassigned to each of the four cells of the design. A differ-ent random order of presentation of the 100 criticalquestions was used for each participant, but the samerandom order was used (within subjects) for the firstattempt at the questions (Phase 1), and the second, revi-sion phase (Phase 2). The 10 practise questions werepresented at the beginning of Phase 1, but not Phase 2,

    with 5 questions each at the 5 s and 10 s deadlines.In addition to the two within-subject variables,

    pretest type (positive vs. negative) was manipulatedbetween subjects in Experiment 2. This pretest consist-ed of 30, four-alternative, general-knowledge, multiple-

    choice questions administered with paper and pencil.For the positive pretest group, the questions weredesigned to be deceptive, such that the most obviousanswer was not actually correct. For example, for thequestion, What is the capital of Australia? [a] Sydney[b] Canberra [c] Adelaide [d] Melbourne, the correctanswer is Canberra, although many people wouldchoose Sydney. For the negative pretest group, thequestions were of medium-level difficulty and non-deceptive.

    Procedure

    Participants in Experiment 1 started with the maincomputer-administered test. They were informed thatthey would be presented with general-knowledgequestions, each with four alternatives, one at a time, onthe computer. They were to choose one of the fouralternative answers by pressing one of the keyslabelled a, b, c, or d (which actually correspond-ed to the a, s, d, and f keys, respectively, on thecomputer keyboard). They were told that they were toprovide answers within a time limit, and that a clock

    which counted down in seconds from either five or tenwould appear on the screen along with the question.Failure to respond in the time allotted, or pressing akey not assigned to one of the four response alterna-tives, resulted in feedback that an error had beenmade, and data from these trials were not analyzed.Participants were also informed of the following pointsystem, which was implemented to encourage questionanswering: +1 = correct; 0 = incorrect; -4 = no answer.No feedback on points was provided during the test.

    Participants began with the 10 practise questions,and then answered the 100 critical questions.Participants were not informed that the initial trials

    were practise, and nothing demarcated the practise andcritical testing phases. After completing all 110 ques-tions, participants were given further instructions, andthen began Phase 2. During Phase 2, participants wereshown the same 100 critical questions again, one at atime, with their chosen answer indicated. They wererequired to type in a response to each question again,

    changing their original answer if desired. Unlike Phase1, no deadline was imposed, and there was no mentionof a point system. Participants were also required torate their confidence (1 to 6) after answering eachquestion in both Phase 1 and 2. However, these data

    were not analyzed.Participants in Experiment 2 started with the paper

    and pencil pretest, and then proceeded to the maintest, which was exactly the same as in Experiment 1.

    After completing the pretest answer sheet in blue ink,participants were required to engage in a distracter task(word puzzles) while the experimenter marked theirtest. The answer sheets were then returned to partici-

    pants without feedback, and they were instructed tochoose 10 of their 30 answers (33%) and change theiranswers to them, using a red pen, so that changescould be easily identified. After the changes were com-plete, the experimenter marked the pretest again, whileparticipants continued with the distracter task.Participants were then told their total correct scorebefore and after changing answers.

  • 8/12/2019 Metacognition and Multiple Choice Tests

    4/7

    METACOGNITION AND ANSWER CHANGING 31

    ResultsAlpha level of .05 was adopted for all comparisons.

    The proportion of correct responses on the main test inExperiments 1 and 2, as a function of question difficul-ty, deadline, and test phase, is shown in Figure 1. Thedata from Experiment 1 were submitted to a 2(Difficult/Easy) x 2 (10 s/5 s) x 2 (Phase 1/Phase 2)

    within-subjects analysis of variance (ANOVA), whichrevealed significant main effects of question difficulty,F(1,23) = 64.19, MSE= .017 (difficult: .68; easy: .83), testphase, F(1,23) = 43.13, MSE= .009 (Phase 1: .71; Phase

    2: .80), and a deadline by phase interaction, F(1,23) =10.18, MSE = .005. The interaction was caused by thefact that the opportunity to revise answers had a largereffect on items initially answered at 5-s deadline than at10-s deadline. The data from Experiment 2 were ana-lyzed with a similarANOVA, only with the between-sub-ject factor pretest type (Positive/Negative) added. ThisANOVA, too, revealed main effects of question difficulty,F(1,46) = 152.94, MSE = .015 (difficult: .65; easy: .81),test phase, F(1,46) = 45.95, MSE = .010 (Phase 1: .70;Phase 2: .77), and a deadline by phase interaction,F(1,46) = 8.53, MSE= .006. The interaction reflected thesame data pattern as in Experiment 1. The main effectof deadline was also significant, F(1,46) = 4.29, MSE =.028 (10 s: .75; 5 s: .71). Importantly, the analysisrevealed no effects of pretest, which is why the data forExperiment 2 are presented in Figure 1 collapsedacross this factor.

    A 2 (Positive/Negative) x 2 (Before Change/AfterChange) mixed ANOVA on the number of correctresponses on the pretest revealed a significant interac-tion, F(1,46) = 7.47, MSE = 3.09. As predicted, in thepositive pretest group, answer changing increased themean number of correct answers on the pretest from16.30 (54%) to 16.82 (56%), whereas it decreased it inthe negative pretest group from 18.16 (61%) to 16.72(56%).

    Before calculating the Type 2 SDT measures ofmonitoring (A) and change bias (BD) on the main test,the hit and false alarm rates were adjusted to avoidundefined values (i.e., hit rate = [A + 0.5] / [A + C + 1];

    TABLE 2Mean Monitoring (A) and Report Bias (BD) for the Main Test inExperiments 1 and 2 as a Function of Response Deadline (5 s/10 s)and Question Difficulty (Difficult/Easy)

    Experimental Condition

    SDT Measure 5 s Easy 5 s Diff 10 s Easy 10 s - Diff

    Experiment 1

    A .89 .82 .85 .78BD -.58 -.73 -.83 -.81

    Experiment 2

    Positive PretestA .89 .79 .86 .76BD -.60 -.68 -.69 -.66

    Negative PretestA .82 .81 .83 .77BD -.75 -.69 -.76 -.80

    Note. A = monitoring;BD = change bias; diff = difficult; SDT =signal-detection theory.

    Figure 1. Proportion of correct responses on the main test in Experiments 1 and 2 as a function of experimental phase(Phase 1/Phase 2), question difficulty (difficult/easy), and response deadline (5 s/10 s).

  • 8/12/2019 Metacognition and Multiple Choice Tests

    5/7

    32 Higham and Gerrard

    false alarm rate = [B + 0.5] / [B + D + 1]; Snodgrass &Corwin, 1988). These data are shown in Table 2. A 2(Difficult/Easy) x 2 (10 s/5 s) within-subjects ANOVA onmonitoring in Experiment 1 revealed a main effect ofdeadline, F(1,23) = 7.17, MSE= .004 (10 s: .82; 5 s: .85),and a main effect of question difficulty, F(1,23) = 21.92,MSE= .005 (easy: .87; difficult: .80). Additional analysesdetermined the effect, if any, of pretest type on moni-toring in Experiment 2. Although the analysis of pro-portion-correct data revealed no effect of pretest, a 2(Difficult/Easy) x 2 (10 s/5 s) x 2 (Positive/Negative)mixed ANOVA on monitoring revealed a main effect ofquestion difficulty, F(1,46) = 24.41, MSE = .009 (easy:.85; difficult: .78), and a pretest by difficulty interaction,F(1,46) = 5.09, MSE= .010. As can been seen in Table 2,a negative pretest lowered monitoring relative to a pos-itive one, but only on easy questions. AnalogousANOVAs on change bias revealed a main effect of dead-line in Experiment 1, F(1,23) = 13.18, MSE= .010 (10 s:-.82; 5 s: -.65), indicating that participants changedanswers to more questions initially answered with the5-s deadline than the 10-s deadline. No effects onchange bias were significant in Experiment 2, althoughthe largest Fvalue, F(1,46) = 2.04, MSE= .055, p > .15,

    was also associated with the main effect of deadline.

    DiscussionThe current experiments were conducted to deter-

    mine the role of metacognition in changing answers onmultiple-choice tests. Consistent with previousresearch, providing participants with the opportunity to

    revise their answers increased their overall test score.However, the manipulations of question difficulty andanswer deadline showed that not all errors were cor-rectable; revision in Phase 2 allowed participants tocompletely eliminate any disadvantage caused by theshort versus long answer deadline in Phase 1.Conversely, questions with confusable alternativesremained harder than questions with less confusablealternatives even after participants were given theopportunity to change their answers (Figure 1). Thisfinding may help to explain the paradox that partici-pants simultaneously hold the opinion that correctinganswers is unbeneficial, but generally do change a few

    answers when given the opportunity to do so, and gen-erally improve their score in the process. Errors likethose produced by an answering deadline (e.g., typo-graphical errors, question misinterpretation or misread-ing) are likely to be monitored well, are likely to becorrected, and their correction should function toimprove the test score. However, they are not likely toreceive much attention. In contrast, students are likelyto sweat over questions with confusable alternativesand check them for accuracy when receiving feedback

    on their test performance. But because there is littleimprovement to changing answers to these questions,the possible result is that students form the erroneousbelief that answer changing neverproduces benefits.

    Examining only the proportion of correct answers inExperiment 2 did not reveal any effect of pretest.Indeed, had performance only been analyzed in termsof whether or not participants produced correctresponses, which is typical in the testing literature, we

    would have concluded that pretest had no effec t.However, proportion correct can be a confoundedmeasure of performance. It may be influenced not justby how much participants know, but also by theirmonitoring ability and change bias. By separating outthe monitoring component, using a Type 2 index ofdiscrimination, the effect of pretest was revealed; afterreceiving negative pretest feedback, participants didnot monitor the errors that they made to easy questionsas well as participants given positive pretest feedback(Table 2). The precise explanation for this result willhave to wait for follow-up research, but it is worthspeculating that the experience of negative feedbackcaused participants to recruit negative prior instances ofanswer changing, or of instances of being told byinstructors to stay with first hunches. Recruitment ofsuch negative instances may have led participants inthe negative pretest group to leave more incorrectanswers unchanged, compared to the positive feedbackgroup. Regardless of the exact reason for the monitor-ing decrease, however, it is clear that traditional meth-ods are not suitable for investigating metacognition in

    answer changing, and that something like Type 2 SDTwill be necessary in the long run.

    Philip A. Higham and Catherine Gerrard, School ofPsychology, University of Southampton. Portions of thisresearch were presented at the 13th annual meeting of theCanadian Society for Brain, Behaviour and CognitiveScience, June, 2003 at McMaster University, Hamilton,Ontario, Canada. Thanks to John Vokey, Scott Allen, andSam Hannah for comments on an earlier draft of this paper.PH was Lee Brooks graduate student from 1987 to 1992.He was awarded his PhD in 1993.

    Correspondence concerning this article should be

    addressed to Philip A. Higham, School of Psychology,University of Southampton, Highfield, Southampton, UKSO17 1BJ (E-mail: [email protected]).

    ReferencesBenjamin, L. T., Cavell, T. A., & Shallenberger, W. R. (1984).

    Staying with initial answers on objective tests: Is it amyth? Teaching of Psychology, 11, 133-141.

    Brooks, L. R. (1967). Spatial and verbal components of theact of recall. Canadian Journal of Psychology, 22, 349-

  • 8/12/2019 Metacognition and Multiple Choice Tests

    6/7

    METACOGNITION AND ANSWER CHANGING 33

    366.Brooks, L. R. (1978). Nonanalytic concept formation and

    memory for instances. In E. Rosch & B. Lloyd (Eds.),Cognition and categorization (pp.169-211). Hillsdale, NJ:Erlbaum.

    Brooks, L. R., LeBlanc, V. R., Norman, G. R. (2000). On diffi-culty of noticing obvious features in patient appearance.Psychological Science, 11, 112-117.

    Clarke, F. R., Birdsall, T. G., & Tanner, W. P. (1959). Twotypes of ROC curves and definitions of parameters.Journal of the Acoustical Society of America, 31, 629-630.

    Crawford, C. (1928). The technique of study. Boston, MA:Houghton Mifflin.

    Crocker, L., & Benson, J. (1980). Does answer changingaffect test quality? Measurement and Evaluation inGuidance, 12, 223-239.

    Donaldson, W. (1992). Measuring recognition memory.Journal of Experimental Psychology: General, 121, 275-277.

    Foote, R., & Belinky, C. (1972). It pays to switch?Consequences of changing answers on multiple choiceexaminations.Psychological Reports, 31, 667-673.

    Galvin, S. J., Podd, J. V., Drga, V., & Whitmore, J. (2003).Type 2 tasks in the theory of signal detectability.Psychonomic Bulletin & Review, 10, 843876.

    Geiger, M. A. (1996). On the benefits of changing multiple-choice answers: Student perception and performance.Education, 117, 108-119.

    Grier, J. B. (1971). Nonparametric indexes for sensitivityand bias: Computing formulas. Psychological Bulletin,75, 424-429.

    Harvil, L. M., & Davis, G. (1997). Medical students reasonsfor changing answers on multiple-choice tests.AcademicMedicine, 72, S97-S99.

    Higham, P. A. (2002). Strong cues are not necessarily weak:Thomson and Tulving (1970) and the encoding specifici-ty principle revisited.Memory & Cognition, 30, 67-80.

    Higham, P. A., & Brooks, L. R. (1997). Learning the experi-menters design: Tacit sensitivity to the structure of mem-ory lists. Quarterly Journal of Experimental Psychology,

    50A, 199-215.Higham, P. A., & Tam, H. (in press). Generation failure:

    Estimating metacognition in cued recall. Journal ofMemory and Language.

    Maki, R. H., & McGuire, M. J. (2002). Metacognition for text:Findings and implications for education. In T. J. Perfect& B. L. Schwartz (Eds.), Applied metacognition (pp. 39-68). Cambridge, UK: Cambridge University Press.

    Mathews, C. O. (1929). Erroneous first impressions onobjective tests. Journal of Educational Psychology, 20,280-286.

    McMorris, R. F., DeMers, L. P., & Schwartz, S. P. (1987).Attitudes, behaviours, and reasons for changing respons-es following answer-changing instruction. Jour nal ofEducational Measurement, 24, 131-143.

    Mueller, D. J., & Wasser, V. (1977). Implications of changinganswers on objective test items. Journal of EducationalMeasurement, 14, 9-13.

    Nelson, T. O., & Narens, L. (1980). Norms of 300 general-knowledge questions: Accuracy of recall, latency ofrecall, and feeling-of-knowing ratings. Journal of VerbalLearning and Verbal Behavior, 19, 338-368.

    Pressley, Ghatala, E. S., Woloshyn, V., & Pirie, J. (1990).Sometimes adults miss the main ideas and do not realizeit: Confidence in responses to short-answer and multi-ple-choice comprehension questions. Reading ResearchQuarterly, 25, 233-249.

    Snodgrass, J. C., & Corwin, J. (1988). Pragmatics of measur-ing recognition memory: Applications to dementia andamnesia. Journal of Experimental Psychology: General,117, 34-50.

    Vispoel, W. (1998). Reviewing and changing answers oncomputer-adaptive and self-adaptive vocabulary tests.Journal of Educational Measurement, 35, 329-346.

    Vispoel, W. (2000). Reviewing and changing answers oncomputerized fixed-item vocabulary tests. Educationaland Psychological Measurement, 60, 371-384.

    Vokey, J. R., & Higham, P. A. (2005). Components of recall:The semantic specificity effect and the monitoring of

    cued recall. Manuscript submitted for publication.

    Sommaire

    Deux expriences ont servi faire enqute sur le rle de

    la mtacognition dans le changement des rponses donnes des questions choix multiple faisant appel aux connais-sances gnrales. Dans le cadre de la premire exprience,les participants ont dabord rpondu 100 questions qua-tre rponses possibles. Deux variables indpendantestaient dtermines pour chaque sujet, cest--dire la diffi-cult des questions et le dlai de rponse. La difficult desquestions tait rgle par linclusion de rponses incorrectessusceptibles ou non dtre confondues avec la bonne. En ce

    qui concerne le dlai de rponse, le rpondant avait soit

    cinq soit dix secondes pour rpondre chaque question.Aprs avoir rpondu toutes les questions la premiretape, les participants avaient la possibilit, la deuximetape, de changer leurs rponses chacune des questionset disposaient de tout le temps voulu pour le faire. cettefin, chaque question et les diffrentes rponses possiblesleur taient prsentes nouveau, de mme que leurrponse initiale. Soit que les participants ne modifiaient pasla premire rponse donne (mme rponse) soit quils

    Revue canadienne de psychologie exprimentale, 2005, 59-1, 33

  • 8/12/2019 Metacognition and Multiple Choice Tests

    7/7

    34 Higham and Gerrard

    choisissaient une nouvelle rponse (rponse change). Lesrsultats ont montr que les erreurs commises cause dudlai de rponse taient tout fait corrigeables et que la-vantage de ceux qui bnficiaient dun dlai de rponse dedix secondes plutt que cinq la premire tape tait com-pltement limin la suite de la rvision des rponses ladeuxime. Par contre, les erreurs causes par la difficultdes questions ntaient pas corrigeables, si bien que lavan-tage procur par les questions faciles en comparaison desquestions difficiles la premire tape est demeur entier la suite de la rvision des rponses la deuxime. Cesrsultats nous apprennent que la compression du temps derponse et le potentiel de confusion entre les rponses pos-sibles ont donn lieu des erreurs diffrentes du point devue qualitatif. Lors de la deuxime exprience, les partici-pants ont pass un test prliminaire avant de procder autest (et de bnficier de la possibilit de revoir les rpon-ses) prvu par la premire exprience. Lobjet du testprliminaire tait de confrer aux participants une expri-ence relle positive ou ngative du changement de rpon-ses. cette fin, des questions trompeuses ou nontrompeuses taient intgres au test prliminaire, et les par-ticipants se sont donc trouvs dans lobligation de changerune part leve des rponses. Par leur conception, lesquestions trompeuses suscitaient des rponses initiales in-tuitives fausses, de telle sorte que le remplacement de nom-breuses rponses tendait augmenter la note obtenue. En

    ce qui concerne les questions non trompeuses, le change-ment de nombreuses rponses tendait diminuer la note,comme bien des bonnes rponses taient remplaces pardes rponses errones. Lanalyse de la part des bonnesrponses donnes au test principal donne entendre queleffet de lexprience du test prliminaire tait nul : la partdes bonnes rponses donnes au test principal tait pra-tiquement identique la part obtenue lors de la premireexprience, qui ne prvoyait aucun test prliminaire. Parailleurs, outre la mesure de la part des bonnes rponses,une mesure de discrimination de type 2 fonde sur ladtection des signaux a servi isoler llment de contrlemtacognitif qui jouait dans le changement de rponses.Llment en question tait une mesure de la tendance desparticipants changer les rponses errones et ne pastoucher aux bonnes rponses. Les analyses de llment ontrvl que, contrairement au test prliminaire positif, le testngatif diminuait le rsultat du contrle des questionsfaciles, peut-tre du fait que les participants avaient ten-dance ne pas toucher aux rponses incorrectes qui leurtaient venues spontanment par intuition. Dans lensem-ble, les rsultats indiquent que les recherches futures sur leschangements de rponses devraient prendre en comptedavantage les facteurs mtacognitifs qui sous-tendent lechangement et faire appel cette fin la thorie de dtec-tion de signaux de type 2 afin de dgager cet aspect de laperformance.