staying with initial answers on objective tests is it a myth?people.tamu.edu/~lbenjamin/answer...

Staying With Initial Answers on Objective TestsIs it a Myth?

Ludy T. Benjamin, Jr., Texas A&M UniversityTimothy A. Cavell, Louisiana State Universityand William R. Shallenberger, III, Texas A&M University

The common advice to not change answersappears to be a mistake, but before that is

certain, additional information Is needed.

Since 1928, at least 33 studies have been publishedconcerning a number of issues surrounding answer-changingbehavior on objective tests. Although results in these studieshave sometimes been at variance, the one consistent findingis that there is nothing inherently wrong with changing initialanswers on objective tests. In tact, the evidence uniformlyindicates that: (a) the majority of answer changes are fromincorrect to correct and (b) most students who change theiranswers improve their test scores. None of the 33 studiescontradicts either of those conclusions.

Most of the research in this area has been aimed attesting the accuracy of "first impressions" in test-taking. Thisbit of academic folk wisdom is typically stated as the beliefthat one should not change answers on objective testsbecause initial reactions to test questions are intuitively moreaccurate than subsequent respcnses.

Prevalence of the Belief and Potential Sources. That thisbelief is widespread among test-takers is supported by anumber of studies. Surveys of students' attitudes toward theresults of answer changing are fairly consistent in theiroutcomes revealing that most students (between 68% and100%) do not expect changed answers to Improve theirscore. Indeed, approximately three out of every four of thesestudents felt answer changes would lower their score. Thepercentages of students reporting an expected improve-ment have ranged from 0% to 32% (Mathews, 1929; Foote &Belinky, 1972; Lynch & Smith, 1975; Mueller & Shwedel,1975; Ballance, 1977; Smith, White & Coop, 1979).

Because a majority of test-takers believe that changinganswers will not improve their scores, the obvious questionto ask is where they acquired such a belief. One potentialsource would be manuals or articles on "how to take tests."Actually very few books on test-taking strategies have beenidentified as either perpetrators of the belief of "firstimpressions" or as proponents of answer changing. Interest-ingly, which side of the fence these strategists were on wasoften not clear. For example. Huff's (1961) position was bothpro and con, depending on the article in which he was cited(Jacobs, 1972; Pascale, 1974; Davis, 1975; Lynch & Smith,1975; Stoffer, Davis, & Brown, 1977). For the record. Huff'sposition is a qualified "yes" to answer changing, except in "acase that is very close to sheer guess" {p. 36). However,there have been some blatant examples cited of advicecontrary to all the extant research. In defense of these

strategies, though, it should be said that they rarely recom-mended complete abstinence from answer changing, ratherthey strongly qualified those occasions when it seemedbeneficial. Millman, Bishop, and Ebel (1965) have writtenthat a tendency to make judicious changes in one s answersis a characteristic of "test-wiseness." More recently, Mehrensand Lehmann (1978) have urged teachers to disabusestudents of the misconception, but Crocker and Benson(1980) advise classroom teachers "specifically not to dis-courage response changing" (p. 239).

Because a half-dozen studies have shown the prevalanceof the belief among college students, it is possible that it isreinforced by their instructors. No investigator tc date haslooked at faculty opinions. That oversight prompted oursurvey of faculty at Texas A&M University in the Colleges ofEducation, Liberal Arts, and Science using a brief question-naire that inquired about belief in the outcome of answerchanging on objective tests. Those faculty who indicatedthat they used objective tests were also asked about anyinstructions they gave to students about answer changing.Like students, most faculty (55.2%) believe that changingthe initial answer will lower scores. Only 15.5% of thoseresponding said they thought answer changing would im-prove a student's score. Interestingly, the majority of thefaculty in that group (77.7%) were members of the College ofEducation (see Table 1). Apparently they are more familiarwith the many studies in this area, which is not surprisingbecause much of this research has been published in theeducational literature.

Do faculty members give instructions to their studentsabout changing answers? Of those responding, about athird (32.7%) indicated that they did, and of that group,nearly two-thirds (63.1 %) warned their students not to changetheir answer because the likely outcome would be a greaternumber of wrong answers. The remainder essentially cau-

Table 1. Facuity Responses to Question about theOutcome of Changing Initial Answers on

Objective Tests

CollegeEducationLiberal ArtsScienceAll Faculty

N23191658

Improve tfieTest Score

30.4%5.3%6.2%

15.5%

Hurt theTest Score

52.2%52.6%62.5%55.2%

NoCfiange

13.0%5.3%

12.5%10,3%

Don'tKnow

4.3%36.8%18.8%19.0%

Vol. 11 No. 3 October 1984 133

tioned their students to be judicious in changing theiranswers as they reassessed their first responses. No instructor,even those who indicated their belief that answer changingwould improve scores, advised students to change answerson that basis.

College faculty may contribute to the initiation and/ormaintenance of the belief, but they are not the sole source asindicated by studies noting the prevalance of the belief inhigh-school students and even some non-student populations.Although no investigator has asked the question, it is likelythat many test-takers acquire the admonition on answerchanging from their peers.

The Attitude-Behavior Question. Given the discrepancybetween existing data that belie the validity of the beliefregarding changing initial answers and the evidence for itspersistence, one may rightfully question the influence of abelief which does not seem to affect answer-changingbehavior. At least four authors have dealt with this question,although each in a different manner. Mathews (1929) encour-aged his students to make changes, for doing so was "moreapt to raise than lower their final scores." The effect of thisencouragement upon total changes and/or benefits fromchanging answers is indiscernible, however, due to animposed penalty for guessing (which probably inhibitedsome answer-changing behavior) and the absence of acomparison group. After having first surveyed students'attitudes toward answer-changing results, Foote and Belinky(1972) provided feedback to the same individuals in theform of the percentages of items on two multiple-choicetests changed from wrong to right (W/R), right to wrong(R/W), and wrong to wrong (WAA/). Despite the fact that forthe two tests combined, the percentages (55% for W/Rchanges, 21 % for RAA/ changes, and 23% for WA/V changes)disagreed with students' expectations (only 32% expectedto gain), answer-changing behavior on two subsequent testswas not noticeably altered, in terms of both the number ofanswers changed and results of those changes.

Likewise, Jacobs (1972) found no relationship betweenperceived outcomes from changing answers and net gainsmade. One study (Ballance, 1977) was designed expresslyto look at the effect of students' expectations on theiranswer-changing behavior. Having grouped subjects ac-cording to their stated belief, Ballance concluded there wasno "evidence to support the conjecture that beliefs assertedby a student have an effective relationship with the numberof answer changes . . . [or] fluctuation in test score resultingfrom answer-changing" (p. 165).

Assuming that reported expectancies are a valid mea-sure of actual expectancies, one might then argue thatanswer-changing behavior was poorly correlated withstudents' expectations. Implicit in this argument is the addedquestion of the utility of debunking a "myth" which has noimpact on test-taking behavior. However, in response itshould be noted that on the average about 16% of alltest-takers do not change any of their answers, and of thosewho do change answers, the average number of responseschanged is around 3%. In addition, given the amount ofselective perception that seems to exist among studentswith respect to their own answer-changing results, expectan-cies may still be exercising some inhibitory effects uponanswer-changing behavior. Bath (1967) has pointed out that

students seldom remember those items that they changedand got correct. Would positive expectations concerninganswer-changing results alter answer-changing behavior?In the one study which grouped students by their ex-pectancies, Ballance (1977) found no differences amongthe groups in terms of their answer-changing behavior.However, considering the small number of subjects (N =12), conclusions here must be rather tentative. Thus, it maybe the case that changing one's belief about the conse-quences of one's behavior is a difficult task, but it does notnecessarily follow that answer-changing behavior is indepen-dent of the beliefs surrounding the consequences of thisbehavior.

Returning to the notion of selective perception amongstudents, one could view that as a much more plausibleexplanation for the maintenance of this widely reportedbelief of "first impressions" than that of misleading test-taking manuals or occasional misinformed teacher's instruc-tions (Lynch & Smith, 1975; Stoffer, et al., 1977; Smith, et al.,1979). Moreover, there would have to be at least equallywidespread ignorance of the data which refute this myth. Infact, this apparent ignorance has extended into the circle ofthose who could be expected to know better, that is, theresearchers in this area. For example, Foote and Belinky(1972) in an article published after no less than ten otherstudies in this area, reported finding "no published researchon this question" (p. 667).

Empirical Studies of Answer Changing. Given the persis-tence of this belief among faculty and students, it is hopedthat the literature review provided here may serve to makefaculty aware of the aocumulated evidence on answerchanging and cause them to evaluate their test-takinginstructions. Such awareness may eventually alter studentexpectations on this question. However, the principal pur-pose of this review is to provide a groundwork for the futuredirections in research discussed at the close of this article.Although research on this topic has spanned more than 50years, the collective knowledge gained from this research isdisappointingly small. This state of affairs is due to problemsof experimental design, narrowness of the questions re-searched, and duplication of previous studies, presumablydue to ignorance of their existence. As an aid in this review,Table 2 provides a summary of test format, test content,types of subjects, and results obtained in 33 studies pub-lished between 1928 and 1983.

in this review we will analyze the research in terms oftechniques for assessing answer changing, the extent ofanswer changing, the consequences of those changes, aswell as the various subject and item variables that have beenstudied.

Measurement Strategies to Determine Answer Changes.Researchers have employed varied strategies in an attemptto measure accurately both the number and results ofchanged answers on objective tests. The vast majority,however, have simply relied upon their own ability to detectchanges as indicated by observed erasures and crossouts.Several authors reported using this basic procedure (Mallinson& Miller, 1956; Bath, 1967; Johnston, 1975; Crocker &Benson, 1980). Some studies reported using interobserveragreement to check the degree of acouracy associated with

134 Teaching of Psychology

Table 2. Summary of Conditions and Results in Studies on Answer Changing

FirstAutfior (date)

Test

TF MC

TestContent

Typeof Ss

Percent of changes

Items W-R R-W W-W

PercentW-R/R-W^

Cfiangers Gainers Losers SamersLehman (1928)Lowe {1929)Mathews (1929)Mathews (1929)Lamson (1935)

AchAchTAcfiTAchT

X Educ.^X EdPsy,^X EdPsy.

X EdPsy.X Psy&Ed

Col.'Col.^Col.Col.Col.

78.02.93.22,62.2

66.4 33.5 N/A65.3 32.3 N/A52.4 21.1 26.565,6 34.4 N/A

1.982.132.491.91

51.048.8

36.016.0

13.0

Hill (1937)Berrien (1939)Jarrett (1948)Jarrett (1948)Reile(1952)

AchT

AchT

Educ.X Psy.

Psy.^X Psy.^X Psy.

Col.Col.Col.^Col.*Col.

2.5 58.9 41.1 N/A

2.83.96.2

71.3 28.7 N/A693 16.6 14.148.9 23.0 28.1

1.43

2.494.182.13

96.189.7'89.7'

66.9 23.0 10.1

Mallinson (1956)Archer (1962)Archer (1962)Clark (1962)Bath (1967)

AchTAchTAchTAohTAchT

X Sci, SSX Educ.^X Educ.^X ChildDevX EdPsy.

Col.Grad.Grad,Col.Col. 4.3

48,0 26.7 25.1 1.80

56.0 24.0 20.0 2.3359.8 20.1 20.1 2.97

86.295.688.093.5

68.176.964.6

24,64.6

15.9

7.218519.5

Copeland (1972)Foote (1972)Jacobs (1972)Reiling (1972)Pascale (1974)

AchTAchTAchTAchTAchT

X Chem.X Psy.X Meth.X Econ.X Meas.

Col.Col.Grad.Col.Col.

3.338

32.799,07.0

44.5 16.554.5 21.655,9 20.1

39.023,923.9

2702.522.78

71.3

Davis (1975)Davis (1975)Johnston (1975)Lynch (1975)Mueller (1975)

AchTAohTAchTAchTAchT

Med,Med,Psy.Educ.Educ.

Med.Med.Col.Col.Grad.

5.55.4

2.53.7

58.0 21.0 21.057.6 20.3 22.1

56.5 26.9 16.7

2.762.83

2 105 30

74.583.080.0

69.6

66.5

30.3

17.0

2.9

18.0Smith (1976)Ballance (1977)Edwards (1977)Stoffer (1977)Stoffar (1977)

AchTAchT

AchTAchT

GEDStat.Psy.Psy.AcftRep

H.S.HIthCareCol.Col.U.S.A.F.

4.0 46.5 19.2 34.0

2.53.83.1

66.765.064.5

6.720.017.0

26716.018.5

2.43

10.002.833.79

60 760.3

60.8

67 072.0

11.4

17.0140

12,7

13,014.0

Beck (1978)Johnston (1978)Best (1979)Smith (1979)Crocker (1980)

AptT

AchTAchTAchT

ReadingPsy.Psy.EdPsy.Math

3rd GrdrsCol.Col.Col.7th Grdrs

2.8 58.0 18.7 23.3

3.26.14.0

61.762.3

20.825.6

17.412.1

3.10

2.962.44 86.0

57.0

75.574.7

47.0

Mch^Achievement: Apt = Aptitude; T^Timed^TF== True-False items; MC = Multiole choice items^Ratio of W-R changes to R-W changes"Outcome of answer changing: "Gainers" are those v îth net increases; "Losers" are those with net losses; "Samers'whose net score does not change

^Not clearly specified in the study'Figure combines data from true-false and multiple-ohoJce tests^Special testing instructions and procedures appear responsible for this value

10.010.3

2.4Sitton (1980)Vidler (1980)Range (1982)Skinner (1983)

AchTAptTAchTAchT

XXXX

Psy.Crit.Thk.Psy.Psy.

Col.Col.Col.Col.

3.83.74.0

fil615551

75.7.5

25.038.520.026.3

13.3

24.522.3

2.501.602 802.00

are those answer changers

the measurement of changes (Archer & Pippert, 1962; Foote& Belinky, 1972; Stoffer, et al., 1977; Beck, 1978; Smith, etal., 1979). Typically these studies reported high interobserverreliability (e.g., 99.5%), and several chose to include onlythose items for which there was 100% agreement amongobservers. Several studies (Foote & Belinky, 1972; Davis,1975) sought to enhance accuracy of detection of changesby back-lighting answer sheets. In one study (Archer &Pippert, 1962) this approach also included checks by anoth-er independent judge. The high interobserver agreementprovides support for a number of authors who have com-mented on the ease and unambiguousness of detectinganswer changes (Hill, 1937; Jarrett, 1948; Archer & Pippert,1962; Foote & Belinky, 1972).

In other studies some type of obtrusive measure wasused to get at the "true" number and direction of answerchanges. For some, students were given special instructions(e.g., "Do not erase," "Draw a line through your first judg-ment that I may know what it is.") As a result of theseinstructions, answer-changing behavior may have beeninhibited. Thus Lehman (1928) reported a relatively low rateof changers (roughly six percentage points below the medi-an reported figure of 84%), but Lamson (1935) reported thelowest rate of items changed (2.2%) among all studies. Onestudy (Smith, et al., 1979) which required students to use apen resulted in 6.1% of the Items being changed, one of thehighest percentages reported, excluding two highly deviantobservations produced apparently by special answering

VDI. 11 No. 3 October 1984 135

formats. That this figure supports an argument that manyerasures go undetected is plausible but certainly not evi-dence enough.

Special answer sheets were used in three studies. Typi-cally these sheets included spaces for marking subsequent,as well as initial, answers, Edwards and Marshall (1977) alsoused tests with carbon backing (for detecting erasures) andimposed a penalty for answer changing. In their 1929 study,Lowe and Crawford required that students read ever theentire test before marking any answers in the "secondanswer" space. Jacobs (1972) used a slide projector topresent test items. Items were exposed for cnly 30 to 45seconds, depending on their wcrd count. After all slides hadbeen viewed and attempted, mimeographed copies of thetest were passed out with red pencils for making answerchanges. Considering that the median reported percent-ages of all items which are changed is 3,3%, one wouldsuspect each of these last two strategies tc be a factor intheir substantially larger percentages—10.2% and 32.7%,respectively.

In short, the use of obtrusive strategies for detectinganswer changes cannot be justified in light of the evidence.Reports of high interobserver reliability in unobtrusiveprocedures, the possibility of altered change rates usingobtrusive procedures, and the apparent ease of detectinganswer changes all argue against the use of obtrusivemeasures.

Extent of Answer-Chan ging Behavior Most people changeat least one or mere answers on exams. The data onpercentage of changers are somewhat clouded by the factthat some studies have reported results per exam, but etherstudies averaged figures across exams. This problemnotwithstanding, data from 15 studies indicate that thepercentage of persons changing one or more answersranged from 57% to 96% with an estimated median 84%.This estimate of the median of the combined studies issimply the median of the reported medians. For the combina-tion of other research, the same procedure is used.

In terms of the proportion cf items which were changed,the amount was consistently very small except, as indicatedearlier, in those instances where an obtrusive experimentalmeasurement strategy was employed. Excluding those twostudies which reported such highly inflated percentages, thedata from the 28 studies reporting this figure ranged from alow of 2.2% to a high of approximately 9.0%. The estimatedmedian percentage of items changed was only 3.3%. Thesepercentages are based on a large number of responseobservations ranging from a low of 1,800 to a high of144,370 with the median number of responses being approxi-mately 12,000. These percentages on the extent of answer-changing behavior are comparable to those reported in anearlier article by Mueller and Wasser (1977), even thoughthe current review contains data from an additional eighteenstudies. In short, the data on this question are very consis-tent in terms cf the proportion of persons changing answersand the proportion of answers changed.

Consequences of Answer-Changing Behavior That mostindividuals improved their scores and that most item re-sponse changes are from wrong to right (W/R) are two

observations found consistently throughout the literature onanswer-changing behavior. For purposes of discussion ofthese data we will define "gainers" as individuals whoimprove their score through changes, "losers" as thosewhose score is lowered after making changes, and "samers"as those whose score is neither raised nor lowered as aconsequence of changing answers. Data on the proportionsof these outcome categories should only be compared withrespect to the item formats used on exams. True-false items,unless ultimately left blank, involve only W/R and RA/Vchanges, whereas multiple-choice items also allow for WAA/changes. Moreover, the nature of the item formats may entailresponse tendencies which differ in their propensity to bechanged in a particular direction (W/R, R/W, W/W). Thus, notonly is it the case that one who changes an initially wronganswer on a true-false test marks a correct answer, andsuch is not necessarily true for multiple-choice tests, but it isalso the case that multiple-choice and true-false items mayinvolve recall tasks of different natures with correspondinglydifferent results. This distincticn is most critical when dis-cussing answer change results by item responses sincethere is no W/W category for true-false item change results.

Two studies have reported response changes by indi-vidual subjects taking tests involving only true-false items,and one of these (Lcwe & Crawford, 1929) listed only thepercentage of students who made only W/R changes (49%)and only R/W changes (16%). Consequently, those individu-als who made both changes (35% of all changes) but werealso possible gainers, losers, or samers were not included inthe data. In the other study, Lehman (1928) found 51% cf allchangers were gainers, 36% losers, and 13% samers. Twostudies which used individuals as the unit of analysis in-volved tests with combined multiple-choice and true-falseformats (Berrien, 1939; Mueller & Shwedel, 1975), Therespective percentages frcm the two studies were as follows:67% and 65% improved their score, 23% and 17% lostpoints, and 10% and 18%i did neither.

A general discussion of the consequences of answer-changing behavior by students taking multiple-choice tests(see Table 3) is hindered by the fact that some studies,rather than report the proportion of changers who gained,

Table 3. Range and Estimated Median Values forSix Studies on Answer Changing Outcome in

Multiple-Choice Tests

Students who Students who Students whoMeasure Gained Lost Stayed the Same

18,5%2.9%

14.0%

etc., chose to report the proportion of total test-takers whoexperienced the consequences of answer changes. Usingthis format, Johnston (1978) and Best (1979) reportedvirtually identical results with 75% of all subjects gaining,10% losing, and 15% either staying the same or changing noanswers.

The percentages of changers (based on six studies) whogained points on multiple-choice tests ranged from 47% to77% with an estimated median of 67%. One can see thathad Johnston (1978) and Best (1979) excluded those stu-

HighestLowestEstimated Median

76.9%47,0%67,5%

30.3%2.4%

15,0%


dents who had made no changes, their figures wouid havebeen even higher, and thus some of the largest percentagesreported. The percentages of iosers ranged from 2.4% to30% with an estimated median percentage of 15. Theestimated median percentage of students who neither im-proved nor hurt their grade, but did change answers, was14%, with a range of samers from 3% to 18%.

Recognizing the sparse data on true-false items, therenevertheless seems to be an increase in the proportion ofindividuals who gain changing item responses as the testformats change from true-false to multiple-choice. This factis illustrated by the median ratios ot numbers of gainers tolosers for true-false (1.43 to 1), combination (3.9 to 1), andmultiple-choice formats (5,3 to 1). Regardless cf test format,all studies conciuded that the majority of students improvetheir score by changing initiai answers.

Answer changing results as reported by item responsesare typically stated in terms of the percentages of totaichanges which were W/R, R/W, and W/W changes (seeTable 4A). For true-false tests, W/R changes ranged from59% to 71 % with an estimated median cf 66% from the five

Table 4. Range and Estimated Medians ofAnswer Changes by (A) Outcome and

(B) Ratio of Gainers to Losers

Category Highest

A, By Percent ot Cutcome of

Multiple-Choice Studies (N =20)Wrong to RightRight to WrongWrong to Wrong

True-False Studies (N = 5)Wrong to RightRight to Wrong

69.326.739.0

71,341.1

Lowest

Change

44,56.7

12.1

58.928.7

B. By Ratio of Gainers to Lcsers

Muitiple-Choioe Studies (N-20)True-False Studies (N = 5)

10.00 to 12.49 to 1

1.80 to 11,43 to 1

EstimatedMedian

57.820.222.8

65.633.5

2.77 to 11.98 to 1

Studies reporting. The one study {Lynch & Smith, 1975) thatused both types of items in one format and which looked atthese proporticns found 57% ot all item responses changeswere in the W/R category. Twenty studies used tests havingonly multiple-choice items. Percentages ranged from 44% to69% with an estimated median of 58% fcr W/R changes. ForR/W changes, the true-false item studies had percentagesranging from 29% to 41 % with an estimated median cf 33%,for the one combined format, 27% of the changes were RA/V,whereas multiple-choice formats produced R/W changepercentages ranging from 7% to 26% with an estimatedmedian of 20%, Of course, no data exist for W/W changes ontrue-false tests, but the one combination format produced17% W/W changes. For multiple-chcice item W/W changes,percentages ranged from 12% to 39% with an estimatedmedian of 23%.

Figures on the ratios of W/R to R/W changes offer a goodsummary of item response change results from the differenttest formats. Studies using true-false items had ratios rang-ing from 1.43 to 1 to 2.49 to 1 with an estimated median of1.98 to 1. Mueller and Shwedel (1975) used a combinationformat but did not report on the breakdown of percentages

for each change category. However, they did find a ratio ofW/R to R/W changes of 5.3 to 1 compared to a ratio of 2.1 to1 in a similar study reported by Lynch and Smith (1975).Combining the data from these two studies produces anestimated median ratio of 3.7 to 1. Multiple-choice itemresponse changes produced W/R:R/W ratios (see Table 4B)ranging from 1.80to 1 to lOto 1 (the next highest being 4.18to 1) with an estimated median of 2.77 to 1. Ignoring thefigures from the combination formats (based on only twostudies using unstated proportions of true-false to multiple-choice items), and focusing on the median ratios of true-false and multiple-choice item response change results,allows the following conclusions: Approximately every threechanges on the true-false tests examined resulted in twocorrect and one incorrect answers whereas every fourchanges on muitiple-choice tests have produced threecorrect and one incorrect answers. This second ratio doesnot include W/W answer changes because those changesdo not hurt the students' performance.

Subject Variables. Many studies have attempted to differen-tiate between changers and non-changers and amonggainers, losers, and samers based on some external criterion.Typical variables investigated have been academic ability,sex, and several personality variables.

Academic Ability. The variable most often used to indi-cate a student's ability has been the student's test scores:sometimes sccres for the entire course and sometimesscores for the particular exam under study. Of course, thereexists a confounding cf any relationships propcsed to existbetween test sccres and numbers and types (W/R, R/W) ofrevisicns made due to the fact that the former is partly aproduct of the latter. Lehman (1928) was the first to note thisrelationship and others have followed suit (e.g., Mueller &Shwedel, 1975; Best. 1979) attempting either to circumventor minimize the importance of this problem. Lehman re-examined tests using initial answers only and found only"two or three" students whose position changed with re-spect to the median. Mueller and Shwedel chose to look atthe relationship between test scores and WA/V changes inorder to assess independently which students made thegreatest gains in answer changing. They repcrted a signifi-cant negative correlation (-.22) between total scores andW/W changes. Thus higher-scoring students made fewerWA/V changes. Best emphasized the unimportance of suchconfounding after cbserving the small proportion of itemschanged per test compared to the number cf points neces-sary to alter one's grade. Best's observations, while theymay be valid in the context of his study, are not necessarilygeneralizable to all studies of this sort.

The problems of using test scores as a subject/classifica-tion variable notwithstanding, several investigators havelooked at the possible relationship between a student's testscore and his or her answer-changing behavior. If one wereto view extensive answer changing as a sign of test-takinguncertainty, then it is not surprising to find that most authorsreport an inverse relationship between test scores and thenumbers of revisions made by students: Six studies reporteda statistically significant negative relationship (Lynch &Smith, 1975; Mueller & Shwedel, 1975; Stoffer et al., 1977-Johnston, 1978; Best, 1979; Sitton, Adams & Anderson'


1980) but five others reported nonsignificant results in thesame direction (Reile& Briggs, 1952; Reiling & Taylor, 1972;Pascale, 1974; Smith & Moore, 1976; Crccker & Benson,1980). Only in one study is there a positive relationship(Beck, 1978) and it was nonsignificant, and two otherstudies showed no trend in either direction (Bath, 1967;Copeland, 1972). Although it is true that most subjectsbenefit from answer changing regardless of test score, andthat most item response changes are from wrong to right, itis obvious that one cannot assume a positive linear relation-ship between test scores and the number of answer changes.

The statement that increased answer changing will im-prove one's score is tempered, not only by the data on testscores, but also by the conflicting data on net gains enjoyedversus the numbers of revisions made: Six studies haveshown that profit from answer changing increased signifi-cantly with the number of revisions (Reiling & Taylor, 1972;Pascale, 1974; Mueller & Shwedel, 1975; Stcffer et al., 1977;Sitton et al., 1980; Range, Andersen & Wesley, 1982), threeothers found negative but nonsignificant trends (Lehman,1928; Mathews, 1929; Archer & Pippert, 1962), and onestudy showed no relationship in either direction (Berrien,1939). Obviously a point must be reached where answerchanges are no longer beneficial to the test-taking student.Assuming revisions are not made randomly, an individualchanging an exorbitant number ct answers soon leavesbehind the rationale for answer-changing behavior thatproduces the gains typically reported.

Given that approximately 70% of all subjects gain fromchanging answers, can it be assumed that those who enjcygreater profits actually have better test scores? The answeris basically "yes," for eleven studies a positive relationshipwas found between test score and net profit (Lehman, 1928;Reile & Briggs, 1952; Mallinson & Miller, 1956; Archer &Pippert, 1962; Copeland, 1972; Reiling & Taylor, 1972;Lynch & Smith, 1975; Mueller & Shwedel, 1975; Smith &Moore, 1976; Stoffer et al., 1977; Beck, 1978), with six ofthose shewing significant results.

Other subject variables pertaining to achievement orability have provided few clues in determining why somepeople change answers and, of those who change, whysome profit and others do not. Achievement variables in-clude course grade and grade point average (G.P.A.),whereas scores on Borgatta and Corsini's (1964) OuickWord Test (OWT), the Scholastic Aptitude Test (SAT), andthe Airman Oualifying Exam (AQE) have been used as abilityvariables. Individuals who were rated low on these variablestypically were found to change more answers (Mathews,1929; Reiling & Taylor, 1972; Pascale, 1974; Stcffer, et al.,1977), however, in only the last of those studies was therefound a significant relationship between ability (as mea-sured by the SAT) and frequency of answer changing. Innone of these studies was there any relationship betweenany of the achievement/ability variables and the degree ofprofit from answer changing.

Sex. Sex has also been investigated as a subject vari-able in predicting answer changing frequency and/cr gainsin a number of studies (Reile & Briggs, 1952;Mallinson &Miller, 1956; Bath, 1967; Copeland, 1972; Reiling & Tayior,1972; Pascale, 1974; Mueller & Shwedel, 1975; Stoffer, et al.,1977; Beck, 1978; Sitton, et al,, 1980; Skinner, 1983). Three

of these studies showed a slightly higher frequency ofanswer changing by females and one study (Skinner, 1983)reported the only statistically significant sex difference,finding that females changed twice as many answers asmales. With regard to gains made from answer changing,Reile and Briggs (1952) reported that females profited less,but Bath (1967) found that females made larger gains thanmales. Both studies have problems. Reile and Briggs sug-gested that their findings might be confounded by somepre-existing performance differences which restricted thepotential for gains by females, but an analysis of Bath's dataon sex differences simply does not support his conclusion.In short, if sex differences do exist, they seem to be in thefrequency of answer changing and not in the gains madefrom those changes.

Personality. Several writers have suggested that person-ality variables may contribute to the separation betweenchangers and nonchangers and between gainers and losers.Tc date, cnly three investigators have reported research inthis area. In an a posteriori analysis of subjects' scores onRotter's t-E scale, Stoffer, et al. (1977) found no relationshipbetween the I-E measure and the number of revisions astudent made. Sittcn, et al. (1980) and Range, et al. (1982)have correlated a host of personality variables with frequen-cies and outcomes of answer changing. They repcrt thatanxious students change more answers than non-anxiousones, and that nondepressed students are more successfulin their answer changes. One variable that may be related toanswer changing is impulsivity (Mueller & Shwedel, 1975),however it has not been investigated to date.

Item Variables. Researchers have only occasionally lookedat the characteristics of test items in their efforts to detectseme distinguishable pattern cf answer-changing behavior.Item difficulty and item position are the variables that havereceived the most attention.

Item difficulty has been defined in varying ways byresearchers. Mcst often, p-values (i,e, the proportion ofstudents who answered an item correctly) were used, eitherderived from the sample data or from previously studiedsamples, Jacobs (1972) and Beck (1978) used previouslyobtained p-values to establish item groups based on ievel ofdifficulty. Lynch and Smith (1975) and Jackson (1978)apparently based their correlational analysis of item difficultyon p-values, aiso. Best (1979) formed his easy-difficult itemgrouping based on whether items fell above or below themedian percentage correct for all test items. Berrien (1939)and Pascale (1974) chose to rate whole tests as either"difficult" or "easy," the former doing so intuitively and thelatter doing so empirically based on the mean p-values ofthe two tests employed. As with test scores, measures ofitem difficulty derived from the sample data may be open toconfounding if linked to measures such as net profit due tcanswer changing. Still it seems unlikely that item difficultyvalues would be substantially affected by response changes(Best, 1979).

Generally it can be said that difficult items are more likelyto be changed than easier ones. Most of the studies indicatetrends in that direction, however, only four (Lynch & Smith,1975; Beck, 1978; Jackson, 1978; Vidler & Hansen, 1980)have reported results that reached statistical significance.


Of course for poorer students, that is, those who areill-prepared for the test, many more items will be seen asdifficult (Lyrich & Smith, 1975). That observation can beused to explain the fact that those students typically changemore of their answers.

Data on the relationship between item difficulty and profitare conflicting, and the reason for these conflicts are not atall apparent, Jacobs (1972) found most net gains involvedchanges of items which were either moderately or highlydifficult. Similarly, Pascale (1974) found a significant positivecorrelation between a test's difficulty and the number ofadditional correct answers due to revisions. Beck (1978),however, reported that changes on easy items were signifi-cantly more likely to be correct than changes on hard items.Controlling for test performance. Best's (1979) data providefurther evidence on the opposing side based on a compari-son of W/R:R/W ratios for difficult (2.47) versus easy items(4.28).

It seems reasonable to conclude that item position mightbe an important variable. Most tests have a time limit, soreview of the answers may not proceed beyond the earliertest items. In addition, seeing items later in the test mayprovide a context in which answers for earlier items might bereviewed and changed. Thus, on an intuitive basis, onecould argue that items at the beginning of a test would bechanged more frequently than items at the end. Only twostudies have examined the effect of item position on answer-changing behavior and both found that fewer changes weremade near the end of the test (Reile & Briggs, 1952:Jackson, 1978). Jackson also examined alternatives withinan item subgroup using partiai correlation. Results showeditem position within a test subgroup of "some specificstimulus material" produced significant zero-order correla-tions with the number cf answers changed. The trend wasfcr earlier items to have more changes, an effect descnbedby Jackson as due to novelty. No one has looked at thereiationship between item position and net gains fromchanging.

In addition to level of difficulty and position, two otheritem variabies have been researched. Reiiing and Taylor(1972) attempted to relate items which involved "analyticalreasoning" to answer-changing behavior but found no rela-tionship tc exist with either the amount of answer changingor the net gain enjoyed, Jackson (1978) did find someevidence (significant results for one of the three samples hetested) that items of low discriminating power tend to bechanged more than answers to other items.

Although the research on the influence of item variabieson answer-changing behavior is fairly sparse, Jackson'sview that changed items are often misinterpreted items isgiven some credence by the results. Item difficulty, novelty,position, and discriminabiiity would all seem to affect, tovarying degrees, the propensity of an item being changed.

Conclusions and Future Directions, After mere than a half-century of research on this topic, what do we knew aboutanswer changing? Based en the 33 studies reviewed in thisarticle we know that: (a) only a small percentage of items areactually changed, (b) most of these changes are from wrongto right answers, (c) most test-takers are answer changers,and (d) most answer changers are point gainers. Thesefindings would appear to refute the widespread belief about

the consequences of answer changing as well as question-ing what effect, if any, that belief may have on actualbehavior.

Having established that most test-takers change an-swers on objective tests in spite of their purported beliefs,what are their reasons for making those changes? Amazingly,no researcher to date has attempted seriousiy to answer thatquestion. Obviously there are different reasons for changingan answer, for example, marking the wrong space on theanswer sheet or simply misinterpreting the item (e.g,, notseeing the negative in the stem of the item). In these cases itis most unlikely that the test-taker will beiieve the assertionabout the ccrrectness cf initial answers, and thus will readiiymake those changes. Another reason for changing ananswer is that the test-taker is unsure of the answer; indeed,it may have been a guess. In this situation, answer changingmay be pursued with greater reluctance. In fact, it ispossible that in this latter case, the myth may be nc myth atall.

In short, the research tc date has treated all answerchanges as though they were the same. That proceduredoes not permit a real answer to the questions, indeed, itignores the crucial question on this topic Future research onanswer changing must begin with a way to segregateanswer changes with respect to the reasons for thesechanges and should focus en those answer change situa-tions in which the answer is altered due tc seme cognitiveevaluation. Indeed, studies of the decisional process itselfcculd shed more light en answer changing In the researchto date we are limited to studying the items that were actuallychanged, and are unable to know how many other itemswere considered but left unchanged. Both of these issues,that is, the reasons for change and the consideration ofitems that were net changed, could be studied by an inquiryprocedure immediately following the test. Such a procedurewould be awkward under most testing conditions, which is tosay that researchers may need to construct their own testingsituations specific to the problem, rather than relying exclu-sively on extant samples and tests that are part of collegecourses.

A related variable to be considered in this research is theconfidence the student has in a particular answer. Whatconfidence level has to exist before a student would bewilling to change an answer? Oniy Skinner (1983) haslooked at that question. He reported that students felt adifferent answer should have a high probability (74.6%) ofbeing correct before a change was made. Further, how isconfidence level as a variable affecting answer changingrelated to the student's level of belief in the caveat aboutanswer changing? These questions are the most basic inthis area, yet they have been largely ignored in the extantresearch.

Almost all cf the studies to date have used regularcourse exams to evaluate answer changing. These areachievement tests which are typically administered under atime limit, although they are not speed tests in the strictestinterpretation of that label. Students may have time toanswer all of the questions, but it is unlikely that they havethe opportunity tq review their answers thoroughly, thusadding to the potential for answer change. Although it wasimpossible to discern in all cases, we could not identify a


single study in the 33 reviewed here which used a testingprocedure that allowed test-takers as much time as theywanted to complete the test. Time constraints are always acritical variable in test taking, and in research assessinganswer changing one ought to be concerned that studentshave the opportunity to change answers. It seems quiteplausible to assume that rates and, perhaps, consequencesof answer changing could be affected by timed versuspower testing conditions.

Other test variables are also potentially important issuesin this research. Do test-takers perform differently in terms cfanswer changing on aptitude versus achievement tests? Todate only two studies (Beck, 1978; Vidler & Hansen, 1980)have used an aptitude test. Similarly, these studies haveused tests that are largely norm-referenced and few that aredomain-referenced. Answer changing may or may not differbetween those types of tests. It is impossible to answer thatquestion based on the research to date, but we believe it is aquestion worth pursuing, Researchers in this area need tostate the purpose cf the tests they are using. Althoughtesting purpose is often implied, a more definitive statementwould add to the understanding of this literature. Further,penalties for guessing should be studied as they affect thefrequency of answer changing and the outcome cf thosechanges.

Another area for future research concerns the apparentdiscrepancy between what students say about answer chang-ing and what they actually do, an issue which we referred toearlier as the "attitude-behavior question," So much evi-dence in psychology has indicated a strong correspon-dence between belief and action that it calis for investigationof this discrepancy. Yet such an analysis cannot be an-swered unless we know why students change answers, andunder what conditions. That information is mandatory if weare to understand the way in which the belief operates.

Further, we would encourage additional research focus-ing on the interrelationships cf item variables (such asdifficulty and nature of item distractors) and cognitive strate-gies of test takers. Research to date has emphasizedacademic ability, using mostly confounded measures, whiieignoring some better measure of both cognitive ability andstyle. Among the subject variables, we believe that cognitivevariables are likely to be far more meaningful than othersstudied thus far, such as sex or certain personality traits.

In conclusion, this review has noted a remarkable consis-tency of results across 33 separate investigations. That kindof consistency is rare in psychology and the authors are dulyimpressed with the harmony of those data, That consistencyof results has caused a number of researchers in this area toargue that the statement about the accuracy cf first impres-sions is a myth. That conclusion may be accurate, but we donot believe that it is justified on the evidence prcvided todate. Until researchers are able to segregate answer chang-ing responses, thus separating those that are the result of"second impressions" from other kinds of changes, theconclusions of this research must be considered tentative.

The question of whether answers shculd be changed ornot is an important one, not only for test-taking strategies,but for an understanding of the psychological processesinvolved in that behavior. Our hope is that future researchwill seek to systematically answer the questions raised in thisreview.

References

Archer, N, S,, & Pippert, R, Don't change the answer! An expose ofthe perennial myth that the first choices are always the correctones. Clearing House. 1962. 37. 39-41,

Ballance, C. T, Students' expectations and their answer-changingbehavior. Psychological Reports. 1977, 41. 163-166,

Bath, J. A, Answer-changing behavior on objective examinations.Journal ot Educational Research. 1967, 67, 105-107,

Beck, M.D, The effect of item response changes on scores on anelementary reading achievement test. Journal of EducationalResearch. 1978, 77, 153-156,

Berrien, F. K, Are first impressions best on objective tests? Schooland Society. 1939, 50, 319-320,

Best, J, B, Item difficulty and answer changing. Teaching ofPsychology. 1979, 6, 228-230.

Borgatta, E. F,, & Corsini, R, J Quick word test. New York:Harcourt, Brace & World, 1964,

Clark, C, A, Should students change answers on objective tests?Chicago Schools Journal. 1962, 43. 382-385.

Copeland, D, A. Should chemistry students change answers onmultiple-choice tests? Journal of Chemistry Education. 1972, 49,258,

Crocker, L , & Benson J, Does answer-changing affect test quality?Measurement and Evaluation in Guidance. 1980, 12. 233-239.

Davis, R. E. Changing examination answers: An educational myth?Journal of Medical Education. 1975, 50, 685-687.

Edwards, K. A,, & Marshall, C. First impressions on tests: Somenew findings. Teaching of Psychology, 1977, 4. 193-195,

Foote, R,, & Belinky, C, It pays to switch? Consequences ofchanging ansvyers on multiple-choice examinations. PsychologicalReports. 1972, 31. 667-673.

Hill, G. E, The effect ot changed responses in true-false tests.• Journal of Educational Psychology. 1937, 28. 308-310,Huff, D. Score—the strategy of taking tests. New York: Ballantine

Books, 1961.Jackson, P. F. Answer changing on objective tests. Journal of

Educational Research. 1978, 71. 313-315,Jacobs, S, S, Answer changing on objective tests: Some implica-

tions for test validity. Educational and Psychological Measurement.1972, 32. 1039-1044,

Jarrett, R, F, The extra-change nature of changes in students'responses to objective test-items. Journal of Genera! Psychology.1948, 38. 243-250,

Johnston, J, J. Sticking with first responses on multiple-choiceexams: For better or worse? Teaching of Psychology. 1975, 2,178-179,

Johnston, J. J. Ansv /̂er-changjng behavior and grades. Teachingof Psychoiogy. 1978, 5, 44-45,

Lamson, E. E, What happens when the second judgment isrecorded in a true-false les\7 Journal of Educational Psychology,1935, 26. 223-227,

Lehman, H, C, Does it pay to change initial decisions in atrue-false test? School and Society. 1928, 28. 456-458,

Lowe, M, L,, & Cravi/ford, C, C, First impression versus secondthought in true-false tests. Journal of Educational Psychology.1929, 20. 192-195.

Lynch, D, O., & Smith, B, C, Item response changes: Effects ontest scores. Measurement and Evaluation in Guidance, 1975, 7,220-224.

Mallinson, G. G,, & Miller, D, J, The effect of second guessing onachievement scores of college tests. In Yearbook of the NationalCouncil on Measurement in Education. Volume 13, 1956, pp,24-26,

Mathews, C, O, Erroneous first impressions on objective tests.Journal of Educational Psychology. 1929, 20. 280-286,

Mehrens, W, A,, & Lehmann, I. J, Measurement and evaluation Ineducation and psychology. New York: Holt, Rinehart and Winston.1978,

Millman, J,, Bishop, C, H., & Ebel, R, An analysis of test-wiseness.Educational and Psychological Measurement. 1965,25,706-726,

Mueller, D. J., & Shwedel, A. Some correlates of net gain resultantfrom answer changing on objective achievement test items.Journal of Educational Measurement. 1975, 72,251-254,


Mueller, D, J,, & Wasser, V, Implications of changing answers onobjective test items. Journal of Educational Measurement. 1977,U. 9-13,

Pascale, P, J, Changing initial answers on multiple-choice achieve-ment tests. Measurement and Evaluation in Guidance, 1974 6236-238,

Range, L M., Anderson, H, N,, & Wesley, A. L Personalitycorrelates of multiple ohoice answer changing patterns. Psycho-logical Reports. 1982, 5?, 523-527.

Reille, P, J,, & Briggs, L J, Should students change their initialanswers on objective-type tests?: More evidence regarding anold problem. Journal of Educational Psychology 1952 43110-115,

Reiiing, E., & Taylor, R. A new approach to the problem ofchanging initial responses to multiple choice questions. Journalof Educational Measurement. 1972, 9, 67-70.

Sitton, L R,, Adams, I, G, & Anderson, H, N. Personality correlatesof students' patterns of changing answers on multiple-choicetests. Psychological Reports. 1980, 47. 655-660,

Skinner, N, F, Switching answers on multiple-choice questions:

Shrevt/dness or shibboleth? Teaching of Psychology. 1983, 10.220-222,

Smith, A,, & Moore, J. C. The effects of changing answers onscores of non-test-sophisticated examinees. Measurement andEvaiuafion in Guidance. 1976, 8. 252-254,

Smith, M,, White, K. P.. & Coop, R- H, The effect of item type on theconsequences of changing answers on multiple-choice tests,Journai of Educational Measurement. 1979, 76, 203-208,

Stoffer, G, R,, Davis, K, E., & Brown, J. B, The consequences ofchanging initial answers on objective tests: A stable effect and astable misconception. Journal of Educational Research. 1977,70. 272-277,

Vidler, D,, & Hansen, R. Answer changing on multiple-choiceiests. Journal of Experimental Education, 1980, 49. 18-20,

NoteRequests for reprints should be addressed to Ludy T, Benjamin, Jr.,Department of Psychology, Texas A&M University, College Station,TX 77843.

Undergraduate Psychology Research Conferences:Goals, Policies, and Procedures

Alan L. Carsrud, University of Texas at AustinJoseph J. Palladino, Indiana State University EvansvilleElizabeth Decker Tanke, University of Santa CiaraLyn Aubrecht and R. John Huber, Meredith Coilege

Conferences such as described here "can makea vital contribution to the educational

process" of AB and graduate bound majors.

The purposes cf this paper are to: (a) examine theassumptions underlying undergraduate research in psy-chology, (b) present survey data that describe the currentstate of undergraduate psychology research conferences,(c) explore the .roles undergraduate research conferencesplay in science education, and (d) provide suggestions forfaculty members who would like to organize undergraduateconferences in such a way as to maximize the contributionsto the educational prccess.

Assumptions About Science Education. Psychology educa-tors have traditionally assumed that a knowledge of re-search methodology and concepts is essential. "Past confer-ences on undergraduate education in psychology haveaffirmed the importance of instruction in statistics and meth-odology in the major program. Consideraticn of the role thatmethodolcgy plays in a frontier science indicates that theemphasis has been appropriate" (Kulik, Brown, Vestewig, &Wright, 1973, p, 55), Cole and Van Krevelen (1977) surveyedpsychology departments in smali liberal arts colleges andfound that "Courses in statistics, research design andmethodology were by far the most commonly identified asappropriately required of all psychciogy majors" (p. 165),Furthermore, their respondents clearly believed that knowl-edge of methodology is "an important contribution , , , to astudent's liberal arts education" {p, 167). In a survey ofpsychology baccalaureates, Davis (Note 2) found that cours-

es in statistics and experimental design were most frequent-ly cited as useful tc the respondents in their presentoccupations, even though the occupations were not directlyrelated to psycholcgy.

An examination of the "typical" undergraduate psycholo-gy curriculum indicates that the suggested emphasis onresearch methods is not always realized in practice. Manypsycholcgy departments require only one or two methodolo-gy courses for completion of a major. The typical contentcourse in psychology provides little direct experience withthe majority of techniques and methods used to acquire thefacts studied. Commonalities in design and analyses existregardless of the content area; for example, almost allstudies use control groups and certain statistical tests, Ascientist-teacher of invertebrate physiology or inorganicchemistry would probably not teach the content of an areawithout giving students direct experience with the researchtechniques that generated those facts. The opposite is thecase for many psychologists who teach content courseswith little direct contact with the methodology involved. AsCampbell and Fiske (1959) have noted, the majority otvariance accounted for in most studies is related to themethods used rather than the concepts being studied, yetwe often do not teach this to undergraduates.

In addition, the particular techniques taught in researchmethods courses are frequently limited by the interestsand/or training of the instructor. Thus, instructors may ignore.


staying with initial answers on objective tests is it a myth?people.tamu.edu/~lbenjamin/answer...

Documents