The benefits of knowing what you know (and what you don’t): How calibration affects credibility
Post on 21-Oct-2016
were justied in believing what they believed. 2008 Elsevier Inc. All rights reserved.
nt seethan p995), hwol &nowle
1997).All these factors are surely relevant, but there may be a more
straightforward explanation. People can rarely independently ver-ify what others tell them, and without that verication they cannotfully assess their sources reliability. In the absence of any evidencefor incompetence or duplicity, it seems functional to assume that
it despite available cues that the condenceaccuracy relation isweak (Keren & Teigen, 2001; Price & Stone, 2004; Thomas & McFa-dyen, 1995).
The importance of calibration
Notwithstanding the many benets of condence, the notionthat more condence is always better does not hold up under scru-tiny. Adjectives like cocky, bull-headed, know-it-all,and
* Corresponding author.
Journal of Experimental Social Psychology 44 (2008) 13681375
Contents lists availab
.eE-mail address: email@example.com (E.R. Tenney).2004). Why does having condence about ones own knowledgegarner so much credibility?
There are some subtle psychological explanations for why peo-ple rely more heavily on those who exude condence. People aremotivated to reduce uncertainty (e.g., Loewenstein, 1994), and of-ten outcomes predicted with extreme condence do more to re-duce uncertainty than outcomes predicted with moderatecondence (Keren & Teigen, 2001). In addition, people might prefercondent informants for social reasons: it is harder to justify toothers a preference for ambiguity (Curley, Yates, & Abrams,1986), and sometimes it eases interpersonal and group interactionsto agree with someone who is sure of herself (Zarnoth & Sniezek,
Of course, people do not rely solely on condence to judge cred-ibility. When possible, they also assess accuracy, which is why law-yers try hard during cross-examination to reveal errors incourtroom testimony and why meteorologists who are oftenwrong get red. Some previous research has suggested that the ef-fects of condence and accuracy are additive: there is a main effectof condence (more condence/more credibility; Sporer, Penrod,Read, & Cutler, 1995) and a main effect of errors (more errors/lesscredibility; e.g., Berman & Cutler, 1996) but no interaction (e.g.,Brewer & Burke, 2002). Other research has suggested that peopleinitially equate condence with accuracy, but then fail to modifyEyewitness testimonyMetacognition
People who are extremely condein life. They are believed more oftencondence (e.g., Penrod & Cutler, 1groups or when giving advice (Van S& Sniezek, 1997), and appear more k0022-1031/$ - see front matter 2008 Elsevier Inc. Adoi:10.1016/j.jesp.2008.04.006m to have an advantageeople who express lowave more inuence inSniezek, 2005; Zarnothdgeable (Price & Stone,
sources have some insight into the quality of their own knowledge,so that a high-condence assertion indicates better, more reliableinformation than a low-condence assertion (cf., Bradeld &Wells,2000; Price & Stone, 2004; Thomas & McFadyen, 1995; Yates, Price,Lee, & Ramirez, 1996; Zarnoth & Sniezek, 1997).
The importance of condence and of accuracyCredibilityCondence
racy alone explains judgments of credibility; rather, whether a person is seen as credible ultimatelydepends on whether the person demonstrates good calibration. Credibility depends on whether sourcesReports
The benets of knowing what you knowHow calibration affects credibility
Elizabeth R. Tenney a,*, Barbara A. Spellman a, RobertaDepartment of Psychology, University of Virginia, 102 Gilmer Hall, P.O. Box 400400, ChbGoldman School of Public Policy and UC Berkeley School of Law, University of Californi
a r t i c l e i n f o
Article history:Received 29 December 2007Revised 31 March 2008Available online 3 May 2008
a b s t r a c t
People tend to believe, andmore than a mere condabsence of other informatirect. However, when peoplrelationship) they overridepants choose between two
Journal of Experimen
journal homepage: wwwll rights reserved.nd what you dont):
ttesville, VA 22904-4400, USAerkeley, USA
e advice from, informants who are highly condent. However, people usece heuristic. We believe that condence is inuential becausein thepeople assume it is a valid cue to an informants likelihood of being cor-et evidence about an informants calibration (i.e., her condenceaccuracyiance on condence or accuracy alone. Two experiments in which partici-posing witnesses to a car accident show that neither condence nor accu-
le at ScienceDirect
al Social Psychology
lsevier .com/locate / jesp
_indeed, overcondentprovide clues that folk psychology has amore nuanced, contingent view of condence. In this paper, wepropose a hypothesis about how condence and accuracy inuencecredibility that extends previous research and captures thatnuance (called for convenience the presumption of calibrationhypothesis):
1. People initially presume, in the absence of relevant evidencethat informants are well calibrated (i.e., that informants are
travelers receive directions from strangers, and when journalistsinterview sources.
E.R. Tenney et al. / Journal of Experime tal Sgood judges of their own knowledge). Thus, high-condentinformants should be most persuasive (and it looks as if a sim-ple condence heuristic is at work).
2. People will override that initial presumption when evidencethat enables the assessment of the informants calibrationbecomes available. Then high-condence statements fromwell-calibrated informants should be most persuasive, regard-less of the informants overall condence.
In other words, the presumption of calibrationwhich is as-sumed, not inferred from any evidencegets dropped when actualevidence regarding calibration becomes available. People rarelyhave the opportunity to accurately assess the overall calibrationof othersthat usually takes many examples of condenceaccu-racy pairings (Lichtenstein, Fischhoff, & Phillips, 1982). Instead,we claim that people will generalize from any useful evidence thatcalls into question the informants calibration.1 People do not trustothers simply because they act condent (cf. Price & Stone, 2004;Thomas & McFadyen, 1995); rather, they trust that condent othershave a basis for their condence, unless proven otherwise.
Support for the presumption of calibration hypothesis
Our previous research supports the idea that people will useavailable calibration information to override an initial presumptionthat a more condent witness would be more accurate (Tenney,MacCoun, Spellman, & Hastie, 2007). Participants read depositionsof two opposing witnesses to a car accident. The condent wit-ness claimed to remember everything clearly, whereas theuncondent witness claimed to remember some details aboutthe day better than others. The witnesses were both condentabout their central claims of who was at fault in the accident. Par-ticipants initially were more likely to believe the condent witnessand rated that witness as more credible. Later, however, theylearned that each witness had made an error. As a consequence,the condent witness showed poor calibration (condent regard-less of right or wrong, therefore overcondent), and the uncon-dent witness showed good calibration (condent when right,uncondent when wrong). With this additional information, therewas a switch: more participants believed the central claims of theuncondent witness and participants rated the uncondent one asmore credible than the condent one. One interpretation of theseresults is that participants used the calibration information andgave more credence to the condent statements from the witnesswho was a good judge of her own knowledge rather than the wit-
1 We use calibration as the general term to refer to the condenceaccuracyrelation. Technically, calibration, refers to an absolute measure of that relation. Forexample, for a perfectly calibrated person to be 80% condent that something hadhappened means that for the set of items stated with 80% condence, 80% of thosewould have happened. Calibration allows for the assessment of over- and undercon-dence. Another way to assess the condenceaccuracy relation is by measures ofrelative accuracy (or resolution), like the Goodman-Kruskal gamma correlation(gamma; see Nelson, 1984), which ranges from 1 to 1. Used here, the gammacorrelation provides a measure of peoples ability to detect which items are more
likely to be accurately remembered. We believe that people might use either theabsolute or relative indicia to assess an informants calibration, depending on whainformation is available.,
ntParticipants read competing depositions of condent and cau-tious witnesses. The witnesses make claims of varying condenceabout specic events that are later proven to be true or false.Experiment 1 shows a three-way interaction: the relationship be-tween accuracy and credibility is moderated by condence and cal-ibration. Experiment 2 shows that not all errors hurt a witnessscredibility; rather what matters is whether the witness is justiedin holding the mistaken belief. The presumption of calibrationhypothesis predicts that calibration can be more important tojudgments of credibility than either condence or accuracy alone.(See Table 1 for the designs of experiments.)
As described above, Tenney et al. (2007) showed that whereasbefore an error was revealed participants preferred a high-con-dence witness to a low-condence witness, after an error they pre-ferred a well-calibrated low-condence witness to a poorlycalibrated high-condence one. Thus, the error eliminates the ben-et of high condence. Although the explanation that people as-sessed and used calibration is plausible, there is an alternative:perhaps in the absence of errors people prefer high-condenceinformants but in the presence of errors people prefer informantswho are more modest, or cautious, in their claims overall. When er-rors are made, participants might be willing to rely on any witnesswho was not so clearly overcondent. Experiment 1 was designedto unconfound calibration from cautiousness explanations.
All participants read conicting testimony from one high-con-dence (condent) witness and one lower-condence (in the pres-ent studies called cautious) witness. The condent witnessmakes two high-condence assertions and is shown to be correctabout one and wrong about the other (and, thus, is overcondent).The cautious witness makes one high-condence and one low-con-dence assertion and is correct about one and wrong about theother. The conditions vary in how condence and accuracy for spe-cic pieces of testimony are related for the cautious witness. In theness who was condent indiscriminately. However, an alternativeexplanation, discussed below, is that in the presence of errors, peo-ple prefer informants who are more cautious in their testimony.
Participants changed their minds about which witness they pre-ferred after errors were revealed, demonstrating an interaction be-tween condence and accuracy that did not show up in earlierstudies (e.g., Brewer & Burke, 2002). Why? In earlier studies, thewitnesses statements of condence were about the overall testi-mony in general, not about specic pieces of information thatcould be shown to be correct or not. To be able to assess calibra-tion, people must be able to evaluate whether the two variables(condence and accuracy) vary together. Note that when evidenceof calibration can be used, there may still be main effects of con-dence and accuracy on perceived credibility; but calibration, whichis a function of the condenceaccuracy interaction, provides abetter overall explanation of many effects.
The present studies demonstrate the explanatory power of thepresumption of calibration hypothesis. We use a courtroom con-text because the condenceaccuracy link has important applica-tions there. But the phenomena we examine presumably occurroutinely when executives receive advice from consultants, when
ocial Psychology 44 (2008) 13681375 1369well-calibrated condition, the cautious witness is correct aboutthe high-condence assertion and wrong about the low-condenceassertion (as in Tenney et al., 2007; gamma correlation of 1); in the
nt rnt rnt rdennt rden
nt rnt rnt rden
tal Spoorly-calibrated condition, the cautious witness is correct aboutthe low-condence assertion and wrong about the high-condenceassertion (gamma correlation of 1). In each condition, partici-pants rate the credibility of and choose between the condent wit-ness and either a well- or poorly-calibrated cautious witness.
If cautiousness matters, the two conditions should yield thesame resultsthat is, participants should switch from believingthe condent to the cautious witness because only overall con-dence in the face of errors is important. If calibration based on spe-cic pieces of testimony matters, the two conditions should yielddifferent resultsthat is, participants should switch when the cau-tious witness is well calibrated but not when she is notdemon-strating that the effect of making an error on judgments ofwitness credibility is moderated not by how condent the witnesshad been overall, but by how condent the witness had been in theerror.
ParticipantsSixty-six participants responded to advertisements for a psy-
chology and law study that were posted around a university cam-pus and in a city newspaper (median age = 22, 47 women).Participants were paid $15 and completed an hour of other exper-iments after nishing this one.
Materials and designParticipants were instructed to act as jurors. First, all partici-
pants read the same conicting depositions about a car accident(based on materials by Borckardt, Sprohge, & Nash, 2003; Tenneyet al., 2007). One eyewitness (the condent witness) was highly
Table 1Experimental conditions
Description of witness Time 1 (
Experiment 1 Condent CondeConde
Cautious well-calibrated CondeUncon
Cautious poorly-calibrated CondeUncon
Experiment 2 Condent CondeConde
Cautious well-calibrated CondeUncon
Note. A, B, C, and D refer to non-critical (i.e., collateral) statements made by the wit
1370 E.R. Tenney et al. / Journal of Experimencondent about everythingher memory of two prior activitiesshe had done on the day of the accident (e.g., she was positive thatshe went to the post ofce and took her dog to the veterinarian)and how the accident occurred. The other eyewitness (the cau-tious witness) was highly condent about one prior activity(e.g., she knew for a fact that she went to a meeting at work aboutremodeling) and how the accident occurred, but uncondent abouta second prior activity (e.g., she was not sure whether she hadlunch with a neighbor that day). The witnesses concluded that dif-ferent vehicles were at fault in the accident. Participants then(Time 1) rated each witnesss credibility on a scale from (1) notcredible to (6) credible and made a binary decision in response tothe question, Assume that only one is correct. Which witnesssdeposition do you believe? The order of witnesses and the activi-ties and descriptions of the car accident were counterbalancedacross participants. Participants received no information aboutthe witnesses sex or race.
Next, participants read that lawyers did further research, andrecords from reliable sources either corroborated or disconrmedthe collateral (i.e., peripheral to the case) aspects of the witnessesdepositions. In all conditions, participants learned that both wit-nesses (a) had been right about one activity they had claimed oc-curred on the day of the accident but (b) had been proven wrongabout the second activity they had claimed to have done thatdayrecords revealed that those activities had actually been doneon different days.
Importantly, for some participants (n = 31), the cautious wit-ness showed good calibration: she had been uncondent whenwrong and condent when right. For other participants (n = 35),this witness showed poor calibration: she had been uncondentwhen right and condent when wrong. (In both versions, the con-dent witness also showed poor calibrationcondent regardlessof whether right or wrong.) With this new information, partici-pants re-answered (Time 2) the same questions.
Therefore, we had a 2 (between-subject: cautious witnessshows either good or poor calibration) 2 (within-subject: con-dent versus cautious witness) 2 (within-subject: Time 1 [beforeerror] versus Time 2 [after error]) mixed factorial design.
Results and discussion
Consistent with the presumption of calibration hypothesis, par-ticipants used available calibration information to evaluate wit-nesses; they did not simply trust the witness who was either themost condent or the most cautious. Chi-squares were used toanalyze the participants choice of which witness to believe, andone-way repeated measures analysis of variance, with calibrationas a between subjects variable, was used to compare mean credi-bility ratings. Alpha was set at .05. The pattern of results was notcontingent on the sex of participant, the order the witnesses testi-
re error) Time 2 (after error) Time 3 (after justication)
e: A Correct e: B Incorrect e: C Correct t re: D Incorrect e: C Incorrect t re: D Correct
e: A Correct e: B Incorrect Justiede: C Correct t re: D Incorrect Justied
es. The statements were counterbalanced.
ocial Psychology 44 (2008) 13681375ed, or either of two descriptions of the events surrounding the caraccident.
BelievabilityAt Time 1, when the two conditions (well-calibrated and
poorly-calibrated) were identical, more participants believed thecondent than the cautious witness. (See Fig. 1.)
However, at Time 2, after the errors were revealed, most sidedwith the cautious witness when that witness was well calibrated,but not when that witness was poorly calibrated. The Time 1Time2 increase for the cautious witness was statistically signicantwhen the witness was well calibrated, X2(1, N = 31) = 16.96,p < .001, 2 = .54, but not when the witness was poorly calibrated,X2(1, N = 35) = .54, ns, 2 = .015.
CredibilityThe pattern of credibility judgments was the same as believabil-
ity choices and differed across calibration conditions (three-wayinteraction: F(1,64) = 8.65, p = .005. See Fig. 2).
tal Social Psychology 44 (2008) 13681375 1371100
E.R. Tenney et al. / Journal of ExperimenAt Time 1, consistent with previous literature, the condentwitness was more credible than the cautious witness,F(1,64) = 6.05, p = .02, d = .43. There was no difference betweenconditions (i.e., no interaction, F(1,64) = .73, ns).
However, after learning about the error at Time 2, the patterndiffered across conditions. When the cautious witness was wellcalibrated, she was now rated as signicantly more credible thanthe condent witness. The top panel of Fig. 2 shows the signicantTime1Time2 interaction, F(1,30) = 19.76, p < .001, d = .99. On theother hand, when the cautious witness was poorly calibrated (bot-tom panel), the credibility of both witnesses dropped by about thesame amount (i.e., the Time1Time2 interaction was not signi-cant, F(1,34) = 2.34, ns, d = .36).
SummaryThe three-way interaction cannot be explained by condence or
accuracy alone, or even by a simple interaction that people prefercondent informants when no errors are made but cautious infor-mants when errors are made. Participants cannot have been rely-ing on average levels of condence for the two cautiouswitnesses. Rather, participants must have been assessing calibra-tionhow condent informants are in the specic pieces of (accu-rate or erroneous) testimonyin order to generate these results.
Time 1Before Error
Time 2After Error
Fig. 1. Participants choice of which witness to believe at Time 1 (before collateralerrors) and Time 2 (after collateral errors) in Experiment 1. Top panel shows cautiouswell-calibrated witness; bottom panel shows cautious poorly-calibrated witness.2
Experiment 1 showed that a high-condence error was detri-mental to overall credibility (whereas a low-condence error wasnot). But what if there is a good reason for a high-condence error?A high-condence error that is not evidence of poor calibrationmight not affect perceived credibility.
The Experiment 2 vignettes begin like the well-calibrated con-dition of Experiment 1: a condent witness (Time 1) is shown tobe in error (Time 2) about a high-condence assertion (the identi-cation of a passenger), whereas a cautious witness is uncondentabout the same error. Then, at Time 3, a justication for the error isgiven: the passenger had an identical twin.2 The presumption of
Time 1Before Error
Time 2After Error
Fig. 2. Mean credibility ratings (16) with standard error bars at Time 1 (beforecollateral errors) and Time 2 (after collateral errors) in Experiment 1. Top paneshows cautious well-calibrated witness; bottom panel shows cautious poorly-calibrated witness.
2 We recognize that the mistaken identication of an identical twin sounds likesomething from a television show; however, the goal of the manipulation was theorytesting, not simulating an actual trial (Mook, 1983). We sought an experimentamanipulation that would create a justiable error without also changing the meaningor perceived severity of the offense itself, as might occur with more common realisticexamples (e.g., a witness was misinformed, tricked, or lied to).l
ed passenger admitted that he had an identical twin who wasfriends with one of the drivers and did not have an alibi. Again, par-ticipants re-rated the credibility and character of each witness.
Finally, as a manipulation check, participants rated how con-dent they thought each witness was in general from (1) not con-dent to (6) condent.
Results and discussion
Results provide further evidence that people rely on calibrationinformation when available rather than merely on statements ofcondence or the presence of errors. Analyses were similar toExperiment 1 with one-way repeated measures analysis of vari-ance used to compare mean ratings of honesty and liking (in addi-tion to credibility). Three participants did not choose whichwitness they believed at Time 1 and/or Time 3, but their otherjudgments were retained in analyses. One of these participantsmade no Time 3 judgments.
Condence manipulation checkThe condence manipulation worked. Participants rated the
cautious witness (M = 3.6, SD = 1.1) as signicantly less condentthan the condent witness (M = 5.6, SD = .73), F(1,103) = 210.1,p < .001, d = 2.2.
tal Social Psychology 44 (2008) 13681375calibration hypothesis predicts that, as in Experiment 1, at Time 1,when there is no information to assess calibration, the majority ofparticipants will initially side with the condent witness whereasat Time 2, after they learn about the errors, they will switch to sidewith the cautious witness. But what should happen at Time 3 whenthe error is justied?
An error that is justied is still technically an error in the abso-lute sense. If participants care about absolute errors, then theyshould continue to side with the cautious witness who had beenuncondent in the error. However, if errors are important (at leastin part) because of what they reveal about calibration, then learn-ing that a high-condence error is justiedand therefore notindicative of poor calibrationshould restore the previously lostcredibility of the condent witness. Thus, at Time 3, participantsshould switch back to side with the condent witness.
ParticipantsOne hundred ve undergraduates at the University of Virginia
(median age = 18; 81 women; 1 not reported) completed this 20-min experiment for course credit.
Materials and designParticipants were instructed to act as jurors, and they judged
the credibility and character of two opposing eyewitnesses to acar accident three times. Time 1 and Time 2 were similar to Exper-iment 1, but with slightly different facts and additional dependentvariables. Participants read that two witnesses had been asked todescribe a car accident and identify a passenger in one of the vehi-cles. (The passenger was said to be suspected of shoplifting, but theshoplifting case would be tried at a later time.) As in Experiment 1,the structure of the two witnesses depositions was similar, butthey differed in how condent they were in collateral aspects oftheir testimony. The condent witness was highly condent inhis memory of all aspects of his testimonythe accident itself,the weather on the day of the accident, and what the passengerlooked like. He condently identied a man sitting in the thirdrow of the courtroom as the passenger. The cautious witness wascondent about how the accident occurred and the weather, butnot about his identication of the passenger. He acknowledgedthat he was not sure that he had gotten a good look at the passen-ger. He identied the same man in the third row, but stated that hewas not condent in his identication. Unlike in Experiment 1, thesex of the witnesses was specied as male. At Time 1, participantsrated the credibility, likeability, and honesty of each witness onthree different scales from (1) not credible/not likeable/not honestto (6) credible/likeable/honest. Likeability and honesty ratings wereincluded to make sure that our two witnesses started out aboutequal on those traits despite the difference in condence (e.g., thatpeople did not nd the condent witness too arrogant or the cau-tious witness too meek). Participants also made a binary decisionas to which witness they believed.
At Time 2, participants learned that the weather on the day ofthe accident was as both witnesses described, but that both wit-nesses had mis-identied the passenger (who had been away onbusiness). Thus, as in Experiment 1, both witnesses were rightabout one fact and wrong about another. The condent witnesswas wrong about something with high condence (thus poorly cal-ibrated through overcondence), and the cautious witness wasright about something with high condence and wrong aboutsomething with low condence (thus well calibrated with a gam-ma correlation of 1). With this new information, participants re-
1372 E.R. Tenney et al. / Journal of Experimenrated each witness as at Time 1.At Time 3, participants read that there was a good reason for the
mistaken identication. After further questioning, the mis-identi-BelievabilityParticipants decisions about which witness they believed chan-
ged over time according to the predicted pattern, X2(2,N = 105) = 82.54, p < .001. (See Fig. 3.) At Time 1, more participantssided with the condent witness than the cautious witness. How-ever, at Time 2, after the error was revealed, most participantssided with the cautious witness. This change replicates the well-calibrated condition in Experiment 1. At Time 3, after the errorwas justied, most participants sided with the condent witnessonce again.
CredibilityThe pattern of results for credibility ratings was the same as for
believability. Fig. 4, top panel, shows the signicant interaction,
Time 1Before Error
Time 2After Error
Time 3Justified Error
CautiousFig. 3. Participants choice of which witness to believe at Time 1 (before collateralerrors), Time 2 (after collateral errors), and Time 3 (after collateral errors arejustied) in Experiment 2.
As shown in Fig. 4, honesty and likeability ratings show an over-all similar pattern to each other and to credibility ratings. One dif-
Fig. 4. Mean credibility, honesty, and likeability ratings (16) with standard errorbars at Time 1 (before collateral errors), Time 2 (after collateral errors), and Time 3
tal Social Psychology 44 (2008) 13681375 1373ference is that at Time 1, credibility ratings are higher for thecondent witness whereas honesty ratings are slightly higher forthe cautious witness (F(1,104) = 5.8, p = .02, d = .24). Perhaps par-ticipants believe that someone who discloses the imperfectnessof her memory is exhibiting honesty.
At Time 2, similar to credibility ratings, honesty and likeabilityratings for the condent witness plummet whereas those for thecautious witness remain relatively unchanged (Time 1Time 2interaction for honesty, F(1,104) = 102.52, p < .001, d = .1.4; forlikeability: F(1,104) = 48.25, p < .001, d = .65).
At Time 3, similar to credibility ratings, honesty and likeabilityratings for the condent witness return to approximately baselinelevels (although now the difference between the condent andcautious witnesses is not signicant for honesty, F(1,103) = .87,ns, d = .12, but is for likeability, F(1,103) = 4.89, p = .03, d = .22).
Honesty and likeability ratings correlated with credibility rat-ings (mean r across times = .58 for honesty and .55 for likeability,ps 6 .01.).
SummaryThe measures all show that a witness who makes an error can
be forgiven for it if the error did not indicate that the witnesswas poorly calibrated. Even though at Time 2 most participantsturned against the condent witness who made an error anddeemed that witness less credible, honest, and likeable, when theerror was explained at Time 3, most participants reconsidered,and the damaging effect of making an error almost disappeared.Thus, the results of Experiment 2 converge with those of Experi-ment 1 to demonstrate that an error might not be damaging toan informants perceived credibility if it does not reveal that theperson is poorly calibrated.
Usually expressing condence enhances a persons credibilityand making a mistake damages itbut not always. Results fromthe current experiments help explain when and why these effectsoccur: condence and errors may help or hinder an informantscredibility in their own right, but their effects will be qualiedwhen, together, they affect perceptions of an informantscalibration.
Experiment 1 showed that the damaging effect of being provedwrong was moderated not by whether a witness had been con-dent overall, but by whether a witness had been condent in thespecic error (Experiment 1). Experiment 2 showed that the dam-aging effect of being proved wrong can disappear if a witnesss cal-indicating that the effect of making and justifying an error on rat-ings of witness credibility were moderated by witness condenceF(2,102) = 94.4, p < .001. At Time 1, the condent witness wasrated as more credible, F(1,104) = 3.81, p = .054, d = .21, whereasat Time 2, the cautious witness was rated as more credible,F(1,104) = 110.0, p < .001, d = 1.4. Then, at Time 3, the condentwitness was once again rated as more credible, F(1,103) = 9.46,p = .003, d = .35. This pattern is predicted by the presumption ofcalibration hypothesis: a justied error is not evidence of poor cal-ibration; thus, at Time 3 participants will assign greater credibilityto the condent witnessbecause they have no information tocontradict the presumption that he is well calibrated.
Honesty and likeability
E.R. Tenney et al. / Journal of Experimenibration is not called into question by that error. If it turns out thata witness erred about something she cannot be expected to know,her credibility might remain intact. Thus, calibration appeared to
(after collateral errors are justied) in Experiment 2.
matter more than overall condence or technical errors: con-dence was only benecial when seen as appropriately linked tothe likelihood of event outcomes, and errors were only damagingwhen they indicated that someone was an unreliable judge ofher own knowledge.
1374 E.R. Tenney et al. / Journal of Experimental SOne potential criticism of these experiments is that participantsmight be switching which witness they sided with from Time 1 toTime 2 because of demand characteristics. Perhaps participants feltcompelled to change their responses because they thought thatsince the transcript presented new information, they were ex-pected to change their initial inclinations. The signicant crossoverinteraction found in Experiment 1 when the cautious witness waswell calibrated but not when poorly calibrated supports our inter-pretation that whether a witness was calibrated, and not demandcharacteristics, is driving the effects.3
A second potential criticism is that all of the action is on thecondent witness; perhaps jurors become dissatised with a (con-dent) witness who is poorly calibrated and side with the cautiouswitness not as a preference for good calibration, but because it isthe only other option. As demonstrated in Experiment 1, however,after the errors were revealed the majority of participants sidedwith the cautious witness only when that witness was well cali-bratedsuggesting that participants were not simply sidingagainst the condent, poorly-calibrated person indiscriminately.Additionally, although credibility judgments for cautious witnessesremained relatively steady in Experiment 2 and in the well-cali-brated condition of Experiment 1, ratings of credibility droppedoff for the cautious witness just as much as for the condent wit-ness in the poorly-calibrated condition.
The results provide support for the presumption of calibrationhypothesis, which posits that when there is a dearth of informa-tion, people presume that others are well calibrated. This presump-tion could account for why condence appears to be important:often actual calibration information is not available and condentstatements are given the benet of the doubt. However, when peo-ple get new information about an informants actual calibration,they will update their perceptions of the informants credibility.People will then prefer the condent statements from a well-cali-brated informant regardless of that informants overall level ofcondence or accuracy. Thus, informants might be inuential notdepending on condence alone or even on whether they havemade a mistake, but ultimately depending on how well calibratedthey appear and whether others believe that their statements ofcondence are warranted.
An important question that we have so far nessed is howstrong the condenceaccuracy relation really is. Many researchersin the psychology and law domain conclude that the condenceaccuracy relationship is weak overall and worry about over reli-ance on it; however, there are known factors that affect the rela-tionship. For example, when a lineup administrator knows whichperson is the suspect, when post-identication feedback is givenbefore the witness states her condence, or when the witness isrepeatedly questioned, then condence is articially inatedthereby decreasing the condenceaccuracy relationship (seeSporer et al., 1995 and Wells, Memon, & Penrod, 2006 for reviews).Outside of eyewitness-related studies, researchers have deter-mined that individual differences (e.g., tending to be over- or un-der-condent), the type of task (e.g., whether information about
3 In another test of demand characteristics, 38 additional participants wereassigned to complete Experiment 2 materials but make no Time 1 judgments; thus,the demand for a given response in later judgments was reduced. As in Experiment 2,after the error was revealed, most participants sided with the cautious witness, but
after the error was justied, most participants sided with the condent witness; thisswitch was statistically signicant, X2 (1, N = 36) = 39.46, p < .001, 2 = 1.0. Thereplication suggests that demand characteristics were not driving the effects.accuracy is obtainable), and the domain of expertise in question(e.g., people are more accurate judging whether they know sciencethan judging whether they know business/law) determine whethercondence is diagnostic (Ackerman, Beier, & Bowen, 2002; Fischer& Budescu, 2005; Soll, 1996; Zarnoth & Sniezek, 1997). Regardlessof whether people should use condence as a proxy for accuracy, itis important to know if and when they do use it (see Yates et al.,1996, on the importance of studying the habits of consumers ofprobability judgments).
We have demonstrated that people who show that they aregood judges of their own knowledge are rewarded, and thosewho are overcondent in what they know lose face. Althoughwe did not test the effects of calibration of personality attri-butes, the presumption of calibration hypothesis could explainwhy self-enhancers, who outwardly exhibit condence in them-selves and their abilities over and above what is warranted, ini-tially make great rst impressions but later become lesspopular (Paulhus, 1998; Robins & Beer, 2001). People may ini-tially be taken with someone who is highly self-condent ifpeople assume high condence has some basis in reality. How-ever, when people receive concrete evidence that someoneelses self-condence is no indication of that persons abilityor performance, that person may appear less credible, likeable,and honest than someone who is better calibrated. Perhaps peo-ple nd out self-enhancers are poorly calibrated and alter theirimpressions accordingly. The benets from exhibiting good cal-ibration, and the damages from poor calibration, might be farreaching.
We thank Natasha Olinger, Daniel Kuckuck, Steve Yang, andAlexander West for their assistance in data collection. We thankReid Hastie for thoughtful discussions, and Vikram Jaswal and Tim-othy Wilson for helpful comments on an earlier draft of the man-uscript. This research was part of a masters thesis submitted bythe rst author under the supervision of the second author to theDepartment of Psychology at the University of Virginia. It was sup-ported in part by the National Science Foundation under Grant0437238 to the second author.
Ackerman, P. L., Beier, M. E., & Bowen, K. R. (2002). What we really know about ourabilities and our knowledge. Personality and Individual Differences, 33, 587605.
Berman, G. L., & Cutler, B. L. (1996). Effects of inconsistencies in eyewitnesstestimony on mock-juror decision making. Journal of Applied Psychology, 81,170177.
Borckardt, J. J., Sprohge, E., & Nash, M. (2003). Effects of the inclusion and refutationof peripheral details on eyewitness credibility. Journal of Applied SocialPsychology, 33, 21872197.
Bradeld, A. L., & Wells, G. L. (2000). The perceived validity of eyewitnessidentication testimony: A test of the ve Biggers criteria. Law and HumanBehavior, 24, 581594.
Brewer, N., & Burke, A. (2002). Effects of testimonial inconsistencies and eyewitnesscondence on mock-juror judgments. Law and Human Behavior, 26, 353364.
Curley, S. P., Yates, J. F., & Abrams, R. A. (1986). Psychological sources of ambiguityavoidance. Organizational Behavior and Human Decision Processes, 38, 230256.
Fischer, I., & Budescu, D. V. (2005). When do those who know more also know moreabout how much they know? The development of condence and performancein categorical decision tasks. Organizational Behavior and Human DecisionProcesses, 98, 3953.
Keren, G., & Teigen, K. H. (2001). Why is p = 90 better than p = .70? Preference fordenitive predictions by lay consumers of probability judgments. PsychonomicBulletin and Review, 8, 191202.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: Thestate of the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgmentunder uncertainty heuristics and biases (pp. 306334). New York, NY: CambridgeUniversity Press.
Loewenstein, G. (1994). The psychology of curiosity: A review and reinterpretation.
ocial Psychology 44 (2008) 13681375Psychological Bulletin, 116, 7598.Mook, D. G. (1983). In defense of external invalidity. American Psychologist, 38,
Nelson, T. O. (1984). A comparison of current measures of feeling-of-knowingaccuracy. Psychological Bulletin, 95, 109133.
Paulhus, D. L. (1998). Interpersonal and intrapsychic adaptiveness of trait self-enhancement: A mixed blessing? Journal of Personality and Social Psychology, 74,11971208.
Penrod, S. D., & Cutler, B. L. (1995). Witness condence and witness accuracy:Assessing their forensic relation. Psychology, Public Policy, and Law, 1, 817845.
Price, P. C., & Stone, E. R. (2004). Intuitive evaluation of likelihood judgmentproducers: Evidence for a condence heuristic. Journal of Behavioral DecisionMaking, 17, 3957.
Robins, R. W., & Beer, J. S. (2001). Positive illusions about the self: Short-termbenets and long-term costs. Journal of Personality and Social Psychology, 80,340352.
Soll, J. B. (1996). Determinants of overcondence and miscalibration: The roles ofrandom error and ecological structure. Organizational Behavior and HumanDecision Processes, 65, 117137.
Sporer, S. L., Penrod, S., Read, D., & Cutler, B. (1995). Choosing, condence, andaccuracy: A meta-analysis of the condenceaccuracy relation in eyewitnessidentication studies. Psychological Bulletin, 118, 315327.
Tenney, E. R., MacCoun, R. J., Spellman, B. A., & Hastie, R. (2007). Calibrationtrumps condence as a basis for witness credibility. Psychological Science, 18,4650.
Thomas, J. P., & McFadyen, R. G. (1995). The condence heuristic: A game theoreticanalysis. Journal of Economic Psychology, 16, 97113.
Van Swol, L. M., & Sniezek, J. A. (2005). Factors affecting the acceptance of expertadvice. British Journal of Social Psychology, 44, 443461.
Wells, G. L., Memon, A., & Penrod, S. D. (2006). Eyewitness evidence: Improving itsprobative value. Psychological Science in the Public Interest, 7, 4575.
Yates, J. F., Price, P. C., Lee, J., & Ramirez, J. (1996). Good probabilistic forecasters:The consumers perspective. International Journal of Forecasting, 12, 4156.
Zarnoth, P., & Sniezek, J. A. (1997). The social inuence of condence in groupdecision making. Journal of Experimental Social Psychology, 33, 345366.
E.R. Tenney et al. / Journal of Experimental Social Psychology 44 (2008) 13681375 1375
The benefits of knowing what you know (and what you don " t): How calibration affects credibilityIntroductionThe importance of confidence and of accuracyThe importance of calibrationSupport for the presumption of calibration hypothesisPresent studies
Experiment 1MethodParticipantsMaterials and design
Results and discussionBelievabilityCredibilitySummary
Experiment 2MethodParticipantsMaterials and design
Results and discussionConfidence manipulation checkBelievabilityCredibilityHonesty and likeabilitySummary