cpb-us-e1.wpmucdn.com · web viewconducting these regressions using ordered probit/logit returns...

Behavioral Consequences of Probabilistic Precision: Experimental Evidence from National Security Professionals

Supporting Material

September 2017

Overview

This supplement contains the following material:

§A. Respondent demographics

Supplementary analysis of Survey Experiment 1: How decision makers interpret probabilities

§B. Response measure wordings§C. Variation in probability assessments across scenario versions§D. T-tests for Elite Sample B§E. Multivariate analysis of scenario data

Supplementary analysis of Survey Experiment 2: How analysis estimate probabilities

§F. Question list for Survey Experiment 2 §G. Distributions of assessments across question types §H. Additional rescoring of probability assessments§I. Full results for Figure 4.§J. Interactions between quantitative assessments and respondent attributes §K. Replication of Survey Experiment 2 results using logarithmic scoring

For more information, please contact [email protected]. This experiment was preregistered at Evidence in Governance and Politics and approved by Committees for the Protection of Human Subjects at Dartmouth College, Harvard University, and participating military institutions.

1

mailto:[email protected]

§A. Respondent demographics

The paper presents survey experiment results from three samples of respondents. National Security Officials took both Survey Experiment 1 and Survey Experiment 2 (in random order). Amazon Mechanical Turk (AMT) respondents were randomly assigned to take only one of the two surveys. We report demographics for both AMT populations separately. Elite Sample B responded to an abridged version of Survey Experiment 1, containing only the neutral hostage rescue scenario. Due to concerns for protecting respondent anonymity, we were able to ask fewer demographic questions of elite sample respondents than of AMT respondents.

The Berlin adaptive numeracy test (Cokely et al. 2012) divides respondents into four categories based on their responses to 2-3 word problems (the choice of problems depends on a respondent’s answers). Previous research with nationally-representative samples generally divides responses into four, equally-sized categories, in which higher scores indicate higher numeracy. We report the percentage of respondents falling into each of these categories below.

Table S1. Respondent demographics

National Security Officials (N=208)

AMT Respondents, Survey 1(N=1,458)

AMT Respondents, Survey 2 (N=1,561)

Elite Sample B (N=199)

% Female 15% 52% 52% -% White 82% 80% 81% -% College degree 100% 61% 61% 100%% U.S. citizen 87% 99% 99% 86%% English as first language 85% 98% 98% 86%% Active-duty military 75% 0.8% 0.4% 78%Age - 34.7 (11.1) 34.6 (11.3) -% Numeracy-1 29% 40% 42% -% Numeracy-2 29% 21% 19% -% Numeracy-3 11% 17% 18% -% Numeracy-4 31% 22% 21% -

2

§B. Response measure wordings

Following each vignette, respondents were presented with the following questions.

On the following scale of 1-7, please rate your level of support for [approving the hostage rescue mission immediately / approving the proposed drone strike immediately / backing Ghamay Jan]

On the following scale of 1-7, please rate your level of support for [delaying decision 1-2 weeks to gather additional information about the compound / waiting another 1-2 days to gather more information about who might be meeting in this location / waiting another 6 months to gather more information about whether backing Ghamay Jan is a better option than backing local officials]

On the following scale of 1-7, please rate your level of confidence in making these assessments

3

§C. Variation in probability assessments across scenario versions

As described in the main text, we presented each scenario in one of three versions. These versions – which we label “pessimistic,” “neutral,” and “optimistic,” below – involved different probability assessments intended to shape respondents’ views about supporting risky actions. As shown in the main text (Table 1), these variations systematically influenced respondents’ views. Table S2 shows how we varied probability assessments across scenario versions.

Table S2. Variations in Probability Assessments Across Scenario Versions

Assessment Probability Assessments Across Scenario VersionsPessimistic Neutral Optimistic

Hostage Rescue Scenario

Hostages at compound Even Chance (50 percent)

Likely (65 percent)

Very Likely (80 percent)

Special forces can retrieve hostages Likely (65 percent)


Almost Certain (95 percent)

Soldiers wounded on mission Likely (65 percent)

Even Chance (50 percent)

Unlikely (35 percent)

Collateral damage Even Chance (50 percent)


Very Unlikely (20 percent)

Hostages killed if mission fails Almost Certain (95 percent)



Drone Strike Scenario

House contains Qaeda leaders Even Chance (50 percent)

Likely (65 percent)


Drone strike kills occupants Very Likely (80 percent)

Very likely (80 percent)


House contains women/children Likely (65 percent)


Remote Chance (5 percent)

Strike compromises surveillance Almost Certain (95 percent)



Local Security Forces Scenario

Jan can mobilize forces Even Chance (50 percent)

Likely (65 percent)


Jan’s forces can secure border Unlikely (35 percent)

Likely (65 percent)


Jan previously assisted Taliban Very likely (80 percent)


Very unlikely (20 percent)

Jan will secure illegal smuggling Almost Certain (95 percent)


Likely (65 percent)

U.S. can retain local leaders’ support Very Unlikely (20 percent)



4

§D. T-tests for Elite Sample B

Table S3 presents two-way t-tests analyzing responses to the neutral hostage scenario for Elite Sample B. These results support the analyses presented in Section 3: quantifying probability assessments reduces support for risky action, increases support for gathering additional information, and had no significant impact on respondents’ confidence levels.

Table S3. Survey results from Elite Sample B

Support for hostage rescue (1-7 scale)

Support for delaying decision (1-7 scale)

Confidence in assessment (1-7 scale)

Qualitative assessments 5.33 (1.56) 3.14 (1.97) 5.18 (1.19)Quantitative assessments 4.53 (1.86) 4.11 (2.07) 5.13 (1.26)

p=0.001 p=0.001 p=0.793

§E. Multivariate analysis of scenario data

The paper’s main text analyzed responses to Survey Experiment 1 using bivariate regressions. Tables S4a/b replicates those results in multivariate form. All models are ordinary least squares with fixed effects for respondent and standard errors clustered by respondent. Conducting these regressions using ordered probit/logit returns similar results.

5

Table S4a. Responses to Scenarios – National Security Officials

Model 1:Predicting support

for risky action

Model 2:Predicting support for

risky action, with interaction terms


delaying action

Quantitative assessment -0.142 (.14) -0.346 (.25) 0.316 (.15)*

Optimistic scenario 0.783 (.17)*** 0.677 (.23)** -0.656 (.18)***

Pessimistic scenario -0.982 (.16)*** -1.146 (.22)*** 0.433 (.16)**

Hostage scenario 1.148 (.16)*** 1.152 (.16)*** -0.227 (.18)Drone scenario -0.617 (.14)*** -0.619 (.14)*** 1.120 (.16)***

Numeracy -0.157 (.06)** -0.155 (.06)** 0.102 (.06)Female -0.257 (.21) -0.261 (.21) 0.129 (.22)Military officer 0.185 (.17) 0.181 (.17) -0.187 (.16)U.S. citizen 0.096 (.35) 0.078 (.34) -0.132 (.36)English as native lang. 0.143 (.34) 0.145 (.34) -0.102 (.37)

Optimistic scenario x Quantitative assessment

0.232 (.33)

Pessimistic scenario x Quantitative assessment

0.364 (.31)

Constant 3.739 (.29)*** 3.841 (.31)*** 4.575 (.30)***

R2 0.298 0.300 0.165

Ordinary least squares regressions predicting 7-point response measures with fixed effects for respondent and standard errors clustered by respondent. * p<0.05 ** p<0.01 *** p<0.001. All models have 624 observations.

6

Table S4b. Responses to Scenarios – AMT Respondents

Model 1:Predicting support

for risky action


risky action, with interaction terms


delaying action

Quantitative assessment -0.429 (.05)*** -0.759 (.09)*** 0.378 (.06)***

Optimistic scenario 0.791 (.06)*** 0.479 (.08)*** -0.518 (.07)***

Pessimistic scenario -0.813 (.06)*** -0.972 (.09)*** 0.222 (.07)***

Hostage scenario 1.501 (.06)*** 1.505 (.06)*** -0.327 (.06)***

Drone scenario 0.329 (.06)*** 0.327 (.06)*** 0.450 (.06)***

Numeracy -0.117 (.02)*** -0.116 (.02)*** 0.056 (.03)Female 0.002 (.05) -0.006 (.05) 0.291 (.06)***

Military officer 0.373 (.25) 0.343 (.24) -0.566 (.41)U.S. citizen 0.076 (.27) 0.067 (.27) -0.325 (.26)English as native lang. -0.250 (.24) -0.242 (.24) -0.358 (.25)Education 0.069 (.04) 0.070 (.04) 0.007 (.05)

Optimistic scenario x Quantitative assessment

0.650 (.12)***

Pessimistic scenario x Quantitative assessment

0.325 (.12)**

Constant 3.650 (.35)*** 3.812 (.35)*** 5.256 (.30)***

R2 0.260 0.265 0.077

Ordinary least squares regressions predicting 7-point response measures with fixed effects for respondent and standard errors clustered by respondent. * p<0.05 ** p<0.01 *** p<0.001. All models have 4,386 observations.

7

§F. Question list for Survey Experiment 2

Table S5 presents the question list for survey experiment 2. Thirty of those questions had known answers as of the date these surveys were administered (August 5-7, 2015). Five questions, labeled “unknown,” had answers that were unknowable at the time. Five questions, labeled “forecast,” involve predictions, were evaluated on February 6, 2016.

As described in the paper, we varied question types in this way so that we could assess whether or not any biases we identified in our analysis were confined to a particular kind of estimate. As shown in part F of this supplement, below, we found that respondents who provided numeric estimates employed noticeably greater certitude than respondents who used words of estimative probability across all three question types.

Table S5. Question List for Survey Experiment 2All assessments recorded between August 5-7, 2015; outcomes are coded relative to that date.

Question Text Outcome Q1 In your opinion, what are the chances that Afghanistan's literacy rate is currently

above 50 percent?0

Q2 In your opinion, what are the chances that Saudi Arabia currently exports more oil than all other countries in the world combined?

0

Q3 In your opinion, what are the chances that the United States currently has a longer life expectancy than Jamaica?

1

Q4 In your opinion, what are the chances that the United States currently operates a military base in Ethiopia?

0

Q5 In your opinion, what are the chances that the United States has an active territorial claim in Antarctica?

0

Q6 In your opinion, what are the chances that France currently has more soldiers stationed in Afghanistan than any NATO member besides the United States?

0

Q7 In your opinion, what are the chances that more than 20 countries currently operate nuclear power plants?

1

Q8 In your opinion, what are the chances that Japan is currently a member of the International Whaling Commission?

1

Q9 In your opinion, what are the chances that Russia is a member of the Nuclear Nonproliferation Treaty?

1

Q10 In your opinion, what are the chances that the United States currently has free trade agreements in place with fewer than 30 countries?

1

Q11 In your opinion, what are the chances that fewer than 80 countries currently recognize Taiwan's independence from China?

1

Q12 In your opinion, what are the chances that ISIS draws more foreign fighters from Egypt than from any other country outside of Iraq and Syria?

0

Q13 In your opinion, what are the chances that Russia's economy grew in 2014? 0Q14 In your opinion, what are the chances that Haiti has the lowest per capita income of

any Latin American country?1

Q15 In your opinion, what are the chances that there are currently more Muslims in the world than there are Roman Catholics?

1

Q16 In your opinion, what are the chances that Sweden is a member of NATO? 0Q17 In your opinion, what are the chances that Tokyo's stock exchange is the second

largest stock exchange in the world?1

8

Q18 In your opinion, what are the chances that the U.S. State Department currently lists Iran as a state sponsor of terrorism?

1

Q19 In your opinion, what are the chances that the Arabic media organization al-Jazeera currently operates bureaus in more countries than does CNN?

1

Q20 In your opinion, what are the chances that the United States currently possesses more than 2,000 nuclear warheads?

1

Q21 In your opinion, what are the chances that the economy of North Korea is larger than the economy of New Hampshire?

0

Q22 In your opinion, what are the chances that German President Angela Merkel is currently the longest-serving head of government in Western Europe?

1

Q23 In your opinion, what are the chances that there are currently more refugees living in Lebanon than in any other country in the world?

0

Q24 In your opinion, what are the chances that the United States currently conducts more trade with Mexico than with the European Union?

1

Q25 In your opinion, what are the chances that the largest U.S. Embassy is currently located in Beijing?

0

Q26 In your opinion, what are the chances that the U.S. defense budget is more than five times as large as China's defense budget?

0

Q27 In your opinion, what are the chances that the United States currently operates more aircraft carriers than all other countries in the world combined?

0

Q28 In your opinion, what are the chances that Israel receives more foreign aid than any other country in the world?

0

Q29 In your opinion, what are the chances that more than 3 million people live within the borders of the Palestinian territories of the West Bank and Gaza?

1

Q30 In your opinion, what are the chances that more than 5,000 people died as a result of the Ebola outbreak in West Africa in 2014?

1

Q31 In your opinion, what are the chances that Ukrainian rebels knew that Malaysia Airlines Flight 17 was a civilian aircraft before they shot it down?

Unknown

Q32 In your opinion, what are the chances that Iran's Supreme Leader, Ayatollah Khamenei, currently intends to develop a nuclear weapon?

Unknown

Q33 In your opinion, what are the chances that the United States would have invaded Iraq if the September 11 terrorist attacks had not occurred?

Unknown

Q34 In your opinion, what are the chances that without waterboarding captured terrorists, there would have been at least one more major terrorist attack (>1,000 casualties) on U.S. soil since 2001?

Unknown

Q35 In your opinion, what are the chances that high-ranking members of Pakistan's intelligence services knew that Osama bin Laden was hiding in Abbottabad?

Unknown

Q36 In your opinion, what are the chances that within the next six months, Syrian President Bashar al-Assad will be killed or no longer living in Syria?

Forecast: 0

Q37 In your opinion, what are the chances that within the next six months, the Iraqi Security Forces will reclaim control of either Ramadi or Mosul (or both) from ISIS?

Forecast: 1

Q38 In your opinion, what are the chances that there will be a new Pope within the next six months?

Forecast: 0

Q39 In your opinion, what are the chances that more than 50,000 U.S. citizens will travel to Cuba within the next six months?

Forecast: 1

Q40 In your opinion, what are the chances that more than 10 U.S. soldiers will be killed fighting ISIS within the next six months?

Forecast: 0

9

§G. Distributions of assessments across question types

Table 2 in the paper’s main text demonstrated that respondents made probability assessments with noticeably greater certitude when using quantitative as opposed to qualitative expressions. Those data only included questions with knowable answers (i.e., the questions for which we scored respondents’ performance).

Tables S6a and S6b show how the same pattern holds when we examine probability assessments that respondents made in response to forecasts or to questions with answers that were unknowable at the time. Statistical significance estimated via two-way t-tests.

Table S6a. Proportion of estimates registered in different segments of the number line (National Security Officials)

Remote(0.00-0.14)

Very Unlikely

(0.15-0.28)

Unlikely(0.29-0.42)

Even Chance

(0.43-0.56)

Likely(0.57-0.71)

Very Likely

(0.72-0.85)

Almost Certain

(0.86-1.00)

Questions with knowable answersQualitative

assessments4.51 7.10 16.67 12.87 25.28 20.34 13.24

Quantitative assessments

15.73 8.73 8.60 15.60 8.87 16.33 26.13

p=0.00 p=0.02 p=0.00 p=0.00 p=0.00 p=0.00 p=0.00

Questions about unknown states of the worldQualitative

assessments7.59 11.67 19.26 14.44 19.63 17.96 9.44


20.20 11.40 8.00 13.80 9.00 14.20 23.40

p=0.00 p=0.89 p=0.00 p=0.77 p=0.10 p=0.00 p=0.00

ForecastsQualitative

assessments12.04 17.41 26.67 15.56 16.30 8.89 3.15


32.40 16.00 11.60 15.60 6.20 9.40 8.80

p=0.00 p=0.54 p=0.00 p=0.98 p=0.78 p=0.00 p=0.00

10

Table S6b. Proportion of estimates registered in different segments of the number line (AMT Respondents)

Remote(0.00-0.14)

Very Unlikely

(0.15-0.28)

Unlikely(0.29-0.42)

Even Chance

(0.43-0.56)

Likely(0.57-0.71)

Very Likely

(0.72-0.85)

Almost Certain

(0.86-1.00)

Questions with knowable answersQualitative

assessments1.65 4.94 15.66 19.75 29.13 19.30 9.57


10.90 9.98 10.64 18.12 13.66 16.12 20.58

p=0.00 p=0.00 p=0.00 p=0.00 p=0.00 p=0.00 p=0.00

Questions about unknown states of the worldQualitative

assessments4.61 8.82 15.34 18.73 24.89 18.70 8.92


18.49 10.96 9.49 14.78 11.35 14.21 20.71

p=0.00 p=0.00 p=0.00 p=0.00 p=0.00 p=0.00 p=0.00

ForecastsQualitative

assessments6.73 13.81 24.84 21.48 16.97 10.19 5.99


29.24 14.78 11.45 16.35 8.30 8.25 11.63

p=0.00 p=0.22 p=0.00 p=0.00 p=0.00 p=0.00 p=0.00

11

§H. Additional rescoring of probability assessments

In the paper’s main text, we develop a method for scoring the accuracy of qualitative and quantitative probability assessments in equivalent terms. First, we assign each quantitative estimate to a segment of the number line corresponding to each “word of estimative probability.” Then, for every question in the data set, we calculated the mean of numeric estimates falling within each of these segments. For the purposes of this analysis we call those means interpolated probabilities. We then replace all numeric estimates with their corresponding interpolated probabilities as well. Otherwise, numeric estimates would demonstrate greater possible variance, and we would not be scoring qualitative/quantitative assessments on a level playing field.

In the main text of the paper, we present results of scoring interpolated probabilities using Brier Scores. But this is only one method of scoring probability estimates. In Table S7, we show how our results are robust to the logarithmic scoring rule, and also to different methods of interpolation. The columns of Table S8 represent the results of scoring estimates according to:

Interpolated probabilities, Brier scoring

Interpolated probabilities, Logarithmic scoring

Words/numbers rounded to the midpoint of each segment, Brier scoring

Qualitative assessments interpreted according to “words of estimative probability” definitions from Mosteller and Youtz (1990); quantitative estimates rounded to the nearest such definition

Probabilities interpolated according to the segments on the 2015 Director of National Intelligence definitions of “words of estimative probability” (see Figure 1 in the main text) instead of the lexicon we used based on National Intelligence Estimates1

For each approach, and for both our National Security Officials and AMT samples, we present mean respondent scores for quantitative assessments, mean respondent scores for qualitative assessments, the proportional difference between these means, and the statistical significance of this difference according to a two-way t-test. Thus we restrict our sample size to one observation per respondent, based on respondents’ mean scores rather than scoring each assessment as an independent observation. Note that lower scores indicate better assessments under Brier scoring, whereas higher scores indicate better assessments under logarithmic scoring.

This analysis shows that, regardless of interpolation method, scoring rule, and sample, quantitative assessments are less accurate than qualitative assessments throughout our data.

1 One drawback of the DNI’s recommendation is that it includes round numbers (e.g., 5 percent, 20 percent) as the endpoints of adjacent bins. For the purposes of this analysis, we round these numerical estimates towards 50 percent in all cases – this reduces respondent certitude and thus reduces the chances of confirming our finding that quantitative assessment exacerbates overconfidence.

12

Table S7. Evaluating qualitative/quantitative assessments using different operationalizations

Interp. Probabilities, Brier Scoring(main result)

Interp. Probabilities,Logarithmic

Scoring

Probs. Rounded to WEP

midpoints, Brier Scoring

Probs. Rounded to nearest WEP definition via

Mosteller/Youtz

Interp. Probs. Using 2015 DNI Guidelines, Brier

Scoring

National Security Officials (208 respondents)Qualitative assessments 0.230 (.03) -0.679 (.10) 0.231 (.03) 0.243 (.04) 0.243 (.04)Quantitative assessments 0.265 (.04) -0.820 (.15) 0.261 (.04) 0.259 (.04) 0.264 (.04)Proportion diff. 13% 17% 11% 6% 8%

p<0.001 p<0.001 p<0.001 p=0.006 p<0.001

AMT Respondents (1,561 respondents)Qualitative assessments 0.276 (.04) -0.790 (.11) 0.273 (.04) 0.296 (.05) 0.293 (.05)Quantitative assessments 0.310 (.06) -0.934 (.19) 0.305 (.05) 0.307 (.05) 0.309 (.06)

11% 15% 10% 4% 5%p<0.001 p<0.001 p<0.001 p<0.001 p<0.001

Table S7 presents mean scores (with standard deviations) across different treatment conditions, evaluated according to different scoring rules. Statistical significance reflects two-way t-test for difference in means.

13

§I. Replication of Survey Experiment 2 results using logarithmic scoring

Table 2 in the paper predicts variation in probability assessment scores across respondents. Table S8 replicates that analysis, evaluating respondent assessments using logarithmic scoring instead of Brier scoring. Note that higher logarithmic scores indicate better assessment, which is the reverse of how one should interpret Brier Scores. Note also that, in Model 4, the direct relationship between quantitative assessment and judgmental accuracy is positive when we include the interaction term between numeric estimation and certitude.

National Security Officials AMT RespondentsModel 1 Model 2 Model 3 Model 4

Quantitative Assessment -0.102 (.02)*** 0.020 (.02) -0.051 (.01)*** 0.025 (.01)***

Certitude 0.472 (.07)*** -0.187 (.09)* -1.317 (.03)*** -1.104 (.04)***

Quantitative Assessment x Certitude -0.489 (.12)*** -0.355 (.05)***

Forecast -0.451 (.07)*** -0.445 (.07)*** -0.803 (.03)*** -0.805 (.03)***

Numeracy -0.008 (.01) 0.008 (.01) 0.012 (2.9e-4)*** 0.012 (2.9e-3)**

Military Service -0.007 (.02) 0.006 (.02) -0.046 (.06) -0.036 (.06)Female -1.8e-4 (.02) 0.001 (.02) -0.022 (6.7e-3)*** -0.020 (.01)**

U.S. Citizen -0.045 (.04) -0.057 (.04) -0.007 (.03) -0.004 (.03)English Native Lang. -0.066 (.04) 0.076 (.04)* -0.010 (.03) -0.010 (.03)Education Level 0.032 (4.9e-3)*** 0.032 (4.9e-3)***

Constant -0.212 (.05)*** -0.274 (.05)*** -0.536 (.04)*** -0.580 (.04)***

N 7,271 7,271 54,621 54,621R2 0.215 0.217 0.244 0.245Table 2 presents ordinary least squares regression predicting Brier scores for individual probability assessments. Standard errors clustered by respondent. Fixed effects for respondents and questions not shown. Note that higher Brier Scores indicate less accurate assessments. *p<.05, **p<.01, ***p<.001.

Table S8. Relationship between quantitative assessment, respondent certitude, and judgmental accuracy

14

cpb-us-e1.wpmucdn.com · web viewconducting these regressions using ordered probit/logit returns...

Documents