roc-statistics-primer better graphics etc rewritten for 2007ieeeconf-submission working

8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

1/7

DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

Abstract

Although the terrorist attacks of September 11th, 2001 have initiated dramatic action on anumber of fronts, there is a need for continuing measured, patient research and development and

scientific test and evaluation in the field of aviation security. Central to aviation security is the

use of statistics to evaluate performance of detection devices. This document briefly explainsdetection statistics. It follows a logical progression from hypothesis testing in 2x2 tables,

through marginal counts in testing of screening devices, to the signal-to-noise probabilities used

in determining Receiver Operating Characteristic (ROC) curves. Using two figures it explainsthe manner of production of detection and false alarm probability curves and their combination

into an ROC curve. Human behavior names given to the four cells in a 2x2 decision table should

aid students' comprehension. A thorough understanding of the statistical concepts herein is

essential for the evaluation of the performance of screening devices.

Introduction

The collection of readings on ROC curves and related issues in human signal detection edited bySwets, [1], is the basis for much of the material herein. The purpose of this paper is to explain

the underlying rationale for statistical methods used in the analysis of baggage screening devicetest data. This document provides a consistent terminology for communicating common concepts

underlying different statistical approaches to the same material. A thorough understanding of

the material herein, will provide the tools to approach real world analysis problems that often

deviate from those in traditional statistics texts.

The Null Hypothesis

Table I shows the null hypothesis testing approach. In each of the four possible outcome cells,

equivalent behavior in a human decision-maker is given a name (SKEPTIC, etc.). Alpha () isthe probability of making a Type I error, of saying there is a significant effect when none exists(a BELIEVERs error). A typical value for in analysis of experimental data is 0.05 or one

chance in twenty of saying an effect exists when in fact there is none. Beta () is the probability

of a Type II error, of saying there is no effect when in reality one exists (a SKEPTICs error). Atypical value for in analysis of experimental data is 0.10 or one chance in ten of saying there is

no effect when in fact one exists. The choice of names in Table I for the four human decision-

making behaviors is unique to this paper. It clearly defines the outcomes and should aid in

students' comprehension.

Frequently, a source of confusion is the logic of the null hypothesis or H(0). The null

hypothesis states that there is no difference between the control and treatment conditions in anexperiment. It states that their measured values are the same and a null or zero difference

exists. If you can reject the null hypothesis (at a given level of alpha significance) then you

accept the alternate hypothesis that the treatment condition differs from the control. Negation ofa negative becomes a positive.

When determining error probabilities, Leach [2], p. 39, teaches us that the null distribution

should be used for determination of the alpha error, whereas the alternate distribution should be

Page 1 of 7


2/7


used for determination of the beta error,. In [3], Navarro states that the hypothesis testing

approach is preferred to the confidence band approach because it allows estimating both the

alpha and the beta errors.

The term (1-) gives the statistical power of the test, which is the probability of detecting an

effect if one is truly present. It will be shown that (1-) and are the probabilities, the z scoresof which determine d'for the ROC curve.

2x2 Contingency Table With Event Counts

In [1], page 652, Elliott presents notation and a 2x2 table for signal detection analysis. Based on

the counts of events falling into the four cells, she defines the probability that the receiver reports

a signal when a signal is present (Psn(A), a true positive) and the probability that the receiverreports a signal when only noise is present (Pn(A), a false positive). TABLE II. shows an alarm-

by-threat contingency table that is consistent with the arrangement in the hypothesis-testing table

(TABLE I). The counts in TABLE II are represented as: C(alarm|threat), C(no alarm|threat),

C(alarm|no threat), and C(no alarm|no threat). C(alarm|threat) is to be read as, the count ofevents in which the system alarmed, given a threat was present.

Following [1], op cit., the probability of a true positive (1-, TABLE I) is defined as:

Psn(A) = C(alarm|threat)/(C(alarm|threat)+C(no alarm|threat)). (1)

The sn in Psn(A) above indicates the presence of signal plus noise.

Further, the probability of a false positive ( in TABLE I.) is defined as:

Pn(A) = C(alarm|no threat)/(C(alarm|no threat)+C(no alarm|no threat)). (2)

The n in Pn(A) above indicates the presence of noise alone. The two denominators of theabove probability definitions are the column totals shown in TABLE II of C(threat) and C(no

threat), respectively.

Contingency Table Showing Probabilities

Elliott [1] then writes: By means of the computed values of Psn(A) and Pn(A), the appropriate

value ofd'may be read from the table. TABLE III shows probabilities using Elliotts notation.

In evaluating performance of a detection device, all necessary information is contained in the top

two cells Psn(A) and Pn(A) that denote the probabilities of true detections and false alarms,respectively. A detection devices cumulative probability distributions for Psn(A) and Pn(A) are

used to compute its ROC curve.

Receiver Operating Characteristic Curves

ROC theory assumes that the distributions for the null case (alarm with no threat present) and

alternate case (alarm with threat present) are both Gaussian with equal standard deviations. The

Page 2 of 7


3/7


threat being present (signal plus noise) shifts the mean of the alternate distribution; but does not

change its spread (variance). The difference between the two distributions is d'(spoken as "d

prime") standard deviations. To be consistent with current FAA practice, in the rest of thisdocument, P(d) is used instead of Psn(A) and P(fa) is used instead of Pn(A).

In [4], ROC information was determined by table look-up. To provide more accuracy andflexibility than table look-up, a spreadsheet was programmed to generate idealized probability of

detection (P(d) = 1-), probability of false alarm (P(fa) = ), and d'(the difference between the

two). Fig. 1 shows the alternate, P(d), and the null, P(fa), cumulative probability distributionsfor a d'of 1.0. The curve to the left represents the probability of detection, P(d). The parallel

curve to the right represents the probability of false alarm, P(fa). The X-axis shows the

normalized Z score in standard deviation units.

Moving to the right along the X-axis is equivalent to increasing the gain of a receiver. As the

gain is increased, the probability of detection increases along with the probability of false alarm.

The Y-axis shows the detection or false alarm probability. The value ofd'for Fig. 1 is 1.0

standard deviation. Three horizontal lines of length corresponding to a d'value of 1.0 standarddeviation are drawn between the detection and the false alarm curves to illustrate the fact that d'

is a constant for these two curves. Because the curves have identical cumulative Gaussiandistributions and equal standard deviations, they differ only as to their means, i.e., they are

exactly the same shape and only shifted along the X-axis by the difference between their means.

Fig. 2 shows ten spreadsheet-generated ROC curves; one for each of ten values ofd'. Each ROCcurve was derived from replotting the data from pairs of cumulative curves like the pair shown in

Fig. 1. Instead of Fig. 1s probability versus Z-score axes, Fig. 2 plots detection probability

versus false alarm probability. Here is how the replotting works. Fig. 1 shows two verticalarrows on the Z-score = -0.5 grid line pointing to circles on the two curves. The upper arrow

points to a Y value of 0.7 on the detection probability curve. The lower arrow points to a Y

value of 0.3 on the false alarm probability curve. These two points correspond to a single pointon the d'= 1.0 curve of Fig. 2 (shown in bold). This point, (0.3, 0.7) is shown enclosed by a

circle. Thus, an ROC curve is the locus of all points defined by corresponding points on the pair

of cumulative probability curves. Higher values ofd'mean better screening device performance.

Conclusion

The collection of real world event counts is the starting point in evaluation of screening deviceperformance. This primer explains the logical progression from hypothesis testing, to event

counts, to computation of probabilities, to production of ROC curves. Throughout this process,

one must take care to avoid errors and inappropriate statistical techniques. Rarely do real worlddetection and false alarm data have Gaussian distributions and equal variances. There are ways

to perform ROC curve analyses in the presence of deviations from the ideal. Many statistical

techniques are available to the investigator. The universal use of personal computers andavailability of a variety of statistical packages reduces drudgery of computation but requires

increasing sophistication on the part of the analyst. It is hoped that this primer on the statistics

involved in analyzing the performance of screening devices will help further the cause of a

secure aviation system.

Page 3 of 7


4/7


TABLE I. Hypothesis Testing 2 x 2 Table

In reality, an effect exists

(reject H(0))

For probabilities, use the

alternate hypothesis

distribution

In reality, no effect exists

(accept H(0))

For probabilities, use the

null hypothesis

distribution

The judgment is

that

an

effect exists

PROVER correctly

affirms

1- (power)

BELIEVER wrongly

affirms

error (typical = 0.05)

The judgment isthat

no

effect exists

SKEPTIC wronglydenies

error (typical = 0.10)

DISPROVER correctlydenies

1-

TABLE II. Alarm-By-Threat Event Counts

In reality,

a threat is present

In reality,

no threat is presentRow Totals

The systemalarms

True PositiveC(alarm|threat)

False PositiveC(alarm|no threat)

C(alarm)

The system

does not alarm

False Negative

C(no alarm|threat)

True Negative

C(no alarm|no threat)C(no alarm)

Column

TotalsC(threat) C(no threat) C(total)

Page 4 of 7


5/7


TABLE III. Alarm-By-Threat Probabilities

In reality,

a threat is present

In reality,

no threat is presentThe system

alarms

True Positive

Psn(A) = 1 -

False Positive

Pn(A) =

The system

does not alarm

False Negative

Psn(CA) =

True Negative

Pn(CA) = 1 -

Page 5 of 7


6/7


Fig. 1. Cumulative Gaussian Distributions, d'= 1.0

Fig. 2. Idealized ROC Curves ford'= 0.1 to 3.0

Page 6 of 7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Z Score (Standard Deviation Units)

ProbabilityofDetectionandFalseAla

rm

P(fa) P(d)

d' = 1.0

d' = 1.0

d' = 1.0

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.02

0.06

0.10

0.14

0.18

0.22

0.26

0.30

0.34

0.38

0.42

0.46

0.50

0.54

0.58

0.62

0.66

0.70

0.74

0.78

0.82

0.86

0.90

0.94

0.98

PROBABILITY OF FALSE ALARM (P(fa))

PROBABILITY

OFDETECTION

(P(d))

3.0

2.5

2.0

1.5

1.0

0.8

0.6

0.4

0.2

0.1

Fig. 1


7/7


References

[1] P. B. Elliott, "Appendix 1 - Tables ofd'(d prime)" in Signal Detection and Recognition by

Human Observers: Contemporary Readings, J. Swets, Ed., John Wiley & Sons, NY, 1964, pp651-658.

[2] C. Leach, Introduction to Statistics: A Nonparametric Approach for the Social Sciences,John Wiley & Sons, NY, 1979.

[3] J. Navarro, D. Becker, B. Kenna, and C. Kossack, "A general protocol for operational

testing and evaluation of bulk explosive systems", in Proceedings 1st International Symp. on

Explosive Detection Technology, November 13-15, 1991, DOT/FAA/CT-92/11, May 1992, pp

347-367.

[4] T. McGhee, and J. Connelly, "Developmental test and evaluation of three commercial x-rayexplosives detection devices", Final Report DOT/FAA/AR-97/12,I, June 1997.

About The Author

Bruce L. Rosenberg has a Masters degree in experimental psychology and a Bachelors degree inpsychology with minors in statistics and mathematics and electronics training in the US Air

Force. He is a Life Member of the Institute of Electrical and Electronics Engineers (IEEE). He

has a strong technical background in testing of advanced electronic systems, statistics,programming, and electronic circuit design. He held a Senior Test Engineer position supporting

Federal Aviation Administration (FAA) Aviation Security Laboratory projects (now under

Homeland Security Administration). From 1969 to 1995, he served as Senior EngineeringResearch Psychologist at the FAA W. J. Hughes Technical Center, Atlantic City Airport, Pomona,

NJ. He performed over 40 T&E and R&D studies, designed over 20 test and evaluation protocols

(including questionnaires, surveys, and debriefings), coded over 10 major software applicationsfor system testing and data analysis, authored over 100 technical reports, taught 14 college-level

courses, and patented 4 inventions.

Page 7 of 7

roc-statistics-primer better graphics etc rewritten for 2007ieeeconf-submission working

Documents