roc-statistics-primer better graphics etc rewritten for 2007ieeeconf-submission working

Upload: brues

Post on 30-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

    1/7

    DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

    Abstract

    Although the terrorist attacks of September 11th, 2001 have initiated dramatic action on anumber of fronts, there is a need for continuing measured, patient research and development and

    scientific test and evaluation in the field of aviation security. Central to aviation security is the

    use of statistics to evaluate performance of detection devices. This document briefly explainsdetection statistics. It follows a logical progression from hypothesis testing in 2x2 tables,

    through marginal counts in testing of screening devices, to the signal-to-noise probabilities used

    in determining Receiver Operating Characteristic (ROC) curves. Using two figures it explainsthe manner of production of detection and false alarm probability curves and their combination

    into an ROC curve. Human behavior names given to the four cells in a 2x2 decision table should

    aid students' comprehension. A thorough understanding of the statistical concepts herein is

    essential for the evaluation of the performance of screening devices.

    Introduction

    The collection of readings on ROC curves and related issues in human signal detection edited bySwets, [1], is the basis for much of the material herein. The purpose of this paper is to explain

    the underlying rationale for statistical methods used in the analysis of baggage screening devicetest data. This document provides a consistent terminology for communicating common concepts

    underlying different statistical approaches to the same material. A thorough understanding of

    the material herein, will provide the tools to approach real world analysis problems that often

    deviate from those in traditional statistics texts.

    The Null Hypothesis

    Table I shows the null hypothesis testing approach. In each of the four possible outcome cells,

    equivalent behavior in a human decision-maker is given a name (SKEPTIC, etc.). Alpha () isthe probability of making a Type I error, of saying there is a significant effect when none exists(a BELIEVERs error). A typical value for in analysis of experimental data is 0.05 or one

    chance in twenty of saying an effect exists when in fact there is none. Beta () is the probability

    of a Type II error, of saying there is no effect when in reality one exists (a SKEPTICs error). Atypical value for in analysis of experimental data is 0.10 or one chance in ten of saying there is

    no effect when in fact one exists. The choice of names in Table I for the four human decision-

    making behaviors is unique to this paper. It clearly defines the outcomes and should aid in

    students' comprehension.

    Frequently, a source of confusion is the logic of the null hypothesis or H(0). The null

    hypothesis states that there is no difference between the control and treatment conditions in anexperiment. It states that their measured values are the same and a null or zero difference

    exists. If you can reject the null hypothesis (at a given level of alpha significance) then you

    accept the alternate hypothesis that the treatment condition differs from the control. Negation ofa negative becomes a positive.

    When determining error probabilities, Leach [2], p. 39, teaches us that the null distribution

    should be used for determination of the alpha error, whereas the alternate distribution should be

    Page 1 of 7

  • 8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

    2/7

    DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

    used for determination of the beta error,. In [3], Navarro states that the hypothesis testing

    approach is preferred to the confidence band approach because it allows estimating both the

    alpha and the beta errors.

    The term (1-) gives the statistical power of the test, which is the probability of detecting an

    effect if one is truly present. It will be shown that (1-) and are the probabilities, the z scoresof which determine d'for the ROC curve.

    2x2 Contingency Table With Event Counts

    In [1], page 652, Elliott presents notation and a 2x2 table for signal detection analysis. Based on

    the counts of events falling into the four cells, she defines the probability that the receiver reports

    a signal when a signal is present (Psn(A), a true positive) and the probability that the receiverreports a signal when only noise is present (Pn(A), a false positive). TABLE II. shows an alarm-

    by-threat contingency table that is consistent with the arrangement in the hypothesis-testing table

    (TABLE I). The counts in TABLE II are represented as: C(alarm|threat), C(no alarm|threat),

    C(alarm|no threat), and C(no alarm|no threat). C(alarm|threat) is to be read as, the count ofevents in which the system alarmed, given a threat was present.

    Following [1], op cit., the probability of a true positive (1-, TABLE I) is defined as:

    Psn(A) = C(alarm|threat)/(C(alarm|threat)+C(no alarm|threat)). (1)

    The sn in Psn(A) above indicates the presence of signal plus noise.

    Further, the probability of a false positive ( in TABLE I.) is defined as:

    Pn(A) = C(alarm|no threat)/(C(alarm|no threat)+C(no alarm|no threat)). (2)

    The n in Pn(A) above indicates the presence of noise alone. The two denominators of theabove probability definitions are the column totals shown in TABLE II of C(threat) and C(no

    threat), respectively.

    Contingency Table Showing Probabilities

    Elliott [1] then writes: By means of the computed values of Psn(A) and Pn(A), the appropriate

    value ofd'may be read from the table. TABLE III shows probabilities using Elliotts notation.

    In evaluating performance of a detection device, all necessary information is contained in the top

    two cells Psn(A) and Pn(A) that denote the probabilities of true detections and false alarms,respectively. A detection devices cumulative probability distributions for Psn(A) and Pn(A) are

    used to compute its ROC curve.

    Receiver Operating Characteristic Curves

    ROC theory assumes that the distributions for the null case (alarm with no threat present) and

    alternate case (alarm with threat present) are both Gaussian with equal standard deviations. The

    Page 2 of 7

  • 8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

    3/7

    DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

    threat being present (signal plus noise) shifts the mean of the alternate distribution; but does not

    change its spread (variance). The difference between the two distributions is d'(spoken as "d

    prime") standard deviations. To be consistent with current FAA practice, in the rest of thisdocument, P(d) is used instead of Psn(A) and P(fa) is used instead of Pn(A).

    In [4], ROC information was determined by table look-up. To provide more accuracy andflexibility than table look-up, a spreadsheet was programmed to generate idealized probability of

    detection (P(d) = 1-), probability of false alarm (P(fa) = ), and d'(the difference between the

    two). Fig. 1 shows the alternate, P(d), and the null, P(fa), cumulative probability distributionsfor a d'of 1.0. The curve to the left represents the probability of detection, P(d). The parallel

    curve to the right represents the probability of false alarm, P(fa). The X-axis shows the

    normalized Z score in standard deviation units.

    Moving to the right along the X-axis is equivalent to increasing the gain of a receiver. As the

    gain is increased, the probability of detection increases along with the probability of false alarm.

    The Y-axis shows the detection or false alarm probability. The value ofd'for Fig. 1 is 1.0

    standard deviation. Three horizontal lines of length corresponding to a d'value of 1.0 standarddeviation are drawn between the detection and the false alarm curves to illustrate the fact that d'

    is a constant for these two curves. Because the curves have identical cumulative Gaussiandistributions and equal standard deviations, they differ only as to their means, i.e., they are

    exactly the same shape and only shifted along the X-axis by the difference between their means.

    Fig. 2 shows ten spreadsheet-generated ROC curves; one for each of ten values ofd'. Each ROCcurve was derived from replotting the data from pairs of cumulative curves like the pair shown in

    Fig. 1. Instead of Fig. 1s probability versus Z-score axes, Fig. 2 plots detection probability

    versus false alarm probability. Here is how the replotting works. Fig. 1 shows two verticalarrows on the Z-score = -0.5 grid line pointing to circles on the two curves. The upper arrow

    points to a Y value of 0.7 on the detection probability curve. The lower arrow points to a Y

    value of 0.3 on the false alarm probability curve. These two points correspond to a single pointon the d'= 1.0 curve of Fig. 2 (shown in bold). This point, (0.3, 0.7) is shown enclosed by a

    circle. Thus, an ROC curve is the locus of all points defined by corresponding points on the pair

    of cumulative probability curves. Higher values ofd'mean better screening device performance.

    Conclusion

    The collection of real world event counts is the starting point in evaluation of screening deviceperformance. This primer explains the logical progression from hypothesis testing, to event

    counts, to computation of probabilities, to production of ROC curves. Throughout this process,

    one must take care to avoid errors and inappropriate statistical techniques. Rarely do real worlddetection and false alarm data have Gaussian distributions and equal variances. There are ways

    to perform ROC curve analyses in the presence of deviations from the ideal. Many statistical

    techniques are available to the investigator. The universal use of personal computers andavailability of a variety of statistical packages reduces drudgery of computation but requires

    increasing sophistication on the part of the analyst. It is hoped that this primer on the statistics

    involved in analyzing the performance of screening devices will help further the cause of a

    secure aviation system.

    Page 3 of 7

  • 8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

    4/7

    DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

    TABLE I. Hypothesis Testing 2 x 2 Table

    In reality, an effect exists

    (reject H(0))

    For probabilities, use the

    alternate hypothesis

    distribution

    In reality, no effect exists

    (accept H(0))

    For probabilities, use the

    null hypothesis

    distribution

    The judgment is

    that

    an

    effect exists

    PROVER correctly

    affirms

    1- (power)

    BELIEVER wrongly

    affirms

    error (typical = 0.05)

    The judgment isthat

    no

    effect exists

    SKEPTIC wronglydenies

    error (typical = 0.10)

    DISPROVER correctlydenies

    1-

    TABLE II. Alarm-By-Threat Event Counts

    In reality,

    a threat is present

    In reality,

    no threat is presentRow Totals

    The systemalarms

    True PositiveC(alarm|threat)

    False PositiveC(alarm|no threat)

    C(alarm)

    The system

    does not alarm

    False Negative

    C(no alarm|threat)

    True Negative

    C(no alarm|no threat)C(no alarm)

    Column

    TotalsC(threat) C(no threat) C(total)

    Page 4 of 7

  • 8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

    5/7

    DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

    TABLE III. Alarm-By-Threat Probabilities

    In reality,

    a threat is present

    In reality,

    no threat is presentThe system

    alarms

    True Positive

    Psn(A) = 1 -

    False Positive

    Pn(A) =

    The system

    does not alarm

    False Negative

    Psn(CA) =

    True Negative

    Pn(CA) = 1 -

    Page 5 of 7

  • 8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

    6/7

    DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

    Fig. 1. Cumulative Gaussian Distributions, d'= 1.0

    Fig. 2. Idealized ROC Curves ford'= 0.1 to 3.0

    Page 6 of 7

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

    Z Score (Standard Deviation Units)

    ProbabilityofDetectionandFalseAla

    rm

    P(fa) P(d)

    d' = 1.0

    d' = 1.0

    d' = 1.0

    0.00

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    0.90

    1.00

    0.02

    0.06

    0.10

    0.14

    0.18

    0.22

    0.26

    0.30

    0.34

    0.38

    0.42

    0.46

    0.50

    0.54

    0.58

    0.62

    0.66

    0.70

    0.74

    0.78

    0.82

    0.86

    0.90

    0.94

    0.98

    PROBABILITY OF FALSE ALARM (P(fa))

    PROBABILITY

    OFDETECTION

    (P(d))

    3.0

    2.5

    2.0

    1.5

    1.0

    0.8

    0.6

    0.4

    0.2

    0.1

    Fig. 1

  • 8/14/2019 Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working

    7/7

    DETECTION STATISTICS PRIMER 2007 Bruce L. Rosenberg

    References

    [1] P. B. Elliott, "Appendix 1 - Tables ofd'(d prime)" in Signal Detection and Recognition by

    Human Observers: Contemporary Readings, J. Swets, Ed., John Wiley & Sons, NY, 1964, pp651-658.

    [2] C. Leach, Introduction to Statistics: A Nonparametric Approach for the Social Sciences,John Wiley & Sons, NY, 1979.

    [3] J. Navarro, D. Becker, B. Kenna, and C. Kossack, "A general protocol for operational

    testing and evaluation of bulk explosive systems", in Proceedings 1st International Symp. on

    Explosive Detection Technology, November 13-15, 1991, DOT/FAA/CT-92/11, May 1992, pp

    347-367.

    [4] T. McGhee, and J. Connelly, "Developmental test and evaluation of three commercial x-rayexplosives detection devices", Final Report DOT/FAA/AR-97/12,I, June 1997.

    About The Author

    Bruce L. Rosenberg has a Masters degree in experimental psychology and a Bachelors degree inpsychology with minors in statistics and mathematics and electronics training in the US Air

    Force. He is a Life Member of the Institute of Electrical and Electronics Engineers (IEEE). He

    has a strong technical background in testing of advanced electronic systems, statistics,programming, and electronic circuit design. He held a Senior Test Engineer position supporting

    Federal Aviation Administration (FAA) Aviation Security Laboratory projects (now under

    Homeland Security Administration). From 1969 to 1995, he served as Senior EngineeringResearch Psychologist at the FAA W. J. Hughes Technical Center, Atlantic City Airport, Pomona,

    NJ. He performed over 40 T&E and R&D studies, designed over 20 test and evaluation protocols

    (including questionnaires, surveys, and debriefings), coded over 10 major software applicationsfor system testing and data analysis, authored over 100 technical reports, taught 14 college-level

    courses, and patented 4 inventions.

    Page 7 of 7