chapter 2. binary and m-ary hypothesis testing 2.1 ...gencheng/signaldetectionestimation/chap... ·...

Signal Detection and Estimation

Chapter 2. Binary and M-ary Hypothesis Testing

2.1 Introduction (Levy 2.1)

Detection problems can usually be casted as binary or M -aryhypothesis testing problems.

Applications:

This chapter: Simple hypothesis testing problem, probabilitydistribution of the observations under each hypothesis is assumedto be known exactly.

Example:

Composite hypothesis testing: problems involving unknownparameters (Chapter 4).

Example:

1


Objectives:1. Design testing rules that are optimal in some appropriate sense

based on the amount of information available.2. Analyze the performance of the test.

Structure of this chapter:2.2 Binary Hypothesis Testing Problem Formulationtes Problem modeling, notation, performance measure, etc.

2.3 Bayesian Testtes * A-priori prob. known * Cost structure known

2.4 Minimax Testtes * A-priori prob. unknown * Cost structure known

2.5 Neyman-Pearson Testtes * A-priori prob. unknown * Cost structure unknown

tes Receiver operating characteristic (ROC)

2.6 Gaussian Detection2.7 M -ary Hypothesis Testing

2


2.2 Binary Hypothesis Testing Problem Formulation(Levy 2.2 and 2.4)

Binary Hypothesis Testing is to decide between 2 hypotheses basedon the observation (random).

Model contains:

1. Hypothesis and a-priori probability

2. Observation

3. Connection b/w hypotheses and observation

4. Decision function

5. Performance measure

3


1. Hypothesis and a-priori probabilityHypotheses: H0 and H1.π0 = P (H0) and π1 = P (H1) = 1− π0.

2. ObservationRandom vector Y with sample space Y.An observation is a sample vector y of Y.

3. Connection b/w hypotheses and observation:Distributions of Y under H0 and H1.

For continuous Y, For discrete Y,

PDF under each hypothesis PMF under each hypothesis

H0 : Y ∼ fY(y|H0)

H1 : Y ∼ fY(y|H1)

H0 : P (Y = y|H0) = p(y|H0)

H1 : P (Y = y|H1) = p(y|H1)

Assume to be known in this chapter.

4


4. Decision function: Decide whether H0 or H1 is true given anobservation.

A map from Y to {0, 1}:

δ(y) =

1 if decide on H1

0 if decide on H0

Decision regions: Y0 and Y1

Y0 , {y | δ(y) = 0} Y1 , {y | δ(y) = 1}.

We have Y0 ∩ Y1 = ∅ and Y0 ∪ Y1 = Y.A decision function is a partition of the sample space of Y.

Examples on decision rules:

5


Goal: obtain the decision rule which is “optimal” (in somesense).

5. An optimality/performance measure.

Bayesian formulation: All uncertainties are quantifiable. Thecost and benefits of outcomes can be measured.

Cost function:Cij for i = 0, 1, j = 0, 1, the cost of deciding on Hi when Hj holds.The value of Cij depends on the application/nature of theproblem.

Examples on cost function:

Assumption: Making a correct decision is always less costlythan making a mistake, i.e., C00 < C10 and C11 < C01.

6


False alarm: H0 is true but H1 is decided. (Error of Type I)Detection: H1 is true and H1 is decided.Miss of detection: H1 is true but H0 is decided. (Error ofType II).

Probability of detection: PD(δ) = P (Y1|H1).Probability of false alarm: PF (δ) = P (Y1|H0).Probability of miss: PM (δ) = P (Y0|H1) = 1− PD(δ).

R(δ) = π0C00+π0(C10−C00)PF (δ)+π1C01+π1(C11−C01)PD(δ).

Ideally: Want PD(δ) → 1 and PF (δ) → 0.

Receiver operating characteristic (ROC): the upperboundary between achievable and un-achievable regions in the(PF , PD)-square.

9


2.3 Bayesian Testing (Levy 2.2)

Assume:1. A-priori probabilities (π0, π1) are known.2. Cost structure (Cij) is known.

Find the optimal decision δ that minimizes Bayes risk, R(δ).

Note: Distribution of Y under each hypothesis is also known.

10


Define the likelihood ratio as

L(y) , f(y|H1)f(y|H0)

H1

RH0

(C10 − C00)π0

(C01 − C11)π1.

The optimal decision rule is:

L(y)H1

RH0

τ , (C10 − C00)π0

(C01 − C11)π1⇐⇒ δB(y) =

1 if L(y) ≥ τ

0 if L(y) < τ.

For discrete Y, similarly, the optimal decision rule is:

L(y) , P (y|H1)P (y|H0)

H1

RH0

τ,

a LRT.

12


Maximum A-Posteriori (MAP) Rule:

Consider the following cost structure:C00 = C11 = 0, C01 = C10 = 1 ⇔ Cij = 1− δij .

Kronecher delta function δij =

1 if i = j

0 if i 6= j.

An error incurs a cost.Minimizing Bayes risk becomes minimizing the probability of error.

LRT becomes:

L(y) =P (y|H1)

P (y|H0)

H1

RH0

(C10 − C00)π0

(C01 − C11)π1=

π0

π1

⇔ π1f(y|H1)H1

RH0

π0f(y|H1)

⇔ P (H1|Y = y)H1

RH0

P (H0|Y = y).

Choose the hypothesis with the larger a-posteriori probability.

13


Maximum Likelihood Rule:

If, furthermore, π0 = π1 = 1/2, equal-probable hypotheses, LRTbecomes

f(y|H1)H1

RH0

f(y|H1).

Choose the hypothesis with the larger likelihood function value.

2.3.2 Examples

14


2.3.3 Asymptotic Performance of LRT (Levy 3.2)

For a binary hypothesis testing problem, Y1,Y2, · · · ,YN is thesequence of i.i.d. random observations. Yk ∈ Rn.

Assume that Y is continuous and let

Y =

Y1

Y2

· · ·YN

.

LRT:

L(y) =f(y|H1)f(y|H0)

=N∏

k=1

f(yk|H1)f(yk|H0)

=N∏

k=1

L(yk)H1

RH0

τ(N)

⇔ 1N

N∑

k=1

ln(yk)H1

RH0

1N

ln τ(N) , γ(N).

15


Let Zk , lnL(Yk) = f(yk|H1)f(yk|H0)

and SN , 1N

∑Nk=1 Zk.

The LRT becomes:

SN

H1

RH0

γ(N).

Notice that Zk’s are i.i.d. and SN is the sample mean of Zk’s.

When N →∞, strong law of large numbers

H1 : SNa.s.−→ E [Zk|H1] =

∫ln

f(y|H1)f(y|H0)

f(y|H1)dy

H0 : SNa.s.−→ E [Zk|H0] =

∫ln

f(y|H1)f(y|H0)

f(y|H0)dy

Def. For two PDFs f and g, the Kullback-Leibler (KL)divergence is

D(f |g) =∫

f(x) lnf(x)g(x)

dx.

A natural notion of distance between random variables. Not a true“distance” metric.

16


Properties:1. D(f |g) ≥ 0 with equality if and only if f = g.2. Non-symmetric D(f |g) 6= D(g|f).3. Does not satisfy the triangular inequality.

Let f0(y) , f(y|H0) and f1(y) , f(y|H1). When N →∞,

H1 : SNa.s.−→

∫ln

f1(y)f0(y)

f1(y)dy = D(f1|f0) > 0.

H0 : SNa.s.−→

∫ln

f1(y)f0(y)

f0(y)dy = −D(f0|f1) < 0

Thus PD(N) → 1 and PF (N) → 0.

* As long as we are willing to collect an arbitrarily large number ofind. observations, we can separate perfectly H0 and H1 regardlessof π0 and Cij .

How fast does PD(N) → 1 and PF (N) → 0? Exponentially with N .

17


2.4 Mini-max Hypothesis Testing (Levy 2.5)

Assume:1. A-priori probabilities (π0, π1) is unknown.2. Cost structure (Cij) is known.

Possible solutions:1. Guess. May lead to bad performance.2. Design the test conservatively by assuming the least-favorable

choice of a-priori and selecting the test that minimizes theBayes risk for this choice. Minimizes the maximum risk.

Guarantees a minimum level of performance independent of thea-priori.

Problem statement: Find the test δM and a-priori value π0M

that solves the mini-max problem

(δM , π0M ) = arg minδ

maxπ0∈[0,1]

R(δ, π0).

18


Approach: Saddle point method.

Def. A saddle point is a point in the domain of a function whichis a stationary point but not a local extremum.

If a point (δM , π0M ) satisfies

R(δM , π0) ≤ R(δM , π0M ) ≤ R(δ, π0M ), for any δ, π0, (1)

It is a saddle point of the function R. It is the solution of themini-max problem.

Proof:Step 1: A saddle point of the form (1) exists.Step 2: The saddle point is the solution (Saddle point property)Step 3: Construct the saddle point.

Mini-max equation:

(C01 − C11) + (C11 − C01)PD(δM )− (C10 − C00)PF (δM ) = 0.

19


testComments:1. If C00 = C11, the mini-max equation becomes

PD = 1− C10 − C00

C01 − C11PF ,

a line through (0,1) of the (PF , PD) square.If Cij = 1− δij , mini-max equation becomes PD = 1− PF .

2. Mini-max test corresponding to the intersection of the ROCand the line of the mini-max equation.

3. The LRT threshold τm of the mini-max test, corresponding tothe intersection, equals the slope of the ROC at this point. Thecorresponding a-priori probability can be calculated by

π0M =

[1 +

C10 − C00

(C01 − C11)τM

]−1

.

4. Another way of finding π0M : π0M = arg maxπ0 minδ R(δ, π0).Define V (π0) , minδ R(δ, π0), which is the minimum Bayes riskwith a-priori π, achieved by the LRT.

π0M = arg maxπ0

V (π0).

20


Examples:

21


2.5 Neyman-Pearson (NP) Testing (Levy 2.4.1)

Assume:1. A-priori probabilities (π0, π1) is unknown.2. Cost structure (Cij) is unknown.

NP-testing problem:Select the test δ that maximizes PD(δ) ensuring that theprobability of false alarm PF (δ) is no more than α.

Dα , {δ | PF (δ) ≤ α}δNP = arg max

δ∈Dα

PD(α)

Lagrangian method for constrained optimization.

22


δNP = arg maxδ

PD(δ) subject to PF (δ) ≤ α.

Consider the Lagrangian:L(δ, λ) , −PD(δ) + λ(PF (δ)− α).

A test δ is optimal if it minimizes L(δ, λ) (maximizes −L(δ, λ)),λ ≥ 0, PF (δ) ≤ α, and λ(α− PF (δ)) = 0.

−L(δ, λ) =∫

Y1

f(y|H1)dy + λα− λ

∫

Y1

f(y|H0)dy

=∫

Y1

[f(y|H1)− λf(y|H0)] dy + λα.

−L(δ, λ) is maximized when

δ(y) =

1 if f(y|H1) > λf(y|H0)

0 if f(y|H1) < λf(y|H0)

0 or 1 if f(y|H1) = λf(y|H0)

=

1 if L(y) > λ

0 if L(y) < λ

0 or 1 if L(y) = λ

Thus, δ has to be an LRT. λ must satisfy the KKT condition.

23


Let FL(l|H0) , P (L ≤ l|H0), CDF of the LR L = L(y) under H0.Let f0 , FL(0|H0) = P (L = 0|H0).Define 2 tests:

δL,λ(y) =

1 if L(y) > λ

0 if L(y) ≤ λδU,λ(y) =

1 if L(y) ≥ λ

0 if L(y) < λ

Case 1: If 1− α < f0, let λ = 0 and δNP = δL,0.

Case 2: If 1− α ≥ f0 and there exists a λ such thatFL(λ|H0) = 1− α, i.e, 1− α is in the range of FL(l|H0),choose this λ as the LRT threshold and let δNP = δL,λ.

Case 3: If 1− α ≥ f0 and 1− α is not in the range of FL(l|H0), i.e.,there is a discontinuity point λ > 0 of FL(l|H0) such that

FL(λ−|H0) < 1− α < FL(λ|H0),

Choose this λ as the LRT threshold, the NP test is the randomized

24


test: Choose δU,λ with probability p and δL,λ with probability1− p, equivalently,

δNP =

1 if L(y) > λ

0 if L(y) < λ

1 w.p. p;

0 w.p. 1− pif L(y) = λ

.

Comments:1. When Y is discrete, FL(l|H0) is discontinuous, thus,

randomized test is usually needed.2. Similarly, we could consider the minimization of PF under the

constraint PM (δ) ≤ β. Similar solution can be obtained. Thisproblem is called an NP test of Type II. The previouslydiscussed one is called an NP test of Type I.

Example:

25


2.5.1. ROC Properties

Finding ROC is naturally the NP test problem, which must be anLRT.

L(y)H1

RH0

τ.

PD(τ) =∫ ∞

τ

fL(l|H1)dl PF (τ) =∫ ∞

τ

fL(l|H0)dl (2)

As τ varies from 0 to ∞, (PF (δ), PD(δ)) moves continuously alongthe ROC curve.

1. Let τ = 0−. Thus δ1(y) = 1 always and PD(δ) = PF (δ) = 1.

(1, 1) belongs to the ROC.

2. Let τ = ∞. Thus δ1(y) = 0 always and PD(δ) = PF (δ) = 0.

(0, 0) belongs to the ROC.

3. The slope of the ROC at point (PF (τ), PD(δ)) equals to τ .

26


4. The ROC curve is concave, i.e., the domain of the achievablepairs (PF , PD) is convex.

5. All points on the ROC curve satisfy PD ≥ PF .

6. The region of feasible tests is symmetric about the point(1/2, 1/2), i.e., if (PF , PD) is feasible, so is (1− PF , 1− PD).

Example:

27

chapter 2. binary and m-ary hypothesis testing 2.1 ...gencheng/signaldetectionestimation/chap... ·...

Documents