ece531 lecture 3: minimax hypothesis testing · minimax hypothesis testing minimax hypothesis...
TRANSCRIPT
Minimax Hypothesis Testing
ECE531 Lecture 3: Minimax Hypothesis Testing
D. Richard Brown III
Worcester Polytechnic Institute
05-February-2009
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 1 / 21
Minimax Hypothesis Testing
Simple Binary Bayesian Risks Under Different Priors
r(δ, π0) = π0R0(δ) + (1 − π0)R1(δ)
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 10
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
prior
r(δ, π0)
r(δBπ′
0 , π0)
r(δBπ0 , π0)
π0π′0
R1(δ)
R1(δBπ′
0)
R0(δ)
R0(δBπ′
0)
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 2 / 21
Minimax Hypothesis Testing
Least Favorable Prior State Distribution
0 0. 1 0. 2 0. 3 0. 4 0. 6 0. 7 0. 8 0. 9 10
0. 1
0. 2
0. 3
0. 5
0. 6
0. 7
0. 8
0. 9
1
prior
r(δBπlf , π0)r(δBπ′
0 , π0)
r(δBπ0 , π0)
π0πlfπ′0
R1(δBπlf )
R1(δBπ′
0)
R0(δBπlf )
R0(δBπ′
0)
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 3 / 21
Minimax Hypothesis Testing
Minimax Hypothesis Testing
Definition:
ρmm := arg minρ
maxj
Rj(ρ)
Remarks:
◮ No single decision rule minimizes the weighted average, e.g. Bayes,risk for every possible prior state distribution.
◮ A conservative approach is to minimize the worst case risk over allpossible prior state distributions.
◮ Intuitively, there should be a least favorable prior. Does it alwaysexist? Is it unique?
◮ Intuitively, the minimax decision rule should be the Bayesian decisionrule with constant Bayesian risk over the priors. Is this always true?
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 4 / 21
Minimax Hypothesis Testing
Minimum Bayesian Risk as a Function of the Prior
Let V (π) := r(δBπ, π) be the minimum Bayesian risk for the prior π.
Theorem
The minimum Bayesian risk V (π) is concave and continuous over the
space of priors satisfying πj ≥ 0, j = 0, 1, . . . , N − 1, and∑
j πj = 1.Hence, there exists a unique least favorable prior
πlf = arg maxπ
V (π).
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 5 / 21
Minimax Hypothesis Testing
Concavity of the Minimum Bayesian Risk
A function is concave if, for any {x, y} in the domain of f and anyα ∈ [0, 1], f(αx + (1 − α)y) ≥ αf(x) + (1 − α)f(y).
Denote a pair of priors as π and π′ and a third prior π′′ = απ + (1 − α)π′.We can write
V (π′′) = π′′⊤R(δBπ′′
)
= απ⊤R(δBπ′′
) + (1 − α)π′⊤R(δBπ′′
)
≥ αV (π) + (1 − α)V (π′)
hence V (π) is concave.
01π0 π′′
0 π′0
V (π0)
V (π′′0 ) V (π′
0)V
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 6 / 21
Minimax Hypothesis Testing
Continuity of the Minimum Bayesian Risk
Theorem (“A First Course in Optimization Theory” byR.K. Sundaram)
Let f : D → R be a concave function. Then, if D is open, f is continuous
on D. If D is not open, f is continuous on the interior of D.
Note that continuity does not imply differentiability.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 7 / 21
Minimax Hypothesis Testing
The Four Possibilities
1
3 4
2
π0lf = 0π0lf = 1
π0lfπ0lf
V (π0)V (π0)
V (π0)V (π0)
π0π0
π0π0
R1(δBπlf )
R1(δBπlf )
R1(δBπlf )
R1(δBπlf )
R0(δBπlf )
R0(δBπlf )
R0(δBπlf )
R0(δBπlf )
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 8 / 21
Minimax Hypothesis Testing
Case 1: Differentiable Interior Maximum Risk
TheoremIf there exists a prior π′ such that the conditional risks satisfy R0(δ
Bπ′
) = R1(δBπ′
)
then π′ is a least favorable prior and the minimax decision rule is ρmm = δBπ′
.
Proof.Given a π′ satisfying R0(δ
Bπ′
) = R1(δBπ′
). For any δ,
max{R0(δ), R1(δ)} ≥ maxπ0∈[0,1]
π0R0(δ) + (1 − π0)R1(δ)
≥ π′R0(δ) + (1 − π′)R1(δ)
≥ π′R0(δBπ′
) + (1 − π′)R1(δBπ′
)
= R0(δBπ′
) = R1(δBπ′
)
Moreover, for any π0 ∈ [0, 1],
V (π′) = π′R0(δBπ′
) + (1 − π′)R1(δBπ′
) = π0R0(δBπ′
) + (1 − π0)R1(δBπ′
) ≥ V (π0)
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 9 / 21
Minimax Hypothesis Testing
A Procedure for Finding the Minimax Decision Rule
1. Find a Bayesian decision rule δBπ as a function of the prior π.
2. See if Case 1 holds by solving for the unique least favorable prior πlf
using the equalizer rule:
R0(δBπlf ) = R1(δ
Bπlf )
3. If the solution exists, then set
ρmm = δBπlf
4. If there is no solution to the equalizer rule, then see if Case 3 or 4holds by computing the risk at the endpoints π0lf = 0 and π0lf = 1.
5. If neither endpoint is least favorable, then we must be in Case 2. Inthis case we must create a randomized minimax decision rule as aconvex function of two deterministic Bayes decision rules.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 10 / 21
Minimax Hypothesis Testing
Example: Coherent Detection of BPSK
Our Bayes decision rule for coherent BPSK with prior π0, π1 = 1 − π0 is
δBπ(y) =
1 if y > γ
0/1 if y = γ
0 if y < γ.
where γ := a0+a12
+ σ2
a1−a0ln π0
π1.
The conditional risks are
R0(δBπ) = Q
(
γ − a0
σ
)
R1(δBπ) = Q
(
a1 − γ
σ
)
where Q(x) :=∫ ∞x
1√2π
e−t2/2 dt.
Let’s try the equalizer rule. What value of γ gives us R0(δBπ) = R1(δ
Bπ)?Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 11 / 21
Minimax Hypothesis Testing
Example: Coherent Detection of BPSK
Answer: R0 = R1 when γ = a0+a12
. Hence
ρmm(y) =
1 if y > a0+a12
0/1 if y = a0+a12
0 if y < a0+a12
.
a0 a1Y0 Y1
γ = a0+a12
What does this imply about the least favorable prior?Answer: π0 = π1 = 1
2.
Given a0, a1, and σ, the minimax rule allows you to guarantee aworst-case risk over all priors.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 12 / 21
Minimax Hypothesis Testing
Example: Coherent Detection of BPSK
0 0. 1 0. 2 0. 3 0. 4 0. 6 0. 7 0. 8 0. 9 10
0. 1
0. 2
0. 3
0. 5
0. 6
0. 7
0. 8
0. 9
1
prior
r(δBπlf , π0)r(δBπ′
0 , π0)
r(δBπ0 , π0)
π0πlfπ′0
R1(δBπlf )
R1(δBπ′
0)
R0(δBπlf )
R0(δBπ′
0)
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 13 / 21
Minimax Hypothesis Testing
Cases 3-4: Maximum Risk Occurs at Boundary
Let’s return to our coin flipping problem from Lecture 1 (H0 ↔ x0 = HTand H1 ↔ x1 = HH) with a modified cost matrix
C =
[
0 100100 60
]
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
prior π0
Bay
es r
isk
rule 1rule 2rule 3rule 4minimum
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 14 / 21
Minimax Hypothesis Testing
Cases 3-4: Maximum Risk Occurs at Boundary
Remarks:
◮ Rules 2 and 3, depending on the prior, minimize the Bayes risk.
◮ Note that the equalizer rule would give no solution to this problem:
R0(D1) = 100 and R1(D1) = 60
R0(D2) = 0 and R1(D2) = 100
R0(D3) = 50 and R1(D3) = 60
R0(D4) = 50 and R1(D4) = 100
No decision rule gives R0 = R1.
◮ In this example, the least favorable prior (maximizing the minimumrisk) is π0 = 0, or that the coin is always HH. This should make sense.
◮ The minimax decision rule is Rule 3: observe T, decide the coin is
fair; observe H, decide the coin is unfair.
◮ You can guarantee a worst-case risk of $60 by using Rule 3.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 15 / 21
Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Back to our original coin flipping problem with cost matrix
C =
[
0 100100 0
]
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
prior π0
Bay
es r
isk
rule 1rule 2rule 3rule 4minimum
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 16 / 21
Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Remarks:
◮ Rules 2 and 3, depending on the prior, minimize the Bayes risk.
◮ Again, the equalizer rule gives no deterministic solution since
R0(D1) = 100 and R1(D1) = 0
R0(D2) = 0 and R1(D2) = 100
R0(D3) = 50 and R1(D3) = 0
R0(D4) = 50 and R1(D4) = 100
◮ In this example, the least favorable prior (maximizing the minimumrisk) is π0 = 2
3, or that the coin is HT with probability 2
3.
◮ The minimax decision rule is neither Rule 2 or Rule 3.
◮ You can guarantee a worst-case risk of $1003
by using a randomizeddecision rule that is a combination of Rules 2 and 3.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 17 / 21
Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Problem: Find a randomized decision rule that satisfies the equalizer rule
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
prior
Ba
ye
s r
isk
rule 1
rule 2
rule 3
rule 4
minimum
r(ρmm)V (πlf )V (πlf )
π0
R1(δBπ+
lf )
R1(δBπ−
lf )
R0(δBπ−
lf )
R0(δBπ+
lf )
δBπ+lf
δBπ−
lf
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 18 / 21
Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
Our randomized minimax decision rule is the
ρmm = αδBπ−
lf + (1 − α)δBπ+lf
We can calculate the randomization α ∈ [0, 1] by applying the equalizerrule:
αR0(δBπ−
lf ) + (1 − α)R0(δBπ+
lf ) = αR1(δBπ−
lf ) + (1 − α)R1(δBπ+
lf )
which gives the solution
α =R1(δ
Bπ+lf ) − R0(δ
Bπ+lf )
(R0(δBπ−
lf ) − R1(δBπ−
lf )) − (R0(δBπ+
lf ) − R1(δBπ+
lf ))
=V ′(π+
lf )
V ′(π+lf ) − V ′(π−
lf )
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 19 / 21
Minimax Hypothesis Testing
Case 2: Non-Differentiable Interior Maximum
What is the minimax decision rule in our example?
V ′(π+lf ) = −100
V ′(π−lf ) = 50
hence α = 23. If H0 is the hypothesis that the coin is HT, H1 is the
hypothesis that the coin is HH, the observations y0 = T, y1 = H, then ourdeterministic decision rules 2 and 3 can be written as
D3 = δBπ−
lf =
[
1 00 1
]
and D2 = δBπ+lf =
[
1 10 0
]
The minimax decision rule is then given by
ρmm(y = T) =2
3
»
10
–
+1
3
»
10
–
=
»
10
–
(T → always decide HT)
ρmm(y = H) =2
3
»
01
–
+1
3
»
10
–
=
»
1/32/3
– „
H → randomize →nPr(decide HT)=1/3Pr(decide HH)=2/3
«
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 20 / 21
Minimax Hypothesis Testing
Final Remarks on Minimax Hypothesis Testing
1. The objective of minimax hypothesis testing is to minimize yourworst-case (maximum) risk over all possible prior state probabilities.
2. Conservative approach but useful in scenarios when:◮ the prior is unknown and/or◮ you need to provide a maximum risk guarantee.
3. Try the equalizer rule first!
4. Minimax risk at the endpoints only occurs in weird cases.
5. Finite observation space Y implies that the minimum Bayes risk curveV is not going to be differentiable everywhere. Randomization isoften necessary to obtain the minimax decision rule in these cases.
6. Composite hypotheses:◮ The equalizer rule is still valid.◮ Checking “endpoints” is still valid (but more points to check).◮ In the case of a non-differentiable interior maximum, finding the
randomization can be difficult.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 21 / 21