after quiz 2

19
Detection theory Detection theory involves making a decision based on set of measurements. Given a set of observations, a decision has to be made regarding the source of observations. Hypothesis A statement about the possible source of the observations. Simplest binary hypothesis testing chooses one of two hypotheses, namely 0 H Null hypothesis which is the usually true statement. 1 H Alternative hypothesis. In radar application, these two hypotheses denote 0 H Target is absent 1 H Target is present M-ary hypothesis testing chooses one of M alternatives: 0 1 1 , ........, M H H H . A set of observations denoted by 1 2 [ ..., ]' n xx x x We can associate an a prior probability to each of hypothesis as 0 1 , ,..., M PH PH PH Given hypothesis i H ,the observations are determined by the conditional PDF / () i H f X x The hypothesis may be about some parameter that determines / () i H f X x . is chosen from a parametric space . Simple and composite hypotheses: For a simple hypothesis, the parameter is a distinct point in while for a composite hypothesis the parameter is specified in a region. For example, 0 : 0 H is a simple hypothesis while 1 : 0 H is a composite one. Bayesian Decision theory for simple binary hypothesis testing The decision process D X partitions the observation space in to the region 0 1 n Z Z such that 0 DX H if 0 x Z and 1 H otherwise A cost , ij j i C CH D H x is assigned to each , j i H D H x pair. Thus 00 0 0 , () C CH D H x , and so on 1,0 0 1 , C CH D H x .The objective is to minimize the average risk

Upload: gopi-saiteja

Post on 20-Jan-2017

97 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Detection theory

Detection theory involves making a decision based on set of measurements. Given a set of

observations, a decision has to be made regarding the source of observations.

Hypothesis –A statement about the possible source of the observations.

Simplest –binary hypothesis testing chooses one of two hypotheses, namely

0H Null hypothesis which is the usually true statement.

1H Alternative hypothesis.

In radar application, these two hypotheses denote

0H Target is absent

1H Target is present

M-ary hypothesis testing chooses one of M alternatives:

0 1 1, ........, MH H H .

A set of observations denoted by

1 2[ ..., ]'nx x xx

We can associate an a prior probability to each of hypothesis as

0 1, ,..., MP H P H P H

Given hypothesis iH ,the observations are determined by the conditional PDF / ( )

iHfX

x

The hypothesis may be about some parameter that determines / ( )

iHfX

x . is chosen

from a parametric space .

Simple and composite hypotheses:

For a simple hypothesis, the parameter is a distinct point in while for a composite hypothesis

the parameter is specified in a region. For example, 0 : 0H is a simple hypothesis while

1 : 0H is a composite one.

Bayesian Decision theory for simple binary hypothesis testing

The decision process D X partitions the observation space in to the region 0 1

n Z Z

such that 0D X H if 0x Z and 1H otherwise

A cost ,ij j iC C H D H x is assigned to each ,j iH D Hx pair. Thus

00 0 0, ( )C C H D H x , and so on 1,0 0 1, C C H D H x .The objective is to minimize

the average risk

,

,

,

Bayesian decison minimize over D , 0,1

Equivalently minimize , over , 0,1

X X x

X x X

j

X x j

C R D E C H D X

E E C H D X

E C H d X f x dx

R D X H j

E C H D X d X H j

We can assign 00 11 10 010 and C 1C C C but cost function need not be symmetric

Likelyhood ratio Test

Suppose 0D X H

1

0

1

0

00 0 01 1

1

1 10 0 11 1

0 11

00 0 01 1 10 0 11

, /

similarly if

, , ( )

The decision rule will be

, , ( ) , , ( )H

H

H

H

X x X xE C H D X X x C P H C P H

D X H

E C H D X X x D X H C P H X x C P H X x

E C H D X X x D X H E C H D X X x D X H

C P H X x C P H X x C P H X x C P

1

0

1

| 010

| 1

| 00

1

00 01 11 10

01 11 1 10 00 0

|

1

|

0

01 11 1 10 00 0 |

since and ,we can simplify

Note that

, 0,1.

we can write

H

H

j

i

H

X HH

H

X H

X HH

X x X x

j X H

j X x

i X H

i

X H

H X x

C C C C

C C P H C C P H

P H f xP H j

P H f x

C C P H f x C C P H f x

f xL X

f x

1

10 00 0

01 11 1

C C P H

C C P H

This decision rule is known as likelihood ratio test LRT .

Errors in decision

The decision space partitions the observation space into two regions: In 0Z , 0H is decided

to be true and in 1Z , 1H is decided to be true. If the decisions are wrong, two errors are

committed:

Type I error=probability false alarm given by

0

1

1 0

/

( ( ) | )

( )

FA

X HZ

P P D X H H

f x dx

Type II error given by

1

0

0 1

/

( ( ) | )

( )

M

X HZ

P P D X H H

f x dx

Error Analysis: Two types of errors Type

0

1

1

0

1 0

11 0 1

0 1

/

/

/

/

/

False alarm probability

Type II error probability:

/

miss detction probability

probability of error

=P P

We observe that

P

( )

I

FA

M

e

F M

F X H

Z

X H

X H

X

P P D x H H

P

P P D x H H

P

P

H P H P

f x

f xf L x dL x L x

f

0( )H x

Similarly,

1

0

1

/

/

0

M X H

Z

X H

P f x dx

f L x dL x

00 0 0 0

10 0 1 0

01 1 0 1

11 1 1 1

00 0 10 0

01 1 11 1

0 1

00

Now Bayesian riskis given by

,

/

/

/

/

1

1

if we substitute 1 , can be wriiten as

F F

M M

R D C EC H D x

C P H P D x H H

C P H P D x H H

C P H P D x H H

C P H P D x H H

C P H P C P H P

C P H P C P H P

P H P H R D

R D C

10 1 11 00 01 11 10 00

1

1

=function of and the threshold

F F M FP C P P H C C C C P C C P

P H

I

00 11 10 01

0 1

0 01 00 0

1 10 11 1

0, 1

The threshold is given by

=

F M

e

C C C C

thenR d isgivenby

P H P P H P

P

LR

P C C P

P C C P

Minimum probability of error criterion :

MinMax Decision Rule

Recall the Baysesian risk function is given by

00 10 1 11 00 01 11 10 001 F F M FR D C P C P P H C C C C P C C P which is function

of 1P H . For a given 1P H we can determine the other parameters in the above expression

using the minimum Baye’s risk criterion.

R D

0

1P H

Suppose the parameters are designed using the Baye’s minimum risk at 1P H p .If 1P H is

now varied,the modified risk curve will be a straight line tangential to the Baye’s risk curve at

, ( )p R D p .The decision will no longer optimal. To overcome this difficulty, Baye’s minimax

criterion is used. According to this criterion, decide by

1

1

Min Max ,

j

R D P H

H P H

Under mild conditions, we can write

1 1

Min Max ( ) Max Min ( )j jH P H P H H

R D R D

Assuming differentabilty, we get

00 10 1 11 00 01 11 10 00

1 1

11 00 01 11 10 00

1 0( ) ( )

0

F F M F

M F

d dR D C P C P P H C C C C P C C P

dP H dP H

C C C C P C C P

The above equation is known as the minimax equation and can be solved to find the threshold.

Example Suppose

0

1

00 11

01 10

: ~ exp(1)

: ~ exp(2)

0

2, 1

H X

H X

C C

C C

Then,

0

1

0

2

1

11 00 01 11 10 00

2

0

2

2

We have to solve the min max

0

Now

1

2

2

1

H

H

x

X H

x

X H

M FA

M F

x

FA

x

M

f e u x

f e u x

L x

equation

C C C C P C C P

P P

P e dx

e

P e dx

e

Substituting FAP and MP in the minimax equation, we can find

Receiver Operating Characteristics

The performance of a test is analyzed in terms of graph showing vs D FAP P . Note that

( )1

1 ( )

D X H

Z

P f x dxη

= ∫ and ( )0

1 ( )

FA X H

Z

P f x dxη

= ∫ where 1

( )Z η is a point corresponding to the

likelihood ratio threshold η and 1

Z represents the region corresponding to the decision of

1H .

= D FAP P

In general, we will like to select aFAP that results in a

DP near the knee of the ROC. If we

increase FAP beyond that value, there is a very small increase in DP . For a continuous DP ,the

ROC has the following properties.

1.ROC is a non-decreasing function of FAP . This is because to increase FAP , we have to

expand 1Z and hence DP will increase.

2.ROC is on or above the line = D FAP P

3.For the likely- hood ratio test, the slope of the roc gives the threshold

Recall that

1

DP

1FAP

( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

1 1

1

0 0

1

0

1

0 0

1

1( )

1

1( )

( )

1 1 1

1

1

1

( )( )

( )( )

( ) ( ) ( )( )

( )

( )

DD X H X H

Z

FAFA X H X H

Z

D X H

Z

DX H X H

D

D

FAFA

dPP f x dx f Z

dZ

dPP f x dx f Z

dZ

P L x f x dx

dPL Z f Z f Z

dZ

dP

dP dZ

dPdP

dZ

η

η

η

ηη

ηη

η η η ηη

ηη

η

= ⇒ = −

= ⇒ = −

=

∴ = − = −

∴ = =

Neyman-Pearson (NP) Hypothesis Testing

The Bayesean approach requires the knowledge of the a priori probabilities 0( )P H and .

Finding these is a problem in many cases. In such cases, the NP hypothesis testing can be

applied.

The NP method maximizes the detection probability while keeping the probability of false

alarm with a limit. The problem can be mathematically written as

( )

maximize

, 0,1

subject to

D

j

FA

P

D x H j

P α

= =

I statistical parlance, α is called the size of the test.

We observe that ROC curve is non-decreasing one. Decreasing FAP will decrease DP also.

Therefore for optimal performance FAP should be kept fixed at α .Hence the modified

optimized problem is

( )

maximize

, 0,1

subject to

D

j

FA

P

D x H j

P α

= =

=

We can solve the problem by the Lagrange multiplier method.

( )( )

( ) ( )

( ) ( )( )

1 0

1 1

1 0

1

, 0,1

Now

j

D FAD x H j

X H X H

Z Z

X H X H

Z

Maximize J P P

J f x dx f x dx

f x f x dx

λ α

λ

αλ

= == − −

= −

= − +

∫ ∫

To maximize we should select 1Z such that

( ) ( )

( )( )

( )

1 0

1

1

00

0

1

0

will give thethreshold in terms of .

can be found out from

H

H

X H X H

X H

X H

X H

Z

f x f x

f x

f x

f x dx

λ

λ

λ

λ α

>

<

− >

=∫

Example

( )

( )

( )( )( )

1

0

1

0

0

1

2 1

2

2 1

2

; 0,1

; 1,1

0.25

H

H

FA

x

X H

X H

x

H X N

H X N

P

f xL x e

f x

e λ

− >

<

=

= =

Taking logarithm,

( )

( )( )

( )( )

1

0

1

0

2

2

11 2ln

2

2 1 2ln

11 2ln

2

10.25

2

0.675

H

H

H

H

x

x

x

e dx

λ

λ

λ η

π

η

>

<

>

<

∞ −

+

⇒ + =

=

⇒ =

Composite Hypothesis Testing

Uncertainty of the parameter under the hypothesis .Suppose

0

1

0 / 0 0

1 / 1 1 1

: ~ ,

: ~ ,

X H

X H

H X f x

H X f x

0θθ

If 0θ 1or θ contains single element it is called a simple hypothesis; otherwise it is called a

composite Hypothesis.

Example:

Suppose

0

1

: ~ N 0,1

: ~ N ,1 , 0

H X

H X

These two hypotheses may represent the absence and presence of dc signal in presence of a

0mean Gaussian noise of known variance 1.

Clearly 0H is a simple and 1H is a composite algorithm. We will consider how to deal

with the decision problems.

Uniformly Most Powerful (UMP) Test

Consider the example

2

2

0

1

12

0

12

0

21

0

: , 0,1,..., 1 , ~ 0,1

: , 0,1,..., 1 , ~ ,1 , 0

Likelyhood ratio

1

2ln ln1

2

2

i

i

i i

i i

xN

i

xN

i

N

ii

H x i N X iid N

H x i N X iid N

eL x

e

nx

If 0 ,then we can have

1

00

H

H

N

ii

T x x

Now we have the modified hypothysis follows :

0

1

: T ~ N 0, N

: t ~ N , N

H X

H DP

21

21

2

x

NFA

v

P e dx

vQ

N

Example

FAP

1

2

1

2

1

2

20

21

2 2

2

2

2

: ~ N 0,

: ~ N 1, , 0

1ln

212

1

2if we take 0, then

1

2

H

H

H

H

H

H

H X

H X

x xL x

x

x

x

With this threshold we can determine the probability of detection DP and the possibility of

false alarm FAP .However, the detection is not optimal any sense. The UMP may not generally

exist. The following result is particularly useful for the UMP test.

Karlin-Rubin Theorem

Suppose 0 0 1 1 0: , :H H

Let T T x be test statistic . If the likelihood ratio

1

0

1

0

L t H

t H

f tt

f t

is a non-decreasing function of t , then the test

1

0

H

H

T maximizes the detection probability DP for a given FAP Thus the test is UMP for a

fixed FAP .

Example:

1 0

0 0

1 1 0

1

0

: ~

: ~

Then x

H X Poi

H X Poi

L x e

is a non-decreasing function of x .

Therefore the threshold test for a given PFA is UMP.

Generalized likelihood ratio test(GLRT):

In this approach to composite hypothesis testing, the non –random parameters are replaced by

their MLE in the decision rule.

Suppose

0

1

0

1

: ~

: ~

X

X

H X f x

H X f x

Then according to GLRT, the models under the two hypotheses are compared. Thus the

decision rule is

11

1

00

0

max

max

H

H

X H

X

f x

f x

This approach also provides the value of the unknown parameter. It is not optimal but works

well in practical situations.

Example:

2

1

2

0

0

1

2

1

2

1

2

1

1

2

1

: 0

: 0

~ ( , )

.

.

1

2

1

2

i

i

i

n

xn

Xi

n x

Xi

H

H

X iid N

x

x

=

x

f x e

f x e

X

The MLE of 0 is given by

2

21

2

2

2

21 1

2

21

2

21

1

1 1

2

11

2

1

1 1

2

1

2

1

2

21

Under GLRT, the likelihood ratio becomes

1

2L x =1

2

1ln L

2

n

i ii

i

n n

i ii i

n

ii

n

ii

n

ii

x xnn

i

n x

i

x xn

x

xn

n

ii

xn

e

e

e

e

e

x xn

2

2

2

1

2

1

2

1

0

2 2

2 2

1 1

2 2 21

Using the GLRT

2 ln

~ ,

~ chi-squre distributed

Particularly, under

1

n

ii

n

ii

n

ii

n n

i ii i

x n

X N n

x

H

x n xn

n X

where 21X is a chi-square random variable with degree 1.

we can find the FAP

Multiple Hypothesis testing:

Decide of 0 1 2 1, , ,...... MH H H H on the basis of observed that

, i 0,1,2,....M 1iP H are assumed to be known, Associate cost ijC associated with the

decision iH and defined by

( , ( ) )ij j iC C H D X H .

The average cost then given by 1 1

0 0

/M M

ij i j ji j

C C P D H H P H

Z

The decision process will partition the observation space nZ � (or a subset of it) into M

subsets 0 1 1, ,..., MZ Z Z

1

0

1

0

1

1

j jM

i

ii j

j

i j X H X H

ZZ

M

X Hjj i

P D H H f x dx f x dx

f x dx

We can write

1 1 1

0 0 1j

i

M M M

ii i j ij jj X Hi i jZ

j i

C C P H P H C C f x

Minimization is achieved by placing x in the region such that the above integral is

minimum. Choose the region of x corresponding to the minimum value of

1

1j

M

i j ij jj X Hjj i

C x P H C C f x

Thus decision rule based on minimizing over iH

0 0

1

0

1

0

1

0

, 0,1,2,..., 1

1, , 0

then

j

j

MX Hi

i j ij jjjX H X Hj i

M

i ij jj jj

ij jj

M

i j X H jjj i

f xC xJ x P H C C

f x f x

P H C C L x i M

C i j C

J x P H f x H

The above minimization corresponds to the minimization of the probability of error.

Z0 Z1

. . :

ZM-1

Rewriting iJ x we get,

1

0

1

M

i j Xjj i

i X

J x P H x f x

P H x f x

and arrive at MAP criterion

If the hypotheses are equally likely i.e 0 1 1...... MP H P H P H P then we can write

1

0j i

M

i X H X X Hjj i

J x Pf x f x Pf x

Therefore the minimization is equivalent to the minimization of the likelihood iX Hf x .

Example

12

20

1 12 2

2 20 0

0

1

2

1

2

1 12

2 2

, 0,1,...n iid Gaussian

: 1

:

:

1

2

1.

2

n

ii

i

n n

i ii i

i

x

X H n

x x

n

X i

H

H

H

f x e

e e

We have to decide on the basis

1 12

0 0

2

2

0

1

2

0

1

2

2

: 2 1

: 4 4

: 2 1

3: 0

23

:2

: 0

n n

ii i

x

T x x

H T x x

H T x x

H T x x

H x

H x

H x

Sequential Detection and Wald’s test

In many applications of decision making observations are sequential in nature and the

decision can be made sequentially on the basis of available data.

We discuss the simple case of sequential binary hypothesis testing by modification of the NP

test.This test is called the sequential probability ratio test(SPRT) or the Wald’s test.

In NP test, we had only one threshold for the likelihood ratio L x given by

1

0

H

H

L x

The threshold is determined from the given level of significance

In SPRT, L x is computed recursively and two thresholds 0 1 and are used. The simple

decision rule is:

If 1 1, decide L Hx

If 0 0, decide L Hx

If 0 1L x wait for the next sample to decide.

The algorithm stops when we get 1 0 or L x L x .

Consider iX to be iid.

1

0

1

00

1 2

1 2

0 1 2

1

1

1

1

1

Then for .....

, ,....,

, ,....,

i i

i

ii

i

n

n

X H n

X H n

n

X H iin

X H ii

n

X H iX H ni

nX H n

X H ii

n n

X X X

L L

f x x x

f x x x

f x

f x

f xf x

f xf x

L L x

n

n

X

X

In terms of logarithm

1ln ln lnn n nL L L x

Suppose the test requirement is

1 and M D FAP P P

We have to fix 0 1 and on the basis of and .

Relation between 0 1, and ,

We have

1

1

0

1

0

1

1

0

0

0

0

0

1

1

1

1

0

0

0

0

1

1

1

Similarly

1

1

1

D X H

Z

X H

Z

X H

Z

M X H

Z

X H

Z

X H

Z

P f x dx

L x f x dx

f x dx

P f x dx

L x f x dx

f x dx

Average stopping timefor the SPRT is optimal in the sense that for a given error levels no

test can perform better than the SPRT with the average number of samples less than that

required for the test. We may take the conservative values of 0 1 and as

1

1

and 0 1

Example Suppose

20

21

~ ~ 0,

~ ~ ,

H X N

H X N

and and

2

2

1

2

1

1

2

1

i

i

xn

i

xn

i

eL x

e

We can compute 1

1

and 0 1

and design the SPRT.