sufficient statistics. the poisson and the exponential can be summarized by (n, )

44
Sufficient statistics. The Poisson and the exponential can be summarized by (n, ). So too can the normal with known variance Consider a statistic S(Y) Suppose that the conditional distribution of Y given S does not depend on , then S is a sufficient statistic for based on Y Occurs iff the density of Y factors into a function of s(y) and and a function of y that doesn't depend on y More Chapter 4

Upload: bluma

Post on 11-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

More Chapter 4. Sufficient statistics. The Poisson and the exponential can be summarized by (n, ). So too can the normal with known variance Consider a statistic S(Y) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Sufficient statistics.

The Poisson and the exponential can be summarized by (n, ).

So too can the normal with known variance

Consider a statistic S(Y)

Suppose that the conditional distribution of Y given S does not depend on , then S is a sufficient statistic for based on Y

Occurs iff the density of Y factors into a function of s(y) and and a function of y that doesn't depend on

y

More Chapter 4

Page 2: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example. Exponential

IExp() ~ Y

E(Y) = Var(Y) = 2

Data y1,...,yn

L() = -1 exp(-yj /)

l() = -nlog() - yj /

yj /n is sufficient

Page 3: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

2222

2

322

2

2

2)ˆ(

2)(

m.l.e. ˆ

1

0)ˆ( .

yn

yn

ynl

nnl

y

ynl

UequationLikelihood

maximum

Page 4: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

0})(

2{)}({

)}({)(

.

2)(

.

2000

0

2

32

ynn

EUE

n

JEI

nInformatioFisher

ynn

J

nInformatioObserved

=

Page 5: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

))(

,(~ˆ

distin )()(

)()(

probin 1)2

()()()(

)/()}({

)/()()}({

200

0

20

002/10

3

0

2

0

1

2

0

010

200

0

2000

0

nN

Zyn

nUI

ynnnn

JI

nJE

nIUVar

Page 6: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Approximate 100(1-2 )% CI for 0

2

2/1

/)ˆ()ˆ(

)ˆI(insert Could

)ˆ(ˆ

ynJIHere

Jz

Example. spring data

8.34)(168.26,16 )0188(.96.130.168

000353.3.168/10/)ˆ()ˆ(

3.168ˆ

22

ynJI

cyclesky

Page 7: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Weibull.

)/log()/(/log/

)/(/),(

log)1(loglog),(

)(exp),;()(

1 if lExponentia

0,, ,)(exp),;(

1

1

1

jjj

j

jj

j

jn

n

yyyn

ynU

yynnl

yyyfL

yyy

yf

Page 8: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Note.

),ˆ(),(max

problemmax D-1

)(ˆ /11

lll

yn

profile

j

Expected information

large Want

/)(log)(

/)(2)(2)'(1 (2)/-(2)/- )/(

),(22

2

I

dzzdz

nI

Page 9: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Gamma.

sufficient ),log(

log)1()(loglog)(

)exp()(

)(

0,, ),exp()(

),;(

1

1

jj

jj

jjn

n

yyS

yynnl

yyL

yyy

yf

Page 10: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example. Bernoulli

Pr{Y = 1} = 1 - Pr{Y = 0} = 0 1

L() = ^yi (1 - )^(1-yi)

= r(1 - )n-r

l() = rlog() + (n-r)log(1-)

r = yj

R = Yj is sufficient for , as is R/n

L() factors into a function of r and a constant

Page 11: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Score vector

[ yj / - (n-yj )/(1-)]

Observed information

[yj /2 + (n-yj )/(1-)2 ]

ny j /ˆ

M.l.e.

Page 12: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Cauchy.

ICau()

f(y;) = 1/(1+(y-)2 )

E|Y| = Var(Y) =

L() = 1/((1+(yj -)2 )

Many local maxima

l() = -log(1+(yj -)2 )

J() = 2((1-(yj -)2 )/(1+(yj -)2 )2 I() = n/2

sufficient is ,....y

N(0,1) closer to is

)ˆ()ˆ( Z)ˆ()ˆ(

)((1)

0

2/1

J0

2/1

n

J

I

y

Z

JIZ

Page 13: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )
Page 14: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Uniform.

f(u;) = 1/ 0 < u <

= 0 otherwise

L() = 1/n 0 < y1 ,..., yn <

= 0 otherwise

0ˆ//)ˆ(

0/)ˆ(

,...y y

)max(ˆ

222

1(n)

nddl

ddl

y

y

n

j

Page 15: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

l() becomes increasingly spikey

E u() = -1 i() = -

ondistributiin lExponentia)ˆ(

1

0 )/(

0 }ˆPr{

n

a

aa

aan

Page 16: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Logistic regression. Challenger data Ibinomials Rj , mj , j

)21()ˆ)(ˆ,ˆ()ˆ(

region Confidence

),(

statistic Sufficient

))exp(1(

)exp(

)!(!

!

),;Pr(),(

})exp{1/(}exp{

2001000

1

110

110

1010

110110

cJ

xRRS

x

xrr

rmr

m

rRL

xx

T

jjj

m

j

jjj

jjj

j

jj

jjj

j

Page 17: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )
Page 18: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Likelihood ratio.

Model includes dim() = p

true (unknown) value 0

Likelihood ratio statistic

)( ason distributiin )(

)}()ˆ({2)(

020

00

IW

llW

p

Page 19: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Justification.

Multinormal result

If Y ~ N (,) then (Y- )T -1(Y- ) ~ p2

)ˆ)(ˆ()ˆ(

)ˆ)(ˆ()ˆ(

)ˆ()ˆ(

)ˆ(21

)ˆ(

)ˆ()ˆ()ˆ({2

)}()ˆ({2)(

00

00

02

00

00

I

J

llll

llW

T

T

T

TT

Page 20: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Uses.

Pr[W(0) cp(1-2 )] 1-2

)}21(21

)ˆ()(:{

)21( )(

p

p

cll

cW

Approx 100(1-2 )% confidence region

Page 21: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example. exponential

84.3}/log1/{2:{

84.3)95.0( 1

}/log1/{2)}()ˆ({2

log)ˆ(

/log)(

1

000

yyn

cp

yynll

nynl

ynnl

Spring data: 96 < <335

vs. asymp normal approx 64 < <273 kcycles

Page 22: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Prob-value/P-value. See (7.28)

Choose T whose large values cast doubt on H0

Pr0(T tobs)

Example. Spring data

Exponential E(Y) =

H0: = 100?

.071.0368*2

)802.1|Pr(|)248.3(Pr

248.3)100(

}/log1/{2)(

/)ˆ()ˆ(

10n 3.168ˆ

2

10

2

ZvalueP

W

yynW

ynJI

y

Page 23: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Nesting

: p by 1 parameter of interest

: q by 1 nuisance parameter

Model with params (0, ) nested within (, )

Second model reduces to first when = 0

)ˆ,()ˆ,ˆ(

Note.

0

0

ll

Page 24: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example. Weibull

params (,)

exponential when = 1

How to examine H0 : = 1?

1p on,distributiin

)]ˆ,()ˆ,ˆ([2)(

2

p

000

llWp

Page 25: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Spring failure times. Weibull

07-5.73E

07)-2(2.867E

)00.5|(|

02.25)]1,168()6,181([2

26.61)1,168( 2749.2)1,168(

)1,168()1,( 75.48)6,181(

227.6)6,181( )6,181()ˆ,ˆ(

1

ZPvalueP

ll

lEL

l

EL

Page 26: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Challenger data. Logistic regression

temperature x1 pressure x2

(0 , 1 , 2 ) = exp{}/(1+exp{})

= 0 + 1 x1 + 2 x2 linear predictor

loglike l(0 , 1 , 2 ) =

0 rj + 1 rj x1j + 2 rj x2j - m log(1+exp{j })

Does pressure matter?

214.)107(.2)24.1|Pr(|

)54.1(Pr :

54.177.*2

05.15),,(max

82.15)0,,(max

2

10

210,,

10,

210

10

Z

valueP

l

l

Page 27: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Model fit.

Are labor times Weibull?

Nest its model in a more general one

Generalized gamma.

0,,, ),exp()(

),,;(1

yyy

yf

Gamma for =1

Weibull for =1

Exponential for ==1

Page 28: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Likelihood results.

max log likelihood:

generalized gamma -250.65

gamma -251.12

Weibull -251.17

gamma vs. generalized gamma

- 2 log like diff:

2(-250.65+251.12) = .94

P-value Pr0 (12 > .94)

= Pr(|Z|>.969)

= 2(.166) = .332

Page 29: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Chi-squared statistics. Pearson's chi-squared

categories 1,...,k

count of cases in category i: Yi

Pr(case in i) = i 0 < i < 1 1k i =1

E(Yi ) = ni

var(Yi ) = i (1 - i )n

cov(Yi ,Yj ) = -i j n i j

E.g. k=2 case cov(Y,n-Y) = -var(Y) = -n1 2

= { (1 ,...,k ): 1k i = 1, 0<1 ,...,k <1}

dimension k-1

Page 30: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Reduced dimension possible?

model i () dim() = p

log like general model:

1k-1 yi log i + yk log[1-1 -...-k-1], 1

k yi = n

nYii /ˆ

log like restricted model:

l() = 1k-1 yi log i() + yk log[1-1()-...-k-1()]

Page 31: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

likelihood ratio statistic:

k

pkiiiy1

2

1~)ˆ(/ˆlog2

if restricted model true

The statistic is sometimes written

W = 2 Oi log(Oi /Ei )

(Oi - Ei )2/Ei

)ˆ(E where i iii nyO

Page 32: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Pearson's chi-squared.

5)ˆ(ntion recommenda

~

)ˆ(/)]ˆ([2

p-1-k

12

i

kiii nnyP

Page 33: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example. Birth data. Poisson?

12.9ˆ 92n arrivalsDaily

Split into k=13 categories

[0,7.5), [7.5,8.5),...[18.5,24] hours

O(bserved) 6 3 3 8 ...

E(xpected) 5.23 4.37 6.26 8.08 ...

P = 4.39

P-value Pr(112 > 4.39) = .96

Page 34: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Two way contingency table.

r rows and c columns

n individuals

Blood groups A, B, AB, O

A, B antigens - substance causing body to produce antibodies

2

2

2

)1)(1(202

26

2)1(35

2)1(179

O

BA

OBB

OAA

O

AB

B

A

group count model I model II

O = 1 - A - B

Page 35: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )
Page 36: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Question. Rows and columns independent?

W = 2 yij log nyij / yi.y.j

with yi. = j yij

~ k-1-p2 = (r-1)c-1)

2

with k=rc p=(r-1)+(c-1)

P = (yij - yi. y.j /n)2 / (yi. y.j /n)

~ (r-1)(c-1)2

Page 37: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Model 1

W = 17.66

Pr(12> 17.66) = Pr(|Z| > 4.202) = 2.646E-05

P = 15.73

Pr(12> 15.73) = Pr(|Z| > 3.966) = 7.309E-05

k-1-p = 4-1-2 = 1

Model 2

W = 3.17

Pr(|Z| > 1.780) = .075

P = 2.82

Pr(|Z|>1.679) = .093

Page 38: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Incorrect model.

True model g(y), fit f(y;)

valuebad"least " :

0D 0);(

yprobabilitin )()f(y; log /)ˆ(

ydiscrepancLiebler -Kullback

)(})f(y;

g(y)log{ );( minimizes

);( log )( maximizes ˆ

g

g

g

j

ffD

dyygnl

dyyggfD

yfl

Page 39: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example 1. Quadratic, fit linear

Page 40: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example 2. True lognormal, but fit exponential

dyyg

nYY

y

ZY

gg

g

)(})f(y;

g(y)log{ Minimizing

/)1(var }2/exp{YE ˆ

}2/exp{

/log :likelog

}exp{ Lognormal

222

2

Page 41: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Large sample distribution.

);()( ifresult mle

))()()(;( ~ ˆ 11

yfyg

IKIN ggggp

Page 42: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Model selection.

Various models:

non-nested

Ockham's razor.

Prefer the simplest model

Page 43: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Formal criteria.

}log)ˆ({2

})ˆ({2

nplBIC

plAIC

Look for minimum

Page 44: Sufficient statistics.    The Poisson and the exponential can be summarized by (n,  )

Example. Spring failure

Model p AIC BIC

M1 12 744.8* 769.9*

M2 7 771.8 786.5

M3 2 827.8 831.2

M4 2 925.1 929.3

6 stress levels

M1: Weibull - unconnected , at each stress level