information geometry of statistical...
TRANSCRIPT
Information Geometry of Statistical Inference
Statistical model M = {p(x, )}Observed data D = {x , x , …, x } iid
Estimation
Hypothesis testing
Exponential family :
, exp
= exp{N( }
1 observed point =
i
i
p D N
N
x
x
x x x
E x : negentropy
Mle (maximum likelihood estimator)
Estimation error
Cramer‐Rao bound
curved exponential family:
, expp D u u u x
: estimator
u
ˆ x
1, 2( , ) ( , ( )) ,... np x u p x u x x x
( , ) exp{ ( )}p x x
1ˆ( ,..., )nu x x
Ancillary family
Estimator ‐‐‐ ˆ ( )u f
Ancillary family ( )A
Mathematical Analysis of Error(u, v)‐coordinates along A(u)
( , )
( , )u v
u v
u e e
Consistent
Efficient
Higher‐order efficient
Mle is consistent and efficient
Efficient estimator ‐‐‐ orthogonal projection
High-Order AsymptoticsHigh-Order Asymptotics
1
1
, (u) : , ,
u u , ,n
n
p x x x
x x
ˆ ˆ Te E u u u u
1 22
1 1e G Gn n
11G G :Cramér-Rao: linear theory
2 2 2
2e m mM AG H H
:
u
ˆ x
quadratic approximation
Hypothesis Testing
Neyman‐Scott Problem
Estimation with nuisance parameter{ ( , , )}M p x
Efficient score
Neyman‐Scott problem
1 1
2 2
{ ( , , )} ( , , ) ( , , )
( , , )N N
M p xx p xx p x
x p x
u: parameter of interest
v: nuisance parameter
Semiparametric Statistical ModelSemiparametric Statistical Model
, ,
( )
M p x Z
Z
y x
'i i i
i i i
yx
mle, least square, total least square
, ; , , ; ,p x y Z p x y Z d
x
y
linear relation ( , )x yx
Statistical Model
2 2
1
1 1, , exp2 2
, , : , , ,
, , , ,
i i i n
p x y c x y
p x y
p x y Z p x y Z d
semiparametric
Least squares?
2
2ˆmin :
1 ,
0
Neyman-Sc
ml
ott
e, TLS
i ii i
i
ii
i i
i i i i
x yL y x
x
yyn x x
y x y x
'
1 2
,
,
, , , ,
, , 0
, 0
ˆ, 0
Z
Z
i
x x p x Z
f x E f x
E f x
f x
Estim ating function
Estim ati
Sem iparam etric statistical m
ng equa
ode
tion
l
estimating function
, unbiase, 0 : dZE f x
1
ˆ ˆ, 0 : = +n
iif x e
22
2
1ˆ E fE
n E f
Fiber Bundle
, ; , log
, ; , log
u x y Z p
v x y Z pZ
{ , , }p x y Z
Z
Parallel Transport , , 0Z ZT r x E r x
1 2 1 2,r r E r x r x
,z
zzr x r x E r x
e
, ,( , , )
z
z
p x zr x r x
p x z
m
1 2 1 2, ,z z
z z zr r r r
e m
Z
,p x ,ZT
Estimating Function ,f x
I N AT T T T
, , : optimal estimating function Iu x z
,var : , 0
,
: , 0
, 0
Z
z
z
z
z
e in iant E f x
f x f
m orthogonality v f
v f
m
e
Example of estimating functions
, ; :
, , , ;
f x y k x y y x s x y
p x y Z f x y Z dxdyd
0,0,
0i i i ik x y y x
, ; , log
, ; ,
, ;
, ,I
u x y Z p
sE s
v x y Z E f s
k s x y
u x y u E u s
k x y y x
2
2
,
, ,I
z N
u x Z x y c y x
c
2
22 2 2
, ;
1
1
i
i
f x c x y c y x
c
xn
xn
2
2 22
2 2
22
2
, ; 0
, ;
1
2
21 30 : :4 4
1 1 21: 1 :32
1: :1
i if x y
f x y x y c y x
c
c Vn
c Vn
c vn
Poisson process
Poisson Process: Instantaneous firing rate is constant over time.dt
For every small time window dt, generate a spike with probability ξdt.
T
Cortical Neuron Poisson Process
T
Poisson process cannot explain inter-spike interval distributions.
TeTp )((Softky & Koch, 1993)
Gamma distribution
Gamma Distribution: Every κ-th spike of the Poisson process is left.ξ: Firing rateκ: Irregularity
.T)(
)(),q(T; T1
e {Two parameters
κ=1 (Poisson)
κ=3
T
Gamma distribution
1 expf T T T
1
Integrate-and fire
Markov model
: Poisson
: regular
Irregularity κ is unique to individual neurons.
t
t
t
Regular(largeκ)
Irregular(smallκ)Irregularity varies among neurons.
(Baker & Lemon 2000; Shinomoto et.al., 2003)
We assume that κ is independent of time.
estimating function
•Estimating function f(T,κ):
)(),(log),(
),(log),(
ξkkT;κpkT;κv
dκkT;κpdkT;κu
How to obtain an estimating function y:
•Maximum likelihood Method:
u vf u vv v
Score functions
E[f(T,κ)]=01
( ; ) 0N
ll
f T
0);()()(log
11
N
llN TuTpTp
dd
Estimation of κ by estimating functions
1. No estimating function exists if the neighboring firing rates are different.2. m(≧2) consecutive observations must have the same firing rate.
Estimating function: (E[f]=0)
Example: m=2
t
1 22
1 2
( , ) log 2 (2 ) 2 ( )TTf T φ κ φ κT T
ξl-th set:
0)(2)2(2log11
2
21
21
κφκφTT
TTN
yN
lll
ll
0
2121 )(),;(),;(),;,( dkTqTqkTTp
)( 21ll ,TT
Model:
em‐algorithm EM‐algorithm
EM algorithm
hidden variables
, ;p x y u
1, , ND x x
, ;M p x y u
,M DD p p p x y x x
ˆmin , :KL p p M x y m-projection to M
De-projection to ˆmin : , ;KL p D p x y u