information geometry...outline information geometry historical remarks treasure example information...
TRANSCRIPT
![Page 1: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/1.jpg)
Information Geometry
Shinto Eguchi
The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected] Url: http//www.ism.ac.jp/~eguchi/
2013年統計数理研究所夏期大学院統計数理研究所 セミナー室 1 9月26日 14時40分~17時50分
ISM Summer School 2013
![Page 2: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/2.jpg)
Outline
Information geometry
historical remarks
treasure example
Information divergence class
robust statistical methods
robust ICA
robust kernel PCA
2
![Page 3: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/3.jpg)
geometry
learning
statistics
quantum
1982 paperS. Amari
1975 paperB. Efron
P. Dawidcomment
Historical comment
1984Workshop
RSS150
C.R.Rao1945
R.A.Fisher1922
3
![Page 4: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/4.jpg)
Dual Riemannian Geometry
Dual Riemannian geometry gives reformulation for Fisher’s foundation
Cf. Einstein field equations (1916)
Information Geometery aims to geometrizeA dualistic structure between modeling and estimation
Estimation is projection of data onto model.
“Estimation is an action by projection and model is an object to be projected”
The interplay of action and object is elucidated by a geometric structure.
Cf. Erlangen program (Klein, 1872)http://en.wikipedia.org/wiki/Erlangen_program
4
![Page 5: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/5.jpg)
2 x 2 table
The space of all 2×2 tables associates with a regular tetrahedron
We know the ruled surface is exponential geodesic, butdoes anyone know the minimality of the ruled surface?
The space of all independent 2×2 tables associates with a ruled surfaceIn the regular tetrahedron
5
![Page 6: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/6.jpg)
regular tetrahedron
P
Q
S
R
0 10 0
0 00 1 0 0
1 0
1 00 0
1/4 1/4
1/4 1/4
0 p0 1p
1p 0 p 0
6
![Page 7: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/7.jpg)
regular tetrahedron
P
Q
S
R
0 1/30 2/3
1/3 0 2/3 0
1/3 2/3 0 0
0 01/3 2/3
+
=
32
32
31
32
32
31
31
31
31
32
1/3 0 2/3 0
0 1/30 2/3
+31
32=1/3 2/3
0 0
0 01/3 2/3
7
![Page 8: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/8.jpg)
P
Q
S
R
P
Q
S
R
Ruled surface
8
![Page 9: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/9.jpg)
.
1 00 0
p q p (1-q)(1-p)q (1-p) (1-q)
0 00 1
0 10 0
hgfeyzxwn
22 )(
0)1()1(
})1)(1()1)(1({ 2
qqppqpqpqppqn
0 01 0
A B
C x y e
D z w f
g h n wzyxnwyhzxgwzfyxe
,,,
A B
C pq p(1q) P
D (1p)q (1p)(1q) 1p
q 1q 1
![Page 10: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/10.jpg)
e-geodesical
-10
0
10 -5
0
5
-5
0
5
-10
0
10
-5
0
5
-10
0
10-10
-5
0
5
10
-5
0
5
-10
0
10
}){( 10,10,1
log,1
log,)1)(1(
log
qpq
qp
pqp
qp }{ RI,RI),,,( yxyxyx
2112112222
21
22
12
22
11211211 1wherelog,log,log),,( )(
3S 3RI
10
![Page 11: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/11.jpg)
Two parametrization
-10
0
10 -5
0
5
-5
0
5
-10
0
10
-5
0
5
)( )1(),1(, qpqpqp
)( ,qp
)(1
log,1
log,)1)(1(
logq
qp
pqp
qp
11
![Page 12: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/12.jpg)
Kullback-Leibler Divergence
p
q
r
Let p(x) and q(x) be probability density functions.
Then Kullback-Leibler Divergence is defined by
12
![Page 13: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/13.jpg)
Two one-parameter familiesLet be the space of all pdfs on a data space.
ここで
p
q
r
m-geodesic
e-geodesic
13
![Page 14: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/14.jpg)
Pythagoras theorem Amari-Nagaoka (1982)
p
q
r
14
![Page 15: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/15.jpg)
Proof
15
![Page 16: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/16.jpg)
ABC in differential geometry
Linear connection defines parallelism along a vector field.
Geodesic = a minimal arc between x and y
Riemannian metric defines an inner product on any tangent space of a manifold.
RI)()(:),( MMggM XX
1
0))(),((argmin
10})1(,)0(:)({
dttxtxgtyxxxtxc
)()()(: MMM XXX
))(,),(()()()2(
,)1(
MYXMfYfXYfYf
YfY
XX
XXf
XF
d
kk
kjijX XX
i1
ComponetwiseCf. slide 41
16
![Page 17: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/17.jpg)
Geodesic
A one-parameter family (curve) is called geodesic }:)({ ttC θ
),...,1(0)()())(()(1 1
pitttt kkp
k
p
j
ikj
i
θ
with respect to a linear connection },,1:)({ pkjiikj θ
Remark 2:
If , any geodesic is a line.),,1(0)( pkjiikj θ
The property is not invariant with parametrization.
The gedesic C is expressed by a transform rule from parameter to
),...,1(0)()())((~)(1 1
patttt cbp
c
p
b
abc
a
ω
,)()(~11 1 1
p
ac
iba
i
p
k
p
j
p
i
ikj
ai
jb
kc
abc
BBBBB
θω )(for, θaaa
iiai
aai tBB
speed of acceleration
Newton's law of inertia
17
![Page 18: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/18.jpg)
Change rule on connections
Let us consider a transform from parameter to . Then a geodesic C is
written by . Hence we have the following:
),...,1(0)()())((~)(1 1
patttt cbp
c
p
b
abc
a
ω
,)()(~11 1 1
p
ac
iba
i
p
k
p
j
p
i
ikj
ai
jb
kc
abc
BBBBB
θω )(for)( 1θaa
a
iiaB
}:))(()({ ttt θω
),...,1()())(()(1
pattBtp
i
iai
a
θ
),...,1()()())(()())(()(1 11
patttBttBtp
j
p
i
jij
ai
p
i
iai
a
θθ
i
aaiB
18
![Page 19: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/19.jpg)
What are reference books?
Foundations of Differential Geometry (Wiley Classics Library) Shoshichi Kobayashi, Katsumi Nomizu
Methods of Information Geometry Shun-Ichi Amari , Hiroshi NagaokaAmer Mathematical Society (2001)
Differential Geometrical Methods in StatisticsShun-Ichi Amari 1985 年 Springer
19
![Page 20: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/20.jpg)
Statistical model and information
Statistical model }:),()({ θθxxθ ppM
Score vector ),(log),( θxθxsθ
p
Space of score vector }RI:),({ T psT αθαθ
Fisher information metric ),(}{),( θθθ TvuuvEvug
Fisher information matrix }),(),({ Tθxθxθθ ssEI
)RI( p
20
![Page 21: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/21.jpg)
Score vector space
Note 1:
0),(),(
),(),(),(),()(T
T
xθxθxsα
xθxθxsαxθxθxθθ
dp
dpdpuuETu
βαβxθxθxsθxsα
xθxθxθx
θ
θθ
Idp
dpvuvugTvuTTT }),(),(),({
),(),(),(),(,
})),((,0)),((:),({ θxθxθx θθθ tVtEtS
The space of all random variables with mean 0 and finite variance
Note 2: is an infinite dimensional vector space that includesθS θT
),(),(For θxsθx Tu
,....2,1},{ ),(...
),(...
},),({),(11
kuEuuEuiki
k
iki
kkk θxθxθxθx θθ
belong to θS Cf. Tiatis (2006) 21
![Page 22: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/22.jpg)
Linear connections
),,1()( }{ '1'
'e
pkjissEg ijk
p
i
iiikj
θθ
),,1()( }]{}{[ ''1'
'm
pkjisssEssEg ijkijk
p
i
iiikj
θθθ
e-connection
m-connection
m-geodesic
e-geodesic
Score vector piis 1)),((),( θxθxs
22
![Page 23: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/23.jpg)
Semiparamtric thoery
Semiparametric Theory and Missing Data (Springer Series in Statistics) Anastasios Tsiatis
1 Introduction to Semiparametric Models
2 Hilbert Space for Random Vectors
3 The Geometry of Influence Functions
4 Semiparametric Models
5 Other Examples of Semiparametric Models
6 Models and Methods for Missing Data
7 Missing and Coarsening at Random for Semiparametric Models
8 The Nuisance Tangent Space and Its Orthogonal Complement
9 …14.
23
![Page 24: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/24.jpg)
Geodesical models
Definition
))1,0(()()()(),( 1 tMqpcMqp ttt xxxx
(i) Statistical model M is said to be e-geodesical if
(ii) Statistical model M is said to be m-geodesic if
))1,0(()()()1()(),( tMqtptMqp xxxx
Note : Let P be a space of all probability density functions.
xxx dqpc ttt )()(/1 1where
By definition P is e-geodesical and m-geodesical.
However the theoretical framework for P is not perfectly complete.
Cf. Pistone and Sempi (1995, AS)24
![Page 25: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/25.jpg)
Two type of modeling
})};()(exp{)(),({ 0)e( θθxtθxθx TppM
})}({)}({:)({0
)m( xtxtx pp EEpM
Exponential model
Matched mean model
})(:RI{ θθ p
Let p0(x) be a pdf and t (x) a p-dimensional statistic.
})};()()(logexp{)(),({
1 00
)e(
θθxxxθx
I
i
ii p
pppM
}:)()()1(),({1
01
)m(
I
iii
I
ii pppM xxθxMixture model
})(:RI{ θθ p
Let be a set of pdfs.},...,1,0:)({ Iipi x
)},...,1(0,10:),...,({1
1 Iii
I
iiI
θ
Exponential model
25
![Page 26: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/26.jpg)
Statistical functional
A functional f (p) is called a statistical functional if p(x) is a pdf. (Hampel, 1974)
A statistical functional f (p) is said to be Fisher-consistent for a model
,)()1( pf
)()()2( θθf θp
if f (p) satisfies that}:)({ θxθpM
For a normal model }RIRI),(},2
)(exp{2
1)({ 22
2
2
θxθ
xpM
Tdxxxpdxxdxxxpp )))((,)(()( 22 f
is Fisher-consistent for = (, 2 ).
Example.
26
![Page 27: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/27.jpg)
Transversality
Let f (p) be Fisher consistent functional.
which is called a leaf transverse to M.
,})(:{)( θffθ ppL
}{)()1( θθ f pML
)()2( fθL
is a local neighborhood.
)( fθL
)(* fθL
θp
*θp
M
27
![Page 28: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/28.jpg)
Foliation structure
)( fP θL
))((),(),( fP θθθθLTvMTuTt ppp
),(),(),(such that θxθxθx vut
))(()()( fP θθθθLTMTT ppp
)( fθL
M
θp
),( θxt ),( θxu
),( θxv
Foliation
Decomposition of tanget spaces
Statistical model }:),()({ θθxxθ ppM )RI( p
28
![Page 29: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/29.jpg)
Transversality for MLE
})};()(exp{)(),({ 0)e( θθxtθxθx TppMExponential model
Hence the foliation associated with fML is a matched mean model.
})(:RI{ θθ p
For the exponential model the MLE functional)e(M
)},({logmaxarg:)(ML θxfθ
pEp p
}{ )}({)}({arg)( ),(ML xtxtf θθ
pp EEp
is written by
})}({)}({:{)( ),(ML xtxtf θθ pp EEpL
Let p0(x) be a pdf and t (x) a p-dimensional statistic.
29
![Page 30: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/30.jpg)
Maximum likelihood foliation
)( MLfθL
θp
)e(M
1p 2p
3p
2r
1r
30
![Page 31: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/31.jpg)
Estimating function
),( θxu
Statistical model }:),()({ θθxxθ ppM )RI( p
p-variable function is unbiased
)(0)}),({(det,}),({ ),(),(
θ
θθxuθxu θθ pp EE 0
def
The statistical functional }),({solvearg)( 0
θxufθ
pEp
is Fisher-consitent, and the leaf transverse to M
}),(:{)( 0 θxufθ pEpL
is m-geodesical.)( fθL
31
![Page 32: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/32.jpg)
Minimum divergence geometry
D : M×M → R is an information divergence on a statistical model M
(i) D(p,q) ≧ 0 with equality if and only if p = q
(ii) D is a differentialble on M×M
Then we get a Riemannian metric and dual connections on M (Eguchi, 1983,1992)
Let
32
![Page 33: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/33.jpg)
is a Riemann metric
Remarks
Cf. Slide 21 33
![Page 34: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/34.jpg)
U cross-entropy
U is a convex function,
U entropy
U divergence
U divergence
34
![Page 35: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/35.jpg)
35
Example of U divergence
KL divergence
Beta (power) divergence
Note
35
![Page 36: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/36.jpg)
36
Geometric formula with DU
36
![Page 37: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/37.jpg)
37
Triangle with DU
p
q
r
mixture geodesic
U geodesic
37
![Page 38: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/38.jpg)
Information divergence class
and
robust statistical methods
38
![Page 39: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/39.jpg)
Light and shadow of MLE
1.Invariance under data-transformations
2.Asymptotic efficiency
1. Non-robust
Sufficiency and efficiency
log exp
u
2. Over-fitting
Log-likelihood on exponential family
Likelihood method
U-method
39
![Page 40: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/40.jpg)
Max U-entropy distribution
Equal mean space
Let us fix a statistics t (x).
Euler-Lagrange’s 1st variation
U-model
40
![Page 41: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/41.jpg)
U-estimate
U-loss function
U-estimate
U-estimating function
U-empirical loss function
Let p(x) be a data density function with statistical model
41
![Page 42: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/42.jpg)
-minimax
Equal mean space
Let us fix a statistics t (x).
42
![Page 43: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/43.jpg)
U-estimator for U-model
U-model
mean parameterU-estimator of
We observe that
which implies
Furthermore,
43
![Page 44: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/44.jpg)
Influence function
Statistical functional
})),((()()),(({minarg)(
dxxpUxdGxpGTU
0
)(),(IF
GTxT
xFG )1(
)}],(),({),(),()[(),(IF 1 xSxwExSxwJxT UU
Influence function
||),(IF||sup)(GES xTT Ux
U 44
![Page 45: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/45.jpg)
Efficiency
Asymptotic variance
))()()(,0()ˆ( 11 UUUD
U JHJNn
)),(),(()(
),),(),(),(()(
XSXwVarH
XSXSXwEJ
U
TU
111 )()()()( UUU JHJI
Information inequality
where
( equality holds iff U(x) = exp(x) )
45
![Page 46: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/46.jpg)
Normal mean
-6 -4 -2 2 4 6
-6
-4
-2
2
4
6
0.125
0.1
0.075
0.05
0.01
0
-6 -4 -2 2 4 6
-6
-4
-2
2
4
6
2.5
0.8
0.4
0.15
0.015
0
β-power estimates η-sigmoid estimates
Influence functionβ= 0
= 0.015= 0.15= 0.4= 0.8= 2.5
η = 0= 0.01= 0.05= 0.075= 0.1= 0.125
tttU )exp()( 46
![Page 47: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/47.jpg)
Gross Error Sensitivity
β-power estimates η-sigmoid estimates
β 効率 GES η 効率 GES0 1 ∞ 0 1 ∞
0.01 0.97 6.16 0.015 0.972 1.90.05 0.861 2.92 0.15 0.873 1.040.075 0.799 2.47 0.4 0.802 0.6780.1 0.742 2.21 0.8 0.753 0.455
0.125 0.689 2.04 2.5 0.694 0.197
47
![Page 48: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/48.jpg)
Multivariate normal
pdf is
2
)()(exp)det)2((),,(1
21 yyyf
Tp
equation Estimating
),(
)(}))(({),(1
0)(),(1
μθ
μxμxθx
μxθx
UT
ii
ii
cwn
wn
48
![Page 49: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/49.jpg)
111 , , kkkkkk μθμθ
),(
),(1
ki
jkik xw
xxw
,
det ),(
)()(),(1
kUki
Tkikiki
k
cxw
xxxw
繰り返し重み付け平均と分散
Under a mild condition
),...1 ( )(L )(L 1 kkUkU
U-algorithm
49
![Page 50: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/50.jpg)
Simulation
ε-conmination
)ˆ( 0KL θ,θD
, N , N-εG
, -
N .
. , N-εG
ε
ε
9999
00
1001
00
)1(
1004
55
250501
00
)1(
(2)
(1)
(2)
(1)
under 6516702under2439033
05.00
G . . G . .
εε
KL error
with MLE θ̂
50
![Page 51: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/51.jpg)
β-power estimates v.s. η-sigmoid estimates
0 39.240.01 35.300.05 20.930.10 8.910.20 12.400.30 31.64
0 39.240.0001 23.04
0.00025 6.700.0005 4.46
0.00075 4.640.001 6.04
βηKL error KL error
0 16.50.01 13.910.05 6.650.10 5.140.20 12.200.30 29.68
0 16.50.0005 3.36
0.00075 3.190.001 3.90.002 3.040.003 3.07
(1)G
)2(G
51
![Page 52: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/52.jpg)
Plot of MLEs (no outliers)
1112
22
100 replications with100 size of sampleunder Normal dis.
η- Est (η=0.0025)β- Est (β=0.1)MLEtrue (1,0,1)
2212
1211
52
![Page 53: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/53.jpg)
Plot of MLEs with ouliers
η- Est (η=0.0025)β- Est (β=0.1)MLEtrue (1,0,1)
)2(G
100 replications with100 size of sampleunder
11.5
2
2.5
0
0.5
1
1.5
0.5
1
1.5
2
11.5
2
2.5
0
0.5
1
1.5
53
![Page 54: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/54.jpg)
Selection for tuning parameter
Squared loss function
dyygyf 221 )}()ˆ,({)ˆ(Loss
dyyfxfn
ii
n
i
2)(
1
)ˆ,(21)ˆ,(1)(CV
)(CVminargˆ
)ˆ,(IF1
1ˆˆ )( i
i xn
Approximate
54
![Page 55: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/55.jpg)
.26.1-
.1-.26 varianceand
00
mean with Normal
0.0250.050.0750.10.1250.150.175
2.82
2.84
2.86
2.88
2.9
2.92
2.94
.261.126-.126-.228
0.081-
0.054
signalMLE
.383.263-.263-1.059
0.184-
0.204
ContamiMLE
.286.134-.134-.293
0.132-
0.086
contamiEst-
β
)(C V
07.0ˆ
55
![Page 56: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/56.jpg)
What is ICA?
Cocktail party effect
Blind source separation
s1 s2 ……… sm
x1 x2 ……… xm
56
![Page 57: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/57.jpg)
ICA model
sWxW mmm 1 s.t., RR
(Independent signals)
(Linear mixture of signals)
0)(,,0)()()()(),...,(
1
111
m
mmm
SESEspspspsss
~
),,( 1 nxx
))(())((|)det(|),,( 111 mmmppWpWf μxwμxwx
Aim is to learn W from a dataset
in which unknown. is)()()( 11 mm spspp s
57
![Page 58: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/58.jpg)
ICA likelihood
Log-likelihood function
|)det(|log))((log)(11
WpW jijj
m
j
n
i
μxw
TTm WWxWxhI
WpWxpWxF
))()((),,(),,(
)( )(log,,)(log)(1
11
m
mm
ssp
sspsh
Estimating equation
Natural gradient algorithm, Amari et al. (1996)58
![Page 59: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/59.jpg)
Beta-ICA
power equation ),(),,(),,(11
WBWxFWxf
n
n
iii
decomposability:
0]))}({E[
)]()}({E[
])}({E[,
XwXwp
XwhXwp
Xwp
tttt
ssssss
tqsqqq
ts
59
![Page 60: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/60.jpg)
Likelihood ICA
0.5 1 1.5 2 2.5
0.2
0.4
0.6
0.8
1
1.2
1.4
-1.5 -1 -0.5 0.5 1 1.5
-1.5
-1
-0.5
0.5
1
1.5
5.01211W
150 signals U(0,1)× U(0,1)
Mixture matrix
60
![Page 61: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/61.jpg)
Non-robustness
-2 -1 1 2 3
-2
-1
1
2
-2 -1 1
-2
-1
1
2
50 Gaussian noise N(0, 1)× N(0, 1)
61
![Page 62: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/62.jpg)
power-ICA (β=0.2)
-2 -1 1 2 3
-2
-1
1
2
-10 -7.5 -5 -2.5 2.5 5 7.5
-6
-4
-2
2
4
Minimum -power
62
![Page 63: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/63.jpg)
Tsallisべきエントロピーに射影不変性を課したエントロピ-、ダイバージェンスを考察した。
平均と分散を一定にする分布族で射影べきエントロピー最大分布モデルが正規分布,
t-分布,ウエグナー分布を含む形で導出された.
そのモデル上での平均と分散の素直な推定法を提案し,推定量が標本平均と標本分散
になることが示された.
63
![Page 64: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/64.jpg)
-cross entropy
-diagonal entropy
Cf. Tsallis (1988), Basu et al (1998), Fujisawa-Eguchi (2008), Eguchi-Kato (2010)
-divergence
Remark= 0 reduces to Boltzmann-Shannon entropy and Kullback-Leibler divergence
Projective -power divergence
64
![Page 65: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/65.jpg)
Max -entropy
-entropy
Equal moment space
Max -entropy
-model
65
![Page 66: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/66.jpg)
-estimation
Parametric model
-Loss function
-estimator
Gaussian
-estimator
Remark Cf. Fujisawa-Eguchi (2008)66
![Page 67: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/67.jpg)
-estimation on -model
-loss function
Theorem.
67
![Page 68: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/68.jpg)
model
loss function
-loss
normal-model -model
(-estimator ’-model)
0-loss
( log-likelihood) MLE
Moment estimatorRobust -estimator
Robust model
MLE
68
![Page 69: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/69.jpg)
Mixture of independent signals
Whitening
Preprocessed form
Statistical model
-ICA
Gamma-ICA
69
![Page 70: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/70.jpg)
70
![Page 71: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/71.jpg)
71
![Page 72: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/72.jpg)
Boosting leaning algorithms
for supervised/unsupervized learning
U-loss functions
Outline
72
![Page 73: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/73.jpg)
Which chameleon wins?
73
![Page 74: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/74.jpg)
Pattern recognition…
74
![Page 75: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/75.jpg)
What is pattern recognition?
□ There are a lot of examples for pattern recognition.
□ In principle pattern recognition is a prediction problem for class label
which would classify a human interestingness and importance.
□ Originally human brain wants to label phenomena a few words , for
example (good, bad), (yes, no), (dead, alive), (success, failure), (effective,
no effect)….
□ Brain intrinsically predicts the class label from empirical evidence.
75
![Page 76: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/76.jpg)
Practice
● Character recognitionvoice recognition
Image recognition
☆ Credit scoring
☆ Medical screening
☆ Default prediction
☆ Weather forcast
●
●
● face recognition
finger print recognition●
speaker recognition●
☆ Treatment effect
☆ Failure prediction ☆ Infectious desease
☆ Drug response
76
![Page 77: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/77.jpg)
Binary classification
classifier
Feature vector Class label
score
In a binary class y = 1, 1 (G=2)
77
![Page 78: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/78.jpg)
Probability distribution
).,( vector random of pdf a be),(Let yyp xx
B Cy
ypCBp }d),({),( xx
Xxx d),()( ypypMarginal density
Conditional density
},...,1{
),()(Gy
ypp xx
)(),()|(
xxx
pypyp
)(),()|(
ypypyp xx
)()|()()|(),( xxxx pypypypyp
)|1()|1(
)1,()1,(
xx
xx
ypyp
ypyp
78
![Page 79: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/79.jpg)
Error rate
Classifier
Feature vector ,class label
has error rate
))(Pr()(Err yhh FF x
G
iF
jiFF iyihjyihh
1
),)(Pr(1),)(Pr()(Err xx
Training error
Test error
79
![Page 80: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/80.jpg)
False negative/positive
FNR
True Negative
True Positive False Positive
False Negative
FPR
80
![Page 81: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/81.jpg)
Bayes rule
Let p(y|x) be a conditional probability of y given x.
The classifier leads to Bayes rule. ))(sgn()( 0Bayes xx Fh
For any classifier h
)(Err)(Err Bayes hh
Note: The optimal classifier is equivalent to the likelihood ratio.However, in practice p(y|x) is unknown, so we have tolearn hBayes(x) based on the training data set.
Theorem 1
81
![Page 82: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/82.jpg)
Discriminant space associated with Bayes classifier is given by
Error rate for Bayes rule
In general, when a classifier h associates with spaces
)(Err)(Err Bayes hh Cf. Mclachlan, 2002
82
![Page 83: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/83.jpg)
Boost learning
Boost by filter (Schapire, 1990)
Bagging, Arching (bootstrap)(Breiman, Friedman, Hasite)
AdaBoost (Schapire, Freund, Batrlett, Lee)
Weak learners can be combined into a strong leanrner?
Weak learner (classifier) = error rate is slightly less than .5
Strong learner (classifier) = error rate is slightly less than that of Bayes classifier
83
![Page 84: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/84.jpg)
Set of weak learners
Decision stumps
Linear classifiers
Neural net SVM k-nearest neighbur
Note: not strong but a variety of characters
84
![Page 85: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/85.jpg)
Exponential loss function
Empirical exponential loss function for a score function F(x)
)}(exp{1)(1
exp ii
n
i
D Fyn
FL x
where q(y|x) is the conditional distribution given x, q(x) is the pdf of x.
Let be a training (example) set.
xxxxX
d)()}|()}(exp{{)(}1,1{
expqyqyFFL
y
E
Expected exponential loss function for a score function F(x)
85
![Page 86: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/86.jpg)
0)(),1()(:Intial.1 011
xFniiw n
,)'(
)())((I)(
iwiwfyf
t
tiit x
)(
)(121
)(
)(log)b(tt
ttt
f
f
T
tttTT fFF
1)( )()( where,)(sign.3 )( xxx
Tt ,,1For .2
))(exp()()()c( )(1 iitttt yfiwiw x
)(min)()a( )( ff tftt F
Adaboost
86
![Page 87: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/87.jpg)
Learning algorithm
)},( ),...,,{( 11 nn yy xx
)(,),1( 11 nww
)(,),1( 22 nww
1
2
1
2
T
)()1( xf
T
1)( )(
ttt f x
)(,),1( nww TT
)()2( xf
)()( xTf
Final learner
T
tttT fF
1)( )()( xx
87
![Page 88: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/88.jpg)
Update weight
21)( )(1 tt f The worst case
)()( 1 iwiw tt
t
t
eyf
eyf
iit
iit
Multiply )(
Multiply )(
)(
)(
x
x
update
Weighted error rate )()()( )1(1)(1)( tttttt fff
88
![Page 89: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/89.jpg)
21)( )(1 tt f
)'()())(()(
1
1
1)(1 iw
iwyfIft
tn
iiittt x
n
ititit
n
itititiit
iwfy
iwfyyfI
1)(
1)()(
)()}(exp{
)()}(exp{))((
x
xx
n
itiitt
n
itiitt
n
itiitt
iwyfIiwyfI
iwyfI
1)(
1)(
1)(
)())((}exp{)())((}exp{
)())((}exp{
xx
x
21
)}(1{)(1
)()(
)()(1
)()(
)(1
)()(
)()(
)(
)(
)()(
)(
tttt
tttt
tt
tt
tttt
tt
ff
ff
ff
ff
f
89
![Page 90: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/90.jpg)
Update in exponential loss
)}(exp{1)(1
exp ii
n
i
Fyn
FL x
)()()( xxx fFF
}))(1()(){(exp fefeFL
Consider
n
iiiiiii yfeyfeFy
n 1
))(I())(I()}(exp{1 xxx
n
iiiii fyFy
nfFL
1exp )}(exp{)}(exp{1)( xx
)(
)}(exp{))(()(
exp
1
FL
FyyfIf
n
iiiij
xxwhere
90
![Page 91: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/91.jpg)
Sequential optimization
}))(1()(){()( expexp efefFLfFL
)()(1log
21
opt ff
)}(1){(2 ff
)}(1){(2)()(12
ffefe
f
Equality holds if and only if
efef ))(1()(
91
![Page 92: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/92.jpg)
AdaBooost = Seq-min exponential loss
)(minarg)a( )( ff tFf
t
)}(exp{)()((c) 1 itittt xfyiwiw
)}(1){()()(min )()(1exp)(1exp ttttt ffFLfFL
R
)()(1
log21
)(
)(opt
t
t
ff
)(minarg)b( )(1exp ttt fFL
R
92
![Page 93: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/93.jpg)
Simulation (complete separable)
-1 -0.5 0.5 1
-1
-0.5
0.5
1
[-1,1]×[-1,1]
Feature space
Decision boundary
93
![Page 94: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/94.jpg)
-1 -0.5 0.5 1
-1
-0.5
0.5
1
Set of linear classifiers
Linear classification machines
Random generation
94
![Page 95: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/95.jpg)
Learning process
•
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Iter = 1, train err = 0.21 Iter = 13, train err = 0.18 Iter = 17, train err = 0.10
Iter = 23, train err = 0.10 Iter = 31, train err = 0.095 Iter = 47, train err = 0.0895
![Page 96: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/96.jpg)
Final decision boundary
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Contour of F(x) Sign(F(x)) 96
![Page 97: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/97.jpg)
Learning curve
50 100 150 200 250
0.05
0.1
0.15
0.2
Iteration number
Training curve
97
![Page 98: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/98.jpg)
KL divergence
For conditinal distribution p(y|x), q(y|x) given x with commonmarginal density p(x) we can write
Nonnegative function
Feature space
Note
Label set
Then,
98
![Page 99: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/99.jpg)
Twin of KL loss functionFor data distribution q(x, y) = q(x) q(y|x) we model as
G
gyFgFym
11 )},(),(exp{)|( xxx
xxxxxxx
Xd)()}|()|(
)|()|(log)|({),(
}1,1{KL qymyq
ymyqyqmqD
y
Then
G
g
gF
yFym
1
2
)},(exp{
)},(exp{)|(x
xx
Log lossExp loss 99
![Page 100: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/100.jpg)
Bound for exponential loss
Empirical exp loss
Expected exp loss
Theorem.
100
![Page 101: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/101.jpg)
Variational calculas
101
![Page 102: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/102.jpg)
On AdaBoostFDA, or Logistic regression
Parametric approach to Bayes classifier
AdaBoost
The stopping time T can be selected according to the state of learning.
Parametric approach to Bayes classifier, but dimension and basis function are flexible
102
![Page 103: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/103.jpg)
U-Boost
U-empirical loss function
In a context of classification
Unnormalized U-loss
Normalized U-loss
103
![Page 104: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/104.jpg)
U-Boost (binary)
Unnormalized U-loss
Note
Bayes risk consistency
F*(x) is Bayes risk consistent because104
![Page 105: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/105.jpg)
Eta-loss function
regularized
AdaBoost with margin
105
![Page 106: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/106.jpg)
EtaBoost for Mislabels Expected Eta-loss function
Optimal score
The variational argument leads to
Mislabel modeling
106
![Page 107: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/107.jpg)
EtaBoost
0)(),1()(: settings Initial.1 011
xFniiw n
,)())((I)(1
iwfyf m
n
iiim
x
T
tttTT fFF
1)( )()( where,)(sign.3 )( xxx
Tm ,,1For .2
))(exp()()()c( )(*
1 iimmmm yfiwiw x
)(min)()a( )( ff mfmm
(b)
107
![Page 108: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/108.jpg)
A toy example
108
![Page 109: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/109.jpg)
Examples partly mislabeled
109
![Page 110: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/110.jpg)
AdaBoost vs. EtaBoost
AdaBoost EtaBoost110
![Page 111: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/111.jpg)
ROC curve, AUC
cF 閾値:判別関数 ),(: X
)1Y|)(()TPR(
0)Y| )(()FPR(
cFPc
cFPc
X
X
}RI|))(TPR),(FPR{()ROC( cccF
111
0)(Y )(1)(Y )(
陰性
陽性
cFcF
XX
1} {0, ,RI , ),( YY pXX
)'(FPR)'(TPR),AUC( cdcF
TPR
FPR
1
10
c
AUC
偽陽性確率
真の陽性確率
受信者動作特性(ROC) 曲線
下側面積(AUC )
Cf. Bamber (1975), Pepe et al. (2003)
![Page 112: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/112.jpg)
112
![Page 113: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/113.jpg)
113
![Page 114: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/114.jpg)
114
![Page 115: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/115.jpg)
115
![Page 116: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/116.jpg)
116
![Page 117: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/117.jpg)
117
![Page 118: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/118.jpg)
118
λ5-fold CVの結果(AUCBoost)
118
![Page 119: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/119.jpg)
119
![Page 120: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/120.jpg)
120
![Page 121: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/121.jpg)
121
![Page 122: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/122.jpg)
2標本問題とAUCブーストの関係
AUCブーストの学習アルゴリズムは逐次的に各 t-ステップで
得られた判別関数 Ft(x) のウィルコクソン検定統計量
を(t+1)-ステップ
で最大方向に増大するよう設計されている.
このように
となる中で全ての単点解析が統合されている.
この意味で全ての単点解析はブースティングに
組むことは可能である. Eguchi-Komori (2009)
))()((1)(ˆ0
1 11
10
1 0
it
n
k
n
iktt FFI
nnFC xx
)()()( 111 xxx tttt fFF
)(ˆ)(ˆ1tt FCFC
122
![Page 123: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/123.jpg)
123
![Page 124: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/124.jpg)
124
![Page 125: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/125.jpg)
Partial AUC
))( ),X()X((P
)(FPR)(TPRpAUC
010 XFcFF
cdcc
)(xF
健常者 病人
cTPR
FPR
1
10
)(TPR c
)(FPR c )(xF cTPR
FPR
1
10
pAUC
)(FPR c
)(TPR c
健常者 病人
cc
pAUC
125
![Page 126: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/126.jpg)
126
![Page 127: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/127.jpg)
Robust for noninformative genes
127
![Page 128: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/128.jpg)
Robustness for noninformative genes
128
![Page 129: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/129.jpg)
生成関数
真のU 陽性確率
129
![Page 130: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/130.jpg)
Boost learning algorithm
130
![Page 131: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/131.jpg)
Bayes risk consistency
131
![Page 132: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/132.jpg)
ベンチマークデータ
132
![Page 133: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/133.jpg)
T r u e d e n s i t y
- 4
- 2
0
2
4
- 4
- 2
0
2
4
0
0 . 0 0 5
0 . 0 1
0 . 0 1 5
0 . 0 2
- 4
- 2
0
2
4
W dictionary133
![Page 134: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/134.jpg)
U-Boost learning
)( argmin *
* fLf Uf UW
pfUf
nfL
n
iiU
RI1d)))((())((1)( xxx U-loss function
)( ))((co 1* WW U
Dictionary of density functions
},1d)(,0)(:)({ xxxx gggW
Learning space = U-model
)}( ))(({ 1
xg
Goal : find
)())0,1,,0(,(Then.))((),(Let )(
1 )( xxxπx
gfgf
WW * U
134
![Page 135: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/135.jpg)
Inner step in the convex hull
))((co1* WW U}:),({ xW
*UW
W
)(xf
1g
2g3g
4g
5g
6g
7g
)(xf
)( argmin *
* fLf Uf UW
Goal : )))(ˆ(ˆ))(ˆ(ˆ()( ˆˆ111* xxx kk fff
135
![Page 136: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/136.jpg)
Convergence of Algorithm
)()()(
)()()1()(
*11
111
fgg
gff
KK
KKKKK
*UW
-mixture
)()()()()( *111 fffff kkk
),()()( 11 kkUkUkU ffDfLfL
k
jjjUkUU ffDfLfL
1111 ),()()(
136
![Page 137: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/137.jpg)
Simplified boosting
initial with the
))),()()1(((minarg
such that))()()1((
,1,....,0For
1
11
1111
1
ckc
gfLg
gfff
Kk
k
kUk
kkkkkk
g
W
K
kkkK gf
1
1 )( )(ˆ
)(minarg1 gLf Ug W
137
![Page 138: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/138.jpg)
Then we have
Non-asymptotic bound
UbU 2
))(co(),,()}()(){(sup)A(
WWW
),(inf),(FA fgDg Uf
UU=W
=W
|}| )(EI))((supEI2),(EE1
1{ ffg pi
n
inf
g
xW
W
constant)length-step:(1
),(IE22
ccK
bcK U
W
(Functional approximation)
(Estimation error)
(Iteration effect)
),(EEand),(FAbetween Trade Remark. WW pp *U
where
),,IE( ),EE(),FA()ˆ,(EI * WWW KggfgD UKUg
Theorem. Assume that a data distribution has a density g(x) and that
138
![Page 139: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/139.jpg)
Lemmas
Lemma 1 Then).))(((andestimator density abe~Let 11* N
j jjpff x
).,,~()}()~(){1()())))(())(~()1((( ***1
1 fffLfLfLfLp UUUUNj jUj
xx
Lemma 2 Then.))(co(in be~Let Wf
])1,0[(),,( 22* UU bff
Lemma 3 (A)assumptionUnder
))((where1
)(inf)ˆ( 122
jjU
UKU fcK
bcfLfL
Lemma 4 then,)(inf)ˆ( If
fLfL UKU
||)(EI))((1||supEI2),(inf)ˆ,(EI
ffn
fgDfgD gif
gUKUg xW
139
![Page 140: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/140.jpg)
KDERSDE
-Boost -Boost
C (skewed-unimodal) L (quadrimodal) H (bimodal)
140
![Page 141: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/141.jpg)
geometry
learning
statistics
quantum
Future of information geometry
Deepening
Optimal transform
Free probability
Semiparametrics
Compressed sensing
Poincare conjecture
Dynamical systems
Mathematical evolutionary theory
Dempster-Shafer theory141
![Page 142: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/142.jpg)
Poincaré conjecture 1904
Every simply connected, closed 3-manifold is
homeomorphic to the 3-sphere
Ricci flow by Richard Hamilton in 1981 Perelmann 2002, 2003
Optimal control theory due to Pontryagin and Bellman
Optimal transport
Wasserstein space }~,~:||||E{min),( 2W gYfXYXgfd
142
![Page 143: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/143.jpg)
Optimal transport
Theorem (Brenier, 1991)
Talagrand inequality
Log Sovolev inequality
Optimal transport theory is extending on a general manifold
Geometers consider not a space, but a distribution family on the space.
C. Villani (2010) Optimal transport theory as Fields medal
143
![Page 144: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry](https://reader033.vdocuments.site/reader033/viewer/2022041707/5e458c2ab2c94709e042b9a2/html5/thumbnails/144.jpg)
Thank you
144