innovations, ricatti and kullback- leibler divergence · 2020. 10. 29. · f. c. schweppe,...
TRANSCRIPT
Innovations, Ricatti and Kullback-Leibler Divergence
Dept. of Electrical EngineeringKAISTDaejeon, Korea
Youngchul Sung
Presented at Niigata University, Japan, Oct 22, 2010
This visit and talk is supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0021269).
© Youngchul Sung. Oct. 22, 2010 2
Research Background
Signal Processing
InformationTheory
Communications
Networking
Statistical Signal Processing(Detection & Estimation)
SPTMSAMAdaptiveMachine LearningAudioVideoMultirateBiomedical...
Signal Processing forCommunications
© Youngchul Sung. Oct. 22, 2010
Outline
3
Introduction
Neyman-Pearson detection of hidden Gauss-Markov signals
Optimal sampling for detection
Application to wireless sensor networks
Conclusion
© Youngchul Sung. Oct. 22, 2010
Mathematical Engineering
4
© Youngchul Sung. Oct. 22, 2010Friedrich Leibnitz
Thomasus (1643)
Mencke
Wichmannshausen
Hausen
Kastner
Pfaff
Gauss
Gudermann Gerlings Bessel
WeierstrassPlucker
Euler d’Alembert
Lagrange Laplace
Fourier Poisson
ChaslesDirichlet
Lipschitz
Klein (1868)
Nicola Bugaev
Nikolai Luzin
Andrei Kolmogorov
Albert Shiryaev
Lindemann (26172) Story
Sommerfeld
Guillemin (degree 1926, Munchen > MIT)
Hilbert
Fano (MIT)
Wozencraft (MIT)
T. Kailath (MIT=>Stanford)
Gaston Darboux
Emile Borel StieltjesPicard
Hadamard Henri Lebesgue
Paul Levy
Michel Loeve (Ecole poletechnique => UCB)
Emanuel Parzen
German Root French Root
Russian branch
Norbert Wiener (Havard 1913 =>MIT)
Royce (JHU 1878 > Harvard)
Konig
Heyne
Bach
Kustner
Ernesti
Gesner
Budeus
Walthier
Strauch
Abraham Kein (1632)
Bose Tufts (Havard)
Morris(1878, U. Berlin >JHU)
Trendelenburg(1868 U. Berlin)
도미
도미도미
Vannevar Bush(MIT)
Ragazzini (columbia)
John Russel(MIT)
ZadehKalman (MIT,Columbia: Stanford)
Arther Kennely(MIT)
Frank Hitchkock(MIT)
Claude Shannon Caldwell
Huffman
?
?
? ?
Bernoulli?
...
?Jacob Andreae
G. Riemann
Truxal (Polytech)
Braun (Polytechnic)
Mendel (USC)
Linvill (MIT) Tuttle
Arthurs
I. Jacobs L. Kleinrok
B. Widrow
(German studying in France)
Bieberbach
Schroder
Reissig
Bunke
Nussbaum
© Youngchul Sung. Oct. 22, 2010
Modern Probability and Statistics
Lebesgue: Measure theory
Kolmogorov: Measure-theoretic probability
Wiener: Probabilistic approach to system theory
Shannon: Probabilistic approach to communication theory
6
© Youngchul Sung. Oct. 22, 2010
The Key Problem in Statistics
7
X YBlackbox
© Youngchul Sung. Oct. 22, 2010
The Key Problem in Statistics
8
YP(Y|X)
The Wiener Model
X
© Youngchul Sung. Oct. 22, 2010
Areas in Statistical Signal Processing
Detection theory Radar setup
Digital communication setup: Shannon formulation and following 60,000 Ph.D.’s
Applications: Radar, sonar, distributed (team) detection, digital communications
Point estimation theory Applications: Frequency/spectral estimation, DOA estimation, Beamforming, (Blind)
parametric system identification
Signal tracking Optimal, adaptive, particle filtering
Applications: Radar, sonar, control, digital communications, equalization, negative feedback recovery
Time series analysis Applications: Finance, prediction theory
Machine learning Applications: Pattern recognition, voice recognition, face recognition, financial
modeling
9
© Youngchul Sung. Oct. 22, 2010 10
Statistical Inference: Detection & Estimation
Pr{ X != X_guess(Y)} || X – X_guess(Y) ||^2
E { || X – X_guess(Y) ||^2 }
Deterministic Least Squares
Stochastic Least Squares orMinimum Mean Square Error
Detection Estimation
Neyman Pearson
H. Chernoff
Kullback Leibler
Claude Shannon
Carl Friedrich Gauß(1777 ~ 1855)
Norbert Wiener
Thomas Bayes(1702 ~ 1762)
Rudolf Kalman
Bernard Widrow
R. Fisher(FIM,1925)
C. R. Rao H. Cramer
(I-divergence)
Lucien Le Cam
© Youngchul Sung. Oct. 22, 2010 1111
Pr{ X != X_guess(Y)} || X – X_guess(Y) ||^2
E { || X – X_guess(Y) ||^2 }
Deterministic Least Squares
Stochastic Least Squares orMinimum Mean Square Error
Detection Estimation
Neyman Pearson
H. Chernoff
Kullback Leibler
Claude Shannon
Carl Friedrich Gauß(1777 ~ 1855)
Norbert Wiener
Thomas Bayes(1702 ~ 1762)
Rudolf Kalman
Bernard Widrow
R. Fisher(FIM,1925)
C. R. Rao H. Cramer
(I-divergence)
Lucien Le Cam
Statistical Inference: Detection & Estimation
State-space model
© Youngchul Sung. Oct. 22, 2010 1212
Introduction
Neyman-Pearson detection of hidden Gauss-Markov signals
Optimal sampling for detection
Application to wireless sensor networks
Conclusion
© Youngchul Sung. Oct. 22, 2010 13
Related Work
S. Kullback & R. A. Leibler, “On information and sufficiency,” Ann. Math. Stat., 1951
R. I. Price, “Optimum detection of stochastic signals in noise with application to scatter-multipath channels,” IRE Trans. Infor. Theory, 1957
F. C. Schweppe, “Evaluation of likelihood functions for Gaussian signals,” IEEE Trans. Infor. Theory, 1965
S. Saito and F. Itakura, “The theoretical consideration of statistically optimum methods for speech spectral density,” Electrical Communication Laboratory, NTT, Tokyo, Rep. 3107, Dec. 1966.
T. Kailath, “A general likelihood-ratio formula for random signals in Gaussian noise,” IEEE Trans. Infor. Theory, 1966
T. Kailath, “Likelihood ratios for Gaussian processes,” IEEE Trans. Infor. Theory, 1970
M. Donsker & S. Varadhan, “Large deviations for stationary Gaussian processes,” Comm. Math. Phys., 1985
I. Vajda, “Distance and discrimination rates of stochastic processes,” Stoch. Proc. Appl. 1993
Y. Sung, L. Tong and H. V. Poor, “Neyman-Pearson detection of Gauss-Markov signals in noise: Closed-form error exponent and properties,” IEEE Trans. Infor.Theory, 2006
Y. Sung, X. Zhang, L. Tong and H. V. Poor, “Sensor configuration and activation for field detection in large sensor arrays,” IEEE Trans. Signal Processing, 2008
Y. Sung, H. V. Poor and H. Yu, “How much information can one get from a wireless ad hoc sensor network over a correlated random field?” IEEE Trans. Infor.Theory, 2009
© Youngchul Sung. Oct. 22, 2010 14
Detection of correlated random fields
Sensor field
0H
1H
Data collection
1H or0H
© Youngchul Sung. Oct. 22, 2010 15
Detection of correlated random fields
Sensor field
0H
1H
Data collection
1H or0H
Spatial correlation
Measurement noise
Sensor deployment Signal sampling
)(xs
2x1ix1x nx
1nx
ix
1s2s
is 1is
1ns
ns
3x
© Youngchul Sung. Oct. 22, 2010 16
Correlation and Noise
SNR=10dB
SNR=10dB
SNR= -3dB
SNR= -3dB
Dif
fere
nt
sa
mp
le c
orr
ela
tio
n
Different SNR
Design parameter: Number of samples and spacing
© Youngchul Sung. Oct. 22, 2010 17
Problem Formulation: The Gauss-Gauss Problem
)()()(
xBuxAsdx
xds
iii uass 1
Aea
)( ii xss
Underlying diffusion phenomenon (Ornstein-Uhlenbeck process)
Sampled signal
)(xs
2x1ix1x nx
1nx
ix
1s2s
is 1is
1ns
ns
3x
10 a
2
2
E
ESNR
i
i
w
s
ii wyH :0
iii
iii
wsy
uassH
1
1 :
vs.
© Youngchul Sung. Oct. 22, 2010
Hidden Gauss-Markov Model: The State-Space Model
18
1i i i
i i i
s as u
y s w
Hidden Gauss-Markov model
Burg’s Theorem
The maximum-entropy-rate stochastic process {S_i} satisfying the constraints
, 0,1,...,i i k kES S a k p
is the p-th order Gauss-Markov process.
Kalman filtering
Optimal estimation of state using observations is based on the state-space model.
© Youngchul Sung. Oct. 22, 2010 19
Optimal Detection: Neyman & Pearson
n
nn
nn
yyp
yyp
),,(
),,(log
1,0
1,1
}|nPr{Decisio 01 HHPF
}|nPr{Decisio 10 HHPM
Likelihood ratio detector
Threshold design
Minimize miss probability
satisfying false alarm probability level
(Neyman-Pearson formulation)
0H
1HNeyman Pearson
© Youngchul Sung. Oct. 22, 2010
Structure for Optimal Detection R. I. Price, “Optimum detection of stochastic signals in noise with application to scatter-multipath channels,” IRE Trans.
Infor. Theory, 1957
Extension by Middleton, Shepp
F. C. Schweppe, “Evaluation of likelihood functions for Gaussian signals,” IEEE Trans. Infor. Theory, 1965
T. Kailath, “A general likelihood-ratio formula for random signals in Gaussian noise,” IEEE Trans. Infor. Theory, 1966
T. Kailath, “Likelihood ratios for Gaussian processes,” IEEE Trans. Infor. Theory, 1970
20
where
© Youngchul Sung. Oct. 22, 2010
Schweppe’s Recursion
21
(Schweppe, 1965)
© Youngchul Sung. Oct. 22, 2010 22
Performance Analysis:Known Results - I.I.D. Case
n
n
i
iy
1
2
FM Pnn
P 1;2SNR1
1;
2
1
Number of samples
PM
(lo
g s
cale
)
Slope = - K
Energy detector
Performance
1H
0H
nKPM log
© Youngchul Sung. Oct. 22, 2010 23
General Case
Challenge: Exact error probability is not available!
nK
M eP
nKPM log
PM
(lin
ear
scale
)
PM
(lo
g s
cale
)
Number of samplesNumber of samples
© Youngchul Sung. Oct. 22, 2010 24
Error Exponent
n
PK M
n
loglim
nKPM log
PM
(lo
g s
cale
)Number of samples
Performance metric for large samples!
Slope=-K
© Youngchul Sung. Oct. 22, 2010 25
Previous Results on Error Exponents
I.i.d. observation Stein’s lemma
Kullback Leibler
dyypyp
yp
Yp
YpEppDK
)()(
)(log
)(
)(log)||(
0
1
0
1
0010
2
20
2 220 0
21
1
2
2 2 2202 2 2 0 01
0 1 1 2 2 2 2 2
0 1 0 1 12
2
1
1
2 1 1 1 1 1( (0, ) || (0, )) log log log 1
2 2 21
2
x
x
e
D N N E E x
e
© Youngchul Sung. Oct. 22, 2010
Interpretation via Sanov’s Theorem
26
E
Q
P*
*Pr( ) min exp( ( || )) exp( ( || ))Y P EP E nD P Q nD P Q
}|nPr{Decisio 10 HHPM
© Youngchul Sung. Oct. 22, 2010
Previous Results on Error Exponents
27
0, 1 2
1, 1 2
( , , , )lim log
( , , , )
n n
nn n
p y y yK
p y y y
under H0
Asymptotic Kullback-Leibler rate Bahadur, Vaida (1980,1989,1990)
© Youngchul Sung. Oct. 22, 2010
Previous Results on Error Exponents
28
Between two Markov or Gauss-Markov observations Koopmans (1960), Hoeffding (1965), Boza (1971), Natarajan(1985)
Donsker & Varadhan(1985), Benitz & Bucklew (1995), Bryc & Dembo(1997), etc.
Luschgy (1994) Neyman-Pearson detection between two Gauss-Markov signals
)|()|()|()(),,( 1231211 nnn yypyypyypypyyp
© Youngchul Sung. Oct. 22, 2010 29
Previous Results on Error Exponents
Markov Gauss-Markov observation Koopmans (1960), Hoeffding (1965), Boza (1971), Natarajan(1985)
Donsker & Varadhan(1985), Benitz & Bucklew (1995), Bryc & Dembo (1997), etc.
Luschgy (1994) Neyman-Pearson detection between two Gauss-Markov signals
State-space model: Hidden Markov)|()|()|()(),,( 1231211 nnn yypyypyypypyyp
s1 s2 s3 s4
y1 y2 y3 y4
Observation process is not Markov!
Signal Process
Observation Process
© Youngchul Sung. Oct. 22, 2010 30
Spectral Domain Approach
20 1
0
1( ,SNR) ( (0, ( )) || (0, ( ))
2y yK a D N S N S d
Theorem (Itakura-Saito, 1970’s)
0 0
0 1
1 1
( ) ( )1 1( (0, ( )) || (0, ( ))) log 1
2 2( ) ( )
y y
y y
y y
S SD N S N S
S S
F. Itakura S. Saito
© Youngchul Sung. Oct. 22, 2010 31
Innovations Process
2y
11 ey 11 ey
2e1|2y 1|2y
11 ey 2y2e
3e
}1,2{|3y
3y
jieeeeeyyy ji ,},,,{},,{ 321321
}1,,1{|ˆ iiii yye }{E 2
, iie eR ),0( ,iei RNe ~
© Youngchul Sung. Oct. 22, 2010
Innovations Approach to Asymptotic KL ratefor the Hidden Gauss-Markov Model
32
WhiteningFilter
© Youngchul Sung. Oct. 22, 2010
Innovations Approach to Asymptotic KL ratefor Hidden Gauss-Markov Model
33
=-
© Youngchul Sung. Oct. 22, 2010 34
Error Exponent: Innovations Approach
Theorem (Sung et al. 2004)
Mn
Pn
aK log1
limSNR),(
21 1 1log
2 2 2
e
e e
R
R R
}|E{lim~
}|E{lim
0
2
1
2
HeR
HeR
in
e
in
e
© Youngchul Sung. Oct. 22, 2010 35
Extreme Correlations
),0(),,0(
)||(
~,
2
01
2
0
10
22
0
NpNp
ppDK
RR ee
Corollary
I.i.d. case
Perfectly correlated case
Stein’s lemma
)1
(
0
~ 2
nP
K
RR
M
ee
~
)0( a
)1( a
perfectly correlatedcase
i.i.d. case
Number of samples
PM
(lo
g s
cale
)P
M (lo
g s
cale
)
Number of samples
© Youngchul Sung. Oct. 22, 2010 36
K vs. Correlation Strength
For SNR 1, K decreases monotonically as
For SNR < 1, there exists a non-zero optimal correlation
1a
A
a** log
Aea
Optimal Sampling
Maximum spacing
Optimal finite spacing*a
Correlation coefficient, a Correlation coefficient, a
Err
or
exponent,
K
Err
or
exponent,
K
Theorem (Sung et al.)
© Youngchul Sung. Oct. 22, 2010 37
Optimal Correlation vs. SNR
Transition is very sharp!
SNR=1
Signal-to-noise ratio [dB]
*a
© Youngchul Sung. Oct. 22, 2010 38
Intuition: Coherency vs. Diversity
Signal notch
Correlation coefficient, a Correlation coefficient, a
Err
or
exponent,
K
Err
or
exponent,
K
SNR=-3 dB
SNR=-3 dB
SNR=10 dB
SNR=10 dB
© Youngchul Sung. Oct. 22, 2010 39
K vs. Signal-to-Noise Ratio
K is monotone increasing as SNR increases
K is proportional to (1/2)log(1+ SNR) at high SNR
)1exp(a
Theorem 5:
Signal-to-noise ratio [dB]
Err
or
exponent,
K
a=exp(-1)
© Youngchul Sung. Oct. 22, 2010
Simulation Results
40
SNR=10 dB SNR= -3 dB
Number of samples Number of samples
Pe
Pe
© Youngchul Sung. Oct. 22, 2010
How Many Samples?
41
K
M
nK
M
eP
P
Kn
ePP
1
)1(
1
log1
ˆ
MP
FP
Required n
um
ber
of
senso
rs
Sensor spacing
Target : 10-4
Target : 10-3
SNR= - 3 dB
SNR= 0 dB
SNR= 3 dB
1ADiffusion rate
Optimal spacing
Reasonable spacing
© Youngchul Sung. Oct. 22, 2010
Vector State Space Model
42
Y. Sung, X. Zhang, L. Tong and H. V. Poor, “Sensor configuration and activation for field detection in large sensor arrays,” IEEE Trans. Signal Processing, 2008
© Youngchul Sung. Oct. 22, 2010
Vector State Space Model
43
Theorem (Sung et al.)
(Riccati)
(Lyapunov)
Y. Sung, X. Zhang, L. Tong and H. V. Poor, “Sensor configuration and activation for field detection in large sensor arrays,” IEEE Trans. Signal Processing, 2008
Known in Kalman filterin gtheory
Newly defined quantities
© Youngchul Sung. Oct. 22, 2010 44
Numerical Results
Figures from Y. Sung, X. Zhang, L. Tong and H. V. Poor, “Sensor configuration and activation for field detection in large sensor arrays,” IEEE Trans. Signal Processing, 2008
© Youngchul Sung. Oct. 22, 2010
Extension to d-Dimensional Gauss-Markov Random Fields
45
Y. Sung, H. V. Poor and H. Yu, “How much information can one get from a wireless ad hoc sensor network over a correlated random field?” IEEE Trans. Infor.Theory, 2009
© Youngchul Sung. Oct. 22, 2010
Extension to d-Dimensional Gauss-Markov Random Fields
46
Itakura-Saito (1970’s)Hannon (1973)
Sung et al. (2009)
Y. Sung, H. V. Poor and H. Yu, “How much information can one get from a wireless ad hoc sensor network over a correlated random field?” IEEE Trans. Infor.Theory, 2009
© Youngchul Sung. Oct. 22, 2010
Conclusion
The performance of Neyman-Pearson detection of hidden Gauss-Markov signals was analyzed using error exponent
Innovations approach to asymptotic Kullback-Leibler rate
Sharp transition behavior of error exponent as a function of correlation depending on SNR was proved
Connection of Kalman filtering quantities with asymptotic KL rate
Extension to vector state-space model
Extension to multi-dimensional case: Sufficient condition for strong convergence established
47