ccfung/courses/2017_2018/mmasp_i/notes/ch01… · examples of mathematical models. linear signal...
TRANSCRIPT
Mathematical Topics Embraced by Signal Processing
Carrson C. FungDept. of Electronics EngineeringNational Chiao Tung University
Examples of Mathematical Models
Linear signal models for discrete and continuous time, including transfer function and state space representations. Applications of these models to SP problems such as prediction, spectrum estimation, and so on
Adaptive filtering models and applications to prediction, system identification, and so forth
The Gaussian random variable, and other probability density functions, including the important idea of conditioning upon an observation
Hidden Markov models Model the dynamics of systems probabilistically
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 2
Why is modeling important?
Our world is complicated To describe it mathematically requires complicated
mathematics E.g. high-order differential equation
E.g. suppose you are ask to design a filter h[n] satisfying some design specifications such as transition bandwidth, passband frequency, stopband frequency, filter order, … Hence, design is usually done in frequency using H(ejω)
How many points in H(ejω) do you need to design? This is an impossible problem to solve as there are uncountable
number of points in [0,π]
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 3
Problem Specifications and Variable Parametrization
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 4
( )
Suppose the desired response is
1, 0 0,
don't care,
p
s
p s
Dω ω
ω ω ω πω ω ω
≤ ≤= ≤ ≤ < ≤
( ) ( )
( ) [ ] ( )
[ ] ( )
0
1
00
0
Change the variable from to amplitude response
1 cos ,
2assuming Type-I linear phase
j j
Nj n Mj
n
M
n
H e H e
H e h n e
Nb n n M
ω ω
ωω
ω
−− −
=
=
=
−= =
∑
∑
[ ]
( )
( )
and
12 , 0
2
1, 0
2
Nh n n
b nN
h n
− − ≠
= − =
Problem Formulation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 5
[ ]( ) ( ) 2
0
Then the filter design problem can be formulated as a LS problem
min ,
: 0 , but excluding transition band.Integration can be approximated by summation.
Now pro
j
Rb n
dD H e
R
ω ωωπ
ω π
−
≤ ≤
∫
blem only needs to solve a finite number of variables.
Filter designed using LS has much better magnitude frequency response than one using Kaiser windowing method
Other Motivations for Using Mathematics
Given a sequence of output data from a system, how can the parameters of the system be determined if the input signal is known What if the input signal is not known? What if system is nonlinear?
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 6
Unknown linear time-invariant
system
x(n)= δ(n)
Model matching for 0 ≤ n ≤ q and
n > q
hd(n) or y(n)
e(n)
Other Motivations for Using Mathematics
Determine a “minimal” representation of a system
Given a signal from a system, determine a predictor for the signal Forward and/or backward
Determine an optimal and/or efficient smoothing method E.g. Image smoothing
Determine a means of efficiently coding (representing) a signal modeled as the output of an LTI system
Develop computational efficient algorithms
Develop adaptive technique to obtain desirable output of system
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 7
n
f[n] How do we predict forward and/or backward
n
f[n]
Prediction
Smoothing
[ ]0s
[ ]1s
[ ]2s
Complex-Valued Linear Discrete-Time Models: ARMA and MA
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 8
[ ] [ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ]
[ ] [ ] [ ] [ ]
[ ] [ ]
* * *1 1
* * *0 1
* *
0 0
* * *0 1
*
0
Autoregressive moving average (ARMA) model
1 2
1
Moving average (MA) model
1
p
q
p q
k kk k
q
q
kk
y n a y n a y n a y n p
b f n b f n b f n q
a y n k b f n k
y n b f n b f n b f n q
y n b f n k
= =
=
= − − − − − − −
+ + − + + −
⇔ − = −
= + − + + −
⇔ = −
∑ ∑
∑
[ ]
[ ][ ]
[ ][ ] [ ]
0
1
Vector notation
1Define and
q
H
bf nbf n
n
bf n q
y n n
− = =
− ⇒ =
f b
b f
Complex-Valued Linear Discrete-Time Models: AR
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 9
[ ] [ ] [ ] [ ] [ ]
[ ] [ ] [ ]
* * * *1 1 0
* *0
1
Autoregressive (AR) model
1 2 p
p
kk
y n a y n a y n a y n p b f n
y n b f n a y n k=
= − − − − − − − +
⇔ = − −∑
[ ]
[ ][ ]
[ ][ ] [ ] [ ]
1
2
*0
Define
12
and
p
H
ay nay n
n
ay n p
y n b f n n
− − = =
− ⇔ = −
y a
a y
System Function and Impulse Response
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 10
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )( )
( )( )
* *
0 0
* *
0 0
* *
0 1
Assuming initial conditions are zero
ARMA System function
1
(usually assume system is normalized so t
p qk k
k kk k
q qk k
k kk k
p pk k
k kk k
a z Y z b z F z Y z A z F z B z
b z b zY z B zH z
F z A za z a z
− −
= =
− −
= =
− −
= =
= ⇔ =
= = = =+
∑ ∑
∑ ∑
∑ ∑0hat 1)a =
( ) ( )( ) ( )
( ) ( )( ) ( )
**
0 0
*
1
*
0
All-pole System function (IIR system)
1
All-zero system function (FIR system)
qk
kk
pk
kk
qk
kk
b zY z bH zF z A za z
Y zH z b z B z
F z
−
=
−
=
−
=
= = =+
= = =
∑
∑
∑
System Function and Impulse Response
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 11
( ) [ ] [ ]
( )
( ) ( )( )
* 10
1
1
1
Factoring into monomial factors using rootsof numerator and denominator
1
1
k
q
ik
p
ik
H z f k h n k
H z
b z z B zH z
A zp z
−
=
−
=
= −
−= =
−
∑
∏
∏
Stochastic MA and AR Models
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 12
[ ]
[ ]( )0
: assumed to be a white discrete-time random process, usually zero mean: set to 1, with input power determined by the variance of the signal
0,
f nb
E f n =
[ ] [ ]( )2
* ,
0, otherwiseff
n
m nE f m f n
σ
∀
==
SP often involves comparing two signals, one way for comparison isby correlation. When the signal is comparing with itself, the correlationis called autocorrelation function. For zero-mean WSS signa [ ]
[ ] [ ] [ ]( ) [ ] [ ] [ ]( )* *
l ,
or
(Note the convention: first argument minus second)yy yy
y n
r k E y n k y n r k E y n y n k− − − = −
Autocorrelation Function
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 13
[ ] [ ][ ] [ ]
[ ] [ ] [ ] [ ][ ] [ ] [ ]( )
[ ] [ ] [ ]( ) [ ]
*
* *1
*
* * *1
Note: (more details later)
For real-valued random process, (even function)
For MA process1
1
yy yy
yy yy
q
yy
q
r k r k
r k r k
y n f n b f n b f n q
r k E y n y n k
E f n b f n b f n q f n k
= −
= −
= + − + + −
⇒ = −
= + − + + − −
[ ] [ ]( )
[ ] [ ] [ ]
[ ] [ ] [ ] [ ]
* *1
22 221
1
* *1
1
For AR process1
q
q
ff ff q ff ff kk
p
b f n k b f n q k
r k b r k b r k b
y n a y n a y n p f n
σ=
+ − − + + − −
= + + + =
+ − + + − =
∑
Autocorrelation Function
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 14
[ ]
[ ] [ ] [ ] [ ] [ ]( )
[ ] [ ]
[ ] [ ]
*
* * * *
0 0
Multply by on both sides and take expectation:
, for 0 (0 for 0 because is white-noise process)
0, for 0For 0
0
p p
k k yyk k
fy
fy
y n
E a y n k y n a r k E f n y n
rf n
r E y n a
= =
−
− − = − = −
=
= >>
>
= = +
∑ ∑
[ ] [ ]( ) [ ][ ] [ ] [ ]
[ ] [ ] [ ]
* * *1
* *1
* *1
1
1
1
p
yy yy p yy
yy yy p yy
y n a y n p y n
r a r a r p
r a r a r p
− + + − − = + − + + −
⇒ = − − − − −
[ ] [ ] [ ] [ ]* *1
For AR process1 py n a y n a y n p f n+ − + + − =f[n]
whitey[n]
( )1
A z
Yule-Walker Equations: Solving System ID Problem
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 15
[ ] [ ] [ ]* *1 1yy yy p yyr a r a r p= − − − − −
[ ] [ ] ( )[ ] [ ] ( )
[ ] [ ] [ ]
[ ][ ]
[ ]
[ ]
*1*2
*
* *
Stacking 1,2, , equations, we have
0 1 1 121 0 2
1 2 0
Conjugating both sides:
0
yy yy yy yy
yyyy yy yy
yypyy yy yy
yy yy
p
r r r p rarar r r p
r par p r p r
r r
=
− − − − −− − =
− − −
[ ] ( )[ ] [ ] ( )
[ ] [ ] [ ]
[ ][ ]
[ ]
* *1
** * *2
** * *
1 1 121 0 2
1 2 0
yy yy
yyyy yy yy
p yyyy yy yy
r p a ra rr r r p
a r pr p r p r
− − − − −− − = ⇔ =
− − −
Rw r
[ ][ ]
[ ]
*1
*2
*
12
,
yy
yy
p yy
a ra r
a r p
− − −
w r
Observations about YW Equations
R=RH
Eigenvalues are real and eigenvectors corresponding to distinct eigenvalues are orthogonal/orthonormal. If Ris real, then R is symmetric, i.e. RT= R
R is a Toeplitz matrix, i.e. rij = ri-j Values of R depend only on the difference between the
index values Has efficient algorithm to solve for solution
Power efficient in hardware implementation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 16
Realization
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 17
( ) ( )( )
( )( ) ( ) ( ) ( )*
2 10 *
1
A controller canonical form (from control) can be written by realizing that the transferfunction can be written as
1 1
qk
k pk k
kk
Y z W zH z b z B z H z H z
W z F za z
−
= −
=
= = = = +
∑∑
( )
( ) ( ) [ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]
( ) ( )( ) ( ) ( ) ( )
[ ] [ ] [ ] [ ] [ ] [ ]
* *1
* * *0
2
*
1
* *1
1
.
Since 1 or
1
Since
1
* 1
p
q
pk
kk
p
H z
W z a z F z
w n f n a w n a w n p
Y zB z Y z W z B z
w n a w n a w n p f n
y n w n b n b w n b w n b w n
W z
q
−
=
+ = ⇔
⇒ = −
+ − + + − =
= = + − +
− − − −
= ⇒
+ −
=
⇔
∑
( )B z
Realization: AR part of Transfer Function
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 18
z-1 z-1 z-1 z-1f[n]
[ ] [ ] [ ] [ ]* *1 1 pw n a w n a w n p f n+ − + + − =
*1a−
w[n-1] w[n-2]
w[n-p]
*2a−
*1pa −−
*pa−
Realization of Complete Transfer Function
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 19
z-1 z-1 z-1 z-1f[n]
• Signal processing practitioners usually attempt to analyze characteristics of a system by ONLY looking at the relationship between the input and output
• Transfer function
• Imagine opening your system (a black box), which can now be modeled using a bunch of integrators (delay elements in discrete time) and putting a logic probe in each of the interconnect
• Concatenation of these signals w[n-k], ∀k makes up the state of the system
y[n]bp*
w[n-1] w[n-2]
w[n-p]
*1a−
*2a−
*1pa −−
*pa−
*0b
*1b
*2pb −
*1pb −
[ ] [ ] [ ] [ ][ ] [ ] [ ] [ ] [ ] [ ]
* *1
* * *0 1
1
* 1p
q
w n a w n a w n p f n
y n w n b n b w n b w n b w n q
+ − + + − =
= = + − + + −
Assumes p = q
State-Space Form
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 20
z-1 z-1 z-1 z-1f[n] y[n]bp*
xp[n]
*1a−
*2a−
*1pa −−
*0b
*1b
*2pb −
*1pb −
Assumes p = q
*pa−
xp-1[n]x2[n]
x1[n]
Consider relabeling the interconnect signals (states) as xk[n], for k = 1, 2, …, p
State-Space Representation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 21
( ) ( )( )
( )( )
( )( ) ( ) ( )1 2
00
1
1qk
k pkk
kk
Y z Y z X zH z b z H z H z
F z X z F z a a z
−
−=
=
= = = = +
∑∑
[ ] [ ][ ] [ ]
[ ] [ ][ ] [ ] [ ] [ ] [ ] [ ] ( )[ ] [ ] [ ] [ ] [ ]
1 2
2 3
1
* * * *1 2 1 1 2 1
* * * *1 1 2 2 1 1
Assuming , note that1
1
1
1 state equation input-output equatio
p p
p p p p p
p p p p
p qx n x n
x n x n
x n x n
x n f n a x n a x n a x n a x ny n b x n b x n b x n b x n
−
− −
− −
=
+ =
+ =
+ =
+ = − − − − −
= + + + +⇒
( )[ ] [ ] [ ] [ ]( )* * * *
0 1 2 1 1
n
p p pb f n a x n a x n a x n−
+ − − − −
xk[n]’s are known as the state variables. Note that the transfer function can be written as
State-Space Representation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 22
[ ][ ]
[ ][ ]
1
* * *0
* * **1 0 10
* * *1 0 1
Define the state vector , containing state variables , ,
0 1 0 0 0 00
0 0 1 0 0 0, , , and
00 0 0 0 0 1
1
k
p
p p
p p
x nn x n k
x n
b b ab b a
d b
b b a a
− −
∀
− −
− −
x
b c A
[ ] [ ] [ ][ ] [ ] [ ]
* * * * * *1 2 3 2 1
* * *0 1 1If 0, , then
1
p p p p
Tp p
T
n n f ny n n
a a a a a
b b b b
df n
− − −
−
− − − − −
= = + = +=
⇒ +
x
c
Ax bc x
Imagine opening your system (a black box), which can now be modeled using a bunch of integrators (delay elements in discrete time) and putting a logic probe in each of the interconnect
• Concatenation of these signals xk[n], ∀k makes up the state of the system
State-space equation
A is called a companion matrix
Non-uniqueness of State-Space Equation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 23
[ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ] [ ][ ] [ ] [ ]
1 1
1
Let , : invertible matrix, then 1
1
Terminologies (which will be explained later) is a similarity transfor
TT
p pn n f n
n n f ny n n df n
y n n df n
− −
−
= ×+ = +
+ = + ⇒= +
= +
x Tz Tz T ATz T b
Tz ATz bc Tz
c Tz
T AT mation of , they share identical eigenvaluesA
Time-varying State-Space Model
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 24
[ ] [ ] [ ] [ ] [ ][ ] [ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]( )
When system is time-varying, the state-space representation becomes 1
so , , , on the time index is shown
T
T
n n n n f ny n n n d n f n
n n n d n n
+ = +
= +
x A x bc x
A b c
Transformed State-Space Model
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 25
( ) ( ) ( )( ) ( ) ( )
Taking the -transform of the time-invariant SS model
Then the state equation becomes
T
zz z z F zY z z dF z
= +
= +
x Ax bc x
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( )
1
1
1
.
Substituting, then the output equation
p
p
Tp
Tp
z z F z
z z F z
Y z z F z dF z
z d F
−
−
−
− =
⇒ = −
= − +
= − +
I A x b
x I A b
c I A b
c I A b ( )
( ) ( )( ) ( ) 1
.
Then the transfer function becomes
Tp
z
Y zH z z d
F z−
= = − +c I A b
Solution for State-Space Difference Equation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 26
[ ] [ ] [ ][ ] [ ] [ ]
[ ][ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ]( ) [ ]
Recall the state-space difference equation 1
Also initial condition 1 , and for 0,
0 1 0
1 0 1
1 0 1
T
n n f ny n n df n
n
f
f
f f
+ = += +
− ≥
= − +
= +
= − + +
=
x Ax bc x
x
x Ax b
x Ax b
A Ax b b
[ ] [ ] [ ]
[ ] [ ] [ ]
2
1
0
1 0 1
1n
n k
k
f f
n f n k+
=
− + +
= − + −∑
A x Ab b
x A x A b
Solution for State-Space Difference Equation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 27
[ ] [ ] [ ] [ ]
[ ] [ ] [ ]
1
01
Quantities of are known as the Markov parameters of the system. Note: is a linear function of 1 and , so it is also a Gaussian process
(more on random pro
nT n T k
kT k
y n f n k df n
n f n k
+
=
= − + − +
• − −
∑c A x c A b
c A bx x
cess later)
State-Space Model: MIMO Extension
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 28
[ ] [ ] [ ][ ] [ ] [ ]
MIMO extension:1
If there are state variables and inputs and outputs, then: , : , : , :
n n n
n n np m
p p p m p m
+ = +
= +
× × × ×
x Ax Bu
y Cx Du
A B C D
[ ] [ ] [ ]
[ ] [ ] [ ] [ ]
1
0
1
0
Simple algebra will show that
1
1
Quantities of are known as the Markov parameters of the system.
nn k
kn
n k
kk
n k
n k n
+
=
+
=
= − +
= − + +
∑
∑
x A x A Bu
y CA x CA Bu Du
CA B
State Equation Example: Two DC Power Supplies
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 29
[ ] [ ] [ ]1 1 1 1
Assume outputs are independent of each othen, then a reasonable model would be thescalar model for each output 1
x n a x n u n= − +
[ ] [ ] [ ][ ] ( ) [ ] ( ) [ ] [ ]
[ ][ ]
1 1 2 2
1 2
2 2 2 2
2 21 2 1 2
2 2
1 1
2 2
1 ,
where 1 ~ , , 1 ~ , , and are zero-mean
WGN with variance and , respectively. All RVs are independent of each other.
0Then
0
x x x x
u u
x n a x n u n
x x u n u n
x n ax n a
µ σ µ σ
σ σ
= − +
− −
=
[ ][ ]
[ ][ ] [ ] [ ] [ ]
[ ]
[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )[ ] [ ]( ) [ ] [ ]( ) [ ]1
2
1
1 1
2 2
21 1 1 2
22 1 2 2
2
1 1 01 .
1 0 1
Also, since is a vector WGN with zero mean and covariance
0 ,
0
0so
0
uT
u
u
u
x n u nn n n
x n u n
n
E u m u n E u m u nE m n m n
E u m u n E u m u n
σδ
σ
σ
σ
− + ⇔ = − + −
= = −
=
x Ax Bu
u
u u
Q [ ] [ ][ ]
1 1
22 2
21
2 22
01 and 1 ~ , .
1 0x x
x x
xx
µ σµ σ
− − = −
x
State Equation Example: Vehicle Tracking
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 30
Goal: Estimate and track range and bearing of vehicle (assuming Cartesian coordinates) Assume constant velocity, perturbed by only wind gusts, slight speed corrections Model these perturbation
x y• −••
[ ] [ ] [ ][ ] [ ] [ ]
s as noise inputs, leading to velocity equations
1
1 .
Note that without
the n
x x x
y y y
v n v n u n
v n v u nn
= − +
= − +
• [ ] [ ]oise perturbations and , the velocities would be constant,
and the vehicle would be modeled as traveling in a straight line as indicated by the dashed linein Fig. 13.21 The position equation at
x yu n u n
•
[ ] [ ] [ ][ ] [ ] [ ]
time can then be written as1 1
1 ,
where is the sam
pling period. e
1
Th
x x x
y y y
nr n r n v n
r n r n v n
= − + − ∆
= − + ∆
ƥ
−
(discrete-time) velocity equations models the vehicle to be traveling at the velocity at1 and then changing abruptly at . This is an approximation to the true continuous behaviorn n−
State Equation Example: Vehicle Tracking
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 31
State Equation Example: Vehicle Tracking
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 32
[ ]
[ ][ ][ ][ ]
[ ][ ][ ][ ]
from the velocity and position equations, we see that
1 0 00 1 0
Def
ine the signal vect
0 0 1 00 0 0 1
or as
x
y
x
y
x
y
x
y
r nr n
nv nv n
r nr nv nv n
=
∆ ∆ =
x
[ ][ ][ ][ ]
[ ][ ]
[ ] [ ] [ ]
0101
11
1 .The measurments are noisy observations of the range and bearing
x
y
xx
yy
r nr n
u nv nu nv n
n n n
− − + − −
⇔ = − +x Ax u
[ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ]( ) [ ]
ˆ ˆ .
This can be written in general form as , where
RR n R n w n
n n w n
n n nββ β
= +
= +
= +y h x w
[ ] [ ]( )[ ] [ ]
[ ][ ]
2 2
arctan
x y
y
x
r n r nn n r n
r n
+
⇒ =
Cx h x
Example: System Estimation: One LS Approach
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 33
Unknown linear time-invariant
systemf[n]
Model matching:
ARMA(p,q)
hd[n] or x[n]
e[n]
w[n] Output may/may not be
affected by additive noise
z[n] z[n]=x[n] + w[n]Modeled as deterministic or
random?
[ ]Assuming: is known System: ARMA( , )
can setup equation to solve for parameters
f np q
•
•⇒ =Ax b
[ ] [ ] [ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ] [ ]
* * *1 1
* * *0 1
* *
1 0
Using ARMA model to model the unknown system :
1 2
ˆ
or
1p
q
p q
k kk k
d y n ah n y n a y n a y n p
b f n b f n b f n q
a y ny n k b f n k= =
= − − − − − − −
+ + − + + −
− +−⇔ −= ∑ ∑
[ ] [ ]ˆ or dh n y n
δ[n], or other known sequence (pilot)
White noise
Example: System Estimation: One LS Approach
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 34
[ ] [ ] [ ] [ ] [ ] [ ][ ] [ ] [ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ] [ ] [ ]
[ ][ ]
[ ]
*1*2
*
*0*1
*
1 2 0 11 1 1 1
1 2 1
1 and p
q
y p y p y f p f p f p qy p y p y f p f p f p q
y N y N y N p f N f N f N q
aa
z pa z pbb z N
b
− − − − − + + − =
− − − − − − − − + = =
A
x b
If large over-determined system LS solution possibleN ⇒ ⇒
[ ] [ ] [ ]* *
1 0Recall
p q
k kk k
y a y n k b fn n k= =
− − + −= ∑ ∑n = p
n = N
To ensure we deal with a causal y[n]
E.g. Linear Prediction (Useful for Speech Coding and Recognition)
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 35
( ) ( )
1
1Assume we are told of an AR system 1
Speech is often modeled as output of such system driven by either a zero-meanuncorrelated signal in the case of unvoiced speech (such as "f", "s" kn
pk
kk
p H za z−
=
=+∑
( ) [ ] [ ] [ ] [ ] [ ] [ ]1
own asfricatives) or by a periodic pulse sequence in the case of voiced speech (vowels)due to the "peaky" nature of human speech signal (in time).
From 1p
T Tk a
kH z y n a y n k f n n f n n
=
⇒ = − − + = − − + = −∑ a y a y
a [ ]
[ ][ ][ ]
[ ]
1
2
11
and 2a
p
f na y na n y n
a y n p
− − −
y
[ ] [ ] [ ][ ] [ ] [ ]ˆGoal is to find or so that
ˆ
is minimized
Ta
T
d
a
n n
e
h n y
n z n y n
⇒ = −=
= −
a y
a a
Application for Speech Recognition (big data example) Suppose there are several classes of signals to be
distinguished (for example, several speech sounds to be recognized).
Each signal will have its own set of prediction coefficients Signal 1 has a1 Signal 2 has a2, …
An unknown input signal can be reduced (by estimating the prediction coefficients that represent it) to a vector a Then a can be compared with a1, a2, and so on…to determine
which signal the unknown input is most similar to
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 36
Inverse Problem: Another Perspective of Prediction
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 37
f[n] y[n]e[n]
Assumed system that needs to be identified
Drive this to some small value in
some sense
( )
1
1 1
pk
kk
H za z−
=
=+∑ ( )
1
1 1p
kk
ka z
H z−
=
= +∑ [ ]f n
Inverse system that is
estimated
Inverse Problem: Another Perspective of Prediction
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 38
( ) ( ) ( ) ( ) ( ) ( )[ ] [ ] [ ]
[ ] [ ]
( )
1
1If
1
In this case, is regarded as input, then is output of an inverse system.If we have an estimated system
1 1
T
pk
kk
Y z H z F z F z Y zH z
f n y n n
y n f n
H za z−
=
= ⇒ =
⇒ = + −
= =+∑
a y
( )( )
[ ] [ ] [ ][ ]
then choose 1
so that is close to in some sense. This is known as an inverse problem.
T
Y zF z
f n y n n
f n
= + −a y
Nonparametric Spectrum Analysis
From DSP, we know we can perform DFT on “any” signals to get a picture of the spectrum
Not very accurate Exploiting a priori knowledge of signal is better
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 39
[ ] [ ]
[ ] [ ]
21
0
21
0
"Analysis" Equation
1
"Synthesis" Equation
1
N j knN
n
N j knN
k
X k x n eN
x n X k eN
π
π
− −
=
−
=
=
=
∑
∑
Why these equationsare writtenthis way?
Parameter Fourier Analysis
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 40
[ ][ ]
[ ][ ]
[ ] ( ) ( )
0 0
0
0 0
0 0
Assume we know the signal cos 2 sin 2 , for 0,1, , 1,
where / , with 1, , / 2 1. Estimate .
1 00cos 2 sin 21
cos 2 1 sin 2 11
T
s n a f n b f n n N
f k N k N a b
sf fs
f N f Ns N
π π
π π
π π
= + = −
= = − =
= − −−
H
θ
( )
1
1 20
1 1 2 2
It can shown that is orthogonal, i.e.
cos 2 sin 2 0 / 2
2
NT
Tnp
T T
ab
k kn nN N N
N
π π−
=
= = ⇒ =
= =
∑
H
h hH H I
h h h h
( ) 1ˆˆˆ
T Ta
b−
= =
θ H H H x
Parametric Fourier Analysis
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 41
( )
[ ]
[ ]
1
1
0
1
0
ˆˆˆ
2 cos 22
2 sin 2
T T
N
nTN
n
a
b
kx n nN N
N kx n nN N
π
π
−
−
=
−
=
= =
= =
∑
∑
θ H H H x
H x
General Adaptive Filter Configuration
Select parameters to achieve the “best” match between the desired signal d[n] and filter output – optimizing the performance function such as Least-squares error Mean-squared error
Characteristics of AF Can automatically adjust (or
adapt) in the face of changing environments and changing system requirements
Can be trained to perform specific filtering or decision-making tasks
Should have some “adaptation algorithm” (learning algorithm) for adjusting system’s parameters
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 42
AdaptiveFilter
f[n]
Adaptive algorithm
y[n]e[n]
d[n]
Applications of AF: System Identification and Interference Cancellation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 43
Unknown system
f[n]
AdaptiveFilter
d[n]
e[n]
AdaptiveFilter
noise
y[n]
e[n]
Signal = x[n]
d[n]
f[n] = η[n]
Applications of AF: Inverse Modeling and Predictors
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 44
f[n] y[n]e[n]Unknown
systemDrive this to some
small value in some sense
[ ]f n
delayd[n]
AdaptiveFilter
f[n] y[n]e[n]
Drive this to some small value in
some sense
[ ]f ndelay
d[n]
AdaptiveFilter
Random Variable (RV)
A random variable is a function that assigns a numerical value each possible outcome in S, i.e. S→ℜ (field of real number) More convenient to work with a
numerical value than nonnumerical value
Can be discrete or continuous (example of discrete RV on top right, continuous RV on bottom right)
Convention Capital letters denote RVs Lowercase letters denote values
the RVs take on E.g. fX(x) distribution function for
RV X with value x
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 45
CDF and PDF
Functions which relates the probability of an event to a numerical value assigned to an event
Parameter vs. nonparameteric There are several different parametric PDFs Nonparametric
Estimated directly from data Easily adaptable
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 46
Probability (Cumulative) Distribution Functions
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 47
( ) ( )( )
( ) ( ) ( )( )
( ) ( )
( )0
0
A way to probabilistically describe an RV
1. 0 1, with 0, 1
2. is continuous from the right, that is,
lim
3. is a nondecre
X
X
X X X
X
X Xx x
X
F x P X x
F x
F x F F
F x
F x F x
F x
+→
•
≤
≤ ≤ −∞ = ∞ =
=
Properties of
( ) ( )1 2 1 2
asing function of , i.e.
if X X
x
F x F x x x≤ <
From 2., FX(x) is continuous from right, so the jump amount = P0
Probability Density Functions (PDF)
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 48
( ) ( )
( )
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )
2
11 2 2 1
More convenient to express statistical averages using PDFs
1. 0
2. 1
3.
4.
XX
X
XX X X
xx
X X Xx
X
dF xf x
dxf x
dF xF x f d f x
dx
f x dx
P x X x F x F x f x dx
f x dx P x dx
ηη η
=
= ⇒ = ≥
=
≤ ≤ = − =
= − <
∫
∫∫
Properties of
( )X x≤
Example: Discrete PDF and CDF
2 fair coins are tossed X: # of heads
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 49
Outcome X P(X=xj)
TT x1=0 ¼THHTHH x3=3 ¼
Some texts use pmf where the Dirac delta’s are represented simply as Kronecker delta’s
Example: Cont. PDF and CDF
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 50
Consider the pointer-spinning experiment. Assume any one stopping point is notfavored over any other and that the RV is defined as the angle that the pointermakes with the vertical, modulo 2 . Thuπ
Θ
[ )[ )
( ) ( )( ) ( )
1 2
1 1 2 2
1 2 1 2
s is limited to 0,2 and for any two
angles and in 0,2 , we have
(equally likely assumption)
, 0 , 2 .
P P
f f
π
θ θ π
θ θ θ θ θ θ
θ θ θ θ πΘ Θ
Θ
−∆ < Θ ≤ = −∆ < Θ ≤
⇒ = ≤ <
( )1 , 0 2 ,
20, otherwise
Area under PDF curve is the probability.
fθ π
θ πΘ
≤ <⇒ =
Joint CDFs and PDFs
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 51
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
1 1
2
1 2 1 2
Characterized by two or more RVs, ,
,,
, ,
, , 1
, ,
XY
XYXY
y x
XYy x
XY XYy x
XY
F x y P X x Y y
F x yf x y
x y
P x X x y Y y f x y dxdy
F f x y dxdy
f x y dxdy P x dx X x y dy Y y
= ≤ ≤
∂=
∂ ∂
< ≤ < ≤ =
⇒ ∞ ∞ = =
⇒ = − < ≤ − < ≤
∫ ∫
∫ ∫
Marginal CDFs and PDFs
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 52
( ) ( ) ( )( ) ( ) ( )
( ) ( )
( ) ( )
Can obtain cdf or pdf of one of the RVs from joint RVs , , ,
, , ,
,
, .
Since
X XY
Y XY
x
X XYy
y
Y XYx
F x y P X x Y F x
F x y P X Y y F y
F x f x y dx dy
F y f x y dx dy
f
′ −∞
′−∞
= ≤ ≤ ∞ = ∞
= ≤ ∞ ≤ = ∞
′ ′ ′ ′=
′ ′ ′ ′=
∫ ∫
∫ ∫
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
and
, and ,
X YX Y
X XY Y XYy x
dF x dF yx f y
dx dy
f x f x y dy f y f x y dx′ ′
= =
′ ′ ′ ′⇒ = =∫ ∫
Conditional CDFs and PDFs
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 53
( ) ( ) ( )( )
( ) ( ) ( )( )
( ) ( )( )
( ) ( )( )
( ) ( )( )
( ) ( )
Conditional RV:,
,
Bayes Theorem:
,
where given .
XYX Y X Y
Y
X Y XYX Y
Y
X XY X Y XXYX Y
Y Y Y
Y X
F x yF x Y F x Y y
F y
F x Y y f x yf x y
x f y
f y X x f x f y x f xf x yf x y
f y f y f y
f y x dx P y dy Y y X x
= ≤ =
∂ == =
∂
== = =
= − < ≤ =
Statistical Independence
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 54
( ) ( ) ( )( ) ( ) ( )
Two RVs are stat. independent if values one takes on do not influencethe values that the other takes on. , or
,
XY X Y
P X x Y y P X x P Y y
F x y F x F y
⇒ ≤ ≤ = ≤ ≤
=
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
,
If and are not independent, then using Bayes' rule
, .
XY X Y
XY X YY X X Y
f x y f x f y
X Yf x y f x f y x f y f x y
=
= =
Example: Statistical Independence
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 55
( )( )
( ) ( )( )
( ) ( )
2
2
0 0
Two RVs and have joint pdf
, , 0 ,0, otherwise.
can be found by noting that
, , 1
Since 1 2
2,
x y
XY
XY XYy x
x y
X XYy
X Y
Ae x yf x y
A
F f x y dxdy
Ae dxdy A
ef x f x y dy
− +
∞ ∞ − +
−
≥=
∞ ∞ = =
= ⇒ =
= =
∫ ∫
∫ ∫
∫( )
( ) ( )
( ) ( )( )
( ) ( )( )
2 2
0
2
, 0 2 , 00, 00, 0
, 0,
0, 0
, 2 , 00, 0
, , 00, 0
x y x
y
Y XYx
xXY
X YY
yXY
Y XX
dy x e xxx
e yf y f x y dx
y
f x y e xf x y
f y x
f x y e yf y x
f y y
∞ + −
−
−
−
≥ ≥ = < <
≥= =
< ≥
= = <
≥= =
<
∫
∫ Conditional prob’s are equal to respective marginals X and Yare independent.
Example: Statistical Independence
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 56
( )( )
( ) ( )( )
( ) ( )
2
2 2
0
, , 0,0, otherwise.
2 , 0 2 , 0,
0, 00, 0
, 0,
0, 0
x y
XY
x y x
X XYy
y
Y XYx
Ae x yf x y
e x e xf x f x y dy
xx
e yf y f x y dx
y
− +
∞ − + −
−
≥=
≥ ≥= = = < <
≥= =
<
∫∫
∫
Sum of Two Statistically Indep. RVs
The density of the sum of two statistically independent RVs is the convolution of their individual density functions.
Suppose X, and Y are three independent RVS where W = X + Y, then
fW(w), fX(x), and fY(y) are pdfs of W, X, and Y, respectively
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 57
( ) ( ) ( )W Y Xyf w f y f w y dy= −∫
x
y
x+ y = w
y=w
x=wx+ y ≤ w
( ) ( ) ( )
( )
( ) ( )
, ,
(stat. indep.)
Differentiating we get the result
W
w y
X Yy x
w y
Y Xy x
F w P W w P X Y w
f x y dxdy
f y f x dxdy
−
=−∞
−
=−∞
= ≤ = + ≤
=
=
∫ ∫
∫ ∫
Statistical Averages
Sometimes full description of RVs, i.e. knowing its CDF or PDF are not required
Sometimes only partial information is needed One type of partial information of a set of RVs
statistical average or mean value
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 58
Average of Discrete RV
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 59
[ ]1 1
1
Expectation of RVs, , , with respective probabilities , ,
Justification:Let experiment be perform number of time,
Arithmeti
M MM
x j jj
M x x P P
E X x
N
P
N
µ=
=∑
with la rge
1 1
1
1 1
1
c mean:
By relative frequency interpretation: lim
Mjm m
jj
jjN
Mm m
j jj
nn x n x xN N
nP
Nn x n x x P
N
=
→∞
=
+ +=
=
+ +⇒ =
∑
∑
Average of Cont. RV
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 60
( )0Expectation of to with pdf . Suppose we break up this interval intosubintervals of size (assume small). The probability that lies between
to is
M X
i i
x x f xx X
x x xP x
∆−∆
( ) ( )
( ) ( )
[ ] ( )
( )0
0
0
lim
1
, for 0, , .Hence, approximated by a discrete RV that takes on values to with probabilities , , .
x
i i X i
M
X X M
M
x i X i Xxi
x X x f x x i MX x x
f x x f x x
E X x f x x xf x dxµ∆ →
=
− ∆ < ≤ ≈ ∆ = …
∆ ∆
⇒ ≈ ∆ =∑ ∫
Properties of Expectation
E[⋅] is a linear operator Sometimes need to perform E(tr(⋅)). tr(⋅) is also linear
operator E(tr(⋅)) = tr(E(⋅)) Additive
E[X+Y] = E[X] + E[Y] for any 2 RVs
Homogeneity E[cX] = cE[X], for any constant c
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 61
Average of a Function of a RV
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 62
( )
[ ]( )
( )
( )
( )
( )
Let .
, discrete RV .
, cont. RV
moment of , for 0,1, 2, . Let
, discrete RV
,
i ii
Y
Yy
th r
ri i
r ir
rXx
Y g X
y P yE Y
yf y dx
r X r Y g X X
x P xE X
x f x dx
µ
ξ
=
=
= = =
=
∑
∫
∑
∫
( ) ( )( )
[ ] ( )2 22
cont. RV
central moment of , for 0,1, 2, . Let
Special case: variance: 2
var
rthX
rr X
X
r X r Y g X X
m E X
r
X m E X E X
µ
µ
µ
= = = −
− =
− =
2 2x Xµ σ −
Average of a Function of a RV
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 63
( )
( )
[ ]
( )
,
,
11
joint moment of and Y, for , 0,1, 2,
, , discrete RV
, , cont. RV
Correlation:
Note:Independent:
th
i jm i m
i j i jij
i jXYx y
XY
r X i j
x y P x yE X Y
x y f x y dxdy
E XY
E XY E
ξ
ξ
=
=
=
∑
∫
( ) ( )( )( )
( )Uncorrelated: 0
Orthogonal: 0
Implications: If and are independent and have zero mean, implies and are uncorrelated and orthogonal. If and are uncorrelated and ha
X Y
XY X Y
X E Y
E X Y
E XY
X Y X YX Y
µ µ− − = =
•• ve zero mean, implies they are orthogonal. Hence, independence is the strongest of the three properties.•
Average of a Function of a RV
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 64
( ) ( )
[ ] ( )( ) [ ]11
joint central moment of and Y, for , 0,1, 2,
Covariance:
,
Correlation coefficient for and :
th
i jij X Y
X Y X Y
r X i j
m E X Y
Cov X Y m E X Y E XY
X Y
µ µ
µ µ µ µ
=
− −
− − = −
[ ]112 2
20 02
,
X Y
Cov X Ymm m
ρσ σ
=
Conditional Expectation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 65
( )
( )
[ ] ( ) ( ) ( )
Conditional expectation of given
Expectation of functions of :
X Yx
Xx
X Y y
E X Y E X Y y xf x Y y dx
X Y g X
E Y E g X g x f x dx
=
= = = =
=
= =
∫
∫
Removing Conditional Expectation Via Expectation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 66
( )( ) ( ) ( )
( ) ( )
( )
( )
Since is a function of , it is also a RV.
X Y
Y YX Y X Yy x
YX Yx y
XYx y
Xx
E X Y Y
E E X Y xf x y dx f y dy
x f x y f y dy dx
x f XY dy dx
xf X dx
=
=
=
=
∫ ∫
∫ ∫
∫ ∫
∫[ ] XE X=
Conditional Expectation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 67
This is an "expectation" version of the total probability theorem.In many cases, we can simplify a problem by conditioning or "fixing"one RV and performing an expectation. Then remove the conditionin
( ) ( )( )
gin a second step by taking the expectation w.r.t. the conditioning RV.
More generally:
Y X YE g X E E g X Y =
Special Average: Characteristic Function
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 68
( )
( ) ( )
( ) ( )
( )
Let
1 2
Note: This is Fourier transform of if we have Sometimes it is more convenient to use the variable
j X
j X j xXx
j xX v
j XX
g X e
E e f x e dx
f x e dv
f x es
ω
ω ω
ω
ω
ω
ωπ
−
−
=
Φ =
= Φ
•
•
∫
∫
( ) ( )
[ ] ( ) ( )0
in place of , the result becomes .
Obtaining moments of a RV:
Set 0 :
jvxXx
j
j xf x e dxd
E X jd
E X
ω
ω
ωω
ωω
ω=
∂Φ=
∂Φ= ⇒ = −
⇒
∫
moment generating function
( ) ( )0
nnn
njd
ω
ωω
=
∂ Φ = −
Chebyshev Inequality and the Law of Large Numbers
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 69
( )
2
2
2
Let be a RV with mean and finite variance . Then for any 0,
(Chebyshev Inequality)
X X
XX
X
P X
µ σ δ
σµ δδ
>
− ≥ ≤
1 22
1
Let , , , be i.i.d. (independent and identically distributed)
RVs with mean and variance each. Let the sample mean be1ˆ .
Then, for any fixed 0,
N
X XN
X ii
X X X
XN
µ σ
µ
δ=
=
>
∑
( )ˆ lim 0. (LLN)
ˆIntuitively, this means the estimator, , will converge to in probability.ˆIf the above limit equals 0, is called a consistent estimato
X XN
X X
X
P µ µ δ
µ µµ
→∞− ≥ =
r of .Xµ
Useful PDFs
Discrete RVs Binomial distribution
Related to chance experiments with two mutually exclusive outcomes with probability p and 1-p Model number of times event A has occurred in n trials (events are indep)
Poisson distribution Related to chance experiment in which an event whose probability of occurrence in a very small time
interval ∆T is P=α∆T, where α is a constant Model the probability of k events occurring in time T Commonly used to model arrival time of packets in packet switching networks
Continuous RVs Normal (Gaussian) distribution
Commonly used to model large number of indep. random events when distribution of each event is unknown
Sum of large number of independent RVs converges to a Gaussian distribution Rayleigh distribution
(see above) Rician distribution
Commonly used to model distribution of power profile of wireless channel when direct line-of-sight (LOS) exists
x = sqrt(x12+x2
2), where x1~N(µ1,σ2), x2~N(µ2,σ2) are indep. RV
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 70
Useful PDFs
Continuous RVs Chi-Squared (central and noncentral)
Commonly encounter in detector design
F-distribution (central and noncentral) Commonly encounter in detector design
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 71
( )
2
2
1
with degrees of freedom
, ~ 0 or ,1 and indep.i i ii
x x x N
ν
ν
χ ν
µ=
=∑
( )1 2
2
2 21 11 2
2 2
PDF: ratio of 2 indep. RVs/ , ~ , ~ and indep./
0 : central dist.
Fxx x xx
F
ν
ν ν
χν χ λ χν
λ
=
= −
Gaussian (Normal) Distribution
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 72
( ) ( )
[ ] ( )
( ) ( )
( ) ( )
( ) ( )2 2
1 1
2
2 2
22
2
1 2 1 2
1 dimensional:
1 1 exp2 2
where ,
Joint CDFs and PDFs:, ,
,,
, ,
X
XY
XYXY
y x
XYy x
f x x
E X E X
F x y P X x Y y
F x yf x y
x y
P x X x y Y y f x y dxdy
µπσ σ
µ σ µ
−
= − −
−
= ≤ ≤
∂=
∂ ∂
≤ ≤ ≤ ≤ = ∫ ∫
( ) ( ) ( )( ) ( ) ( )
( ) ( )
Marginal distribution:, ,
, ,
,
X XY XY
Y XY XY
X XYx
F x F x F x Y
F y F y F X y
f x f x y dy
= ∞ = ≤ ∞
= ∞ = ≤ ∞
= ∫
2-D (Bivariate) Gaussian Distribution
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 73
( )( ) ( ) ( ) ( )
( )
[ ] [ ] [ ] [ ]( ) ( ) [ ]
22
22
2 2
2 2
/ 2 / / /1, exp2 12 1
where , , var , var
,
x x x x y y y yXY
x y
x y x y
x y
x y x y
x x y yf x y
E X E Y X Y
E X E Y Cov X Y
µ σ ρ µ σ µ σ µ σ
ρπσ σ ρ
µ µ σ σ
µ µρ
σ σ σ σ
− − − − + − = − −−
= = = =
− − = =
2-D (Bivariate) Gaussian Distribution
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 74
N-dimensional Gaussian Distribution
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 75
( )( ) ( )
( ) ( )
[ ]( )
( )
( )( )
1/ 2 1/ 2
1
1 1exp22 det
(applied element-wise)
TN
N
T
f
E xE
E x
E
π− = − − −
=
− −
X x x
x
x x
x x μ C x μC
μ x
C x μ x μ
Central Limit Theorem
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 76
2 2 21 2 1 2
2 2 21
2
Let , , , be indep. RVs with zero mean and variance , , , .
Let . If for any fixed 0, there exists a sufficient large such that
N N
N N
k
X X Xs
N
σ σ σ
σ σ ε
σ
+ + >
<
1 2
, for 1, , ,then the normalized RV
converges to the standard normal (Gaussian) PDF.
N
NN
N
s k N
X X XZs
ε =
+ + +
Q-Function
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 77
( )
( ) ( )
2
222
2/
/
Gaussian -Function:
Normalized Normal distribution of ,
1 1Consider exp22
1(let ) exp22
x
x
x
x
x x
a
x x xaxx
axa
x
Q
N
P a X a x dx
x yy dy
µ
µ
σ
σ
µ σ
µ µ µσπσ
µσ π
+
−
−
− ≤ ≤ + = − −
−= = −
∫
∫
( )
2/
0
2
/
2
1 2 exp22
1(since area under PDF=1) 1 2 exp22
1 2
1 1where exp ex22 2
x
x
a
a
x
u
y dy
y dy
aQ
yQ u dyu
σ
σ
π
π
σ
π π
∞
∞
= −
= − −
= −
− ≈
∫
∫
∫2
p , for 12
has been computed numerically.
u u −
Normalized Distribution Function: F(x) and Q(x)
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 78
( )
( ) ( )( ) ( )
( )
2
2
/ 2
Normalized cumulative distribution function: 0, 11 2
1
A related function: 11 2
x x
xF x e d
F x F x
F x Q x
Q x e
ξ
ξ
µ σ
ξπ
π
−
−∞
−
= =
=
− = −
= −
=
∫
( ) ( )
/ 2
1
xd
Q x Q x
ξ∞
− = −
∫
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 79
( )( ) ( )
Normalized cumaltivedistribution function
1
F x
F x Q x= −
Stochastic Process
Random Processes (Stochastic Processes) Informal definition
The outcomes (events) of a chance experiment are mapped into functions of time (waveforms)
Cf. Random variables: outcomes are mapped into numbers Each waveform is called a sample function, or a realization. The
totality of all sample functions is called an ensemble Chance experiment that gives rise to this ensemble is called a
random/stochastic process Formal definition
Every outcome ζ we assign, according to a certain rule, a time function X(t,ζ). X(t,ζi) signifies a single time function
X(tj,ζ) denotes a single RV X(tj,ζi) is a number
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 80
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 81
Voltage at the terminals of a noise generator. 10 ensemble experiments
Statistical Description of Random Process A random process is statistically specified by its
Nth order joint pdf’s that describes a typical sample function at times tN > tN-1 > … > t1, for any N where
FX1X2…XN(x1,t1;x2,t2; …; xN,tN) = P(x1-dx1 < X1 ≤ x1 at time t1, x2-dx2 < X2 ≤ x2 at time t2, …, xN-dxN < XN
≤ xN at time tN)where Xn≡X(tn,ζ), for n=1,…N
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 82
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 83
Random process from realization ζM.
X(tj,ζ) is a random variable
Joint probability (from relative frequency) is the number of sample functions that pass through the slits placed at t=t1 and t=t2 in both barriers divided by the total number of M of sample functions as M becomes large w/o bound
FX1X2(x1,t1;x2,t2) = P(x1-dx1 < X1 ≤ x1 at time t1, x2-dx2 < X2 ≤ x2 at time t2)
Stationarity and Wide-Sense Stationarity
Statistical stationarity in the strict sense or stationarity Joint pdfs depend only on the time
differences t2-t1, t3-t1, …, tN-t1 Not dependent on time origin
Mean and variance independent of time Correlation coefficient or covariance
depends only on difference, e.g. t2-t1 Wide-sense stationarity (WSS)
Joint pdfs are dependent on time origin Mean and variance independent of time Correlation coefficient or covariance
depends only on difference, e.g. t2-t1 Stationarity WSS
Converse is not necessarily true Exception: Gaussian random process
(Why?)
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 84
Nonstationary processes
Stationary processes
Ensemble Average (Expectation)
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 85
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
2 222
*
1 2 1 1 2 2
**1 2 1 2
2 1
Mean: ,
Variance:
Covariance:
,
,
x X
xx
xx
xx
t E x t x t f t d
t E x t x t E x t x t
c t t E x t x t x t x t
E x t x t x t x t
c t t E x t
αµ α α α
σ
= = =
= − = −
= − −
= −
=
∫
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )( )
1 22 1
*
2 2 1 1
**2 1 2 1
*1 2 2 1
*1 2 1 2
1 2 1 1 2 2 1 2
, ,Autocorrelation:
,
, ; ,
xx xx
xx
X X
x t x t x t
E x t x t x t x t
c t t c t t
r t t E x t x t
f t t d dα α
α α α α α α
− −
= − ⇒ =
=
= ∫ ∫
Ensemble Average (Vector Random Process)
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 86
( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
2
22
1 2 1 1 2 2
1 2
Mean:
Variance:
2Re
Covariance:
,
x
H
xx
H
H
xx
H
t E t t
t E t t t t
E x t t t x t
t t E t t t t
E t t
σ
= =
= − −
= − +
= − −
=
μ x x
x x x x
x x
C x x x x
x x ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
1 2 1 2 1 2
1 2 1 2
Autocorrelation:
,
H HH
Hxx
E t t E t t t t
t t E t t
− − +
=
x x x x x x
R x x
Ensemble Average (Expectation) for WSS Process
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 87
( ) ( )( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
2
*
**
*
WSS:
Mean: constant
Variance: constantCovariance:
Autocorrelation:
x
xx
xx
xx
t E x t
t
c E x t x t x t x t
E x t x t x t x t
r E x t x t
µ
σ
τ τ τ
τ τ
τ τ
= = =
− − − −
= − − −
−
( ) ( ) ( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
* *
* * *
*
xx
xx
xx
r E x t x t
r E x t x t E x t x t
E x p x p r
τ τ
τ τ τ
τ τ
⇒ − ⇒ − + = + = − =
Ensemble Average for Vector WSS Process
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 88
( ) ( )( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
2
WSS:
Mean: constant
Variance: constant
Covariance:
Autocorrelation:
x
Hxx
H
xx
HH
xx
t E t
t E t t
E t t t t
E t t t t
E
σ
τ τ τ
τ τ
τ
= = = =
− − − −
= − − −
μ x
x x
C x x x x
x x x x
R x
( ) ( )Ht t τ − x
Ergodicity
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 89
( ) ( )
Ergodic processes are processes for which time and ensemble averages are interchangeable.For example, for real-valued WSS processes:
x E x t x tµ = =
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( )
2 22
,
1where lim .2
Note: All time and ensemble averages are interchangeable, not just t
xx
xx
T
TT
E x t x t x t x t
r E x t x t x t x t
v t v t dtT
σ
τ τ τ
−→∞
= − = −
= + = +
•
∫
he above. Ergodicity strict-sense stationarity• ⇒
Example 1: Ergodicity
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 90
( ) ( )
( )
0
0
Consider a random process with sample function cos 2 ,where is a constant and is a RV with pdf
1 , .2
0, otherwiseCalculate its
n t A f tf
f
π θ
θ πθ πΘ
= +
Θ
≤=
( ) ( )
( ) ( ) ( )
( )
( )
0
22 20
2 20
2
0
2
ensemble and time-average.
1 cos 2 02
1 cos 22
1 cos 22
1 cos 4 24
2
nn
E n t A f t d
t E n t A f t d
A f t d
A f t d
A
π
π
π
π
π
π
π
π
π θ θπ
σ π θ θπ
π θ θπ
π θ θπ
−
−
−
−
= + =
= = +
= +
= + +
=
∫
∫
∫
∫
( ) ( )
( ) ( )
( ) ( ) ( ) ( )
0
2 2 20
2
2 2
1lim cos 2 02
1lim cos 22
2
=constant and constant.
It may be stationary and ergodic.
T
TT
T
TT
nn
n t A f t dtT
n t A f t dtT
A
E n t n t t n t
π θ
π θ
σ
−→∞
−→∞
= + =
= +
=
= = =
∫
∫
Example 2: Ergodicity
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 91
( )
( ) ( )
( ) ( )
( ) ( ) ( )
/ 4
0/ 4
/ 4
0 0/ 4
/ 4 22 20/ 4
2 , Suppose .4
0, otherwiseCalculate its ensemble and time-average.
2 cos 2
2 2 2 sin 2 cos 2
20 cos 2nn
f
E n t A f t d
AA f t f t
r E n t A f t d
π
π
π
π
π
π
πθθ π
π θ θπ
π θ ππ π
π θ θπ
Θ
−
−
−
≤=
= +
= + =
= = +
∫
( )
( )
2 / 4
0/ 4
2 2
0
1 cos 4 2
cos 42
Process is not stationary as first and second moment depends on , henceit is for different time origin.
A f t d
A A f t
t
π
ππ θ θ
π
ππ
−= + +
= +
∫
∫
Summary for Ergodic Process
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 92
( ) ( ) ( )
( ) ( )( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
2 2
2 2
2 22 2 2
1. Mean: is the DC component
2. is the DC power
3. 0 is the total power
4. is the power in the
alternating current (time-varying) component
5.
x
xx
xx
t E x t x t
x t x t
r x t x t
t x t x t x t x t
µ
σ
= =
=
= =
= − = −
( ) ( ) ( ) 22 2Total power is the AC power plus the
DC powerxxx t t x tσ= +
Power Spectral-Density Functions (PSD) and Cross-Spectral Density
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 93
( ) ( )
The PSD of a wide-sense stationary random process is the Fourier transformof the autocorrelation function. For continuous-time random process
.
Sinc
jxx xxS j r e dτ
ττ τ− ΩΩ = ∫
( )( )
e is symmetric, the PSD is a real-valued function of . Since
real-valued power cannot be negative, the PSD must satisfy 0,. Then average power of a random process is
xx
xx
r
S
τ Ω
Ω ≥
∀Ω
( ) ( ) ( ) ( )
( ) ( )
( ) ( )
2*
0
0
1 1 2 2
Cross-Spectral Density:
xx
jxx xx
jxy xy
r E x t x t E x t
S j e d S j d
S j r e d
τ
τ
τ
τ
π π
τ τ
Ω
Ω Ω=
− Ω
= =
⇔ Ω Ω = Ω Ω
Ω =
∫ ∫
∫
Bilateral Laplace Transform of the Autocorrelation Function
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 94
( ) ( )
( ) ( )
Note: (entire complex plane). Define
For real-valued random process, since autoco
sxx xx
sxy xy
s j
S s r e d
S s r e d
τ
τ
τ
τ
σ
τ τ
τ τ
−
−
= + Ω
∫∫
( ) ( )( ) ( )*
variance is real and even, itsLaplace transform will be even .
If .xx xx
xx xx
S s S s
s j S j S j
= −
= Ω − Ω = Ω
Discrete-Time PSD and its Laplace Transform Representation
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 95
( ) [ ]
( ) [ ]
( ) [ ]
For discrete-time, PSD:
Cross-Spectral Density:
Define
j j kxx xx
k
j j kxy xy
k
kxx xx
k
S e r k e
S e r k e
S z r k z
ω ω
ω ω
−
−
−
=
=
∑
∑
∑
( ) [ ]
( )
( ) ( )*
For real-valued process1
and
kxy xy
k
xx xx
j jxx xx
S z r k z
S S zz
S e S eω ω
−
−
=
=
∑
Uncorrelated, Orthogonal, Independent Random Processes
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 96
( ) ( )
( ) ( ) ( )
( )
( )
*1 2 1 1 1 2
1 2 1 2
1 1 1 2 2 2
Given two random processes and (1) Uncorrelated if , , ,(2) Orthogonal if , 0, ,(3) Independence: if , , ; , , ; ; , ,
XY X Y
XY
XY n n n
X t Y t
R t t m t m t t t
R t t t t
f x y t x y t x y t
= ∀
= ∀
( ) ( )
( ) ( )( ) ( ) ( )( )( ) ( )
1 1 2 2 1 1 2 2 , ; , ; ; , , ; , ; ; ,
Remarks:(1) Independence Uncorrelated
(2) Uncorrelated and are orthogonal
(3) (Uncorrelated and either 0 or 0) orthogonal(4)
X n n Y n n
X Y
X Y
f x t x t x t f y t y t y t
X t m t Y t m t
m t m t
=
⇒
⇒ − −
= = ⇒
Uncorrelated and Gaussian Independent⇒
Linear Systems and Random Processes
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 97
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
* * *1 2 1 2 1 2
* *1
Given is LTI, and *
Mean of :
*
0
Cross-correlation
,
y u u
x xu
xy u
h t y t h t x t
y t
t E h t x t E h u x t u du h u E x t u du
t h u du t H
r t t E x t y t E x t h u x t u du
h u E x t x
µ
µ µ
=
= = − = −
= =
= = −
=
∫ ∫
∫
∫( )
( ) ( )( )
( ) ( ) ( ) ( ) ( )
2
*1 2
1 2
* *
If is WSS, let
*
u
xxu
xy xx xxu
t u du
h u r t t u du
x t t t
r h u r u du h r
τ
τ τ τ τ
−
= − +
= −
= + = −
∫∫
∫
Linear Systems and Random Processes
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 98
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( )( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
* *1 2 1 2 1 2
*1 2
1 2
1 2
*
Similarly
,
If is WSS, let
*
yx u
u
xxu
yx xx xxu
yy
r t t E y t x t E h u x t u du x t
h u E x t u x t du
h u r t t u du
x t t t
r h u r u du h r
r E y t y t
τ
τ τ τ τ
τ τ
= = −
= −
= − −
= −
= − =
= −
∫
∫∫
∫( ) ( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )( ) ( ) ( )
* *
* *
*
*
*
*
* *
u
u
yxu
yx
xx
E y t h u x t u du
h u E y t x t u du
h u r u du
h r
h h r
τ
τ
τ
τ τ
τ τ τ
= − −
= − −
= +
= −
= −
∫
∫∫
Linear Systems and Power Spectral Densities
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 99
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
* *
*
* * *
*
*
Since and
xy xx xy xx
yx xx xy yx xx
xx xx xx xx xx
yx xx
r h r S j H j S j
r h r r S j H j S j
r r r S j S j
S j H j S j
τ τ τ
τ τ τ τ
τ τ τ
= − ⇔ Ω = Ω Ω
= = − ⇔ Ω = Ω Ω
− = − = Ω = − Ω
⇔ Ω = Ω Ω
( ) ( )
( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
* *
2* *
* *
xx
yy yx yy yx
xx xx xx
H j S j
r h r S j H j S j
h h r H j H j S j H j S j
τ τ τ
τ τ τ
= Ω − Ω
= − ⇔ Ω = Ω Ω
= − ⇔ = Ω Ω Ω = Ω Ω
Markov and Hidden Markov Models (HMM) HMM is a stochastic model that is used to model
time-varying random phenomena E.g. speech signal, video sequence Can be understood in terms of state-space models
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 100
Markov Models
Used to model evolution of random phenomena that can be in discrete states as a function to time, Transition from one state to the
next is random E.g. A system can be in one
of the S distinct states At each step of discrete time it
can move to another state at random, with probability of the transition at the time tdependent only upon the state of the system at time t i.e. only the previous state is
relevant
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 101
1 ( )1p y 2 ( )2p y
3 ( )3p y
0.5
0.3
0.2
0.3
0.7
0.7
0.1
0.2
Markov Models
From state 1 to state 1 is possible with probability 0.5
Denote S[t] denote the state at time t, where it takes on one of the values 1,2, …, S.
Initial state is selected according to a probability π
πi = P(S[1] = i), i = 1, 2, …, S
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 102
1 ( )1p y 2 ( )2p y
3 ( )3p y
0.5
0.3
0.2
0.3
0.7
0.7
0.1
0.2
Markov Models
Probability of transition depends ONLY upon the current state P(S[t+1] = j | S[t] = i, S[t-1] = k, S[t-2] = l, …) = P(S[t+1] = j | S[t] = i)
This structure of probability is called the Markov property, andthe random sequence of state values S[0], S[1], S[2], … is calleda Markov sequence or a Markov chain
Sequence is the output of the Markov model Can determine the probability of arriving in the next state by
adding up all the probabilities of the ways of arriving there, i.e.
Note that this is just the law of total probability
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 103
[ ]( ) [ ]( ) ( ) [ ]( ) ( )
[ ]( ) ( )
1 1 [ ] 1 [ ] 1 1 [ ] 2 [ ] 2
1 [ ] [ ]
P S t j P S t j S t P S t P S t j S t P S t
P S t j S t S P S t S
+ = = + = = = + + = = =
+ + + = = =
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 104
Partitions and Total Probability
B
A1
A2A3
A4
An
1 2
1
Suppose the events , , , form a partition of a sample space , that is,the events 's are mutually exclusive and their union is . Suppose is anyother event. Then
n
i
ii
A A A SA S B
B S B A=
= ∩ =
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
1 2
1 2
1 1 2 2
,where are also mutually exclusive. Then .
From the multiplication theorem,
n
n
i
n
n
B
A B A B A BA BP B P A B P A B P A B
P B P A P B A P A P B A P A P
∩
= ∩ ∪ ∩ ∪ ∪ ∩
∩
= ∩ + ∩ + + ∩
= + + +
( ).This is known as the .
nB A
law of total probability
Markov Models
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 105
[ ]
[ ]( )[ ]( )
[ ]( )
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
( ) [ ] [ ]( )
Can be written in matrix form. Define
1 11 1 2 1
2 2 1 2 2 2, , with 1 .
1 2
From the previous example:0.5 0.3 0.20.2 0 0.70.3 0
ij
P S n P P P S
P S n P P P Sn a P i j P S t i S t j
P S P S P S SP S n S
=
= = + = = =
=
p A
A
.7 0.1
1 ( )1p y 2 ( )2p y
3 ( )3p y
0.5
0.3
0.2
0.3
0.7
0.7
0.1
0.2
Markov Models
A steady-state probability assignment is one that does not change from one time step to the next, so the probability must satisfy the equation
Ap=p This is an eigenequation, with eigenvalue = 1. By law of total probability, each column of A sum to 1 Definition: An m×m matrix P, such that ∑𝑗𝑗=1𝑚𝑚 𝑝𝑝𝑖𝑖𝑖𝑖 =
1 (each row sums to 1) and each element of P is nonnegative, is called a stochastic matrix. If the rows and columns each sum to 1, then P is doubly stochastic
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 106
Markov Models
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 107
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
11 1 2 1
2 1 2 2 2 is the transpose of a stochastic matrix. The vector
1 2
contains the initial probabilities. Thus, we can write the probabilistic update equation is
P P P S
P P P S
P S P S P S S
A
π
[ ] [ ] [ ]
[ ] [ ][ ]
1 , with 0 .Or, 1 ,
with 0 for 0. Note that the above is similar to the state equation
t
t t
t t
t t
δ
+ = =
+ = +
= ≤
p Ap p π
p Ap π
p
[ ] [ ] [ ][ ] [ ]
[ ] [ ]
1 .
Note that the "state" represented by 1 is actually the vector
of probabilities , not the state of the Markov sequence t
n n f n
t t
t S t
δ
+ = +
+ = +
x Ax b
p Ap π
p
Relationship to Markov Models and HMM
Pick a ball from 3 urns Each urn contains 3 types of colored balls: black green,
and red At each instant of time, an urn is selected by genie at
random according to the state it was in at the previous time instant
Genie – magic creature which could do everything Ball is then drawn at random from the urn at time t Observation = ball selected Actual state is hidden
State of the system before the ball was chosen the state of the system after
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 108
Relationship to Markov Models and HMM: State Diagram
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 109
Urn 13 black2 green1 red
( )1p yUrn 25 black7 green3 red
( )2p y
Urn 32 black2 green2 red
( )3p y
0.5
0.2
0.3
0.2
0.3
0.7
0.7
0.1
Relationship to Markov Models and HMM
To further clarify the relationship,
provides for the state update of the Markov system. However, in most linear system, the state vector x[t]
is not directly observable, instead, it is observed only through the observation matrix C (assuming D = 0), i.e. y[t] = Cx[t]
In an HMM, the state is hidden from direct observation
Instead, each state has a probability distribution associated with it
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 110
[ ] [ ]1 tt t δ+ = +p Ap π
Relationship to Markov Models and HMM
In the HMM, we do not observe the “state” p[t] Instead, each state has a probability distribution associated with it
When HMM moves into state s[t] at time t, the observed output y[t] is an outcome of a random variable Y[t] that is selected according to the distribution f(y[t]|S[t] = s), which we will represent using the notation f(y|S[t]=s) = fs(y)
In the urn example, the output probabilities depend on the contents of the urns
A sequence of outputs from an HMM is y[0], y[1], y[2], … The underlying state information is hidden Distribution in each state can be of any type
Each state could have its own distribution In practice, distribution of each state is the same, but with different
parameters
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 111
Summary: HMM
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 112
[ ][ ]( )
[ ]( )( )
[ ] [ ] [ ]
Denote the state at time as .
Initial state is selected according to probability 1 , 1, 2, ,
assume 0, for 0 .
Transition probability depends ONLY on current state:
1 , 1
i
t S t
P S i i S
P S t i t
P S t j S t i S t k
π = = =
= = ≤
+ = = − =
[ ]( ) [ ] [ ]( )
[ ]( ) [ ] [ ]( ) [ ]( ) [ ] [ ]( ) [ ]( )[ ] [ ]( ) [ ]( )
, 2 , 1
Then, the probability of arriving in the next state is
1 1 1 1 1 2 2
1
S t P S t j S t i
P S t j P S t j S t P S t P S t j S t P S t
P S t j S t S P S t S
− = = + = =
+ = = + = = = + + = = =
+ + + = = =
Summary: HMM State Transition
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 113
[ ]
[ ]( )[ ]( )
[ ]( )
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
( ) [ ] [ ]( )
Can be written in matrix form. Define
1 11 1 2 1
2 2 1 2 2 2, , with 1 .
1 2
From urn example:0.5 0.3 0.20.2 0 0.70.3 0.7 0.1
ij
P S t P P P S
P S t P P P St a P i j P S t i S t j
P S P S P S SP S t S
=
= = + = = =
=
p A
A
( )1p y ( )2p y
( )3p y
0.50.2
0.3
0.20.3
0.7
0.70.1
[ ] [ ][ ]
1 ,
with 0tt t δ⇒ + = +p Ap π
π p
Summary: HMM Input-Output
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 114
[ ] [ ]
[ ]( ) [ ] [ ]( ) [ ]( ) [ ] [ ]( ) [ ]
Let denote the number of possible outcomes from all statesLet be the random variable output at time , with outcome Then probability of each possible output is
1 1 2
MY t t y t
P Y t j P Y t j S t P S t P Y t j S t P S t= = = = = + = = ( )[ ] [ ]( ) [ ]( )
2
P Y t j S t S P S t S
=
+ + = = =
Summary: HMM Input-Output
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 115
[ ]
[ ]( )[ ]( )
[ ]( )
[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )
[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )[ ] [ ]
Can be written in matrix form. Define
1 1 1 2 11
2 2 1 2 2 2, ,
1 2
with ij
P Y t S t P Y t S t P Y t S t SP Y t
P Y t P Y t S t P Y t S t P Y t S t St
P Y t M P Y t M S t P Y t M S t P Y t M S t S
c P Y t i S t
= = = = = = = = = = = = = = = = = = = = =
= = =
q C
( ).From urn example, with 1 (black), 2 (green), 3 (red):
1/ 2 1/ 3 1/ 31/ 3 7 /15 1/ 31/ 6 1/ 5 1/ 3
j
S = = =
=
C0.5
0.2
0.3
0.20.3
0.7
0.70.1
3 black2 green1 red
5 black7 green3 red
2 black2 green2 red
[ ] [ ]n n⇒ =q Cp
State-Space vs. HMM
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 116
[ ] [ ] [ ][ ] [ ] [ ]
State-space:1n n u n
n n n
+ = +
= +
x Ax B
y Cx Du[ ] [ ] [ ][ ] [ ]
HMM:1 , with 0nn n
n n
δ+ = +
=
p Ap π π p
q Cp
[ ] [ ] [ ]
[ ] [ ] [ ] [ ]
1
0
1
0
Hence, models dynamics of system by treating the system as
Recall solution for state-space is
1
1
a Markov proce s.
.
s
nn k
kn
n k
kk
n n k
n n k n
+
=
+
=
= − + −
= − + − +
⇒
∑
∑
x A x A Bu
y CA x CA Bu D
A
u
Example: Speech Modeling for Speech Recognition Patterns in speech signal occurring sequentially in
time Each word or sound (phoneme) to be recognized is
represented by an HMM Output is some feature vector that is derived from the
speech data Random variability in the feature vector and the amount of
time each feature is produced is modeled by the HMM Variability in the duration of the word is modeled by the
Markov model Variability in the outputs is modeled by the random
selection from within each state
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 117
Example
Given a small vocabulary system with N words There are N HMMs: (Ai, πi, Ci) i denotes a particular state
Training phase Each is trained to represent the parameters for that word
Testing phase Sequence of feature vectors is computed (front end part) The likelihood (probability) that this sequence of feature
vectors was produced by the HMM (Ai, πi, Ci) is computed for each i
HMM that produces the highest probability selects the recognized word
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 118
Issues for HMM
Training: How can the parameters (A, π, C) be estimated based
upon observations of the data? In other words, how can we train the parameters of the models
in the pattern recognition problem?
Testing How can we determine how well the observed data fits
the model that has been trained? How can we determine the sequence of states of the
underlying Markov model? I.e. How do we discover the hidden states?
IEE 5335: Mathematical Methods and Algorithms for Signal Processing 119