ccfung/courses/2017_2018/mmasp_i/notes/ch01… · examples of mathematical models. linear signal...

Mathematical Topics Embraced by Signal Processing

Carrson C. FungDept. of Electronics EngineeringNational Chiao Tung University

Examples of Mathematical Models

Linear signal models for discrete and continuous time, including transfer function and state space representations. Applications of these models to SP problems such as prediction, spectrum estimation, and so on

Adaptive filtering models and applications to prediction, system identification, and so forth

The Gaussian random variable, and other probability density functions, including the important idea of conditioning upon an observation

Hidden Markov models Model the dynamics of systems probabilistically

IEE 5335: Mathematical Methods and Algorithms for Signal Processing 2

Why is modeling important?

Our world is complicated To describe it mathematically requires complicated

mathematics E.g. high-order differential equation

E.g. suppose you are ask to design a filter h[n] satisfying some design specifications such as transition bandwidth, passband frequency, stopband frequency, filter order, … Hence, design is usually done in frequency using H(ejω)

How many points in H(ejω) do you need to design? This is an impossible problem to solve as there are uncountable

number of points in [0,π]


Problem Specifications and Variable Parametrization


( )

Suppose the desired response is

1, 0 0,

don't care,

p

s

p s

Dω ω

ω ω ω πω ω ω

≤ ≤= ≤ ≤ < ≤

( ) ( )

( ) [ ] ( )

[ ] ( )

0

1

00

0

Change the variable from to amplitude response

1 cos ,

2assuming Type-I linear phase

j j

Nj n Mj

n

M

n

H e H e

H e h n e

Nb n n M

ω ω

ωω

ω

−− −

=

=

=

−= =

∑

∑

[ ]

( )

( )

and

12 , 0

2

1, 0

2

Nh n n

b nN

h n

− − ≠

= − =

Problem Formulation


[ ]( ) ( ) 2

0

Then the filter design problem can be formulated as a LS problem

min ,

: 0 , but excluding transition band.Integration can be approximated by summation.

Now pro

j

Rb n

dD H e

R

ω ωωπ

ω π

−

≤ ≤

∫

blem only needs to solve a finite number of variables.

Filter designed using LS has much better magnitude frequency response than one using Kaiser windowing method

Other Motivations for Using Mathematics

Given a sequence of output data from a system, how can the parameters of the system be determined if the input signal is known What if the input signal is not known? What if system is nonlinear?


Unknown linear time-invariant

system

x(n)= δ(n)

Model matching for 0 ≤ n ≤ q and

n > q

hd(n) or y(n)

e(n)

Other Motivations for Using Mathematics

Determine a “minimal” representation of a system

Given a signal from a system, determine a predictor for the signal Forward and/or backward

Determine an optimal and/or efficient smoothing method E.g. Image smoothing

Determine a means of efficiently coding (representing) a signal modeled as the output of an LTI system

Develop computational efficient algorithms

Develop adaptive technique to obtain desirable output of system


n

f[n] How do we predict forward and/or backward

n

f[n]

Prediction

Smoothing

[ ]0s

[ ]1s

[ ]2s

Complex-Valued Linear Discrete-Time Models: ARMA and MA


[ ] [ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ]

[ ] [ ] [ ] [ ]

[ ] [ ]

* * *1 1

* * *0 1

* *

0 0

* * *0 1

*

0

Autoregressive moving average (ARMA) model

1 2

1

Moving average (MA) model

1

p

q

p q

k kk k

q

q

kk

y n a y n a y n a y n p

b f n b f n b f n q

a y n k b f n k

y n b f n b f n b f n q

y n b f n k

= =

=

= − − − − − − −

+ + − + + −

⇔ − = −

= + − + + −

⇔ = −

∑ ∑

∑

[ ]

[ ][ ]

[ ][ ] [ ]

0

1

Vector notation

1Define and

q

H

bf nbf n

n

bf n q

y n n

− = =

− ⇒ =

f b

b f

Complex-Valued Linear Discrete-Time Models: AR


[ ] [ ] [ ] [ ] [ ]

[ ] [ ] [ ]

* * * *1 1 0

* *0

1

Autoregressive (AR) model

1 2 p

p

kk

y n a y n a y n a y n p b f n

y n b f n a y n k=

= − − − − − − − +

⇔ = − −∑

[ ]

[ ][ ]

[ ][ ] [ ] [ ]

1

2

*0

Define

12

and

p

H

ay nay n

n

ay n p

y n b f n n

− − = =

− ⇔ = −

y a

a y

System Function and Impulse Response


( ) ( ) ( ) ( ) ( ) ( )

( ) ( )( )

( )( )

* *

0 0

* *

0 0

* *

0 1

Assuming initial conditions are zero

ARMA System function

1

(usually assume system is normalized so t

p qk k

k kk k

q qk k

k kk k

p pk k

k kk k

a z Y z b z F z Y z A z F z B z

b z b zY z B zH z

F z A za z a z

− −

= =

− −

= =

− −

= =

= ⇔ =

= = = =+

∑ ∑

∑ ∑

∑ ∑0hat 1)a =

( ) ( )( ) ( )

( ) ( )( ) ( )

**

0 0

*

1

*

0

All-pole System function (IIR system)

1

All-zero system function (FIR system)

qk

kk

pk

kk

qk

kk

b zY z bH zF z A za z

Y zH z b z B z

F z

−

=

−

=

−

=

= = =+

= = =

∑

∑

∑

System Function and Impulse Response


( ) [ ] [ ]

( )

( ) ( )( )

* 10

1

1

1

Factoring into monomial factors using rootsof numerator and denominator

1

1

k

q

ik

p

ik

H z f k h n k

H z

b z z B zH z

A zp z

−

=

−

=

= −

−= =

−

∑

∏

∏

Stochastic MA and AR Models


[ ]

[ ]( )0

: assumed to be a white discrete-time random process, usually zero mean: set to 1, with input power determined by the variance of the signal

0,

f nb

E f n =

[ ] [ ]( )2

* ,

0, otherwiseff

n

m nE f m f n

σ

∀

==

SP often involves comparing two signals, one way for comparison isby correlation. When the signal is comparing with itself, the correlationis called autocorrelation function. For zero-mean WSS signa [ ]

[ ] [ ] [ ]( ) [ ] [ ] [ ]( )* *

l ,

or

(Note the convention: first argument minus second)yy yy

y n

r k E y n k y n r k E y n y n k− − − = −

Autocorrelation Function


[ ] [ ][ ] [ ]

[ ] [ ] [ ] [ ][ ] [ ] [ ]( )

[ ] [ ] [ ]( ) [ ]

*

* *1

*

* * *1

Note: (more details later)

For real-valued random process, (even function)

For MA process1

1

yy yy

yy yy

q

yy

q

r k r k

r k r k

y n f n b f n b f n q

r k E y n y n k

E f n b f n b f n q f n k

= −

= −

= + − + + −

⇒ = −

= + − + + − −

[ ] [ ]( )

[ ] [ ] [ ]

[ ] [ ] [ ] [ ]

* *1

22 221

1

* *1

1

For AR process1

q

q

ff ff q ff ff kk

p

b f n k b f n q k

r k b r k b r k b

y n a y n a y n p f n

σ=

+ − − + + − −

= + + + =

+ − + + − =

∑

Autocorrelation Function


[ ]

[ ] [ ] [ ] [ ] [ ]( )

[ ] [ ]

[ ] [ ]

*

* * * *

0 0

Multply by on both sides and take expectation:

, for 0 (0 for 0 because is white-noise process)

0, for 0For 0

0

p p

k k yyk k

fy

fy

y n

E a y n k y n a r k E f n y n

rf n

r E y n a

= =

−

− − = − = −

=

= >>

>

= = +

∑ ∑

[ ] [ ]( ) [ ][ ] [ ] [ ]

[ ] [ ] [ ]

* * *1

* *1

* *1

1

1

1

p

yy yy p yy

yy yy p yy

y n a y n p y n

r a r a r p

r a r a r p

− + + − − = + − + + −

⇒ = − − − − −

[ ] [ ] [ ] [ ]* *1

For AR process1 py n a y n a y n p f n+ − + + − =f[n]

whitey[n]

( )1

A z

Yule-Walker Equations: Solving System ID Problem


[ ] [ ] [ ]* *1 1yy yy p yyr a r a r p= − − − − −

[ ] [ ] ( )[ ] [ ] ( )

[ ] [ ] [ ]

[ ][ ]

[ ]

[ ]

*1*2

*

* *

Stacking 1,2, , equations, we have

0 1 1 121 0 2

1 2 0

Conjugating both sides:

0

yy yy yy yy

yyyy yy yy

yypyy yy yy

yy yy

p

r r r p rarar r r p

r par p r p r

r r

=

− − − − −− − =

− − −

[ ] ( )[ ] [ ] ( )

[ ] [ ] [ ]

[ ][ ]

[ ]

* *1

** * *2

** * *

1 1 121 0 2

1 2 0

yy yy

yyyy yy yy

p yyyy yy yy

r p a ra rr r r p

a r pr p r p r

− − − − −− − = ⇔ =

− − −

Rw r

[ ][ ]

[ ]

*1

*2

*

12

,

yy

yy

p yy

a ra r

a r p

− − −

w r

Observations about YW Equations

R=RH

Eigenvalues are real and eigenvectors corresponding to distinct eigenvalues are orthogonal/orthonormal. If Ris real, then R is symmetric, i.e. RT= R

R is a Toeplitz matrix, i.e. rij = ri-j Values of R depend only on the difference between the

index values Has efficient algorithm to solve for solution

Power efficient in hardware implementation


Realization


( ) ( )( )

( )( ) ( ) ( ) ( )*

2 10 *

1

A controller canonical form (from control) can be written by realizing that the transferfunction can be written as

1 1

qk

k pk k

kk

Y z W zH z b z B z H z H z

W z F za z

−

= −

=

= = = = +

∑∑

( )

( ) ( ) [ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ]

( ) ( )( ) ( ) ( ) ( )

[ ] [ ] [ ] [ ] [ ] [ ]

* *1

* * *0

2

*

1

* *1

1

.

Since 1 or

1

Since

1

* 1

p

q

pk

kk

p

H z

W z a z F z

w n f n a w n a w n p

Y zB z Y z W z B z

w n a w n a w n p f n

y n w n b n b w n b w n b w n

W z

q

−

=

+ = ⇔

⇒ = −

+ − + + − =

= = + − +

− − − −

= ⇒

+ −

=

⇔

∑

( )B z

Realization: AR part of Transfer Function


z-1 z-1 z-1 z-1f[n]

[ ] [ ] [ ] [ ]* *1 1 pw n a w n a w n p f n+ − + + − =

*1a−

w[n-1] w[n-2]

w[n-p]

*2a−

*1pa −−

*pa−

Realization of Complete Transfer Function


z-1 z-1 z-1 z-1f[n]

• Signal processing practitioners usually attempt to analyze characteristics of a system by ONLY looking at the relationship between the input and output

• Transfer function

• Imagine opening your system (a black box), which can now be modeled using a bunch of integrators (delay elements in discrete time) and putting a logic probe in each of the interconnect

• Concatenation of these signals w[n-k], ∀k makes up the state of the system

y[n]bp*

w[n-1] w[n-2]

w[n-p]

*1a−

*2a−

*1pa −−

*pa−

*0b

*1b

*2pb −

*1pb −

[ ] [ ] [ ] [ ][ ] [ ] [ ] [ ] [ ] [ ]

* *1

* * *0 1

1

* 1p

q

w n a w n a w n p f n

y n w n b n b w n b w n b w n q

+ − + + − =

= = + − + + −

Assumes p = q

State-Space Form


z-1 z-1 z-1 z-1f[n] y[n]bp*

xp[n]

*1a−

*2a−

*1pa −−

*0b

*1b

*2pb −

*1pb −

Assumes p = q

*pa−

xp-1[n]x2[n]

x1[n]

Consider relabeling the interconnect signals (states) as xk[n], for k = 1, 2, …, p

State-Space Representation


( ) ( )( )

( )( )

( )( ) ( ) ( )1 2

00

1

1qk

k pkk

kk

Y z Y z X zH z b z H z H z

F z X z F z a a z

−

−=

=

= = = = +

∑∑

[ ] [ ][ ] [ ]

[ ] [ ][ ] [ ] [ ] [ ] [ ] [ ] ( )[ ] [ ] [ ] [ ] [ ]

1 2

2 3

1

* * * *1 2 1 1 2 1

* * * *1 1 2 2 1 1

Assuming , note that1

1

1

1 state equation input-output equatio

p p

p p p p p

p p p p

p qx n x n

x n x n

x n x n

x n f n a x n a x n a x n a x ny n b x n b x n b x n b x n

−

− −

− −

=

+ =

+ =

+ =

+ = − − − − −

= + + + +⇒

( )[ ] [ ] [ ] [ ]( )* * * *

0 1 2 1 1

n

p p pb f n a x n a x n a x n−

+ − − − −

xk[n]’s are known as the state variables. Note that the transfer function can be written as

State-Space Representation


[ ][ ]

[ ][ ]

1

* * *0

* * **1 0 10

* * *1 0 1

Define the state vector , containing state variables , ,

0 1 0 0 0 00

0 0 1 0 0 0, , , and

00 0 0 0 0 1

1

k

p

p p

p p

x nn x n k

x n

b b ab b a

d b

b b a a

− −

∀

− −

− −

x

b c A

[ ] [ ] [ ][ ] [ ] [ ]

* * * * * *1 2 3 2 1

* * *0 1 1If 0, , then

1

p p p p

Tp p

T

n n f ny n n

a a a a a

b b b b

df n

− − −

−

− − − − −

= = + = +=

⇒ +

x

c

Ax bc x

Imagine opening your system (a black box), which can now be modeled using a bunch of integrators (delay elements in discrete time) and putting a logic probe in each of the interconnect

• Concatenation of these signals xk[n], ∀k makes up the state of the system

State-space equation

A is called a companion matrix

Non-uniqueness of State-Space Equation


[ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ] [ ][ ] [ ] [ ]

1 1

1

Let , : invertible matrix, then 1

1

Terminologies (which will be explained later) is a similarity transfor

TT

p pn n f n

n n f ny n n df n

y n n df n

− −

−

= ×+ = +

+ = + ⇒= +

= +

x Tz Tz T ATz T b

Tz ATz bc Tz

c Tz

T AT mation of , they share identical eigenvaluesA

Time-varying State-Space Model


[ ] [ ] [ ] [ ] [ ][ ] [ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ]( )

When system is time-varying, the state-space representation becomes 1

so , , , on the time index is shown

T

T

n n n n f ny n n n d n f n

n n n d n n

+ = +

= +

x A x bc x

A b c

Transformed State-Space Model


( ) ( ) ( )( ) ( ) ( )

Taking the -transform of the time-invariant SS model

Then the state equation becomes

T

zz z z F zY z z dF z

= +

= +

x Ax bc x

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( )

1

1

1

.

Substituting, then the output equation

p

p

Tp

Tp

z z F z

z z F z

Y z z F z dF z

z d F

−

−

−

− =

⇒ = −

= − +

= − +

I A x b

x I A b

c I A b

c I A b ( )

( ) ( )( ) ( ) 1

.

Then the transfer function becomes

Tp

z

Y zH z z d

F z−

= = − +c I A b

Solution for State-Space Difference Equation


[ ] [ ] [ ][ ] [ ] [ ]

[ ][ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ]( ) [ ]

Recall the state-space difference equation 1

Also initial condition 1 , and for 0,

0 1 0

1 0 1

1 0 1

T

n n f ny n n df n

n

f

f

f f

+ = += +

− ≥

= − +

= +

= − + +

=

x Ax bc x

x

x Ax b

x Ax b

A Ax b b

[ ] [ ] [ ]

[ ] [ ] [ ]

2

1

0

1 0 1

1n

n k

k

f f

n f n k+

=

− + +

= − + −∑

A x Ab b

x A x A b

Solution for State-Space Difference Equation


[ ] [ ] [ ] [ ]

[ ] [ ] [ ]

1

01

Quantities of are known as the Markov parameters of the system. Note: is a linear function of 1 and , so it is also a Gaussian process

(more on random pro

nT n T k

kT k

y n f n k df n

n f n k

+

=

= − + − +

• − −

∑c A x c A b

c A bx x

cess later)

State-Space Model: MIMO Extension


[ ] [ ] [ ][ ] [ ] [ ]

MIMO extension:1

If there are state variables and inputs and outputs, then: , : , : , :

n n n

n n np m

p p p m p m

+ = +

= +

× × × ×

x Ax Bu

y Cx Du

A B C D

[ ] [ ] [ ]

[ ] [ ] [ ] [ ]

1

0

1

0

Simple algebra will show that

1

1

Quantities of are known as the Markov parameters of the system.

nn k

kn

n k

kk

n k

n k n

+

=

+

=

= − +

= − + +

∑

∑

x A x A Bu

y CA x CA Bu Du

CA B

State Equation Example: Two DC Power Supplies


[ ] [ ] [ ]1 1 1 1

Assume outputs are independent of each othen, then a reasonable model would be thescalar model for each output 1

x n a x n u n= − +

[ ] [ ] [ ][ ] ( ) [ ] ( ) [ ] [ ]

[ ][ ]

1 1 2 2

1 2

2 2 2 2

2 21 2 1 2

2 2

1 1

2 2

1 ,

where 1 ~ , , 1 ~ , , and are zero-mean

WGN with variance and , respectively. All RVs are independent of each other.

0Then

0

x x x x

u u

x n a x n u n

x x u n u n

x n ax n a

µ σ µ σ

σ σ

= − +

− −

=

[ ][ ]

[ ][ ] [ ] [ ] [ ]

[ ]

[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )[ ] [ ]( ) [ ] [ ]( ) [ ]1

2

1

1 1

2 2

21 1 1 2

22 1 2 2

2

1 1 01 .

1 0 1

Also, since is a vector WGN with zero mean and covariance

0 ,

0

0so

0

uT

u

u

u

x n u nn n n

x n u n

n

E u m u n E u m u nE m n m n

E u m u n E u m u n

σδ

σ

σ

σ

− + ⇔ = − + −

= = −

=

x Ax Bu

u

u u

Q [ ] [ ][ ]

1 1

22 2

21

2 22

01 and 1 ~ , .

1 0x x

x x

xx

µ σµ σ

− − = −

x

State Equation Example: Vehicle Tracking


Goal: Estimate and track range and bearing of vehicle (assuming Cartesian coordinates) Assume constant velocity, perturbed by only wind gusts, slight speed corrections Model these perturbation

x y• −••

[ ] [ ] [ ][ ] [ ] [ ]

s as noise inputs, leading to velocity equations

1

1 .

Note that without

the n

x x x

y y y

v n v n u n

v n v u nn

= − +

= − +

• [ ] [ ]oise perturbations and , the velocities would be constant,

and the vehicle would be modeled as traveling in a straight line as indicated by the dashed linein Fig. 13.21 The position equation at

x yu n u n

•

[ ] [ ] [ ][ ] [ ] [ ]

time can then be written as1 1

1 ,

where is the sam

pling period. e

1

Th

x x x

y y y

nr n r n v n

r n r n v n

= − + − ∆

= − + ∆

∆•

−

(discrete-time) velocity equations models the vehicle to be traveling at the velocity at1 and then changing abruptly at . This is an approximation to the true continuous behaviorn n−



[ ]

[ ][ ][ ][ ]

[ ][ ][ ][ ]

from the velocity and position equations, we see that

1 0 00 1 0

Def

ine the signal vect

0 0 1 00 0 0 1

or as

x

y

x

y

x

y

x

y

r nr n

nv nv n

r nr nv nv n

=

∆ ∆ =

x

[ ][ ][ ][ ]

[ ][ ]

[ ] [ ] [ ]

0101

11

1 .The measurments are noisy observations of the range and bearing

x

y

xx

yy

r nr n

u nv nu nv n

n n n

− − + − −

⇔ = − +x Ax u

[ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ]( ) [ ]

ˆ ˆ .

This can be written in general form as , where

RR n R n w n

n n w n

n n nββ β

= +

= +

= +y h x w

[ ] [ ]( )[ ] [ ]

[ ][ ]

2 2

arctan

x y

y

x

r n r nn n r n

r n

+

⇒ =

Cx h x

Example: System Estimation: One LS Approach


Unknown linear time-invariant

systemf[n]

Model matching:

ARMA(p,q)

hd[n] or x[n]

e[n]

w[n] Output may/may not be

affected by additive noise

z[n] z[n]=x[n] + w[n]Modeled as deterministic or

random?

[ ]Assuming: is known System: ARMA( , )

can setup equation to solve for parameters

f np q

•

•⇒ =Ax b

[ ] [ ] [ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ] [ ]

* * *1 1

* * *0 1

* *

1 0

Using ARMA model to model the unknown system :

1 2

ˆ

or

1p

q

p q

k kk k

d y n ah n y n a y n a y n p

b f n b f n b f n q

a y ny n k b f n k= =

= − − − − − − −

+ + − + + −

− +−⇔ −= ∑ ∑

[ ] [ ]ˆ or dh n y n

δ[n], or other known sequence (pilot)

White noise

Example: System Estimation: One LS Approach


[ ] [ ] [ ] [ ] [ ] [ ][ ] [ ] [ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ] [ ] [ ]

[ ][ ]

[ ]

*1*2

*

*0*1

*

1 2 0 11 1 1 1

1 2 1

1 and p

q

y p y p y f p f p f p qy p y p y f p f p f p q

y N y N y N p f N f N f N q

aa

z pa z pbb z N

b

− − − − − + + − =

− − − − − − − − + = =

A

x b

If large over-determined system LS solution possibleN ⇒ ⇒

[ ] [ ] [ ]* *

1 0Recall

p q

k kk k

y a y n k b fn n k= =

− − + −= ∑ ∑n = p

n = N

To ensure we deal with a causal y[n]

E.g. Linear Prediction (Useful for Speech Coding and Recognition)


( ) ( )

1

1Assume we are told of an AR system 1

Speech is often modeled as output of such system driven by either a zero-meanuncorrelated signal in the case of unvoiced speech (such as "f", "s" kn

pk

kk

p H za z−

=

=+∑

( ) [ ] [ ] [ ] [ ] [ ] [ ]1

own asfricatives) or by a periodic pulse sequence in the case of voiced speech (vowels)due to the "peaky" nature of human speech signal (in time).

From 1p

T Tk a

kH z y n a y n k f n n f n n

=

⇒ = − − + = − − + = −∑ a y a y

a [ ]

[ ][ ][ ]

[ ]

1

2

11

and 2a

p

f na y na n y n

a y n p

− − −

y

[ ] [ ] [ ][ ] [ ] [ ]ˆGoal is to find or so that

ˆ

is minimized

Ta

T

d

a

n n

e

h n y

n z n y n

⇒ = −=

= −

a y

a a

Application for Speech Recognition (big data example) Suppose there are several classes of signals to be

distinguished (for example, several speech sounds to be recognized).

Each signal will have its own set of prediction coefficients Signal 1 has a1 Signal 2 has a2, …

An unknown input signal can be reduced (by estimating the prediction coefficients that represent it) to a vector a Then a can be compared with a1, a2, and so on…to determine

which signal the unknown input is most similar to


Inverse Problem: Another Perspective of Prediction


f[n] y[n]e[n]

Assumed system that needs to be identified

Drive this to some small value in

some sense

( )

1

1 1

pk

kk

H za z−

=

=+∑ ( )

1

1 1p

kk

ka z

H z−

=

= +∑ [ ]f n

Inverse system that is

estimated

Inverse Problem: Another Perspective of Prediction


( ) ( ) ( ) ( ) ( ) ( )[ ] [ ] [ ]

[ ] [ ]

( )

1

1If

1

In this case, is regarded as input, then is output of an inverse system.If we have an estimated system

1 1

T

pk

kk

Y z H z F z F z Y zH z

f n y n n

y n f n

H za z−

=

= ⇒ =

⇒ = + −

= =+∑

a y

( )( )

[ ] [ ] [ ][ ]

then choose 1

so that is close to in some sense. This is known as an inverse problem.

T

Y zF z

f n y n n

f n

= + −a y

Nonparametric Spectrum Analysis

From DSP, we know we can perform DFT on “any” signals to get a picture of the spectrum

Not very accurate Exploiting a priori knowledge of signal is better


[ ] [ ]

[ ] [ ]

21

0

21

0

"Analysis" Equation

1

"Synthesis" Equation

1

N j knN

n

N j knN

k

X k x n eN

x n X k eN

π

π

− −

=

−

=

=

=

∑

∑

Why these equationsare writtenthis way?

Parameter Fourier Analysis


[ ][ ]

[ ][ ]

[ ] ( ) ( )

0 0

0

0 0

0 0

Assume we know the signal cos 2 sin 2 , for 0,1, , 1,

where / , with 1, , / 2 1. Estimate .

1 00cos 2 sin 21

cos 2 1 sin 2 11

T

s n a f n b f n n N

f k N k N a b

sf fs

f N f Ns N

π π

π π

π π

= + = −

= = − =

= − −−

H

θ

( )

1

1 20

1 1 2 2

It can shown that is orthogonal, i.e.

cos 2 sin 2 0 / 2

2

NT

Tnp

T T

ab

k kn nN N N

N

π π−

=

= = ⇒ =

= =

∑

H

h hH H I

h h h h

( ) 1ˆˆˆ

T Ta

b−

= =

θ H H H x

Parametric Fourier Analysis


( )

[ ]

[ ]

1

1

0

1

0

ˆˆˆ

2 cos 22

2 sin 2

T T

N

nTN

n

a

b

kx n nN N

N kx n nN N

π

π

−

−

=

−

=

= =

= =

∑

∑

θ H H H x

H x

General Adaptive Filter Configuration

Select parameters to achieve the “best” match between the desired signal d[n] and filter output – optimizing the performance function such as Least-squares error Mean-squared error

Characteristics of AF Can automatically adjust (or

adapt) in the face of changing environments and changing system requirements

Can be trained to perform specific filtering or decision-making tasks

Should have some “adaptation algorithm” (learning algorithm) for adjusting system’s parameters


AdaptiveFilter

f[n]

Adaptive algorithm

y[n]e[n]

d[n]

Applications of AF: System Identification and Interference Cancellation


Unknown system

f[n]

AdaptiveFilter

d[n]

e[n]

AdaptiveFilter

noise

y[n]

e[n]

Signal = x[n]

d[n]

f[n] = η[n]

Applications of AF: Inverse Modeling and Predictors


f[n] y[n]e[n]Unknown

systemDrive this to some

small value in some sense

[ ]f n

delayd[n]

AdaptiveFilter

f[n] y[n]e[n]

Drive this to some small value in

some sense

[ ]f ndelay

d[n]

AdaptiveFilter

Random Variable (RV)

A random variable is a function that assigns a numerical value each possible outcome in S, i.e. S→ℜ (field of real number) More convenient to work with a

numerical value than nonnumerical value

Can be discrete or continuous (example of discrete RV on top right, continuous RV on bottom right)

Convention Capital letters denote RVs Lowercase letters denote values

the RVs take on E.g. fX(x) distribution function for

RV X with value x


CDF and PDF

Functions which relates the probability of an event to a numerical value assigned to an event

Parameter vs. nonparameteric There are several different parametric PDFs Nonparametric

Estimated directly from data Easily adaptable


Probability (Cumulative) Distribution Functions


( ) ( )( )

( ) ( ) ( )( )

( ) ( )

( )0

0

A way to probabilistically describe an RV

1. 0 1, with 0, 1

2. is continuous from the right, that is,

lim

3. is a nondecre

X

X

X X X

X

X Xx x

X

F x P X x

F x

F x F F

F x

F x F x

F x

+→

•

≤

≤ ≤ −∞ = ∞ =

=

Properties of

( ) ( )1 2 1 2

asing function of , i.e.

if X X

x

F x F x x x≤ <

From 2., FX(x) is continuous from right, so the jump amount = P0

Probability Density Functions (PDF)


( ) ( )

( )

( ) ( ) ( ) ( )

( )

( ) ( ) ( ) ( )

( )

2

11 2 2 1

More convenient to express statistical averages using PDFs

1. 0

2. 1

3.

4.

XX

X

XX X X

xx

X X Xx

X

dF xf x

dxf x

dF xF x f d f x

dx

f x dx

P x X x F x F x f x dx

f x dx P x dx

ηη η

=

= ⇒ = ≥

=

≤ ≤ = − =

= − <

∫

∫∫

Properties of

( )X x≤

Example: Discrete PDF and CDF

2 fair coins are tossed X: # of heads


Outcome X P(X=xj)

TT x1=0 ¼THHTHH x3=3 ¼

Some texts use pmf where the Dirac delta’s are represented simply as Kronecker delta’s

Example: Cont. PDF and CDF


Consider the pointer-spinning experiment. Assume any one stopping point is notfavored over any other and that the RV is defined as the angle that the pointermakes with the vertical, modulo 2 . Thuπ

Θ

[ )[ )

( ) ( )( ) ( )

1 2

1 1 2 2

1 2 1 2

s is limited to 0,2 and for any two

angles and in 0,2 , we have

(equally likely assumption)

, 0 , 2 .

P P

f f

π

θ θ π

θ θ θ θ θ θ

θ θ θ θ πΘ Θ

Θ

−∆ < Θ ≤ = −∆ < Θ ≤

⇒ = ≤ <

( )1 , 0 2 ,

20, otherwise

Area under PDF curve is the probability.

fθ π

θ πΘ

≤ <⇒ =

Joint CDFs and PDFs


( ) ( )

( ) ( )

( ) ( )

( ) ( )

( ) ( )

2 2

1 1

2

1 2 1 2

Characterized by two or more RVs, ,

,,

, ,

, , 1

, ,

XY

XYXY

y x

XYy x

XY XYy x

XY

F x y P X x Y y

F x yf x y

x y

P x X x y Y y f x y dxdy

F f x y dxdy

f x y dxdy P x dx X x y dy Y y

= ≤ ≤

∂=

∂ ∂

< ≤ < ≤ =

⇒ ∞ ∞ = =

⇒ = − < ≤ − < ≤

∫ ∫

∫ ∫

Marginal CDFs and PDFs


( ) ( ) ( )( ) ( ) ( )

( ) ( )

( ) ( )

Can obtain cdf or pdf of one of the RVs from joint RVs , , ,

, , ,

,

, .

Since

X XY

Y XY

x

X XYy

y

Y XYx

F x y P X x Y F x

F x y P X Y y F y

F x f x y dx dy

F y f x y dx dy

f

′ −∞

′−∞

= ≤ ≤ ∞ = ∞

= ≤ ∞ ≤ = ∞

′ ′ ′ ′=

′ ′ ′ ′=

∫ ∫

∫ ∫

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

and

, and ,

X YX Y

X XY Y XYy x

dF x dF yx f y

dx dy

f x f x y dy f y f x y dx′ ′

= =

′ ′ ′ ′⇒ = =∫ ∫

Conditional CDFs and PDFs


( ) ( ) ( )( )

( ) ( ) ( )( )

( ) ( )( )

( ) ( )( )

( ) ( )( )

( ) ( )

Conditional RV:,

,

Bayes Theorem:

,

where given .

XYX Y X Y

Y

X Y XYX Y

Y

X XY X Y XXYX Y

Y Y Y

Y X

F x yF x Y F x Y y

F y

F x Y y f x yf x y

x f y

f y X x f x f y x f xf x yf x y

f y f y f y

f y x dx P y dy Y y X x

= ≤ =

∂ == =

∂

== = =

= − < ≤ =

Statistical Independence


( ) ( ) ( )( ) ( ) ( )

Two RVs are stat. independent if values one takes on do not influencethe values that the other takes on. , or

,

XY X Y

P X x Y y P X x P Y y

F x y F x F y

⇒ ≤ ≤ = ≤ ≤

=

( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

,

If and are not independent, then using Bayes' rule

, .

XY X Y

XY X YY X X Y

f x y f x f y

X Yf x y f x f y x f y f x y

=

= =

Example: Statistical Independence


( )( )

( ) ( )( )

( ) ( )

2

2

0 0

Two RVs and have joint pdf

, , 0 ,0, otherwise.

can be found by noting that

, , 1

Since 1 2

2,

x y

XY

XY XYy x

x y

X XYy

X Y

Ae x yf x y

A

F f x y dxdy

Ae dxdy A

ef x f x y dy

− +

∞ ∞ − +

−

≥=

∞ ∞ = =

= ⇒ =

= =

∫ ∫

∫ ∫

∫( )

( ) ( )

( ) ( )( )

( ) ( )( )

2 2

0

2

, 0 2 , 00, 00, 0

, 0,

0, 0

, 2 , 00, 0

, , 00, 0

x y x

y

Y XYx

xXY

X YY

yXY

Y XX

dy x e xxx

e yf y f x y dx

y

f x y e xf x y

f y x

f x y e yf y x

f y y

∞ + −

−

−

−

≥ ≥ = < <

≥= =

< ≥

= = <

≥= =

<

∫

∫ Conditional prob’s are equal to respective marginals X and Yare independent.

Example: Statistical Independence


( )( )

( ) ( )( )

( ) ( )

2

2 2

0

, , 0,0, otherwise.

2 , 0 2 , 0,

0, 00, 0

, 0,

0, 0

x y

XY

x y x

X XYy

y

Y XYx

Ae x yf x y

e x e xf x f x y dy

xx

e yf y f x y dx

y

− +

∞ − + −

−

≥=

≥ ≥= = = < <

≥= =

<

∫∫

∫

Sum of Two Statistically Indep. RVs

The density of the sum of two statistically independent RVs is the convolution of their individual density functions.

Suppose X, and Y are three independent RVS where W = X + Y, then

fW(w), fX(x), and fY(y) are pdfs of W, X, and Y, respectively


( ) ( ) ( )W Y Xyf w f y f w y dy= −∫

x

y

x+ y = w

y=w

x=wx+ y ≤ w

( ) ( ) ( )

( )

( ) ( )

, ,

(stat. indep.)

Differentiating we get the result

W

w y

X Yy x

w y

Y Xy x

F w P W w P X Y w

f x y dxdy

f y f x dxdy

−

=−∞

−

=−∞

= ≤ = + ≤

=

=

∫ ∫

∫ ∫

Statistical Averages

Sometimes full description of RVs, i.e. knowing its CDF or PDF are not required

Sometimes only partial information is needed One type of partial information of a set of RVs

statistical average or mean value


Average of Discrete RV


[ ]1 1

1

Expectation of RVs, , , with respective probabilities , ,

Justification:Let experiment be perform number of time,

Arithmeti

M MM

x j jj

M x x P P

E X x

N

P

N

µ=

=∑

with la rge

1 1

1

1 1

1

c mean:

By relative frequency interpretation: lim

Mjm m

jj

jjN

Mm m

j jj

nn x n x xN N

nP

Nn x n x x P

N

=

→∞

=

+ +=

=

+ +⇒ =

∑

∑

Average of Cont. RV


( )0Expectation of to with pdf . Suppose we break up this interval intosubintervals of size (assume small). The probability that lies between

to is

M X

i i

x x f xx X

x x xP x

∆−∆

( ) ( )

( ) ( )

[ ] ( )

( )0

0

0

lim

1

, for 0, , .Hence, approximated by a discrete RV that takes on values to with probabilities , , .

x

i i X i

M

X X M

M

x i X i Xxi

x X x f x x i MX x x

f x x f x x

E X x f x x xf x dxµ∆ →

=

− ∆ < ≤ ≈ ∆ = …

∆ ∆

⇒ ≈ ∆ =∑ ∫

Properties of Expectation

E[⋅] is a linear operator Sometimes need to perform E(tr(⋅)). tr(⋅) is also linear

operator E(tr(⋅)) = tr(E(⋅)) Additive

E[X+Y] = E[X] + E[Y] for any 2 RVs

Homogeneity E[cX] = cE[X], for any constant c


Average of a Function of a RV


( )

[ ]( )

( )

( )

( )

( )

Let .

, discrete RV .

, cont. RV

moment of , for 0,1, 2, . Let

, discrete RV

,

i ii

Y

Yy

th r

ri i

r ir

rXx

Y g X

y P yE Y

yf y dx

r X r Y g X X

x P xE X

x f x dx

µ

ξ

=

=

= = =

=

∑

∫

∑

∫

( ) ( )( )

[ ] ( )2 22

cont. RV

central moment of , for 0,1, 2, . Let

Special case: variance: 2

var

rthX

rr X

X

r X r Y g X X

m E X

r

X m E X E X

µ

µ

µ

= = = −

− =

− =

2 2x Xµ σ −



( )

( )

[ ]

( )

,

,

11

joint moment of and Y, for , 0,1, 2,

, , discrete RV

, , cont. RV

Correlation:

Note:Independent:

th

i jm i m

i j i jij

i jXYx y

XY

r X i j

x y P x yE X Y

x y f x y dxdy

E XY

E XY E

ξ

ξ

=

=

=

∑

∫

( ) ( )( )( )

( )Uncorrelated: 0

Orthogonal: 0

Implications: If and are independent and have zero mean, implies and are uncorrelated and orthogonal. If and are uncorrelated and ha

X Y

XY X Y

X E Y

E X Y

E XY

X Y X YX Y

µ µ− − = =

•• ve zero mean, implies they are orthogonal. Hence, independence is the strongest of the three properties.•



( ) ( )

[ ] ( )( ) [ ]11

joint central moment of and Y, for , 0,1, 2,

Covariance:

,

Correlation coefficient for and :

th

i jij X Y

X Y X Y

r X i j

m E X Y

Cov X Y m E X Y E XY

X Y

µ µ

µ µ µ µ

=

− −

− − = −

[ ]112 2

20 02

,

X Y

Cov X Ymm m

ρσ σ

=

Conditional Expectation


( )

( )

[ ] ( ) ( ) ( )

Conditional expectation of given

Expectation of functions of :

X Yx

Xx

X Y y

E X Y E X Y y xf x Y y dx

X Y g X

E Y E g X g x f x dx

=

= = = =

=

= =

∫

∫

Removing Conditional Expectation Via Expectation


( )( ) ( ) ( )

( ) ( )

( )

( )

Since is a function of , it is also a RV.

X Y

Y YX Y X Yy x

YX Yx y

XYx y

Xx

E X Y Y

E E X Y xf x y dx f y dy

x f x y f y dy dx

x f XY dy dx

xf X dx

=

=

=

=

∫ ∫

∫ ∫

∫ ∫

∫[ ] XE X=

Conditional Expectation


This is an "expectation" version of the total probability theorem.In many cases, we can simplify a problem by conditioning or "fixing"one RV and performing an expectation. Then remove the conditionin

( ) ( )( )

gin a second step by taking the expectation w.r.t. the conditioning RV.

More generally:

Y X YE g X E E g X Y =

Special Average: Characteristic Function


( )

( ) ( )

( ) ( )

( )

Let

1 2

Note: This is Fourier transform of if we have Sometimes it is more convenient to use the variable

j X

j X j xXx

j xX v

j XX

g X e

E e f x e dx

f x e dv

f x es

ω

ω ω

ω

ω

ω

ωπ

−

−

=

Φ =

= Φ

•

•

∫

∫

( ) ( )

[ ] ( ) ( )0

in place of , the result becomes .

Obtaining moments of a RV:

Set 0 :

jvxXx

j

j xf x e dxd

E X jd

E X

ω

ω

ωω

ωω

ω=

∂Φ=

∂Φ= ⇒ = −

⇒

∫

moment generating function

( ) ( )0

nnn

njd

ω

ωω

=

∂ Φ = −

Chebyshev Inequality and the Law of Large Numbers


( )

2

2

2

Let be a RV with mean and finite variance . Then for any 0,

(Chebyshev Inequality)

X X

XX

X

P X

µ σ δ

σµ δδ

>

− ≥ ≤

1 22

1

Let , , , be i.i.d. (independent and identically distributed)

RVs with mean and variance each. Let the sample mean be1ˆ .

Then, for any fixed 0,

N

X XN

X ii

X X X

XN

µ σ

µ

δ=

=

>

∑

( )ˆ lim 0. (LLN)

ˆIntuitively, this means the estimator, , will converge to in probability.ˆIf the above limit equals 0, is called a consistent estimato

X XN

X X

X

P µ µ δ

µ µµ

→∞− ≥ =

r of .Xµ

Useful PDFs

Discrete RVs Binomial distribution

Related to chance experiments with two mutually exclusive outcomes with probability p and 1-p Model number of times event A has occurred in n trials (events are indep)

Poisson distribution Related to chance experiment in which an event whose probability of occurrence in a very small time

interval ∆T is P=α∆T, where α is a constant Model the probability of k events occurring in time T Commonly used to model arrival time of packets in packet switching networks

Continuous RVs Normal (Gaussian) distribution

Commonly used to model large number of indep. random events when distribution of each event is unknown

Sum of large number of independent RVs converges to a Gaussian distribution Rayleigh distribution

(see above) Rician distribution

Commonly used to model distribution of power profile of wireless channel when direct line-of-sight (LOS) exists

x = sqrt(x12+x2

2), where x1~N(µ1,σ2), x2~N(µ2,σ2) are indep. RV


Useful PDFs

Continuous RVs Chi-Squared (central and noncentral)

Commonly encounter in detector design

F-distribution (central and noncentral) Commonly encounter in detector design


( )

2

2

1

with degrees of freedom

, ~ 0 or ,1 and indep.i i ii

x x x N

ν

ν

χ ν

µ=

=∑

( )1 2

2

2 21 11 2

2 2

PDF: ratio of 2 indep. RVs/ , ~ , ~ and indep./

0 : central dist.

Fxx x xx

F

ν

ν ν

χν χ λ χν

λ

=

= −

Gaussian (Normal) Distribution


( ) ( )

[ ] ( )

( ) ( )

( ) ( )

( ) ( )2 2

1 1

2

2 2

22

2

1 2 1 2

1 dimensional:

1 1 exp2 2

where ,

Joint CDFs and PDFs:, ,

,,

, ,

X

XY

XYXY

y x

XYy x

f x x

E X E X

F x y P X x Y y

F x yf x y

x y

P x X x y Y y f x y dxdy

µπσ σ

µ σ µ

−

= − −

−

= ≤ ≤

∂=

∂ ∂

≤ ≤ ≤ ≤ = ∫ ∫

( ) ( ) ( )( ) ( ) ( )

( ) ( )

Marginal distribution:, ,

, ,

,

X XY XY

Y XY XY

X XYx

F x F x F x Y

F y F y F X y

f x f x y dy

= ∞ = ≤ ∞

= ∞ = ≤ ∞

= ∫

2-D (Bivariate) Gaussian Distribution


( )( ) ( ) ( ) ( )

( )

[ ] [ ] [ ] [ ]( ) ( ) [ ]

22

22

2 2

2 2

/ 2 / / /1, exp2 12 1

where , , var , var

,

x x x x y y y yXY

x y

x y x y

x y

x y x y

x x y yf x y

E X E Y X Y

E X E Y Cov X Y

µ σ ρ µ σ µ σ µ σ

ρπσ σ ρ

µ µ σ σ

µ µρ

σ σ σ σ

− − − − + − = − −−

= = = =

− − = =

2-D (Bivariate) Gaussian Distribution


N-dimensional Gaussian Distribution


( )( ) ( )

( ) ( )

[ ]( )

( )

( )( )

1/ 2 1/ 2

1

1 1exp22 det

(applied element-wise)

TN

N

T

f

E xE

E x

E

π− = − − −

=

− −

X x x

x

x x

x x μ C x μC

μ x

C x μ x μ

Central Limit Theorem


2 2 21 2 1 2

2 2 21

2

Let , , , be indep. RVs with zero mean and variance , , , .

Let . If for any fixed 0, there exists a sufficient large such that

N N

N N

k

X X Xs

N

σ σ σ

σ σ ε

σ

+ + >

<

1 2

, for 1, , ,then the normalized RV

converges to the standard normal (Gaussian) PDF.

N

NN

N

s k N

X X XZs

ε =

+ + +

Q-Function


( )

( ) ( )

2

222

2/

/

Gaussian -Function:

Normalized Normal distribution of ,

1 1Consider exp22

1(let ) exp22

x

x

x

x

x x

a

x x xaxx

axa

x

Q

N

P a X a x dx

x yy dy

µ

µ

σ

σ

µ σ

µ µ µσπσ

µσ π

+

−

−

− ≤ ≤ + = − −

−= = −

∫

∫

( )

2/

0

2

/

2

1 2 exp22

1(since area under PDF=1) 1 2 exp22

1 2

1 1where exp ex22 2

x

x

a

a

x

u

y dy

y dy

aQ

yQ u dyu

σ

σ

π

π

σ

π π

∞

∞

= −

= − −

= −

− ≈

∫

∫

∫2

p , for 12

has been computed numerically.

u u −

Normalized Distribution Function: F(x) and Q(x)


( )

( ) ( )( ) ( )

( )

2

2

/ 2

Normalized cumulative distribution function: 0, 11 2

1

A related function: 11 2

x x

xF x e d

F x F x

F x Q x

Q x e

ξ

ξ

µ σ

ξπ

π

−

−∞

−

= =

=

− = −

= −

=

∫

( ) ( )

/ 2

1

xd

Q x Q x

ξ∞

− = −

∫


( )( ) ( )

Normalized cumaltivedistribution function

1

F x

F x Q x= −

Stochastic Process

Random Processes (Stochastic Processes) Informal definition

The outcomes (events) of a chance experiment are mapped into functions of time (waveforms)

Cf. Random variables: outcomes are mapped into numbers Each waveform is called a sample function, or a realization. The

totality of all sample functions is called an ensemble Chance experiment that gives rise to this ensemble is called a

random/stochastic process Formal definition

Every outcome ζ we assign, according to a certain rule, a time function X(t,ζ). X(t,ζi) signifies a single time function

X(tj,ζ) denotes a single RV X(tj,ζi) is a number



Voltage at the terminals of a noise generator. 10 ensemble experiments

Statistical Description of Random Process A random process is statistically specified by its

Nth order joint pdf’s that describes a typical sample function at times tN > tN-1 > … > t1, for any N where

FX1X2…XN(x1,t1;x2,t2; …; xN,tN) = P(x1-dx1 < X1 ≤ x1 at time t1, x2-dx2 < X2 ≤ x2 at time t2, …, xN-dxN < XN

≤ xN at time tN)where Xn≡X(tn,ζ), for n=1,…N



Random process from realization ζM.

X(tj,ζ) is a random variable

Joint probability (from relative frequency) is the number of sample functions that pass through the slits placed at t=t1 and t=t2 in both barriers divided by the total number of M of sample functions as M becomes large w/o bound

FX1X2(x1,t1;x2,t2) = P(x1-dx1 < X1 ≤ x1 at time t1, x2-dx2 < X2 ≤ x2 at time t2)

Stationarity and Wide-Sense Stationarity

Statistical stationarity in the strict sense or stationarity Joint pdfs depend only on the time

differences t2-t1, t3-t1, …, tN-t1 Not dependent on time origin

Mean and variance independent of time Correlation coefficient or covariance

depends only on difference, e.g. t2-t1 Wide-sense stationarity (WSS)

Joint pdfs are dependent on time origin Mean and variance independent of time Correlation coefficient or covariance

depends only on difference, e.g. t2-t1 Stationarity WSS

Converse is not necessarily true Exception: Gaussian random process

(Why?)


Nonstationary processes

Stationary processes

Ensemble Average (Expectation)


( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( )

2 222

*

1 2 1 1 2 2

**1 2 1 2

2 1

Mean: ,

Variance:

Covariance:

,

,

x X

xx

xx

xx

t E x t x t f t d

t E x t x t E x t x t

c t t E x t x t x t x t

E x t x t x t x t

c t t E x t

αµ α α α

σ

= = =

= − = −

= − −

= −

=

∫

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( )( )

1 22 1

*

2 2 1 1

**2 1 2 1

*1 2 2 1

*1 2 1 2

1 2 1 1 2 2 1 2

, ,Autocorrelation:

,

, ; ,

xx xx

xx

X X

x t x t x t

E x t x t x t x t

c t t c t t

r t t E x t x t

f t t d dα α

α α α α α α

− −

= − ⇒ =

=

= ∫ ∫

Ensemble Average (Vector Random Process)


( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( )

2

22

1 2 1 1 2 2

1 2

Mean:

Variance:

2Re

Covariance:

,

x

H

xx

H

H

xx

H

t E t t

t E t t t t

E x t t t x t

t t E t t t t

E t t

σ

= =

= − −

= − +

= − −

=

μ x x

x x x x

x x

C x x x x

x x ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

1 2 1 2 1 2

1 2 1 2

Autocorrelation:

,

H HH

Hxx

E t t E t t t t

t t E t t

− − +

=

x x x x x x

R x x

Ensemble Average (Expectation) for WSS Process


( ) ( )( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

2

*

**

*

WSS:

Mean: constant

Variance: constantCovariance:

Autocorrelation:

x

xx

xx

xx

t E x t

t

c E x t x t x t x t

E x t x t x t x t

r E x t x t

µ

σ

τ τ τ

τ τ

τ τ

= = =

− − − −

= − − −

−

( ) ( ) ( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

* *

* * *

*

xx

xx

xx

r E x t x t

r E x t x t E x t x t

E x p x p r

τ τ

τ τ τ

τ τ

⇒ − ⇒ − + = + = − =

Ensemble Average for Vector WSS Process


( ) ( )( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( )

2

WSS:

Mean: constant

Variance: constant

Covariance:

Autocorrelation:

x

Hxx

H

xx

HH

xx

t E t

t E t t

E t t t t

E t t t t

E

σ

τ τ τ

τ τ

τ

= = = =

− − − −

= − − −

μ x

x x

C x x x x

x x x x

R x

( ) ( )Ht t τ − x

Ergodicity


( ) ( )

Ergodic processes are processes for which time and ensemble averages are interchangeable.For example, for real-valued WSS processes:

x E x t x tµ = =

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( )

2 22

,

1where lim .2

Note: All time and ensemble averages are interchangeable, not just t

xx

xx

T

TT

E x t x t x t x t

r E x t x t x t x t

v t v t dtT

σ

τ τ τ

−→∞

= − = −

= + = +

•

∫

he above. Ergodicity strict-sense stationarity• ⇒

Example 1: Ergodicity


( ) ( )

( )

0

0

Consider a random process with sample function cos 2 ,where is a constant and is a RV with pdf

1 , .2

0, otherwiseCalculate its

n t A f tf

f

π θ

θ πθ πΘ

= +

Θ

≤=

( ) ( )

( ) ( ) ( )

( )

( )

0

22 20

2 20

2

0

2

ensemble and time-average.

1 cos 2 02

1 cos 22

1 cos 22

1 cos 4 24

2

nn

E n t A f t d

t E n t A f t d

A f t d

A f t d

A

π

π

π

π

π

π

π

π

π θ θπ

σ π θ θπ

π θ θπ

π θ θπ

−

−

−

−

= + =

= = +

= +

= + +

=

∫

∫

∫

∫

( ) ( )

( ) ( )

( ) ( ) ( ) ( )

0

2 2 20

2

2 2

1lim cos 2 02

1lim cos 22

2

=constant and constant.

It may be stationary and ergodic.

T

TT

T

TT

nn

n t A f t dtT

n t A f t dtT

A

E n t n t t n t

π θ

π θ

σ

−→∞

−→∞

= + =

= +

=

= = =

∫

∫

Example 2: Ergodicity


( )

( ) ( )

( ) ( )

( ) ( ) ( )

/ 4

0/ 4

/ 4

0 0/ 4

/ 4 22 20/ 4

2 , Suppose .4

0, otherwiseCalculate its ensemble and time-average.

2 cos 2

2 2 2 sin 2 cos 2

20 cos 2nn

f

E n t A f t d

AA f t f t

r E n t A f t d

π

π

π

π

π

π

πθθ π

π θ θπ

π θ ππ π

π θ θπ

Θ

−

−

−

≤=

= +

= + =

= = +

∫

( )

( )

2 / 4

0/ 4

2 2

0

1 cos 4 2

cos 42

Process is not stationary as first and second moment depends on , henceit is for different time origin.

A f t d

A A f t

t

π

ππ θ θ

π

ππ

−= + +

= +

∫

∫

Summary for Ergodic Process


( ) ( ) ( )

( ) ( )( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

2 2

2 2

2 22 2 2

1. Mean: is the DC component

2. is the DC power

3. 0 is the total power

4. is the power in the

alternating current (time-varying) component

5.

x

xx

xx

t E x t x t

x t x t

r x t x t

t x t x t x t x t

µ

σ

= =

=

= =

= − = −

( ) ( ) ( ) 22 2Total power is the AC power plus the

DC powerxxx t t x tσ= +

Power Spectral-Density Functions (PSD) and Cross-Spectral Density


( ) ( )

The PSD of a wide-sense stationary random process is the Fourier transformof the autocorrelation function. For continuous-time random process

.

Sinc

jxx xxS j r e dτ

ττ τ− ΩΩ = ∫

( )( )

e is symmetric, the PSD is a real-valued function of . Since

real-valued power cannot be negative, the PSD must satisfy 0,. Then average power of a random process is

xx

xx

r

S

τ Ω

Ω ≥

∀Ω

( ) ( ) ( ) ( )

( ) ( )

( ) ( )

2*

0

0

1 1 2 2

Cross-Spectral Density:

xx

jxx xx

jxy xy

r E x t x t E x t

S j e d S j d

S j r e d

τ

τ

τ

τ

π π

τ τ

Ω

Ω Ω=

− Ω

= =

⇔ Ω Ω = Ω Ω

Ω =

∫ ∫

∫

Bilateral Laplace Transform of the Autocorrelation Function


( ) ( )

( ) ( )

Note: (entire complex plane). Define

For real-valued random process, since autoco

sxx xx

sxy xy

s j

S s r e d

S s r e d

τ

τ

τ

τ

σ

τ τ

τ τ

−

−

= + Ω

∫∫

( ) ( )( ) ( )*

variance is real and even, itsLaplace transform will be even .

If .xx xx

xx xx

S s S s

s j S j S j

= −

= Ω − Ω = Ω

Discrete-Time PSD and its Laplace Transform Representation


( ) [ ]

( ) [ ]

( ) [ ]

For discrete-time, PSD:

Cross-Spectral Density:

Define

j j kxx xx

k

j j kxy xy

k

kxx xx

k

S e r k e

S e r k e

S z r k z

ω ω

ω ω

−

−

−

=

=

∑

∑

∑

( ) [ ]

( )

( ) ( )*

For real-valued process1

and

kxy xy

k

xx xx

j jxx xx

S z r k z

S S zz

S e S eω ω

−

−

=

=

∑

Uncorrelated, Orthogonal, Independent Random Processes


( ) ( )

( ) ( ) ( )

( )

( )

*1 2 1 1 1 2

1 2 1 2

1 1 1 2 2 2

Given two random processes and (1) Uncorrelated if , , ,(2) Orthogonal if , 0, ,(3) Independence: if , , ; , , ; ; , ,

XY X Y

XY

XY n n n

X t Y t

R t t m t m t t t

R t t t t

f x y t x y t x y t

= ∀

= ∀

( ) ( )

( ) ( )( ) ( ) ( )( )( ) ( )

1 1 2 2 1 1 2 2 , ; , ; ; , , ; , ; ; ,

Remarks:(1) Independence Uncorrelated

(2) Uncorrelated and are orthogonal

(3) (Uncorrelated and either 0 or 0) orthogonal(4)

X n n Y n n

X Y

X Y

f x t x t x t f y t y t y t

X t m t Y t m t

m t m t

=

⇒

⇒ − −

= = ⇒

Uncorrelated and Gaussian Independent⇒

Linear Systems and Random Processes


( ) ( ) ( ) ( )

( )

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( ) ( )

* * *1 2 1 2 1 2

* *1

Given is LTI, and *

Mean of :

*

0

Cross-correlation

,

y u u

x xu

xy u

h t y t h t x t

y t

t E h t x t E h u x t u du h u E x t u du

t h u du t H

r t t E x t y t E x t h u x t u du

h u E x t x

µ

µ µ

=

= = − = −

= =

= = −

=

∫ ∫

∫

∫( )

( ) ( )( )

( ) ( ) ( ) ( ) ( )

2

*1 2

1 2

* *

If is WSS, let

*

u

xxu

xy xx xxu

t u du

h u r t t u du

x t t t

r h u r u du h r

τ

τ τ τ τ

−

= − +

= −

= + = −

∫∫

∫

Linear Systems and Random Processes


( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

( ) ( )( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

* *1 2 1 2 1 2

*1 2

1 2

1 2

*

Similarly

,

If is WSS, let

*

yx u

u

xxu

yx xx xxu

yy

r t t E y t x t E h u x t u du x t

h u E x t u x t du

h u r t t u du

x t t t

r h u r u du h r

r E y t y t

τ

τ τ τ τ

τ τ

= = −

= −

= − −

= −

= − =

= −

∫

∫∫

∫( ) ( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )( ) ( ) ( )

* *

* *

*

*

*

*

* *

u

u

yxu

yx

xx

E y t h u x t u du

h u E y t x t u du

h u r u du

h r

h h r

τ

τ

τ

τ τ

τ τ τ

= − −

= − −

= +

= −

= −

∫

∫∫

Linear Systems and Power Spectral Densities


( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

* *

*

* * *

*

*

Since and

xy xx xy xx

yx xx xy yx xx

xx xx xx xx xx

yx xx

r h r S j H j S j

r h r r S j H j S j

r r r S j S j

S j H j S j

τ τ τ

τ τ τ τ

τ τ τ

= − ⇔ Ω = Ω Ω

= = − ⇔ Ω = Ω Ω

− = − = Ω = − Ω

⇔ Ω = Ω Ω

( ) ( )

( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

* *

2* *

* *

xx

yy yx yy yx

xx xx xx

H j S j

r h r S j H j S j

h h r H j H j S j H j S j

τ τ τ

τ τ τ

= Ω − Ω

= − ⇔ Ω = Ω Ω

= − ⇔ = Ω Ω Ω = Ω Ω

Markov and Hidden Markov Models (HMM) HMM is a stochastic model that is used to model

time-varying random phenomena E.g. speech signal, video sequence Can be understood in terms of state-space models


Markov Models

Used to model evolution of random phenomena that can be in discrete states as a function to time, Transition from one state to the

next is random E.g. A system can be in one

of the S distinct states At each step of discrete time it

can move to another state at random, with probability of the transition at the time tdependent only upon the state of the system at time t i.e. only the previous state is

relevant


1 ( )1p y 2 ( )2p y

3 ( )3p y

0.5

0.3

0.2

0.3

0.7

0.7

0.1

0.2

Markov Models

From state 1 to state 1 is possible with probability 0.5

Denote S[t] denote the state at time t, where it takes on one of the values 1,2, …, S.

Initial state is selected according to a probability π

πi = P(S[1] = i), i = 1, 2, …, S


1 ( )1p y 2 ( )2p y

3 ( )3p y

0.5

0.3

0.2

0.3

0.7

0.7

0.1

0.2

Markov Models

Probability of transition depends ONLY upon the current state P(S[t+1] = j | S[t] = i, S[t-1] = k, S[t-2] = l, …) = P(S[t+1] = j | S[t] = i)

This structure of probability is called the Markov property, andthe random sequence of state values S[0], S[1], S[2], … is calleda Markov sequence or a Markov chain

Sequence is the output of the Markov model Can determine the probability of arriving in the next state by

adding up all the probabilities of the ways of arriving there, i.e.

Note that this is just the law of total probability


[ ]( ) [ ]( ) ( ) [ ]( ) ( )

[ ]( ) ( )

1 1 [ ] 1 [ ] 1 1 [ ] 2 [ ] 2

1 [ ] [ ]

P S t j P S t j S t P S t P S t j S t P S t

P S t j S t S P S t S

+ = = + = = = + + = = =

+ + + = = =


Partitions and Total Probability

B

A1

A2A3

A4

An

1 2

1

Suppose the events , , , form a partition of a sample space , that is,the events 's are mutually exclusive and their union is . Suppose is anyother event. Then

n

i

ii

A A A SA S B

B S B A=

= ∩ =

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

1 2

1 2

1 1 2 2

,where are also mutually exclusive. Then .

From the multiplication theorem,

n

n

i

n

n

B

A B A B A BA BP B P A B P A B P A B

P B P A P B A P A P B A P A P

∩

= ∩ ∪ ∩ ∪ ∪ ∩

∩

= ∩ + ∩ + + ∩

= + + +

( ).This is known as the .

nB A

law of total probability

Markov Models


[ ]

[ ]( )[ ]( )

[ ]( )

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

( ) [ ] [ ]( )

Can be written in matrix form. Define

1 11 1 2 1

2 2 1 2 2 2, , with 1 .

1 2

From the previous example:0.5 0.3 0.20.2 0 0.70.3 0

ij

P S n P P P S

P S n P P P Sn a P i j P S t i S t j

P S P S P S SP S n S

=

= = + = = =

=

p A

A

.7 0.1

1 ( )1p y 2 ( )2p y

3 ( )3p y

0.5

0.3

0.2

0.3

0.7

0.7

0.1

0.2

Markov Models

A steady-state probability assignment is one that does not change from one time step to the next, so the probability must satisfy the equation

Ap=p This is an eigenequation, with eigenvalue = 1. By law of total probability, each column of A sum to 1 Definition: An m×m matrix P, such that ∑𝑗𝑗=1𝑚𝑚 𝑝𝑝𝑖𝑖𝑖𝑖 =

1 (each row sums to 1) and each element of P is nonnegative, is called a stochastic matrix. If the rows and columns each sum to 1, then P is doubly stochastic


Markov Models


( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

11 1 2 1

2 1 2 2 2 is the transpose of a stochastic matrix. The vector

1 2

contains the initial probabilities. Thus, we can write the probabilistic update equation is

P P P S

P P P S

P S P S P S S

A

π

[ ] [ ] [ ]

[ ] [ ][ ]

1 , with 0 .Or, 1 ,

with 0 for 0. Note that the above is similar to the state equation

t

t t

t t

t t

δ

+ = =

+ = +

= ≤

p Ap p π

p Ap π

p

[ ] [ ] [ ][ ] [ ]

[ ] [ ]

1 .

Note that the "state" represented by 1 is actually the vector

of probabilities , not the state of the Markov sequence t

n n f n

t t

t S t

δ

+ = +

+ = +

x Ax b

p Ap π

p

Relationship to Markov Models and HMM

Pick a ball from 3 urns Each urn contains 3 types of colored balls: black green,

and red At each instant of time, an urn is selected by genie at

random according to the state it was in at the previous time instant

Genie – magic creature which could do everything Ball is then drawn at random from the urn at time t Observation = ball selected Actual state is hidden

State of the system before the ball was chosen the state of the system after


Relationship to Markov Models and HMM: State Diagram


Urn 13 black2 green1 red

( )1p yUrn 25 black7 green3 red

( )2p y

Urn 32 black2 green2 red

( )3p y

0.5

0.2

0.3

0.2

0.3

0.7

0.7

0.1


To further clarify the relationship,

provides for the state update of the Markov system. However, in most linear system, the state vector x[t]

is not directly observable, instead, it is observed only through the observation matrix C (assuming D = 0), i.e. y[t] = Cx[t]

In an HMM, the state is hidden from direct observation

Instead, each state has a probability distribution associated with it


[ ] [ ]1 tt t δ+ = +p Ap π


In the HMM, we do not observe the “state” p[t] Instead, each state has a probability distribution associated with it

When HMM moves into state s[t] at time t, the observed output y[t] is an outcome of a random variable Y[t] that is selected according to the distribution f(y[t]|S[t] = s), which we will represent using the notation f(y|S[t]=s) = fs(y)

In the urn example, the output probabilities depend on the contents of the urns

A sequence of outputs from an HMM is y[0], y[1], y[2], … The underlying state information is hidden Distribution in each state can be of any type

Each state could have its own distribution In practice, distribution of each state is the same, but with different

parameters


Summary: HMM


[ ][ ]( )

[ ]( )( )

[ ] [ ] [ ]

Denote the state at time as .

Initial state is selected according to probability 1 , 1, 2, ,

assume 0, for 0 .

Transition probability depends ONLY on current state:

1 , 1

i

t S t

P S i i S

P S t i t

P S t j S t i S t k

π = = =

= = ≤

+ = = − =

[ ]( ) [ ] [ ]( )

[ ]( ) [ ] [ ]( ) [ ]( ) [ ] [ ]( ) [ ]( )[ ] [ ]( ) [ ]( )

, 2 , 1

Then, the probability of arriving in the next state is

1 1 1 1 1 2 2

1

S t P S t j S t i

P S t j P S t j S t P S t P S t j S t P S t

P S t j S t S P S t S

− = = + = =

+ = = + = = = + + = = =

+ + + = = =

Summary: HMM State Transition


[ ]

[ ]( )[ ]( )

[ ]( )

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

( ) [ ] [ ]( )


1 11 1 2 1

2 2 1 2 2 2, , with 1 .

1 2

From urn example:0.5 0.3 0.20.2 0 0.70.3 0.7 0.1

ij

P S t P P P S

P S t P P P St a P i j P S t i S t j

P S P S P S SP S t S

=

= = + = = =

=

p A

A

( )1p y ( )2p y

( )3p y

0.50.2

0.3

0.20.3

0.7

0.70.1

[ ] [ ][ ]

1 ,

with 0tt t δ⇒ + = +p Ap π

π p

Summary: HMM Input-Output


[ ] [ ]

[ ]( ) [ ] [ ]( ) [ ]( ) [ ] [ ]( ) [ ]

Let denote the number of possible outcomes from all statesLet be the random variable output at time , with outcome Then probability of each possible output is

1 1 2

MY t t y t

P Y t j P Y t j S t P S t P Y t j S t P S t= = = = = + = = ( )[ ] [ ]( ) [ ]( )

2

P Y t j S t S P S t S

=

+ + = = =

Summary: HMM Input-Output


[ ]

[ ]( )[ ]( )

[ ]( )

[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )

[ ] [ ]( ) [ ] [ ]( ) [ ] [ ]( )[ ] [ ]


1 1 1 2 11

2 2 1 2 2 2, ,

1 2

with ij

P Y t S t P Y t S t P Y t S t SP Y t

P Y t P Y t S t P Y t S t P Y t S t St

P Y t M P Y t M S t P Y t M S t P Y t M S t S

c P Y t i S t

= = = = = = = = = = = = = = = = = = = = =

= = =

q C

( ).From urn example, with 1 (black), 2 (green), 3 (red):

1/ 2 1/ 3 1/ 31/ 3 7 /15 1/ 31/ 6 1/ 5 1/ 3

j

S = = =

=

C0.5

0.2

0.3

0.20.3

0.7

0.70.1

3 black2 green1 red

5 black7 green3 red

2 black2 green2 red

[ ] [ ]n n⇒ =q Cp

State-Space vs. HMM


[ ] [ ] [ ][ ] [ ] [ ]

State-space:1n n u n

n n n

+ = +

= +

x Ax B

y Cx Du[ ] [ ] [ ][ ] [ ]

HMM:1 , with 0nn n

n n

δ+ = +

=

p Ap π π p

q Cp

[ ] [ ] [ ]

[ ] [ ] [ ] [ ]

1

0

1

0

Hence, models dynamics of system by treating the system as

Recall solution for state-space is

1

1

a Markov proce s.

.

s

nn k

kn

n k

kk

n n k

n n k n

+

=

+

=

= − + −

= − + − +

⇒

∑

∑

x A x A Bu

y CA x CA Bu D

A

u

Example: Speech Modeling for Speech Recognition Patterns in speech signal occurring sequentially in

time Each word or sound (phoneme) to be recognized is

represented by an HMM Output is some feature vector that is derived from the

speech data Random variability in the feature vector and the amount of

time each feature is produced is modeled by the HMM Variability in the duration of the word is modeled by the

Markov model Variability in the outputs is modeled by the random

selection from within each state


Example

Given a small vocabulary system with N words There are N HMMs: (Ai, πi, Ci) i denotes a particular state

Training phase Each is trained to represent the parameters for that word

Testing phase Sequence of feature vectors is computed (front end part) The likelihood (probability) that this sequence of feature

vectors was produced by the HMM (Ai, πi, Ci) is computed for each i

HMM that produces the highest probability selects the recognized word


Issues for HMM

Training: How can the parameters (A, π, C) be estimated based

upon observations of the data? In other words, how can we train the parameters of the models

in the pattern recognition problem?

Testing How can we determine how well the observed data fits

the model that has been trained? How can we determine the sequence of states of the

underlying Markov model? I.e. How do we discover the hidden states?


ccfung/courses/2017_2018/mmasp_i/notes/ch01… · examples of mathematical models. linear signal...

Documents