communication theory

Communication Theory

I. Frigyes

2009-10/II.

Frigyes: Hírkelm 2

http://docs.mht.bme.hu/~frigyes/hirkelmhirkelm01bEnglish

http://docs.mht.bme.hu/~frigyes/hirkelm

Frigyes: Hírkelm 3

Topics• (0. Math. Introduction: Stochastic processes, Complex

envelope)• 1. Basics of decision and estimation theory• 2. Transmission of digital signels over analog channels:

noise effects• 3. Transmission of digital signels over analog channels:

dispersion effects• 4. Analóg jelek átvitele – analóg modulációs eljárások (?)• 5. Channel characterization: wireless channels, optical

fibers • 6. A digitális jelfeldolgozás alapjai: mintavételezés, kvantálás, jelábrázolás • 7. Elvi határok az információközlésben.• 8. A kódelmélet alapjai • 9. Az átvitel hibáinak korrigálása: hibajavító kódolás; adaptív kiegyenlítés• 10. Spektrális hatékonyság – hatékony digitális átviteli eljárások

(0. Stochastic processes, the complex envelope)

Frigyes: Hírkelm 5

Stochastic processes

• Also called random waveforms.• 3 different meanings: • As a function of ξ number of realizations:

a series of infinite number of random variables ordered in time

• As a function of time t: a member of a time-function family of irregular variation

• As a function of ξ and t: one member of a family of time functions drawn at random

Frigyes: Hírkelm 6

Stochastic processes

• example:

t

ξ

f(t,ξ1)

f(t,ξ2)

f(t,ξ3)

f(t1,ξ) f(t2,ξ)f(t 3,ξ)

1

2

3

Frigyes: Hírkelm 7

Stochastic processes: how to characerize them?

• According to the third definition

• And with some probability distribution.

• As the number of random variables is infinite: with their joint distribution (or density)

• (not only infinite but continuum cardinality)

• Taking these into account:

Frigyes: Hírkelm 8


• (Say: density)• First prob. density of x(t)• second:

joint t1,t2 • nth:n-fold joint• The stochastic process is completly

characterized, if there is a rule to compose density of any order (even for n→).

• (We’ll see processes depending on 2 parameters)

ttxpx

21212,1 ,, tttxtxp xx nnnxx ttttxtxtxxp ,...,,....,,... 21212,1

Frigyes: Hírkelm 9


• Comment: although precisely the process (function of t and ξ) and one sample function (function of t belonging to say ξ16)

are distinguished we’ll not always make this distinction.

Frigyes: Hírkelm 10


• Example: semi-random binary signal:

• Values : ±1 (P0=P1= 0,5)

• Change: only at t=k×T• First density_:• Second::

12

11

2

1 xxpx

otherwise ,1,14

11,1

4

1

1,14

11,1

4

1 slot timesame in the , if

,1,12

11,1

2

1

,

2121

2121

21

2121

212,1

xxxx

xxxx

;tt

xxxx

txtxp xx


Continuing the example:

In the same time-slot In two distinct time-slots

45o

45o


Stochastic processes: the Gaussian process

• A stoch. proc. is Gaussian if its n-th density is that of an n-dimensional vector random variable

• m is the expected value vector, K the covariance matrix.

• nth density can be produced if are given

• are given

mXKmX

xK

X

1

2

1

2det

1 T

iep

nt

221121 E, és E tmtxtmtxttKtxtm xxxx


Stochastic processes: the Gaussian process

• An interesting property of Gaussian processes (more precisely: of Gaussian variables):

• These can be realizations of one process at different times

yw

wwzyx

zEwxEyEzxE

zEyxEE


Stochastic processes: stationary processes

• A process is stationary if it does not change (much) as time is passing

• E.g. the semirandom binary signal is (almost) like that

• Phone: to transmit 300-3400 Hz sufficient (always, for everybody). (What could we do if this didn’t hold?)

• etc.



• Precise definitions: what is almost unchanged:• A process is stationary (in the strict sense) if for

the distribution function of any order and any at any time and time difference

• Is stationary in order n if the first n distributions are stationary

• E.g.: the seen example is first order stationary• In general: if stationary in order n also in any

order <n

,...,...,,...,..., 121121 nnxnnx ttttFttttF



• Comment: to prove strict sense stationarity is difficult

• But: if a Gaussian process is second order stationary (i.e. in this case: if K(t1,t2) does not change if time is shifted) it is strict sense (i.e. any order) stationary. As: if we know K(t1,t2) nth density can be computed (any n)


Stochastic processes: stationarity in wide sense

• Wide sense stationary: if the correlation function is unchanged if time is shifted (to be defined)

• A few definitions:.• a process is called a Hilbert-process if

• (That means: instantaneous power is finite.)

2E tx


Stochastic processes: wide sense stationary processes

• (Auto)correlation function of a Hilbert-process:

• The process is wide sense stationary if

• the expected value is time-invariant and

• R depends only on τ=t2-t1 for any time and any τ.

2121 .E, txtxttR


Stochastic processes: wide sense – strict sense stationary processes

• If a process is strict-sense stationary then also wide-sense

• If at least second order stationary: then also wide sense.

• I.e.:

21

2121

,,

,,E,

21212121

21212121

ttRdXdXXXpXX

dXdXXXpXXtxtxttR

ttttxxtt

ttttxxtt

RttRttR 1121 ,,


Stochastic processes: wide sense – strict sense stationary processes

• Further: if wide sense stationary, not strict sense stationary in any sense

• Exception: Gaussian process. This: if wide sense stationary, also in stict sense.


Stochastic processes: once again on binary transmission

• As seen: only first order stationary (Ex=0)

• Correlation:

• if t1 and t2 in the same time-slot:

• if in different:

1.E, 2121 txtxttR

0EE, 2121 txtxttR



• The semi-random binary transmission can be transformed in random by introducing a dummy random variable e distributed uniformly in (0,1)

• like x:

etxty

0E ty

eT



• Correlation:

• If |t1-t2|>T, (as e T)

• if |t1-t2| T

• so

0EEE 2121 et.ytytyty

otherwise0;

;1, 21

21

ttTettRy

T

TeRy

1E1



• I.e. :

-T Tτ


Stochastic processes: other type of stationarity

• Given two processes, x and y, these are jointly stationary, if their joint distributions are alle invariant on any τ time shift.

• Thus a complex process is stationary in the strict sense if x and y are jointly stationary.

• A process is periodic (or ciklostat.) if distributions are invariant to kT time shift

tjytxtz


Stochastic processes: other type of stationarity

• Cross-correlation:

• Two processes are jointly stationary in the wide sense if their cross correlation is invariant on any time shift

2121, E, tytxttR yx

12 tt


Stochastic processes: comment on complex processes

• Appropriate definition of correlation for these:

• A complex process is stationary in the wide sense if both real and imaginary parts are wide sense stationary and they are that jointly as well

2121 .E, txtxttR


Stochastic processes: continuity

• There are various definitions

• Mean square continuity

• That is valid if the correlation is continuous

ha ,E 22 txtx


Stochastic processes: stochastic integral

• x(t) be a stoch. proc. Maybe that Rieman integral exists for all realizations:

• Then s is a random variable (RV). But if not, we can define an RV converging (e.g. mean square) to the integral-approximate sum:

b

a

dttxs


Stochastic processes: stochastic integral

0Elimif 2

10

n

iii

t

b

a

ttxsdttxsi

• For this

b

a

b

as

b

a

dtdttxtxttR

dttxs

2121212 EE,

EE


Stochastic processes: stochastic integral - comment

• In σs2 the integrand is the

(auto)covariancie-function:

• This depends only on t1-t2=τ if x is stationary (at least wide sense)

212121 EEEˆ, txtxtxtxttC


Stochastic processes: time average

• Integral is needed – among others –to define time average

• Time average of a process is its DC component;• time average of its square is the mean power• definition:

T

TT

x dttxT

n2

1lim



• In general this is a random variable. It would be nice if this were the statistical average. This is really the case if

• Similarly we can define

0 and E 2n

txnx

T

TT

dttxtxT

R 2

1lim



• This is in general also a RV. But equal to the correlation if

• If these equalities hold the process is called ergodic

• The process is mean square ergodic if

0 with , 2 Rxx RR

2E2

1lim txdR

T

T

T

xT


Stochastic processes: spectral density

• Spectral density of a process is, by definition the Fourier transform of the correlation function

deRS jxx


Stochastic processes: spectral density

• A property:

• Consequently this integral >0; (we’ll see: S˙(ω)>0)

2

-

2

E0 :But

pairs-riableFourier va:, ;02

1

as 2

1E

txR

udu

dStx x

F


Spectral density and linear transformation

• As known in time functions output function is convolution

• h(t): impulse response

FILTER

h(t)x(t) y(t)

duuthuxthtxty



• Comment.: h(t<0)≡ 0; (why?); and: h(t) = F-1[H(ω)]

• It is plausible: the same for stochastic processes

• Based on that it can be shown :

• (And also )

xy SHS2

0EE Htxty



• FurtherS(ω) ≥ 0 (all frequ.)• For: if not, there is a domain where

S(ω) <0 (ω1, ω2)

Sx(ω)

Sy(ω) (its integral is negative)

FILTER

h(t)x(t) y(t)

H(ω)



• S(ω) is the spectral density (in rad/sec). As:

ω

H(ω)

Hzxx

xy

BSdS

HSS

22

1

Power

2


Modulated signals – the complex envelope

• In previous studies we’ve seen that in radio, optical transmission

• one parameter is influenced (e.g. made proportional)

• of a sinusoidal carrier

• by the modulating signal .

• A general modulated signal:

tttAdtx c cos2



• Here d(t) and/or (t) carries the information – e.g. are in linear relationship with the modulating signal

• An other description method (quadrature form):

• d, , a and q are real time functions – deterministic or realizations of a stoch. proc.

ttAqttAatx cc sincos



• Their relationship:

• As known x(t) can also be written as:

ta

tqarctgt

tqtatd

;

2

22

ttdtqttdta sin2;cos2

tcjetjqtatx Re



• Here a+jq is the complex envelope. Question: when, how to apply.

• To beguine with: Fourier transform of a real function is conjugate symmetric:

• But if so: X(ω>0) describes the signal completly: knowing that we can form the ω<0 part and, retransform.

XxtxX t-FF and ;ˆ



• Thus instead of X(ω) we can take that:

• By the way:

• The relevant time function:

XjjXXXX signsignˆ

0;0

0;2

X

X

XjjtxXtx

sign11 FF

↓„Hilbert” filter



• We can write:

• The shown inverse Fourier transform is 1/t.

• So

• Imaginary part is the so-callerd Hilbert-transzform of x(t)

sign-1 jtjxtxtx F

dt

xjtxtx

txjtxtxjtxtx ˆ H



• Now introduced function is the analytic function assigned to x(t) (as it is an analytic function of the z=t+ju complex variable).

• An analytic function can be assigned to any (baseband or modulated) function; relationship between the time function and the analytic function is

tx



• It is applicable to modulated signals: analytic signal of cosωct is ejωct. Similarly that of sinωct is jejωct. So if quadrature components of the modulated signal a(t), q(t) are

• band limited and

• their band limiting frequency is < ωc/2π (narrow band signal)

• then

tjc

tjc

c

c

etjqtxttqtx

etatxttatx

sin

or

cos

NB. Modulation is a linearoperation in a,q: frequencydisplacement.



• Thus complex envelope determines uniquely the modulated signals. In the time domain

• Comment: according to its name can be complex. (X(ω) is not conjugate symmetric around ωc.)

• Comment 2: if the bandwidt B>fc, is not analytic, its real part does not define the modulated signal.)

• Comment 3: a és q can be independent signals (QAM) or can be related (FM or PM).

tjqtatx ~

ttqttatx

txetxtxtjqtatx

cc

tcj

sincosRe

~~

tx~

tx



• In frequency domain? On analytic signal we saw.

ctj XXetxtx c

~ so and ~

X(ω)

X˚(ω)

X(ω)

ω

X(ω)


H(ω)M(ω)


• Linear transfor-mation – bandpass filter –acts on x(t) as a lowpass filter.

• If H(ω) is asymm: x(t) is complex – i.e. is a crosstalk between a(t) és q(t) között

• (there was no sin component – now there is.)

X(ω)=F[m(t)cosωct]

X˚(ω)

X(ω)

Y(ω)

Y˚(ω)

Y(ω)


Modulated signals – the complex envelope, stochastic processes

• Analytic signal and complex envelope are defined for deterministic signals

• It is possible for stochastic processes as well

• No detailed discussion

• One point:


Modulált jelek – a komplex burkoló; sztochasztikus folyamatok

• x(t) is stationary (Rx is independent from t), iff qaaqqa RRRRtqta ; ;0EE

tRRtRR

RRRRtR

ttqttatx

cqaaqcqa

caqqacqax

cc

2sin2cos

sincos,

és sincos


Narrow band (white) noise

• White noise is of course not narrow band

• Usually it can be made narrow band by a (fictive) bandpass filter:

ω

X(ω)

X(ω)

Sn(ω) =N0/2

H(ω)


Narrow band (white) noise properties

csscsc RRRR ,,;

tcjsc

tcj etjntnetntn ~

sccn jRRR ,~

cncnn SSS ~~2

1

1. Basics of decision theory and estimation theory


Detection-estimation problems in communications

• 1. Digital communication: one among signals known (by the receiver) – in presence of noise

• E.g.: (baseband binary communication)

• Decide: which was sent?

DIGITALSOURCE

Transmission channel

SINK

tntstnUts 01 ;



• 2. (Otherwise) known signal has unknown parameter(s) (statistics are known) -

• Same block schematic; example: non-coherent FSK

tntAtstntAts cc 222111 cos2 ;cos2

otherwise ;0

,;2

1

21

pp



• Other example: non-coherent FSK, over non-selective Rayleigh fading channel

222

2

epA

otherwise ;0

,;2

1

21

pp

tntAtstntAts cc 222111 cos2 ;cos2



• 3. Signalshape undergoes random changes• Example: antipodal signal set in very fast fading

tntTtststntTtsts 21 ;

DIGITALSOURCE

Transmission channel

T(t)SINK



• 4. Analog radio communications: one parameter of the carrier is proportional to the time-contous modulating signal.

• E.g..: analog FM; estimate: m(t)

• Or: digial transmission over frequency-selective fading. For decision: h(t) must be known (i.e. estimated

tntmFtAts c ..2cos2

Mitndsthtr i ,...2,1;


Basics of decision theory

• Simplest example: simple binary transmission; decision is based on N independent samples

• Model:

SOURCE

H0

H1

CHANNEL

(Only statistics are known)

OBSEVATION SPACE

(OS)

DECIDER

Decision ruleH0? H1?

Ĥ

Comment: now ˆ has nothing todo with Hilbert transform


Basics of decision theory

• Two hypothesis (H0 és H1)• Observation: N samples→the OS is of N-

dimensions• Observation: rT=(r1,r2…,rN)• Decision: which was sent• Results: 4 possibilities• 1. H0 sent & Ĥ=H0 (correct)• 2. H0 sent & Ĥ=H1 (erroneous)• 3. H1 sent & Ĥ=H1 (correct)• 4. H1 sent & Ĥ=H0 (erroneous)


Bayes decision

• Bayes decision :• a.) probabilities of sending H0 or H1 are a-

priori known:

•• b.) each decision has some cost (Cik) (we decide

in favor of i while sent was k)• c.) it is sure: false decision is more expensive

than correct:

1100 ˆHPr ;ˆHPr PP

11010010 ; CCCC


Bayes decision• d.) decision rule: the average cost (so

called risk, K) should be minimal

0101000000

1010111111

|ˆPr|ˆPr

|ˆPr|ˆPr

HHHPCHHHPC

HHHPCHHHPCK

OS

FORRÁS „H1”„H0”

(Z1)

(Z0)

pr|H1(R|H1)

pr|H0(R|H0)

domain of r; two pdf-s correspond to each point.


Bayes decision• Question: how to partition OS in order to

get minimal K? • For that: K in detail:

• As some decision is taken:• And so

01

10

11|10111|111

00|01000|000

||

||

ZHr

ZHr

ZHr

ZHr

dHpPCdHpPC

dHpPCdHpPCK

RRRR

RRRR

110

ZZ

p

00011

1ZZZZZ


Bayes decision

• From that:

• Term 1 & 2 are constant• And: both integrands >0• Thus: Z1, where the first integrand is larger

Z0, where the second

0

00|00100

0

11|11011111100

|

|

ZHr

ZHr

dHppCCP

dHppCCPCPCPK

RR

RR

FORRÁS„H1”

„H0”(Z1)

(Z0)

pr|H1(R|H1)

pr|H0(R|H0)


Bayes decision

• And here we decide in favor of H1:

• decision: H0

1ˆ HH

00|00100

11|110111

|

|:

HppCCP

HppCCPZ

Hr

Hr

R

R

00|00100

11|110110

|

|:

HppCCP

HppCCPZ

Hr

Hr

R

R

0ˆ HH


Bayes decision

• It can also be written: decide for H1 if

• Otherwise for H0

• Lefthand side: likelyhood ratio, Λ(R)• Righthand: (from certain aspect) the treshold, η• Comment: Λ depends only on the realisation of r

(on: what did we measure?)• η only on the a-priori probabilities and costs

11011

00100

00|

11|

|

|

CCP

CCP

Hp

Hp

Hr

Hr

R

R


Example on Baysean decision

• H1: constant voltage + Gaussian noise• H0: Gaussian noise only (designation:

φ(r;mr,σ2) • Decision: on N independent samples of r• At sample #i

• This resulting in

20|

21| ,0; ;,; iHiriHir

RpmRp

N

iiHr

N

iiHr

RHp

mRHp

1

200|

1

211|

,0;|

;,;|

R

R


Example on Baysean decision

N

i

i

N

i

i

R

mR

12

2

12

2

2exp

2

1

2exp

2

1

R

• its logaritm

• resulting in

2

2

12 2

ln

NmR

m N

ii

R

N

ii

N

ii

Nm

mRHH

Nm

mRHH

1

2

01

2

1 2ln:ˆ ;

2ln:ˆ

threshold


Comments to the example

• 1. The threshold contains known quantities only, independent of the measured values

• 2. Result depends only on the sum of ri-s – we have to know only that; so called sufficient statistics l(R):

• 2.a Like in this example: OS dimension is whatever, l(R) is always 1D

• i.e.„1 coordinate” – the others are independent on hypothesis

N

iiRl

1

R


SOURCE

H0

H1

CHANNEL

(Only statistics known)

OBSERVATIONSPACE

(OS)

Decision rule

DECISION SPACE

(DS)

Rl

DECIDER

Ĥ

Thus the decision process


Comments to the example

• 3. Special case: C00=C11=0 és C01=C10=1

• (i.e. probability of erroneous decision)

• If P0,1≡0,5, the treshold N.m/2

0

11|1

1

00|0 ||Z

HZ

H dHpPdHpPK RRRR rr


An other example, for home

• Similar but now the signal is not constant but Gaussian noise with variance σS

2

• I.e. H1:Π φ(Ri;0,σS2+σ2)

• H0:Π φ(Ri;0,σ2)

• Questions: threshold; sufficient statistics


Third example - discrete

• Given two Poisson sources with different expected values; which was sent?

• Remember: Poisson-distribution:

• Hypotheses:

!Pr

n

emn

nn

!

Pr: ;!

Pr:0

00

11

1 n

emnH

n

emnH

mnmn


Third example - discrete

• Likelihood-ratio:

• Decision rule: (m1>m0)

• For sake of precision:

01

0

1 mm

n

em

mn

001

01

101

01

:ln

ln if

;:ln

ln if

Hmm

mmn

Hmm

mmn

5,0)1Pr()0Pr(

;1:ln

ln if 01

01

01

QQ

QHQHmm

mmn


Comment

• A possible situation: a-priori probebilities are not known

• A possible method then: compute maximal K (as a function of Pi;and chose the decision rule wich

minimizes that (so called minimax decision)

• (Note: this is not optimal at any Pi )• But we don’t deal with this in detail.


Probability of erroneous decision

• For that: compute relevant integral• In example 1 (with N=1): Gs pdf should be

integrated over the hatched domains

Threshold:

2ln0

m

md

d0 d1


Probability of erroneous decision

• Thus:

• If lnη=0: d0=d1=m/2 (threshold: the point of intersection)

• Comment: N samples:

0

0

d

02

d

02

10E

dmerfc

2

1dR,m;R1|0P

derfc

2

1dR,0;R0|1P

1|0PP0|1PPP

2ln0

Nm

Nmd


Decision with more than 2 hypotheses

• M possible outcomes (e.g.:non-binary digital communication – we’ll see: why for)

• Like before: each decision has a cost

• Their average is the risk

• With Bayes-decision: this is minimized

• Like before: Observation Space

• decision rule: partitioning of the OS


Decision with more than 2 hypotheses

• Like before, risk:

• From that (with M = 3)

RRR dHpPCKM

i

M

j ZjHjjij

i

1

0

1

0| |ˆ

2

00|0020011|11211

1

00|0010022|22122

0

11|1101122|22022

222111000

||

||

||

ZHH

ZHH

ZHH

dHpCCPHpCCP

dHpCCPHpCCP

dHpCCPHpCCP

CPCPCPK

RRR

RRR

RRR

rr

rr

rr


Döntés kettőnél több hipotézisnél

• Likelyhood ratio-series :

• Decision rule(s):

00|

22|2

00|

11|1 |

| ;

|

|

Hp

Hp

Hp

Hp

H

H

H

H

R

RR

R

RR

r

r

r

r

less; if or ˆ

:or ˆ

less; if or ˆ

:or ˆ

less; if or ˆ

:or ˆ

01

1112111020022212212

10

1012110020022202212

20

2021220010011101121

HHH

CCPCCPCCPHHH

HHH

CCPCCPCCPHHH

HHH

CCPCCPCCPHHH

RR

RR

RR


Döntés kettőnél több hipotézisnél (M =3)

• This defines 3 streight lines (in the 2D decision space)

H0

H1

H2

Λ1(R)

Λ2(R)


Ecample: special case – error probability

• The average error probability is minimized

• Then we get:

1;0 jiii CC

H2

H0

H1

Λ1(R)

Λ2(R)

P0 /P2

P0 /P1

Λ2 = (P1/P2)Λ1.


The previous, detailed

kisebb; ha vagy ˆ

: vagy ˆ

kisebb; ha vagy ˆ

: vagy ˆ

kisebb; ha vagy ˆ

: vagy ˆ

01

112212

10

02212

20

01121

HHH

PPHHH

HHH

PPHHH

HHH

PPHHH

RR

R

R1;0 jiii CC

kisebb; ha vagy ˆ

: vagy ˆ

kisebb; ha vagy ˆ

: vagy ˆ

kisebb; ha vagy ˆ

: vagy ˆ

01

1112111020022212212

10

1012110020022202212

20

2021220010011101121

HHH

CCPCCPCCPHHH

HHH

CCPCCPCCPHHH

HHH

CCPCCPCCPHHH

RR

RR

RR

H2

H0

H1

Λ1(R)

Λ2(R)

P0 /P2

P0 /P1

Λ2 = (P1/P2)Λ1.


Example –special case, error probability: a-posteriori prob.

• Very important, based on the precedings: we have

• If we divide each by pr(R) and apply Bayes theorem we get:

11|122|202

00|022|212

00|011|121

||:H vagy

||:H vagy

||:H vagy

HpPHpPH

HpPHpPH

HpPHpPH

HH

HH

HH

RR

RR

RR

rr

rr

rr

RRRRRR |Pr|Pr ;|Pr|Pr;|Pr|Pr 120201 HHHHHH

b

aabba

Pr

Pr|Pr|Pr: theoremBayes

(these are a-posteriori probabilities)


Example –special case, error probability: a-posteriori prob.

• I.e. we have to decide on the max. a-posteriori probabilities.

• Rather plausible: probability of correct decision is the highest if we decide on what is the most probable


Bayes- theorem (conditional probabilities)

• For discrete variables:

• Continuous:

• a discrete, b continuous:

b

aabba

a

baab

b

baba

Pr

Pr.|Pr|Pr

Pr

,Prˆ|Pr;

Pr

,Prˆ|Pr

bp

apabpbap

.||

bp

aabpba

Pr.||Pr


Comments

• 1. The Observation Space is N dimensional (N is number of the observations).

• The Decision Space is M-1 dimensional (M is number of the hypotheses).

• 2. Explicitly we dealt only with independent samples; investigation is much more complicated is these are correlated

• 3. We’ll see that in digital transmission the case N>1 is often not very important


Basics of estimation theory: parameter-estimation

• Task is to estimate unknown parameter(s) of an analog or digital signal

• Examples: voltage measurment in noise

• digital signal; phase measurement

)1,...1,0(

;2

cos

Mi

iM

ttats cii

nsrsx ;



• Frequency estimation

• synchronizing an unaccurate oscillator

• Power estimation of interfering signals

• interference cancellation (via the antenna or multiuser detection)

• SNR estimation

• etc



• The parameter can be: • i. a random variable (pdf is assumed to be known)

or• ii. an unknown deterministic value• Model:

PARAMETERSPACE

ESTIMATIONSPACE

a

Domain ofthe estimated

parameter

Becslésiszabály

Ra

pa(A)

OBSERVATIONSPACE (OS)

Mapping toOS

ii. means: we have no a-priori knowledge about its magnitude

i. means: we havesome a-priori knowledge


Example – details (estimation 1)

• We want to measure voltage a

• We know that • And that Gaussian

noise is added φ(r;0,σn

2)• Observable

parameter is r = a+n• Mapping of the

parameter to OS:

2

2

|2

exp2

1|

nn

ar

ARARp

Va


Parameter estimation –parameter is a RV

• Similar principle: estimation ha some cost; its average is the risk; we want to minimize that

• realization of the parameter : a• Observation vector: R• Estimated value: â(R)• Cost is in the general case a 2-variable function:

C(a,â)

• Error of the estimation is ε = a-â(R)

• Often the cost depnds only on that: C = C (ε)



• Examples:

• Risk is

• Joint pdf can be written:

2/;1

2/;0 ; ;2

CCC

dAdApaACCK a RRR r ,ˆE ,

RRR Rrr |, |, AppAp aa



• Applying to the square cost function (subscript ms: mean square)

• K=min (i.e. )where the inner integral = min (as the outer is i. pozitiv and ii. does not depend on A); this holds where

RRRR r ddAApaApK ams |ˆ |2

0ˆ/ addK



0|ˆ2|2

|ˆˆ

||

|2

dAApadAAAp

dAApaAad

d

aa

a

RRR

RR

rr

r

• The second integral =1, thus

• i.e. a-posteriori expected value of a.• (According to the previous definition : a-posteriori knowledge: what

is gained from measurement/investigation.)

dAAApa ams RR r |ˆ |


Comment

• Comming back to the risk:

• Inner integral is now: the conditional variance, σa

2(R). Thus

•

dAApaAdpK arms RRRR r |ˆ |2

22 E aams dpK RRRr



• An other cost function: 0 in a band Δ elsewhere 1. The risk is now

• (un: uniform)• K=min, if result of the estimation is

maximum of the conditional pdf (if Δ is small): max. a-posteriori – MAP-estimation

2ˆ

2ˆ

| |1R

R

rr RRRa

a

aun dAApdpK

Δ



• Than derivative of the log conditional pdf=0

• (so called MAP-equation)

• Applying Bayes-theorem log-cond. pdf can be written

0|ln

ˆ

|

R

r R

aA

a

A

Ap

RRR rrr pApApAp aaa lnln|ln|ln ||

R

RR

r

rr p

ApApAp aa

a

.|| |

|



• First term is the statistical relationship between A-R

• Secnd is the a-priori knowledge • Last term does not depend on A.• Thus what is maximal is

• And so the MAP equation is:

0

ln|ln:ahol ,ˆ |

A

Ap

A

ApaA aa RR r

ApApAl aa ln|lnˆ | Rr


Once again

• Minimum Mean Square Error (MMSE) estimate is the average of the a-posteriori. pdf

• Maximum a-posteriori (MAP) estimate is the maximum of the a-posteriori. pdf


Example (estimate-2)

• Again we have Gauss ian a+n, but N independent samples

• What we need to (any) estimate is

N

i a

i

n

a

aa

a

ARAp

AAp

12

2

|

2

2

2exp

2

1|

2exp

2

1)(

Rr

R

RR

r

rr p

ApApAp aa

a

|| |

|



• Note that p(R) is constant from the point of view of the conditional pdf, thus its form is irrelevant.

• Thus

2

2

21

2

1

1| 2

1exp

22

1|

an

N

ii

N

ina

a

AAR

kAp

RRr



• This is a Gaussian distribution and we need its expected value. For that the square must be completed in the exponent, resulting in

2

122

2

22

2|

1

2

1exp|

N

ii

na

aa R

NNAkAp

RRr

22

222

2with na

na

N



• In Gaussian pdf average = mode thus

N

ii

na

amapms R

NNaa

122

2 1

/ˆˆ

RR



• a is now also φ(A;0,σn2), but only s(A),

a nonlinear function of it can be observed (e.g. phase of a carrier); noise is added:

• Az a-posteriori sűrűségfüggvény

2

2

21

2

| 2

1exp|

an

N

ii

a

AAsR

RkAp

Rr

ii nAsr


Példa (becslés-3)

• Remember: the MAP equation:

• Aplying that to the preceding

0

ln|ln:ahol ,ˆ |

A

Ap

A

ApaA aa RR r

A

AsAsRa

N

ii

n

aMAP

12

2

ˆ

R


Parameter estimation – the parameter is a real constant

• In that case it is the measurement result what is a RV. E.g. in the case of a square cost function the risk is:

• This is minimal if â(R)=A. But this has no sense: that’s the value what we want to estimate.

• In that case – i.e. if we have no a-priori knowledge – the method can be:

• we search for a function of A – an estimator – which is „good” (has average close to A and low variance)

RRR r dApAaAK a |ˆ |2


Közbevetve: a tételek eddig

• 1. Sztohasztikus folyamatok: főként a fogalmak definiciója

• (sztoh. foly.; val. sűrűségek-eloszlások, erős stacioanaritás; korrelációs fv. gyenge stacionaritás, időátlag, ergodicitás spektrális sűrűség, lin. transzformáció)

• 2. Modulált jelek – komplex burkoló• (mod. jelek leírása, időfüggv., analitikus függv.

(frekv-idő) komplex burkoló, egyikből a másik, szűrő)


Közbevetve: a tételek eddig/2

• 3. A döntéselmélet alapjai• (Milyen feladatok; költség, kockázat, a-priori val,

Bayes-f. döntés; a megfigyelési tér opt. felosztása, küszöb, elégséges statisztika, M hipotézis (nem kell rész-letezni, csak az eredmény), a-post. val.)

• 4. A becsléselmélet alapjai• (Val.-vált paraméter, költségfüggv. min. ms.

max. a-post.; determ. param, likelyhood fv. ML, MVU becslés, torzítatlan; Cramér-Rao; hatékony)


Parameter estimation – the parameter is a real constant – criteria for the estimator

• Average of the – somehow chosen – estimator:

• If this is = A: unbiased estimate; operation of estimation is searching of the average

• If bias B is constant: can be subtracted from the average (e.g. in optics: background radiation)

• if B=f(A): biased estimate

)A(BA

BA

A

RdA|RpRaˆRaE A|r


Parameter estimation – the parameter is a real constant – criteria for the estimator

• Error variance:

• Best would be: unbiased and• low variance• more precisely: unbiased and minimális variance• for any A (MVU)• Such estimator does or does not exist• An often applied estimator: maximum likelihood• Likelihood function.: as a function of A• Estimator of A then: maximum of the likelihood function

ABAaAa222

ˆ2 ˆE RR

Ap a || Rr


A few of details

This is certainly betterthan that:

variance of themeasuring result is lower Θ

Further:


Max. likelihood (ML) estimation

• Necessary condition of max of ln likelihood

• Remember: in the case of a RV: MAP estimate

• Thus: ML is the same without a-priori knowledge

0

|ln:ahol ,ˆ |

A

ApaA a RR r

0

ln|ln:ahol ,ˆ |

A

Ap

A

ApaA aa RR r



• For any unbiased estimate: variance cannot be lower than Cramér-Rao lower bound, CRLB,

• or an other form:

12

|ˆ

2 |lnE

A

Ap aAa

RrR

1

2

|2

ˆ2 |ln

E-

A

Ap aAa

RrR



• Proof of CRLB: via Schwartz- unequality

• If the estimate is equal to CRLB: efficient estimate.

Example of existing MVUbut no (or unknown)

efficient estimate(var θ^

1>CRLB)


Example 1 (estimation, nonrandom)

• Voltage + noise, but the voltage is a real, nonrandom

• Max likelihood estimation

N

iiML

N

ii

n

A

nnii

RN

a

ARN

N

A

p

nnpNinAr

1

12

|

2

1ˆ

01ln

,0;)(;,...2,1;

R

r


Example 1 (estimation, nonrandom)

• Is it biased??

• Expected value of â is the (a-priori not known) true value; i.e. it is unbiased

ANAN

RN

aN

iiML

1E

1ˆE

1

R


Example 2 (estimation, nonrandom): phase of a sinusoid

• What can be measured is a function of the quantity to be estimated

• (Independent Samples are taken at equal times)

• A likelyhood function to be maximized:



• This is = max, if the function below = min

•

=0

• The righthand side tends to 0, with large N



• And then finally:

2. Transmission of digital signals over analog channels: effect of

noise


Introductory comments

• Theory of digital transmission is (at least partly) application of decision theory

• Definition of digital signals/transmission:• Finite number of signal shapes (M)• Each has the same finite duration (T)• The receiver knows (a priori) the

signal shapes (they are stored)• So the task of the receiver is hypothesis

testing.


Introductory comments – degrading effects

DECISIONMAKER

BANDPASSFILTER

BANDPASSFILTER

FADINGCHANNEL +

n(t)

NONLINEARAMPLIFIER

s(t)

INTER-FERENCE

INTER-FERENCE

INTER-FERENCE

ωc

ωcz0(t)

z1(t)

z2(t)

ω1

ω2

CCI

ACI

ACI



• Quality parameter: error probability• (I.e. the costs are:• )• Erroneous decision may be caused by:• additíve noise• linear distortion• nonlinear distortion• additive interference (CCI, ACI)• false knowlledge of a

parameter• e.g. synchronizing error

MkiCC kiii ,...,2,1, ;1 ;0



• Often it is not one signal of which the error probability is of interest but of a group of signals – e.g. of a frame.

• (A secondquality parameter: erroneous recognition of T : the jitter.)


Transmission of single signals in additive Gaussian noise

• Among the many sources of error now we regard only this one

• Model to be investigated:

SOURCESIGNAL

GENERATOR +DECISION

MAKERSINK

TIMING (T)

n(t)

mi

{mi}, Pi

si(t) r(t)= si(t)+n(t)

ˆm


Transmission of single signals in additive Gaussian noise

• Specifications:• a-priori probabilities Pi are known• support of the real time finctions is• (0,T)• their energy is finite (E: square

integral of the time functions)• relationship is mutual and

unique (i.e. there is no error in the transmitter)

tsi

tsm ii

communication theory

Documents