mathematics and information theory for engineers [5mm] …thomas m. cover and joy a. thomas:...

225
Mathematics and Information Theory for Engineers Lecture Sándor Baran Academic year 2018/19, 2. semester Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 1 / 239

Upload: others

Post on 05-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Mathematics and Information Theory for Engineers

Lecture

Sándor Baran

Academic year 2018/19, 2. semester

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 1 / 239

Page 2: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Literature

Brian Davies: Integral Transforms and Their Applications. Springer,2002.

John J. D’Azzo, Constantine H Houpis, Stuart N. Sheldon: LinearControl System Analysis and Design with Matlab. Marcel Dekker,New York, 2003.

Thomas M. Cover and Joy A. Thomas: Elements of InformationTheory. Wiley, 2006.

Roberto Togneri and Christopher J. S. de Silva: Fundamentals ofInformation Theory and Coding Design. Chapman & Hall/CRC, 2006.

Results, informationarato.inf.unideb.hu/baran.sandor/miscen.html

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 2 / 239

Page 3: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Contents1 Matrix calculus2 Differential calculus of multivariable functions3 Numerical solution of optimization problems4 Integral calculus of multivariable functions5 Laplace transform and its applications6 Fourier transform and its properties7 Digital signals, z-transform8 Fundamentals of source coding, uniquely decodable and prefix codes9 Entropy and its properties. Block codes10 Universal source coding. Lempel-Ziv algorithms11 Quantization, sampling12 Transform coding. DPCM, Jayant quantizer, delta modulation,

predictors13 Audio and speech compression14 Image and video compression

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 3 / 239

Page 4: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MatricesDefinition. A rectangular array

A =

a11 a12 · · · a1na21 a22 · · · a2n... ... ...

am1 am2 · · · amn

of real or complex numbers arranged in m rows and n columns is called anm × n real or complex matrix, respectively. If n = m, then A in an n × nsquare matrix.

Basic matrix operations:addition and scalar (element of R or C) multiplication;matrix multiplication;transposition.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 5 / 239

Page 5: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Special matricesDefinition A matrix A is symmetric if A⊤ = A.

Definition. An m × n matrix A =(aij)

is called diagonal if all entries out-side the main diagonal are zero, that is aij = 0, if i = j. Notation for n × nsquare matrices: A = diag(a11, a22, . . . , ann).The identity matrix of size n is the n × n diagonal square matrix in whichall entries of the main diagonal are equal to 1. Notation: In.

Definition. An m×n real matrix A is orthogonal, if A⊤A= In.Remark. The orthogonality of a matrix A means that its column vectorsa1, a2, . . . , an are orthonormal, that is

a⊤i aj =

{1, if i = j,0, if i = j,

i, j = 1, 2, . . . n.

Definition. The powers of an n × n square matrix A are defined as:A0 := In, A1 := A, An := An−1A, n ∈ N.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 6 / 239

Page 6: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

EigenvaluesDefinitions. Let A be an n × n square matrix. A scalar λ and a vectorx = 0 satisfying

Ax = λx

are called an eigenvalue and a corresponding eigenvector of A, respectively.

Equivalent formulation: Ax = λx ⇐⇒(A − λIn

)x = 0.

Remark. The system of homogeneous linear equations(A − λIn

)x = 0

has a non-trivial solution (x = 0) if and only if

det(A − λIn

)= 0.

Definition. The polynomial of degree n defined by

p(λ) := det(A − λIn

)is called the characteristic polynomial of A.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 7 / 239

Page 7: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

EigenvectorsAn n × n square matrix possesses n (not necessarily distinct) eigenvalues.

The eigenvector x =(x1, x2 . . . , xn

)⊤ corresponding to the eigenvalue λ ofa matrix A =

(aij)

is the solution of the homogeneous system of linearequations

a11 − λ a12 · · · a1na21 a22 − λ · · · a2n... ... ...

an1 an2 · · · ann − λ

x1x2...

xn

=

00...0

.

Remark. If x solves the above system, so does any non-zero multiple cx,that is, the eigenvectors are not unique.Unit eigenvector: ∥x∥2 :=

√x⊤x =

√|x1|2 + |x2|2 + · · ·+ |xn|2 = 1.

If x ∈ Rn then ∥x∥2 :=√

x⊤x =√

x21 + x2

2 + · · ·+ x2n = 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 8 / 239

Page 8: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

A =

−1 −1 1−4 2 4−1 1 5

, A − λI3 =

−1 − λ −1 1−4 2 − λ 4−1 1 5 − λ

.

Characteristic polynomial:

p(λ) = det(A − λI3

)= −λ3 + 6λ2 + 4λ− 24.

Eigenvalues (roots of p(λ)): λ1 = 6, λ2 = −2, λ3 = 2.

The eigenvector corresponding to the eigenvalue λ1=6 solves(A−6I3

)x=0.

(A − 6I3

)x =

−7 −1 1−4 −4 4−1 1 −1

x1x2x3

=

000

⇐⇒

7 1 −10 1 −10 0 0

x1x2x3

=

000

.

The solution is parametric: x1 = 0, x2 = t, x3 = t, with 0 = t ∈ R arbitrary.

The unit eigenvector corresponding to λ1 = 6: x1 =(0, 1/

√2, 1/

√2)⊤.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 9 / 239

Page 9: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MATLAB solution>> A=[-1 -1 1;-4 2 4;-1 1 5];>> [V L]=eig(A)

V =

0.0000 0.7071 0.40820.7071 0.7071 -0.81650.7071 0.0000 0.4082

L =

6.0000 0 00 -2.0000 00 0 2.0000

Eigenvalues: λ1 = 6, λ2 = −2, λ3 = 2.

Unit eigenvectors, respectively:

x1 =

01/

√2

1/√

2

=

0.00000.70710.7071

,

x2 =

1/√

21/

√2

0

=

0.70710.70710.0000

,

x3 =

1/√

6−2/

√6

1/√

6

=

0.4082−0.81650.4082

.

Further examples

B =

3 −1 23 −1 6−2 2 −2

, C =

−1 −1 14 2 4−1 1 5

.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 10 / 239

Page 10: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example, repeated eigenvalues

>> B=[3 -1 2;3 -1 6;-2 2 -2];>> [V L]=eig(B)

V =0.8890 -0.2673 0.07300.3810 -0.8018 0.9062-0.2540 0.5345 0.4166

L =2 0 00 -4 00 0 2

Eigenvalues: λ1 = λ3 = 2, λ2 = −4.

General form of eigenvectors:

x1=x3=

s − 2tst

, x2=

u3u−2u

,

where s, t, u ∈ R, u = 0, s2 + t2 = 0.

x1, x3 span a two-dimensional subspace. Onecan take any basis.

Unit eigenvectors corresponding to cases s = 0and t = 0, respectively:

x1 =(1/

√5)−2

01

, x2 =(1/

√14) 1

3−2

, x3 =(1/

√2)1

10

.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 11 / 239

Page 11: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example, complex eigenvalues>> C=[-1 -1 1;4 2 4;-1 1 5];>> [V L]=eig(C)

V =0.2132 - 0.4264i 0.2132 + 0.4264i 0.0000 + 0.0000i-0.8528 + 0.0000i -0.8528 + 0.0000i 0.7071 + 0.0000i0.2132 - 0.0000i 0.2132 + 0.0000i 0.7071 + 0.0000i

L =0.0000 + 2.0000i 0.0000 + 0.0000i 0.0000 + 0.0000i0.0000 + 0.0000i 0.0000 - 2.0000i 0.0000 + 0.0000i0.0000 + 0.0000i 0.0000 + 0.0000i 6.0000 + 0.0000i

Eigenvalues: λ1 = 2i, λ2 = −2i, λ3 = 6.Unit eigenvectors:

x1 =(1/

√22)1 − 2i

−41

, x2 =(1/

√22)1 + 2i

−41

, x3 =(1/

√22)0

11

.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 12 / 239

Page 12: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of eigenvaluesTheorem. A symmetric n × n matrix possesses n real eigenvalues.Definition. The trace of an n × n square matrix A =

(aij)

is defined as

tr(A) = a11 + a22 + · · ·+ ann.

Theorem. Let λ1, λ2, . . . , λn be the eigenvalues of an n × n square mat-rix A. Then

tr(A) = λ1 + λ2 + · · ·+ λn =∑n

k=1 λk;det(A) = λ1 · λ2 · · · · · λn =

∏nk=1 λk.

Corollary. A square matrix A is singular, that is det(A) = 0, if 0 is aneigenvalue of A.

Theorem. Let λ1, λ2, . . . , λn be the eigenvalues of an n × n square mat-rix A. Then

the eigenvalues of Ak are λk1, λ

k2, . . . , λ

kn, k ∈ N;

if A is regular, then the eigenvalues of A−1 are 1/λ1, 1/λ2, . . . , 1/λn.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 13 / 239

Page 13: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Positive definite and positive semidefinite matricesDefinition. An n × n real matrix A is positive semidefinite, if for any vec-tor x ∈ Rn

x⊤Ax ≥ 0.A matrix A is positive definite, if for any non-zero vector 0 = x ∈ Rn

x⊤Ax > 0.

Theorem. For an n × n symmetric real matrix A the following statementsare equivalent:

A is positive definite;all principal minors of A are positive, that is

∆k := det(Ak

)> 0, k = 1, 2, . . . , n,

where Ak is the submatrix of A obtained by taking the upper left-hand corner k × k submatrix of A.the eigenvalues of A are positive.

Remark. The eigenvalues of a symmetric positive semidefinite matrix arenon-negative.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 14 / 239

Page 14: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Negative definite and negative semidefinite matricesDefinition. An n × n real symmetric matrix A is negative definite or ne-gative semidefinite if all of its eigenvalues are negative or non-positive, res-pectively.

Theorem. Let A be an n × n symmetric real matrix and denote by ∆k thekth principal minor of A, k = 1, 2, . . . , n.

A is negative definite if and only if (−1)k∆k > 0, k = 1, 2, . . . , n.If (−1)k∆k > 0, k = 1, 2, . . . , n − 1, and ∆n = 0 then A is negativesemidefinite.If ∆k > 0, k = 1, 2, . . . , n − 1, and ∆n = 0 then A is positive semide-finite.

Definition. A symmetric real matrix is indefinite if it is neither positive,nor negative semidefinite (so it cannot be positive or negative definiteeither).

Theorem. For any real matrix A matrix A⊤A is positive semidefinite.A⊤A is positive definite if and only if the columns of A are linearly inde-pendent.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 15 / 239

Page 15: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Matrix polynomials

Definition. Letp(x) = α0 + α1x + · · ·+ αkxk

be a real or complex polynomial and A be an n× n matrix. Then the valueof p(x) at A is defined as

p(A) := α0In + α1A + · · ·+ αkAk.

Cayley-Hamilton theorem. Let A be an n × n matrix and p(λ) be thecharacteristic polynomial of A. Then

p(A) = 0n,

where 0n is the n × n matrix with zero entries.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 16 / 239

Page 16: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Matrix power seriesDefinition. Let

f(x) =∞∑

k=0αkxk

be a real or complex power series and A be an n × n matrix. Then

f(A) :=∞∑

k=0αkAk,

given it converges.

Examples. Let A be an n × n matrix.

1 exp(x) = ex =∞∑

k=0

xk

k! , that is exp(A) =∞∑

k=0

Ak

k! .

2 cos(x) =∞∑

k=0

(−1)kx2k

(2k)! , that is cos(A) =∞∑

k=0

(−1)kA2k

(2k)! .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 17 / 239

Page 17: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Evaluation of matrix power series

A: an n × n matrix with characteristic polynomial p(x).f(x): an arbitrary polynomial or power series.

Aim: give the value of f(A) in a closed form.

Division with remainder:

f(x) = g(x) · p(x) + r(x), where deg[r(x)

]≤ n − 1.

Cayley-Hamilton theorem: p(A) = 0n, that is f(A) = r(A).

It suffices to determine the coefficients of

r(x) = β0 + β1x + · · ·βn−1xn−1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 18 / 239

Page 18: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Single eigenvalues

Assume that all roots of p(x) (eigenvalues of A) are single, that is

p(x) = (−1)n(x − λ1)(x − λ2) · · · (x − λn).

p(λi) = 0, so f(λi) = g(λi)p(λi) + r(λi) = r(λi), i = 1, 2, . . . , n.

To determine the coefficients of

r(x) = β0 + β1x + · · ·βn−1xn−1

one has to solve the system of linear equations

f(λi) = β0 + β1λi + · · ·βn−1λn−1i , i = 1, 2, . . . , n.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 19 / 239

Page 19: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind f(A) := exp(A) for

A =

−1 −1 1−4 2 4−1 1 5

, that is A2 =

4 0 0−8 12 24−8 8 28

.

Characteristic polynomial: p(x) = −x3 + 6x2 + 4x − 24 = −(x − 6)(x + 2)(x − 2).

Eigenvalues: λ1 = 6, λ2 = −2, λ3 = 2.

The form of the remainder polynomial r(x): r(x) := β0 + β1x + β2x2.

System of equations to be solved:

e6 = β0 + 6β1 + 36β2; e−2 = β0 − 2β1 + 4β2; e2 = β0 + 2β1 + 4β2.

Solution:

β0 =18(− e6 + 6e2 + 3e−2), β1 =

14(e2 − e−2), β2 =

132(e6 − 2e2 + e−2).

exp(A) =14

e2+3e−2 −e2+e−2 e2−e−2

−e6−2e2+3e−2 e6+2e2+e−2 3e6−2e2−e−2

−e6+e2 e6−e2 3e6−e2

.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 20 / 239

Page 20: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Repeated eigenvaluesAssume that the roots of p(x) (eigenvalues of A) are:

λ1 : ℓ1-fold, λ2 : ℓ2-fold, · · · , λk : ℓk-fold, ℓ1 + ℓ2 + · · ·+ ℓk = n.Characteristic polynomial:

p(x) = (−1)n(x − λ1)ℓ1(x − λ2)

ℓ2 · · · (x − λk)ℓk .

p(λi) = p′(λi) = p′′(λi) = · · · = p(ℓi−1)(λi) = 0, i = 1, 2, . . . , k, sof(λi) =g(λi)p(λi)+r(λi)= r(λi);

f′(λi) =g′(λi)p(λi)+g(λi)p′(λi)+r′(λi)= r′(λi);

f′′(λi) =g′′(λi)p(λi)+2g′(λi)p′(λi)+g(λi)p′′(λi)+r′′(λi)= r′′(λi);

...f(ℓi−1)(λi) = · · · = r(ℓi−1)(λi), i = 1, 2, . . . , k.

The system of n equations of n variables to be solved:f(j)(λi) = r(j)(λi), i = 1, 2, . . . , k, j = 1, 2, . . . , ℓi.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 21 / 239

Page 21: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind f(B) := exp(B) for

B =

3 −1 23 −1 6−2 2 −2

, that is B2 =

2 2 −4−6 10 −12−4 −4 12

.

Characteristic polynomial: p(x) = −x3 + 12x − 16 = −(x − 2)2(x + 4).Eigenvalues: λ1 = λ2 = 2, λ3 = −4.The form of the remainder polynomial r(x): r(x) := β0+β1x+β2x2; r′(x) = β1+2β2x.System of equations to be solved:

e2 = β0 + 2β1 + 4β2; e2 = β1 + 4β2; e−4 = β0 − 4β1 + 16β2.

Solution:

β0 =19(− 4e2 + e−4), β1 =

19(4e2 − e−4), β2 =

136(5e2 + e−4).

exp(B) =

7e2−e−4

6−e2+e−4

6e2−e−2

3e2−e−4

2e2+e−4

2 e2 − e−4

−e2+e−4

3e2−e−4

3e2+2e−4

4

.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 22 / 239

Page 22: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MATLAB solution

A =

−1 −1 1−4 2 4−1 1 5

, B=

3 −1 23 −1 6−2 2 −2

.

eA =

e2+3e−2

4−e2+e−2

4e2−e−2

4−e6−2e2+3e−2

4e6+2e2+e−2

43e6−2e2−e−2

4−e6+e2

4e6−e2

43e6−e2

4

=

1.9488 −1.8134 1.8134−104.4502 104.5856 298.8432−99.0099 99.0099 304.4189

.

eB =

7e2−e−4

6−e2+e−4

6e2−e−2

3e2−e−4

2e2+e−4

2 e2 − e−4

−e2+e−4

3e2−e−4

3e2+2e−4

4

=

8.6175 −1.2285 2.45693.6854 3.7037 7.3707−2.4569 2.4569 2.4752

.

>> A=[-1 -1 1;-4 2 4;-1 1 5];>> expA=expm(A)

expA =

1.9488 -1.8134 1.8134-104.4502 104.5856 298.8432-99.0099 99.0099 304.4189

>> B=[3 -1 2;3 -1 6;-2 2 -2];>> expB=expm(B)

expB =

8.6175 -1.2285 2.45693.6854 3.7037 7.3707

-2.4569 2.4569 2.4752

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 23 / 239

Page 23: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Singular-value decomposition of matricesTheorem. All m × n real matrices A can be decomposed as

A = UΣV⊤

called singular-value decomposition (SVD), where U and V are m×m andn×n orthogonal matrices, respectively, and Σ is an m×n dimensional dia-gonal matrix with real diagonal elements σ1 ≥ σ2 ≥ · · · ≥ σmin{m,n} ≥ 0called the singular values of A.Remark. The number of positive singular values coincide with the rank rof A. In this case the singular value decomposition equals

A =r∑

i=1σiuiv⊤i ,

where ui and vi the ith columns of matrices U and V, respectively.

Application:image processing: compression, noise reduction (deblurring);digital signal processing: noise reduction.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 24 / 239

Page 24: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Spectral decomposition of symmetric matricesTheorem. Let A be an n × n symmetric real matrix with eigenvaluesλ1 ≥ λ2 ≥ · · · ≥ λn and corresponding orthogonal unit eigenvectorsq1,q2, . . . ,qn. The spectral decomposition of A can be written as

A =n∑

i=1λiqiq⊤

i .

The decomposition can be restated in matrix form

A = QΛQ⊤,

where Q is an orthogonal matrix having columns q1,q2, . . . ,qn, whereasΛ = diag(λ1, λ2, . . . , λn).

Remark. For symmetric positive definite matrices the spectral decompo-sition is identical to the singular-value decomposition.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 25 / 239

Page 25: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MATLAB function svd

>> A=[1 2 3 4;5 6 7 8];

Full decomposition

>> [U,S,V]=svd(A)

U =-0.3762 -0.9266-0.9266 0.3762

S =14.2274 0 0 0

0 1.2573 0 0

V =-0.3521 0.7590 -0.4001 -0.3741-0.4436 0.3212 0.2546 0.7970-0.5352 -0.1165 0.6910 -0.4717-0.6268 -0.5542 -0.5455 0.0488

Parsimonious decomposition

>> [U,S,V]=svd(A,'econ')

U =-0.3762 -0.9266-0.9266 0.3762

S =14.2274 0

0 1.2573

V =-0.3521 0.7590-0.4436 0.3212-0.5352 -0.1165-0.6268 -0.5542

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 26 / 239

Page 26: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Moore-Penrose pseudoinverseDefinition. Let A be an m × n real matrix with singular-valuedecomposition

A = UΣV⊤.

The Moore-Penrose pseudoinverse of A is the n × m matrix

A+ = VΣ+U⊤,

where matrix Σ+ is obtained by taking the reciprocial of each non-zeroelement of the diagonal of Σ and then transposing the matrix.Theorem. Any matrix has a unique pseudoinverse.Theorem. Properties of the pseudoinverse:

AA+A = A and A+AA+ = A+;(AA+

)⊤= AA+ and

(A+A

)⊤= A+A;

if A is regular then A+ = A−1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 27 / 239

Page 27: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Solution of systems of linear equations

Consider a system of linear equations of form Ax = b, where A is anm × n real matrix, b ∈ Rm, x ∈ Rn.

General solution: x∗ = A+b.If the system of equations has a unique solution, x∗ is exactly thissolution.If the system of equations has several solutions, x∗ is a solution withthe smallest norm.If the system of equations is contradictory, that is it does not have asolution, then x∗ is a minimum point of ∥Ax − b∥2 with the smallestnorm.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 28 / 239

Page 28: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MATLAB function pinv>> A=[1 2 3 4;5 6 7 8];>> A_inv=pinv(A)

A_inv =-0.5500 0.2500-0.2250 0.12500.1000 -0.00000.4250 -0.1250

>> B=[-1 -1 1;4 2 4;3 1 5];>> det(B)

ans =0

>> B_inv=pinv(B)

B_inv =-0.2115 0.1538 -0.0577-0.2179 0.1282 -0.08970.2372 -0.0513 0.1859

>> C=[-1 -1 1;4 2 4;-1 1 5];>> det(C)

ans =24

>> C_inv=pinv(C)

C_inv =0.2500 0.2500 -0.2500-1.0000 -0.1667 0.33330.2500 0.0833 0.0833

>> inv(C)

ans =0.2500 0.2500 -0.2500-1.0000 -0.1667 0.33330.2500 0.0833 0.0833

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 29 / 239

Page 29: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Differentiability of univariable functions

Definition. A function f : R 7→ R is said to be differentiable at a point xif the limit

limh→0

f(x + h)− f(x)h =: f′(x) = df

dx(x)

exists. f′(x) is called the derivative of f at point x.Remark. f is differentiable at x if and only if there exists a number a ∈ Rsuch that

limh→0

f(x + h)− f(x)− a · hh = 0.

In this case a = f′(x).Remark. Best linear approximation:

f(x + h) ≈ f(x) + f′(x) · h.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 31 / 239

Page 30: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Differentiability of multivariable functionsDefinition. We say that f : Rn 7→ R is differentiable at a point x ∈ Rn ifthere exists a vector a ∈ Rn such that

limh→0

f(x + h)− f(x)− a⊤h∥h∥2

= 0.

Vector a is called the gradient of f at x and denoted by ∇f(x).

Definition We say that a function f : Rn 7→ R is partially differentiable atx = (x1, x2, . . . , xn) ∈ Rn with respect to xi if the limit

∂f∂xi

(x) := limh→0

f(x + hei)− f(x)h

= limh→0

f(x1, . . . , xi−1, xi + h, xi+1, . . . xn)− f(x1, . . . , xi−1, xi, xi+1, . . . xn)

h

exists. ∂f∂xi

is the ith partial derivative of f, i = 1, 2, . . . , n.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 32 / 239

Page 31: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Second partial derivativesTheorem. If a function f : Rn 7→ R is differentiable at a point x ∈ Rn

then all partial derivatives of f exist and the gradient of f is

∇f(x) =(

∂f∂x1

(x), ∂f∂x2

(x), . . . , ∂f∂xn

(x))⊤

.

Remark. Taking the partial derivatives of the partial derivatives∂f∂xi

: Rn 7→ R, i = 1, 2, . . . , n, results in the second partial derivatives

∂2f∂xi∂xj

(x), i, j = 1, 2, . . . , n.

If the second partial derivatives are continuous at all x ∈ Rn then

∂2f∂xi∂xj

(x) = ∂2f∂xj∂xi

(x), i, j = 1, 2, . . . , n.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 33 / 239

Page 32: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Hessian

Definition. If the second partial derivatives of f : Rn 7→ R exist then thematrix

∇2f(x) :=

∂2f∂x2

1(x) ∂2f

∂x1∂x2(x) · · · ∂2f

∂x1∂xn(x)

∂2f∂x2∂x1

(x) ∂2f∂x2

2(x) · · · ∂2f

∂x2∂xn(x)

... ... ...∂2f

∂xn∂x1(x) ∂2f

∂xn∂x2(x) · · · ∂2f

∂x2n(x)

is called the Hessian of f.

Remark. If the second partial derivatives are continuous then the Hessianis symmetric.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 34 / 239

Page 33: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Find the gradient and the Hessian of

f(x1, x2) := x31 + x3

2 − 3x1 − 3x2.

-4

2

1.5

-2

1

0.5

0 2

0

1.5-0.5 1

0.5-1 0

2

-0.5-1.5

-1

-1.5-2

-2

4

Solution.

Partial derivatives:∂f∂x1

(x1, x2)=3x21−3, ∂f

∂x2(x1, x2)=3x2

2−3.

Gradient:

∇f(x1, x2) =(3x2

1 − 3, 3x22 − 3

)⊤.

Second partial derivatives:

∂2f∂x2

1(x1, x2) = 6x1,

∂2f∂x1∂x2

(x1, x2) = 0,

∂2f∂x2

2(x1, x2) = 6x2,

∂2f∂x2∂x1

(x1, x2) = 0.

Hessian:

∇2f(x1, x2) =

(6x1 00 6x2

).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 35 / 239

Page 34: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind the gradient and the Hessian of

f(x1, x2, x3) := x1x2x3 − x3ex1 + x4

2x23.

MATLAB solution.

>> syms x1 x2 x3>> F=x1*x2*x3+x3*exp(x1)+x2^4*x3^2;>> GradF=jacobian(F)

GradF =[ x2*x3 + x3*exp(x1), 4*x2^3*x3^2 + x1*x3, 2*x3*x2^4 + x1*x2 + exp(x1)]

>> HesseF=jacobian(GradF)

HesseF =[ x3*exp(x1), x3, x2 + exp(x1)][ x3, 12*x2^2*x3^2, 8*x3*x2^3 + x1][ x2 + exp(x1), 8*x3*x2^3 + x1, 2*x2^4]

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 36 / 239

Page 35: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Jacobian

Definition. A vector valued function f = (f1, f2, . . . , fm)⊤ : Rn 7→ Rm isdifferentiable at a point x = (x1, x2, . . . , xn)⊤ ∈ Rn if the component func-tions fi, i = 1, 2, . . . ,m, are differentiable at x. In this case the m × nmatrix

f′(x) :=

∂f1∂x1

(x) ∂f1∂x2

(x) · · · ∂f1∂xn

(x)∂f2∂x1

(x) ∂f2∂x2

(x) · · · ∂f2∂xn

(x)... ... ...

∂fm∂x1

(x) ∂fm∂x2

(x) · · · ∂fm∂xn

(x)

is called the Jacobian of f.

Remark. The ith row of the Jacobian equals ∇fi(x)⊤.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 37 / 239

Page 36: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind the Jacobian of the functions

f(x1, x2) :=

e2x1+x2

x2 − x1x2

1 + x2

and g(y1, y2, y3) :=

(y1 + 2y2 + y2

3y2

1 + sin(y2 + y3)

).

MATLAB solution.

>> syms x1 x2>> f=[exp(2*x1+x2);x2-x1;x1^2+x2]

f =exp(2*x1 + x2)

x2 - x1x1^2 + x2

>> Jf=jacobian(f)

Jf =[ 2*exp(2*x1 + x2), exp(2*x1 + x2)][ -1, 1][ 2*x1, 1]

>> syms y1 y2 y3>> g=[y1+2*y2+y3^2;y1^2+sin(y2+y3)]

g =y3^2 + y1 + 2*y2

y1^2 + sin(y2 + y3)

>> Jg=jacobian(g)

Jg =[ 1, 2, 2*y3][ 2*y1, cos(y2 + y3), cos(y2 + y3)]

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 38 / 239

Page 37: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Taylor series expansionTaylor’s theorem. Let f : Rn 7→ R be a continuously differentiable (diffe-rentiable with continuous partial derivatives) function and let p ∈ Rn.Then we have

f(x + p) = f(x) +∇f(x + tp)⊤pfor some t ∈ (0, 1). Moreover, if f is twice continuously differentiable then

f(x + p) = f(x) +∇f(x)⊤p +12p⊤∇2f(x + tp)⊤p

for some t ∈ (0, 1).

First-order Taylor approximation around x:

f(x + p) ≈ f(x) +∇f(x)⊤p.

Second-order Taylor approximation around x:

f(x + p) ≈ f(x) +∇f(x)⊤p +12p⊤∇2f(x)⊤p.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 39 / 239

Page 38: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind the first- and second-order Taylor approximation of the function

f(x1, x2) := xx21 , x1 ∈ R+, x2 ∈ R,

around (1, 1).Solution. Partial derivatives:

∂f∂x1

(x1, x2) = x2xx2−11 ,

∂f∂x2

(x1, x2) = xx21 ln(x1);

∂2f∂x2

1(x1, x2) = x2(x2 − 1)xx2−2

1 ,∂2f∂x2

2(x1, x2) = xx2

1 ln2(x1),

∂2f∂x1∂x2

(x1, x2) = xx2−11

(1 + x2 ln(x1)

)=

∂2f∂x2∂x1

(x1, x2).

Gradient: ∇f(1, 1) = (1, 0)⊤; Hessian: ∇2f(1, 1) =(

0 11 0

).

First-order approximation:f(1 + p1, 1 + p2) ≈ f(1, 1) +∇f(1, 1)⊤(p1, p2)

⊤ = 1 + p1.

Second-order approximation:

f(1+p1, 1+p2) ≈ f(1, 1)+∇f(1, 1)⊤(p1, p2)⊤+

12 (p1, p2)∇2f(1, 1)(p1, p2)

⊤= 1+p1+p1p2.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 40 / 239

Page 39: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Approximate the value of the expression 1.011.005 using Taylor expansion.

Solution.

Function f(x1, x2) := xx21 .

First-order expansion around (1, 1): f(1 + p1, 1 + p2) ≈ 1 + p1.

Second-order expansion around (1, 1): f(1 + p1, 1 + p2) ≈ 1 + p1 + p1p2.

With p1 = 0.01, p2 = 0.005the first-order approximation: 1.011.005 ≈ 1 + 0.01 = 1.01;the second-order approximation: 1.011.005 ≈ 1 + 0.01 + 0.01 · 0.005 = 1.01005;true value (rounded to 15 decimals): 1.011.005 = 1.010050250420819.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 41 / 239

Page 40: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Extremal points of multivariable functions

Let f : Rn 7→ R be a continuously differentiable function.x∗ is a global minimizer [maximizer] of the function f if f(x∗) ≤ f(x)[f(x∗) ≥ f(x)

]for all x ∈ Rn.

x∗ is a local minimizer [maximizer] of f if there is a neighborhoodN ⊂ Rn of x∗ such that f(x∗) ≤ f(x)

[f(x∗) ≥ f(x)

]for all x ∈ N .

x∗ is a strict local minimizer [maximizer] of f if there is a neighbor-hood N ⊂ Rn of x∗ such that f(x∗) < f(x)

[f(x∗) > f(x)

]for all

x∗ = x ∈ N .x∗ an isolated local minimizer [maximizer] of f if there is a neighbor-hood N ⊂ Rn of x∗ such that x∗ is the only local minimizer [maximi-zer] in N .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 42 / 239

Page 41: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Stationary pointsTheorem. (First-order necessary conditions) If x∗ is a local extremal po-int (minimizer or maximizer) of f : Rn 7→ R and f is continuously differen-tiable in an open neighborhood of x∗, then ∇f(x∗) = 0.

Definition. We call x∗ a stationary point of f : Rn 7→ R if ∇f(x∗) = 0.

Definition. If x∗ is such a stationary point of f that is neither a maximi-zer, nor a minimizer, then x∗ is a saddle point.

-4

-3

2

-2

-1

0

1

1

2

3

2

4

0

1

-1 0

-1

-2-2

Example.

f(x1, x2) := x21 − x2

2.

∇f(x1, x2) =(2x1,−2x2

)⊤.

(0, 0) is the only stationary point,which is a saddle point.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 43 / 239

Page 42: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Second-order conditions of extrema

Theorem. (Second-order necessary conditions) If x∗ is a local minimizer[maximizer] of f : Rn 7→ R and ∇2f exists and continuous in an openneighborhood of x∗, then

a) ∇f(x∗) = 0,b) ∇2f(x∗) is positive [negative] semidefinite.

Theorem. (Second-order sufficient conditions) Assume that ∇2f existsand continuous in an open neighborhood of x∗ and that

a) ∇f(x∗) = 0,b) ∇2f(x∗) is positive [negative] definite.

Then x∗ is a strict local minimizer [maximizer] of f.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 44 / 239

Page 43: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Find the extremal points of the function

f(x1, x2) := x31 + x3

2 − 3x1 − 3x2.

-4

2

1.5

-2

1

0.5

0 2

0

1.5-0.5 1

0.5-1 0

2

-0.5-1.5

-1

-1.5-2

-2

4

Solution.

Gradient:

∇f(x1, x2) =(3x2

1 − 3, 3x22 − 3

)⊤.

Stationary points:

(1, 1), (−1, 1), (1,−1), (−1,−1).

Hessian:

∇2f(x1, x2) =

(6x1 00 6x2

).

∇2f(1, 1) is positive definite, (1, 1) isa minimizer.∇2f(−1,−1) is negative definite,(1, 1) is a maximizer.∇2f(−1, 1) and ∇2f(1,−1) are inde-finite, (−1, 1) and (1,−1) are saddlepoints.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 45 / 239

Page 44: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Optimization problemGiven a continuously differentiable function f : Rn 7→ R, find the value of

minx

f(x).

Examples.

f(x1, x2) := x31 + x3

2 − 3x1 − 3x2.

-4

2

1.5

-2

1

0.5

0 2

0

1.5-0.5 1

0.5-1 0

2

-0.5-1.5

-1

-1.5-2

-2

4

Rosenbrock function:

g(x1, x2) := 100(x2−x2

1)2+(1−x1

)2.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 47 / 239

Page 45: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Optimization algorithmsGeneral form of an optimization algorithm:

1 specification of a starting point x0;2 determination of the strategy xk −→ xk+1;3 stopping criterion.

Line search: chooses a descent direction pk and searches along this direc-tion by solving the one-dimensional optimization problem

minα

f(xk + αpk).

Definition. An optimization algorithm with strategy xk −→ xk+1 conver-ges in order q to the optimal point x∗ if there exists a constant C>0 suchthat for some norm ∥ · ∥ inequality

∥xk+1 − x∗∥ ≤ C∥xk − x∗∥q

holds.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 48 / 239

Page 46: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Line search algorithmsOne should choose a search direction p such that function f should decre-ase along p.First-order Taylor approximation for some small α:

f(xk + αp) ≈ f(xk) + αp⊤∇f(xk).

The problem to be solved:

minp

p⊤∇f(xk), ∥p∥ = 1.

Asp⊤∇f(xk) = ∥p∥ · ∥∇f(xk)∥ cosΘ = ∥∇f(xk)∥ cosΘ,

the optimal direction corresponds to cosΘ = −1:

p = − ∇f(xk)

∥∇f(xk)∥

Steepest descent method (gradient method).Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 49 / 239

Page 47: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

-4

2

1.5

-2

1

0.5

0 2

0

1.5-0.5 1

0.5-1 0

2

-0.5-1.5

-1

-1.5-2

-2

4

f(x1, x2) := x31 + x3

2 − 3x1 − 3x2

Minimizer: (1, 1).Maximizer: (−1,−1).Saddle point: (1,−1) and (−1, 1).

xs

xmin

xmax

xs

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1

0

1

2

Contour lines, stationary points and thenegative gradient field.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 50 / 239

Page 48: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Descent directionsDefinition. A direction p ∈ Rn is a descent direction at x if

p⊤∇f(x) < 0.

If α is sufficiently small

f(xk + αp) ≈ f(xk) + αp⊤∇f(xk).

Asp⊤∇f(xk) = ∥p∥ · ∥∇f(xk)∥ cosΘ,

inequality cosΘ < 0 implies p⊤∇f(xk) < 0

Special case: steepest descent (gradient method):

p⊤ = −∇f(xk) azaz cosΘ = −1.

Problem: Gradient method can be very slow, if the optimum point laysin a long, prolate valley.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 51 / 239

Page 49: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Effect of prolate valleys

xs

xmin

xmax

xs

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1

0

1

2

Contour lines and the negative gradientfield of

f(x1, x2) := x31 + x3

2 − 3x1 − 3x2.

xs

xmin

xmax

xs

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1

0

1

2

Contour lines and the negative gradientfield of

g(x1, x2) := 10x31 + x3

2 − 30x1 − 3x2.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 52 / 239

Page 50: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Gradient method - step length

Given a search direction pk find the minimizer of the univariate function

α 7→ f(xk + αpk), (α > 0).

Finding the exact minimizer might be too expensive (too many functionevaluations, too large computation costs), hence it suffices to find an αwhich is “good enough”.

Two steps:Determine the maximal step length.In the given interval find an “good” α such that f(xk + αpk) issufficiently smaller than f(xk).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 53 / 239

Page 51: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Ideal case: quadratic functionLet f : Rn 7→ R be quadratic, that is

f(x) := 12x⊤Qx − b⊤x.

Q: n × n-dimensional, symmetric, positive definite matrix; b ∈ Rn.In this case ∇f(x) = Qx − b, that is, for a stationary point x∗ we haveQx∗ = b.The optimal step length at point xk:

αk =∇f⊤(xk)∇f(xk)

∇f⊤(xk)Q∇f(xk),

that isxk+1 = xk −

(∇f⊤(xk)∇f(xk)

∇f⊤(xk)Q∇f(xk)

)∇f(xk).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 54 / 239

Page 52: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example: gradient method

−10 −8 −6 −4 −2 0 2 4 6 8 10−4

−3

−2

−1

0

1

2

3

4

Gradient method for f(x1, x2) := x21 + 10x2

2.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 55 / 239

Page 53: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Gradient method with backtrackingBacktracking method for choosing αk:

1 Let c1 ∈ (0, 1), ϱ ∈ (0, 1) be fixed, α := α0;2 while f(xk + αpk) > f(xk) + αc1∇f(xk)⊤pk

α := ϱαend

3 αk = α;

Algorithm of the gradient method with backtracking:1 Let x0 be given.2 Given xk, let pk = −∇f(xk).3 Choose αk using the backtracking algorithm.4 Let xk+1 = xk + αkpk.5 Stopping criterion: ∇f(xk) = 0 (∥∇f(xk)∥ < ε).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 56 / 239

Page 54: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example: gradient method with backtracking

0.99913 0.99914 0.99915 0.99916 0.99917 0.99918 0.99919 0.9992 0.99921

0.99825

0.9983

0.99835

0.9984

Gradient method with backtracking for the Rosenbrock function, the last130 iteration steps. x0 = (−1.2, 1), ε = 10−3, ϱ = 0.5. Total number ofiteration steps: 5231.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 57 / 239

Page 55: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Newton’s method for nonlinear equationsNonlinear equation:

f(x) = 0, where f : R 7→ R.

Newton’s iteration with starting point x0 for approximation of a root ofthe nonlinear equation f(x) = 0:

xk+1 = xk −f(xk)

f′(xk), k = 0, 1, 2, . . . .

Nonlinear system of equations:

F(x) = 0, where F : Rn 7→ Rn.

Newton’s iteration with starting point x0 for approximation of a root ofthe nonlinear system of equations F(x) = 0:

F′(xk)(xk+1 − xk

)= −F(xk), k = 0, 1, 2, . . .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 58 / 239

Page 56: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Newton’s method for optimizationThe minimizer of f is a solution of the equation ∇f(x) = 0.

∇f : Rn 7→ Rn, so ∇f(x) = 0 is a nonlinear system of equations.

If f is twice continuously differentiable, then Newton’s method with start-ing point x0 for the system of equations ∇f(x) = 0:

∇2f(xk)(xk+1 − xk

)= −∇f(xk), k = 0, 1, 2, . . . .

The algorithm:let x0 be given;∇2f(xk)pk = −∇f(xk), that is pk = −

(∇2f(xk)

)−1∇f(xk);xk+1 = xk + pk.

Remark. Step length of Newton’s methods equals 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 59 / 239

Page 57: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Newton directionsDefinition. Direction

pk = −(∇2f(xk)

)−1∇f(xk)

is called Newton direction.

Remark. If ∇2f(xk) is positive definite, then the Newton direction is adescent direction:

p⊤k ∇f(xk) = −∇f⊤(xk)

(∇2f(xk)

)−1∇f(xk) < 0.

If ∇2f(xk) is not positive definite, then the Newton direction might not bedefined, or might not be a descent direction.

Advantage: in a neighborhood of the minimizer the rate of conver-gence of the optimization method is quadratic.Disadvantage: it requires the knowledge of the Hessian.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 60 / 239

Page 58: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Quasi-Newton directionIf the Hessian ∇2f(xk) is not known, or it is too expensive to determine,on can use an approximation Bk ≈ ∇2f(xk). It results in a quasi-Newtondirection.

Matrices Bk shouldsatisfy equation

Bk+1(xk+1 − xk

)= ∇f(xk+1)−∇f(xk);

should be symmetric;Bk and Bk+1 should have a low rank difference.

Most popular: Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm.The BFGS formula is

Bk+1 = Bk −Bksks⊤k Bk

s⊤k Bksk+

yky⊤ky⊤k sk

,

where sk = xk+1 − xk and yk = ∇f(xk+1)−∇f(xk).Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 61 / 239

Page 59: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example: BFGS

x1

x2

−1.5 −1 −0.5 0 0.5 1 1.5−1

−0.5

0

0.5

1

1.5

2

BFGS for the Rosenbrock function, x0 = (−1.2, 1), ε = 10−4. Total number of iterationsteps: 36.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 62 / 239

Page 60: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

PartitionsLet D := [a, b]× [c, d] ⊂ R2 be a rectangular domain and f : D 7→ R be abounded function.Consider a partition of the intervals by points

a = x0 < x1 < x2 < · · · < xm−1 < xm = b,c = y0 < y1 < y2 < · · · < yn−1 < yn = d.

The partition P of D consists of the mn rectangles

Rij :={(x, y) ∈ R2 ∣∣ xi−1 ≤ x ≤ xi, yj−1 ≤ y ≤ yj

},

i = 1, 2, . . . ,m, j = 1, 2, . . . , n.

Area of Rij: ∆Aij := ∆xi∆yj = (xi − xi−1)(yj − yj−1).

Diameter of Rij: diam(Rij) :=√(xi − xi−1)2 + (yj − yj−1)2.

The norm of the partition P: ∥P∥ := max1≤i≤m1≤j≤n

diam(Rij).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 64 / 239

Page 61: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Riemann sumsDefinition. The Riemann sum corresponding to a partition P of the rect-angle D and a set of arbitrary points

(x∗ij, y∗ij

)∈ Rij is defined as

R(f,P) :=m∑

i=1

n∑j=1

f(x∗ij, y∗ij

)∆Aij.

Source: Robert Adams, Christopher Essex: Calculus: AComplete Course, 7th Edition. Pearson, Toronto, 2010.

Assume f ≥ 0.f(x∗ij, y∗ij

)∆Aij: volume of the

box with base Rij and heightf(x∗ij, y∗ij

).

R(f,P): approximation of thevolume above D under thegraph of the function f.∫∫

D f(x, y)dxdy: the limit ofR(f,P) as ∥P∥ → 0 if thelimit exists independently ofthe choice of

(x∗ij, y∗ij

).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 65 / 239

Page 62: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Double integral over a rectangleDefinition. Let f : D ⊂ R2 7→ R be a bounded function. We say that f isintegrable over the rectangle D and has double integral

I :=

∫∫D

f(x, y)dxdy

if for every refinig sequence Pk of partitions of D with limk→∞ ∥Pk∥ = 0and choices of points of the subrectangles of Pk the corresponding Rie-mann sums R(f,Pk) converge to I.Example Let D = [0, 1]2 and f(x, y) := 2x2 + xy.Consider the partition defined by lines x = 1/2 and y = 1/2 and choose the centres ofthe obtained squares, that is (1/4, 1/4), (1/4, 3/4), (3/4, 1/4) and (3/4, 3/4).

∫∫D(2x2 + xy)dxdy ≈

(2 · 1

16 +14 · 1

4

)14 +

(2 · 1

16 +14 · 3

4

)14

+

(2 · 9

16 +34 · 1

4

)14 +

(2 · 9

16 +34 · 3

4

)14 =

78 = 0.875.∫∫

D(2x2 + xy)dxdy =

1112 = 0.9167.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 66 / 239

Page 63: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

General bounded region

Definition. Let S ⊂ R2 be a bounded region, f : S 7→ R be a boundedfunction and fS : R2 7→ R be a function defined as

fS(x, y) :={

f(x, y), if (x, y) ∈ S,0, if (x, y) ∈ S.

Further, let D ⊂ R2 be a rectangle such that S ⊆ D. We say that f isintegrable over S if fS is integrable over D and the double integral of f overS is defined as ∫∫

Sf(x, y)dxdy :=

∫∫D

fs(x, y)dxdy.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 67 / 239

Page 64: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the double integralTheorem. Let D ⊂ R2 be a bounded domain, f, g : D 7→ R be boundedfunctions and denote by λ(D) the area of D.

a) If λ(D) = 0 then ∫∫D

f(x, y)dxdy = 0.

b) ∫∫D

1dxdy = λ(D).

c) If f and g are integrable over D then αf + βg is also integrable and∫∫D

(αf(x, y) + βg(x, y)

)dxdy =α

∫∫D

f(x, y)dxdy

+ β

∫∫D

g(x, y)dxdy, α, β ∈ R.

d) If f and g are integrable over D and f(x, y) ≤ g(x, y) on D then∫∫D

f(x, y)dxdy ≤∫∫

Dg(x, y)dxdy.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 68 / 239

Page 65: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the double integrale) If f is integrable over D then |f| is also integrable and∣∣∣∣∫∫

Df(x, y)dxdy

∣∣∣∣ ≤ ∫∫D

∣∣f(x, y)∣∣dxdy.

f) Let S ⊆ D. If f ≥ 0 is integrable both over S and D then∫∫S

f(x, y)dxdy ≤∫∫

Df(x, y)dxdy.

g) If D1,D2, . . . ,Dk are nonoverlapping domains on each of which f isintegrable, then f is integrable over the union

D := D1 ∪ D2 ∪ · · · ∪ Dk

and ∫∫D

f(x, y)dxdy ≤k∑

i=1

∫∫Di

f(x, y)dxdy.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 69 / 239

Page 66: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Simple domainsDefinition. We say that the domain D ⊂ R is y-simple if it is bounded bytwo vertical lines x = a and x = b and two continuous graphs y = c(x) andx = d(x) between these lines. Similarly, D is x-simple if it is bounded byhorizontal lines y = c and y = d and continuous graphs x = a(y) andx = b(y).

xba

y

y=c(x)

y=d(x)

x

c

d

y

x=a(y)x=b(y)

y-simple domain x-simple domain

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 70 / 239

Page 67: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Iteration of double integralsTheorem. If f(x, y) is continuous on the bounded y-simple domain Ddefined by a ≤ x ≤ b and c(x) ≤ y ≤ d(x), then∫∫

Df(x, y)dxdy =

∫ b

a

[∫ d(x)

c(x)f(x, y)dy

]dx.

Similarly, if f(x, y) is continuous on the bounded x-simple domain D defi-ned by c ≤ y ≤ d and a(y) ≤ x ≤ b(y), then∫∫

Df(x, y)dxdy =

∫ d

c

[∫ b(y)

a(y)f(x, y)dx

]dy.

Remark. Instead of∫∫

D f(x, y)dxdy one can write∫∫

D f(x, y)dydx, bothexpressions stand for the double integral of f over D. The order of dx anddy is important when the double integral is iterated.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 71 / 239

Page 68: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind ∫∫

D(x2 + y)dxdy where S :=

{(x, y)

∣∣ 0 ≤ x ≤ 1, x2 ≤ y ≤√

x}.

Solution.∫∫D(x2 + y)dxdy =

∫ 1

0

[∫ √x

x2(x2 + y)dy

]dx =

∫ 1

0

[x2y +

y2

2

]y=√x

y=x2dx

=

∫ 1

0

(x5/2 +

x2 − 3

2x4)

dx =

[27x7/2 +

x2

4 − 32

x5

5

]1

0=

33140 .

MATLAB solution.

>> syms x y>> int(int((x^2+y),y,x^2,sqrt(x)),x,0,1)

ans =33/140

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 72 / 239

Page 69: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Improper integralsWe talk about improper double integrals if either the domain D ⊆ R2 ofintegration is unbounded or the interand f is unbounded near a point ofthe domain of integrations or its boundary.For f ≥ 0 or f ≤ 0 the integral is either exists (finite) or infinite.

Example. Evaluate∫∫

T1x4 e−y/xdxdy where T :=

{(x, y) ∈ R2 ∣∣ x ≥ 1, 0 ≤ y ≤ x

}.

Solution.∫∫T

1x4 e−y/xdxdy =

∫ ∞

1

∫ x

0

1x4 e−y/xdydx =

∫ ∞

1

1x4

[∫ x

0e−y/xdy

]dx

=

∫ ∞

1

1x4

[−xe−y/x

]y=x

y=0dx =

(1− 1

e

)lim

ϱ→∞

∫ ϱ

1

dxx3

=

(1− 1

e

)lim

ϱ→∞

[− 1

2x2

]ϱ1=

(1− 1

e

)lim

ϱ→∞

(12− 1

2ϱ2

)=

12

(1− 1

e

).

MATLAB solution.

>> syms x y>> f=exp(-y/x)/x^4;

>> int(int(f,y,0,x),x,1,Inf)

ans =1/2 - exp(-1)/2

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 73 / 239

Page 70: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Change of variablesLet f : D ⊆ R2 7→ R an integrable function and suppose x and y are ex-pressed as functions of two other variables u and v by the equations

x = x(u, v), y = y(u, v).These equations define a transformation from points (u, v) of the uv-planeto points (x, y) in the xy-plane.Definition. The Jacobian determinant of the transformation is defined as

J(u, v) := det( ∂x

∂u(u, v)∂x∂v(u, v)

∂y∂u(u, v)

∂y∂v(u, v)

).

Theorem. Let x = x(u, v), y = y(u, v) be a one-to-one transformationfrom domain S in the uv-plane onto a domain D in the xy-plane and assu-me that x and y are continuously differentiable on S. If f(x, y) is integ-rable over D then g(u, v) := f

((x(u, v), y(u, v)

)∣∣J(u, v)∣∣ is integrable over Sand ∫∫

Df(x, y)dxdy =

∫∫S

f((x(u, v), y(u, v)

)∣∣J(u, v)∣∣dudv.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 74 / 239

Page 71: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Change to polar coordinatesEach point P with Cartesian coordinates (x, y) can be located by its polarcoordinates [r, θ].

[ r, ]θ

x

y

θ

r

(x,y)

r: distance from the origin.θ: angle with the positive direction ofthe x axis.

x = r cos θ, r =√

x2 + y2,

y = r sin θ, tan θ = y/x.

d

r+drr x

dr

dA

rd

θ

θ

θ

x

y y

x x+dx

dA

y+dy

ydy

dx

dA = dx dy: area ofa small rectangle inCartesian coordi-nates.dA ≈ r dr dθ: cor-responding area inpolar coordinates.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 75 / 239

Page 72: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Integral transformation to polar coordinatesTransformation from polar coordinates [r, θ] to Cartesian coordinates (x, y):

x = r cos θ, y = r sin θ.

Jacobian determinant:

J(r, θ) = det(

cos θ −r sin θsin θ r cos θ

)= r.

Integral transformation:∫∫D

f(x, y)dxdy =

∫∫S

f(r cos θ, r sin θ

)r drdθ.

Examples for transformation of domains:

D :={(x, y)

∣∣ x2 + y2 ≤ a2} ⇐⇒ S :={[r, θ]

∣∣ 0 ≤ r ≤ a, 0 ≤ θ ≤ 2π};

D :={(x, y)

∣∣ b2 ≤ x2 + y2 ≤ a2, x, y ≥ 0}

⇐⇒ S :={[r, θ]

∣∣ b ≤ r ≤ a, 0 ≤ θ ≤ π/2}.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 76 / 239

Page 73: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Find∫∫

S xy dxdy, where S is the region in the first quadrant lying inside the disk withradius a and under the line y =

√3x.

Solution.∫∫S

xy dxdy =

∫ π/3

0

[∫ a

0r cos θ · r sin θ · r dr

]dθ

=

∫ π/3

0cos θ sin θ dθ

∫ a

0r3dr

=12

∫ π/3

0sin(2θ) dθ

[r44

]a

0

=a4

8

[−1

2 cos(2θ)]π/3

0

=a4

16(1 − cos(2π/3)

)=

332a4.

y= 3x

x

y

S

π/3

a

2 2x + y = a

2

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 77 / 239

Page 74: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Triple integrals

Let B := [a, b]× [c, d]× [p, q] ⊂ R3 be a rectangular box and f : B 7→ Rbe a bounded function. The triple integral∫∫∫

Bf(x, y, z)dxdydz

can be defined as the limit of Riemann sums corresponding to partitions ofB into subboxes. Properties are analogous to those of double integrals.

Let D ⊂ R3 be a bounded domain. Then∫∫∫D

1dxdydz = λ(D),

where λ(D) is the volume of D.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 78 / 239

Page 75: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleThree points are chosen randomly from the interval [a, b], a < b. Find the probabilitythat the third point lies between the first two chosen points.

Solution. Geometric probability. Denote by x, y and z the coordinates of the chosenpoints, respectively. These three values represent a single point (x, y, z) of the cubeC := [a, b]3 having volume λ(C) = (b − a)3. Points satisfying the conditions belong tothe set

S :={(x, y, z) ∈ [a, b]3

∣∣ x < z < y or y < z < x}.

The required probability: P = λ(S)/λ(C) = λ(S)/(b − a)3.

By symmetry:

λ(S) =∫∫∫

S1dxdydz = 2

∫ b

a

∫ y

a

∫ z

a1dxdzdy = 2

∫ b

a

∫ y

a[x]x=z

x=a dzdy

= 2∫ b

a

∫ y

a(z − a)dzdy = 2

∫ b

a

[(z − a)2

2

]z=y

z=ady =

∫ b

a(y − a)2dy

=

[(y − a)3

3

]y=b

y=a=

(b − a)33 .

Hence, P = 1/3.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 79 / 239

Page 76: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Change of variables for triple integralsLet f : D ⊂ R3 7→ R an integrable function and suppose x, y and z areexpressed as

x = x(u, v,w), y = y(u, v,w), z = z(u, v,w).These equations define a transformation from points (u, v,w) of theuvw-space to points (x, y, z) in the xyz-space. The Jacobian determinant:

J(u, v,w) := det

∂x∂u(u, v,w)

∂x∂v(u, v,w)

∂x∂w(u, v,w)

∂y∂u(u, v,w)

∂y∂v(u, v,w)

∂y∂w(u, v,w)

∂z∂u(u, v,w)

∂z∂v(u, v,w)

∂z∂w(u, v,w)

.

If the transformation is one-to-one from the domain S in the uvw-spaceonto a domain D in the xyz-space, the defining functions are continuouslydifferentiable and f : D ⊆ R3 7→ R is integrable over D then∫∫∫

Df(x, y, z)dxdydz =

∫∫∫S

g(u, v,w)∣∣J(u, v,w)∣∣dudvdw,

where g(u, v,w) := f(x(u, v,w), y(u, v,w), z(u, v,w)

).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 80 / 239

Page 77: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Cylindrical coordinatesEach point P with Cartesian coordinates (x, y, z) can be located by itscylindrical coordinates [r, θ, z].

[ θ ]r, , z

x

z

y

❘●

z

y

x

d

(x, y, z)

r: distance from the origin ofthe projection in the xy-plane.

θ: angle with the positive di-rection of the x axis in thexy-plane.

x = r cos θ,y = r sin θ,z = z.

Jacobian determinant:

J(r, θ, z) = det

cos θ −r sin θ 0sin θ r cos θ 0

0 0 1

= r.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 81 / 239

Page 78: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Spherical coordinatesEach point P with Cartesian coordinates (x, y, z) can also be located by itsspherical coordinates [ρ, ϕ, θ].

x

z

y

❘●

(x,y,z)

z

y

x

φ

ρ

[ρ,φ,θ]

ρ: distance from the origin.ϕ: angle with the positive di-rection of the z axis.θ: angle with the positive di-rection of the x axis in thexy-plane.

x = ρ sinϕ cos θ,y = ρ sinϕ sin θ,z = ρ cosϕ.

Jacobian determinant:

J(ρ, ϕ, θ) = det

sinϕ cos θ ρ cosϕ cos θ −ρ sinϕ sin θsinϕ sin θ ρ cosϕ sin θ ρ sinϕ cos θ

cosϕ −ρ cosϕ 0

= ρ2 sinϕ.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 82 / 239

Page 79: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind the integral∫∫∫

Sx2z2 dxdydz, where S :=

{(x, y, z)

∣∣ x, y > 0, x2 + y2 + z2 ≤ a2}, a > 0.

Solution. In the spherical coordinate system S can be expressed as

D :={[ρ, ϕ, θ]

∣∣ 0 ≤ ρ ≤ a, 0 ≤ ϕ ≤ π, 0 ≤ θ ≤ π/2}.

Thus ∫∫∫S

x2z2 dxdydz =

∫∫∫D

(ρ sinϕ cos θ

)2(ρ cosϕ

)2(ρ2 sinϕ

)dρ dϕ dθ

=

∫ π/2

0

∫ π

0

∫ a

0ρ6 sin3ϕ cos2θ cos2ϕ dρ dϕ dθ

=

(∫ a

0ρ6dρ

)(∫ π

0sin3ϕ cos2ϕ dϕ

)(∫ π/2

0cos2θ dθ

)

=a7

7

[115 cos3ϕ(3 cos2ϕ− 5)

]ϕ=π

ϕ=0

2 +14 sin(2θ)

]θ=π/2

θ=0

=a7

7415

π

4 =a7π

105 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 83 / 239

Page 80: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

The Laplace integral

Definition. Let f : [0,∞[ 7→ R be an arbitrary function. Then the Lapla-ce integral

F(s) :=∫ ∞

0f(t)e−stdt

is called the Laplace transform of f(t) provided the integral exists.

Remarks.a) s is a complex number of form s = σ + iω, where i2 = −1.b) In practical problems t usually represents the time, so f(t) is a

time-dependent quantity.c) The usual shorthand notation for the Laplace integral of f is

L[f]:= F(s).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 85 / 239

Page 81: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExamplesUnit step function u−1(t) (Heaviside step function)

u−1(t) :={

1, t > 0;0, t < 0.

L[u−1(t)

]=

∫ ∞

0u−1(t)e−stdt =: U−1(s).

U−1(s) =∫ ∞

0e−stdt =

[−e−st

s

]t=∞

t=0=

1s if σ > 0.

Decaying exponential e−αt (α > 0)

L[e−αt] =∫ ∞

0e−αte−stdt =

∫ ∞

0e−(s+α)tdt =

[−e−(s+α)t

s + α

]t=∞

t=0=

1s + α

if σ > −α.

Simple periodic function eiωt (ω ∈ R)

L[eiωt] =∫ ∞

0eiωte−stdt =

∫ ∞

0e−(s−iω)tdt =

[−e−(s−iω)t

s − iω

]t=∞

t=0=

1s − iω if σ > 0.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 86 / 239

Page 82: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExamplesSinusoid cos(ωt) (0 < ω ∈ R)

L[

cos(ωt)]=

∫ ∞

0cos(ωt)e−stdt, where cos(ωt) = eiωt + e−iωt

2 .

L[

cos(ωt)]=

12

∫ ∞

0

(eiωt + e−iωt)e−stdt = 1

2

[∫ ∞

0eiωte−stdt +

∫ ∞

0e−iωte−stdt

]=

12

[1

s − iω +1

s + iω

]=

ss2 + ω2 if σ > 0.

Ramp function u−2(t) := tu−1(t)

L[u−2(t)

]=

∫ ∞

0u−2(t) e−stdt =

∫ ∞

0te−stdt =: U−2(s).

Integration by parts

U−2(s) =∫ ∞

0te−stdt =

[− te−st

s

]t=∞

t=0−∫ ∞

0

(−e−st

s

)dt

= 0 −[−e−st

s2

]t=∞

t=0=

1s2 if σ > 0.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 87 / 239

Page 83: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the Laplace transform ITheorem (Linearity). If α and β are constants or are independent of sand t, and f(t) and g(t) are transformable with Laplace transforms F(s)and G(s), respectively, then

L[αf(t) + βg(t)

]= αL

[f(t)

]+ βL

[g(t)

]= αF(s) + βG(s).

Theorem (Translation in time). If the Laplace transform of f(t) is F(s)and a is a positive real number, then the Laplace transform of the transla-ted function f(t − a)u−1(t − a) is

L[f(t − a)u−1(t − a)

]= e−asF(s).

Theorem (Complex differentiation). If the Laplace transform of f(t) isF(s), then

L[tf(t)

]= − d

dsF(s).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 88 / 239

Page 84: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Examples1.

L[

cos(ωt)]=

ss2 + ω2 , so L

[t cos(ωt)

]= − d

ds

(s

s2 + ω2

)=

s2 − ω2

(s2 + ω2)2.

2.L[e−αt] = 1

s + αso L

[te−αt] = − d

ds

(1

s + α

)=

1(s + α)2

.

3. UsingL[u−1(t)

]=

1s

one has

L[u−2(t)

]= L

[tu−1(t)

]= − d

ds

(1s

)=

1s2 ;

L[u−3(t)

]= L

[tu−2(t)

]= L

[t2u−1(t)

]− d

ds

(1s2

)=

2s3 =

2!s3 ;

L[u−4(t)

]= L

[tu−3(t)

]= L

[t3u−1(t)

]− d

ds

(2s3

)=

6s4 =

3!s4 .

In general:L[u−(n+1)(t)

]= L

[tnu−1(t)

]=

n!sn+1 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 89 / 239

Page 85: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the Laplace transform IITheorem (Translation in the s domain). If the Laplace transform of f(t)is F(s) and a is a complex number, then

L[eatf(t)

]= F(s − a).

Theorem (Real differentiation). If the Laplace transform of f(t) is F(s)and f′(t) = d

dt f(t) = Df(t) is transformable, thenL[f′(t)

]= sF(s)− f(0+),

where f(0+) is the right limit of f at 0, that isf(0+) := lim

x→0x>0

f(t).

The transform of the second derivative f′′(t) = d2

dt2 f(t) = D2f(t) is

L[f′′(t)

]= s2F(s)− sf(0+)− f′(0+).

In general, the transform of the nth derivative f(n)(t)= dndtn f(t)=Dnf(t) is

L[f(n)(t)

]=snF(s)−sn−1f(0+)−sn−2f′(0+)−· · ·−sf(n−2)(0+)− f(n−1)(0+).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 90 / 239

Page 86: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the Laplace transform IIITheorem (Real integration). If the Laplace transform of f(t) is F(s), itsintegral

D−1f(t) :=∫ t

0f(s)ds + D−1f(0+), where D−1f(0+) := lim

t→0t>0

∫ t

0f(s)ds,

is transformable and its Laplace transform is

L[D−1f(t)

]=

F(s)s +

D−1f(0+)s .

The transform of the second integral is

L[D−2f(t)

]=

F(s)s2 +

D−1f(0+)s2 +

D−2f(0+)s .

In general,

L[D−nf(t)

]=

F(s)sn +

D−1f(0+)sn + · · ·+ D−nf(0+)

s .

In what follows, instead of f(0+) simply f(0) will be used.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 91 / 239

Page 87: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Examples1.

L[

cos(ωt)]=

ss2 + ω2 , and sin(ωt) = − 1

ωcos′(ωt).

Thus

L[

sin(ωt)]= − 1

ωL[

cos′(ωt)]= − 1

ω

(sL[

cos(ωt)]− cos(0)

)= − 1

ω

(s2

s2 + ω2 − 1)

s2 + ω2 .

2.L[e−αt cos(ωt)

]=

s + α

(s + α)2 + ω2 .

3. Laplace transformation by MATLAB

>> syms A W t>> f=sin(W*t);>> F=laplace(f,t)

F =W/(W^2 + t^2)

>> g=exp(-A*t)*cos(W*t);

>> G=laplace(g,t)

G =(A + t)/((A + t)^2 + W^2)

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 92 / 239

Page 88: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the Laplace transform IVTheorem (Final value). If f(t) and f′(t) are Laplace transformable, theLaplace transform of f(t) is F(s) and the limit of f(t) as t → ∞ exists, then

lims→0

F(s) = limt→∞

f(t).

Theorem (Initial value). If f(t) and f′(t) are Laplace transformable, theLaplace transform of f(t) is F(s) and the limit of sF(s) as s → ∞ exists,then

lims→∞

sF(s) = limt→0

f(t).

Theorem (Complex integration). If the Laplace transform of f(t) is F(s)and is f(t)/t has a limit as t → 0, t > 0, then

L[

f(t)t

]=

∫ ∞

0F(s)ds.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 93 / 239

Page 89: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example: series resistor-inductor-capacitor circuitv(t) or u(t): voltage at time t (V: Volt); I(t): current at time t (A: Amper)

Kirchhoff’s laws1 In traversing any closed loop, the sum of the voltage rises equals the sum of

voltage drops.2 The sum of currents entering the junction equals the sum of currents leaving it.

I(t)

+

u(t)

C

L

R

Element Quantity Voltage dropResistor (R) Resistance (Ohm) vR = RIInductor (L) Inductance (Henry) vL = L dI

dt =: LDICapacitor (C) Capacity (Farad) vC=

1C∫ t

0 I(τ)dτ+ Q0C =: I

CD

Q0: initial value of the charge of the capacitor.u(t): input voltage.

Kirchhoff’s law: vR(t) + vL(t) + vC(t) = u(t).

Corresponding equation: RI + LDI + ICD = u.

Output y(t): voltage drop at the capacitor vC(t).Equation for the output:

RCDy(t) + LCD2y(t) + y(t) = u(t) ⇐⇒ LC y′′(t) + RC y′(t) + y(t) = u(t).Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 94 / 239

Page 90: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Application of the Laplace transformSecond-order linear differential equation for the output voltage:

LC y′′(t) + RC y′(t) + y(t) = u(t).

Laplace transform of the equation:

L[LC y′′(t) + RC y′(t) + y(t)

]= L

[u(t)

].

Laplace transforms of the terms:

L[y(t)

]= Y(s), L

[RC y′(t)

]= RC

(sY(s)− y(0)

),

L[u(t)

]= U(s), L

[LC y′′(t)

]= LC

(s2Y(s)− sy(0)− y′(0)

).

Substitute the values into the equation:(LCs2 + RCs + 1

)Y(s)−

(LC sy(0) + LC y′(0) + RC y(0)

)= U(s).

Y(s) = 1LCs2 + RCs + 1︸ ︷︷ ︸

System transfer function

U(s) + LC sy(0) + LC y′(0) + RC y(0)LCs2 + RCs + 1︸ ︷︷ ︸

Initial condition component

.

Laplace transform of the solution:

Y(s) = U(s) + LC sy(0) + LC y′(0) + RC y(0)LCs2 + RCs + 1 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 95 / 239

Page 91: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Solution of the equationOriginal equation: LC y′′(t) + RC y′(t) + y(t) = u(t).Laplace transform of the solution:

Y(s) = U(s) + LC sy(0) + LC y′(0) + RC y(0)LCs2 + RCs + 1 .

The solution y(t) can be found by applying the inverse transform L−1,

y(t) = L−1[Y(s)] = L−1[

U(s) + LC sy(0) + LC y′(0) + RC y(0)LCs2 + RCs + 1

].

After inserting numerical values one can use table of Laplace transform pairs.Assume at t = 0 a direct current source of 1 V is turned on, that is y(0) = y′(0) = 0and u(t) is the unit step function, that is u(t) = u−1(t).

y(t) = L−1[Y(s)] = L−1

[1/(LC)

s(s2 + (R/L)s + 1/(LC)

)] = L−1

[ω2

s(s2 + 2ζωs + ω2

)] ,where ω := 1/

√LC and ζ :=

(R/2)

√C/L. From the table of Laplace transform pairs:

y(t) = 1 − e−ζωt√1 − ζ2

sin(ω√

1 − ζ2t + arccos ζ).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 96 / 239

Page 92: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Convolution integrals

Definition. Let f1, f2 : [0,∞[ 7→ R be arbitrary integrable functions. Thenthe convolution of f1 and f2 is defined as

f(t) := f1(t) ∗ f2(t) :=∫ ∞

0f1(τ)f2(t − τ)dτ.

Theorem. If f1(t) and f2(t) are transformable with Laplace transformsF1(s) and F2(s), respectively, then

L[f1(t) ∗ f2(t)

]= F1(s) · F2(s).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 97 / 239

Page 93: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Inverse transformIn many cases the Laplace transform F(s) of f(t) can be expressed as

F(s) = P(s)Q(s) =

aksk + ak−1sk−1 + · · ·+ a1s + a0sn + bn−1sn−1 + · · ·+ b1s + b0

, k < n.

Definition. The poles of F(s) are the roots s1, s2, . . . , sn of Q(s), givenP(si) = 0, i = 1, 2, . . . , n (no common roots of P(s) and Q(s)).One has to use the partial-fraction expansion of F(s). Four cases:

1 F(s) has first-order real poles.2 F(s) has multiple-order real poles.3 F(s) has a pairs of complex-conjugate poles.4 F(s) has repeated pairs of complex-conjugate poles.

Required Laplace transform pairs:

L−1[

1(s − α)n

]=

tn−1eαt

(n − 1)! .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 98 / 239

Page 94: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

First order real poles

F(s) = P(s)Q(s) =

P(s)(s − s1)(s − s2) · · · (s − sn)

=A1

s − s1+

A2s − s2

· · · Ans − sn

.

f(t) = L−1[F(s)] = A1es1t + A2es2t + · · ·+ Anesnt, t ≥ 0.

Example. Find the inverse Laplace transform of

F(s) = 10s3 + 7s2 + 10s .

Solution.

F(s) = 10s(s + 2)(s + 5) =

1s +

23 · 1

s + 5 − 53 · 1

s + 2 , f(t) = 1 +23e−5t − 5

3e−2t.

MATLAB solution>> syms s>> F=10/(s^3+7*s^2+10*s)

F =10/(s^3 + 7*s^2 + 10*s)

>> ilaplace(F,s)

ans =(2*exp(-5*s))/3 - (5*exp(-2*s))/3 + 1

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 99 / 239

Page 95: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Multiple-order real poles

F(s) = P(s)(s − s1)q1(s − s2)q2 · · · (s − sr)qr

=A11

s − s1+ · · ·+ A1q1

(s − s1)q1

+A21

s − s2+ · · ·+ A2q2

(s − s2)q2+ · · ·+ Ar1

s − sr+ · · ·+ Arqr

(s − sr)qr.

f(t) =L−1[F(s)] = A11es1t + A12tes1t · · ·+ A1q1tq1−1

(q1 − 1)!es1t

+ · · ·+ Ar1esrt + Ar2tesrt · · ·+ Arqrtqr−1

(qr − 1)!es1t, t ≥ 0.

Example. Find the inverse Laplace transform of

F(s) = s2 + s + 1s4 + 5s3 + 9s2 + 7s + 2 .

Solution.

F(s) = s2 + s + 1(s + 1)3(s + 2) =

3s + 1 − 2

(s + 1)2 +1

(s + 1)3 − 3s + 2 .

f(t) = 3e−t − 2te−t +t2

2 e−t − 3e−2t, t ≥ 0.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 100 / 239

Page 96: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Pairs of complex-conjugate poles

F(s) = P(s)(s2+2ζωs+ω2)(s−s3) · · · (s−sn)

=A1

s−s1+

A2s−s2

+ · · ·+ Ans−sn

,

s1 =−ζω+iω√

1−ζ2, s2 =−ζω−iω√

1−ζ2 ∈ C, s3, . . . , sn ∈ R.

F(s) = A1

s+ζω−iω√

1−ζ2+

A2

s+ζω+iω√

1−ζ2+

A3s−s3

+ · · ·+ Ans−sn

.

f(t) = A1e(−ζω+iω√

1−ζ2)t + A2e(−ζω−iω√

1−ζ2)t + A3es3t + · · ·+ Anesnt.

A1,A2 ∈ C: complex conjugates; A2, · · · ,An ∈ R.

f(t) = 2|A1|e−ζωt sin(ω√

1 − ζ2t + ϕ)+ A3es3t + · · ·+ Anesnt

= 2|A1|eσt sin(ωdt + ϕ

)+ A3es3t + · · ·+ Anesnt,

where σ := −ζω, ωd := ω√

1 − ζ2 and ϕ− π/2 is the angle of A1.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 101 / 239

Page 97: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleFind the inverse Laplace transform of

F(s) = 36ss3 + 6s2 + 21s + 26 .

Solution.

F(s) = 36s(s2 + 4s + 13)(s + 2) =

36s(s + 2 − 3i)(s + 2 + 3i)(s + 2)

=4 − 6i

s + 2 − 3i +4 + 6i

s + 2 + 3i −8

s + 2 .

f(t) = (4 − 6i)e(−2+3i)t + (4 + 6i)e(−2−3i)t − 8e−2t

= 4e−2t(

e3it+e−3it)

︸ ︷︷ ︸2 cos(3t)

−6ie−2t(

e3it−e−3it)

︸ ︷︷ ︸2i sin(3t)

−8e−2t = 4e−2t(2 cos(3t)+3 sin(3t)− 2)

=4√

13e−2t(

2√13

cos(3t)+ 3√13

sin(3t))−8e−2t

= 4√

13e−2t( sinϕ cos(3t) + cosϕ sin(3t))−8e−2t = 4

√13e−2t sin(3t + ϕ)−8e−2t,

where tanϕ = 2/3, so ϕ = 33.69◦.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 102 / 239

Page 98: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Imaginary poles

F(s) = P(s)(s2 + ω2)(s−s3) · · · (s−sn)

=A1

s−s1+

A2s−s2

+ · · ·+ Ans−sn

=A1

s−iω +A2

s+iω +A3

s−s3+ · · ·+ An

s−sn, s3, . . . , sn ∈ R.

f(t) = A1eiωt + A2e−iωt + A3es3t + · · ·+ Anesnt

= 2|A1| sin(ωt + ϕ

)+ A3es3t + · · ·+ Anesnt, t ≥ 0.

Example. Find the inverse Laplace transform of

F(s) = 20(s2 + 16)(s + 2) .

MATLAB solution>> syms s>> F=20/(s^2+16)/(s+2)

F =20/((s^2 + 16)*(s + 2))

>> ilaplace(F,s)

ans =exp(-2*s) - cos(4*s) + sin(4*s)/2

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 103 / 239

Page 99: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Repeated pairs of complex-conjugate polesRepeated pairs of complex-conjugate poles are treated similar to multiple-order real poles.Example. Find the inverse Laplace transform of

F(s) = 324s(s2 + 4s + 13)2(s + 2) =

324s(s + 2 − 3i)2(s + 2 + 3i)2(s + 2) .

Solution.

F(s) = 4 − 3is + 2 − 3i +

4 + 3is + 2 + 3i −

9 + 6i(s + 2 − 3i)2 − 9 − 6i

(s + 2 + 3i)2 − 8s + 2 .

f(t) = (4−3i)e(−2+3i)t+ (4+3i)e(−2−3i)t− (9+6i)te(−2+3i)t− (9−6i)te(−2−3i)t− 8e−2t

= e−2t((4 − 9t)

(e3it+e−3it

)− (3 + 6t)i

(e3it−e−3it

)− 8)

= 2e−2t((4 − 9t) cos(3t) + (3 + 6t) sin(3t)− 4)= 10e−2t

(45 cos(3t) + 3

5 sin(3t))

− 6√

13te−2t(

3√13

cos(3t)− 2√13

sin(3t))− 8e−2t

= 10e−2t sin(3t + ϕ) + 6√

13te−2t sin(3t − ψ)− 8e−2t, tanϕ =43 , tanψ =

32 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 104 / 239

Page 100: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

The Fourier transformDefinition. Let f : R 7→ R be an arbitrary function. Then the (exponen-tial) Fourier transform of f(t) is defined by the integral

F[f(t)

]:= f(ω) :=

∫ ∞

−∞f(t)eiωtdt

for those values of ω where the integral exists.

Remark. Let f be absolutely integrable, that is∫ ∞

−∞

∣∣f(t)∣∣dt < ∞.

Then ∣∣∣∣∫ ∞

−∞f(t)eiωtdt

∣∣∣∣ ≤ ∫ ∞

−∞

∣∣f(t)eiωt∣∣dt ≤∫ ∞

−∞

∣∣f(t)∣∣dt < ∞,

so the Fourier transform of f exists.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 106 / 239

Page 101: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Connection to the Laplace transformLet F±(s) denote the following Laplace transforms:

F±(s) :=∫ ∞

0f(±t)e−stdt.

Then for the Fourier transform f(ω) of f(t) we have

f(ω) = F+(−iω) + F−(iω).Example. Let f(t) := e−α|t|, t ∈ R, α > 0.

f(ω) =∫ ∞

−∞e−α|t|eiωtdt =

∫ 0

−∞e(α+iω)tdt +

∫ ∞

0e(−α+iω)tdt

=

[e(α+iω)t

α+ iω

]t=0

t=−∞+

[e(−α+iω)t

−α+ iω

]t=∞

t=0=

1α+ iω − 1

−α+ iω =2α

α2 + ω2 .

Corresponding Laplace transforms:

F+(s) = L[e−αt] = 1

s + α, F−(s) = L

[e−αt] = 1

s + α.

F+(−iω) + F−(iω) =1

−iω + α+

1iω + α

=2α

α2 + ω2 = f(ω).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 107 / 239

Page 102: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Connection to probability theoryDefinition. Let f(t) be the probability density function (PDF) of an abso-lutely continuous random variable X, that is

P(X ≤ x

)=

∫ x

−∞f(t)dt.

Then the Fourier transform f(ω) of f is called the characteristic function ofX.Remark. The characteristic function can be considered as the meanE[eiωX] of eiωX, since

f(ω) =∫ ∞

−∞f(t)eiωtdt = E

[eiωX].

Example. φ(t): PDF of the standard normal distribution.

φ(t) := 1√2π

e−t2/2, φ(ω) = e−ω2/2.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 108 / 239

Page 103: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Inverse Fourier transformTheorem. Assume that f is absolutely integrable and the same holds for f.Then f is continuous and

f(t) = 12π

∫ ∞

−∞f(ω)e−iωtdω

for all t, that is the function is uniquely defined by its Fourier transform.Remark. Fourier transformation gives a link between a function f(t) inthe time domain to its representation f(ω) in the frequency domain.

Theorem (Parseval equality). Assume that both f and f are absolutelyintegrable. Then ∫ ∞

−∞

∣∣f(t)∣∣2dt = 12π

∫ ∞

−∞

∣∣f(ω)∣∣2dω.

Further, if f and g are absolutely integrable functions such that the sameholds for the corresponding Fourier transforms f and g. Then∫ ∞

−∞f(t)g(t)dt = 1

∫ ∞

−∞f(ω)g(ω)dω.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 109 / 239

Page 104: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Physical meaningf(t): signal in the time domain (e.g. voltage across a resistor).Total energy contained in f(t) summed across all of time t:

Ef :=

∫ ∞

−∞

∣∣f(t)∣∣2dt.

Total energy of the Fourier transform f(ω) summed across all of itsfrequency components ω:

12π

∫ ∞

−∞

∣∣f(ω)∣∣2dω =

∫ ∞

−∞

∣∣f(2πν)∣∣2dν.

∣∣f(ω)∣∣2: energy spectral density of f.Parseval equality: the total energy contained in a signal f(t) summedacross all of time t is equal to the total energy of its Fourier transformf(ω) summed across all of its frequency components ω.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 110 / 239

Page 105: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleRectangular function (pulse function) of duration 2T

f(t) :={

1, |t| ≤ T;

0, |t| > T.f(ω) =

∫ T

−Teiωtdt =

[eiωt

]t=T

t=−T=

2 sin(ωT)

ω.

f(ω) = 2 sin(ωT)

ω= 2T sinc

(ωTπ

), where sinc(x) := sin(πx)

πx , x ∈ R.

−10 −5 0 5 10−0.2

0

0.2

0.4

0.6

0.8

1

Time domain

t

f(t)

−3 −2 −1 0 1 2 3−4

−2

0

2

4

6

8

10

Frequency domain

ω

f(ω

)

Pulse function and its Fourier transform for T = 5.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 111 / 239

Page 106: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleBandpass filter. Let 0 < w < W and

f(ω) ={

1, |ω| ∈ [w,W];

0 otherwise.f(t) = sin(Wt)

πt − sin(wt)πt .

−10 −5 0 5 10−0.2

0

0.2

0.4

0.6

0.8

1

Frequency domain

ω

f(ω

)

−3 −2 −1 0 1 2 3

−0.5

0

0.5

1

1.5

Time domain

t

f(t)

Bandpass filter and its inverse Fourier transform for w = 1, W = 5.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 112 / 239

Page 107: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the Fourier transform, ITheorem (Linearity). If α and β are constants or are independent of ωand t and the Fourier transforms F

[f(t)

]= f(ω) and F

[g(t)

]= g(ω) of

f(t) and g(t), respectively, exist, then

F[αf(t) + βg(t)

]= αF

[f(t)

]+ βF

[g(t)

]= αf(ω) + βg(ω).

Theorem (Derivative). If f(t) is continuous and piecewise differentiableand f′(t) is absolutely integrable then

f′(ω) = iωf(ω).

Theorem (Translation). If τ, a ∈ R, then

F[f(t − τ)] = eiωτF

[f(t)] = eiωτ f(ω),

F[eiatf(t)] = f(a + ω).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 113 / 239

Page 108: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the Fourier transform, IITheorem (Multiplication by t). Assume that tf(t) is absolutely integrab-le. Then

F[tf(t)

]= i d

dω f(ω).

Theorem (Similarity). Let 0 = a ∈ R. Then

F[f(t/a)

]= |a|f(aω).

Theorem (Convolution). Let f(t) be the convolution integral of two ab-solutely integrable functions f1(t) and f2(t), that is

f(t) := f1(t) ∗ f2(t) :=∫ ∞

−∞f1(τ)f2(t − τ)dτ.

Thenf(ω) = f1 ∗ f2(ω) = f1(ω) · f2(ω).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 114 / 239

Page 109: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Discrete-time Fourier transformf[n], n ∈ Z: a discrete-time (digital) signal.

f[n] can be obtained e.g. by sampling from a continuous function g(t),t ∈ R by sample time T, that is f[n] = g(nT).

Definition. The discrete-time Fourier transform of a digital signalf : Z 7→ R is defined as

F(ω) := F{

f[n]}:=

∞∑n=−∞

f[n]eiωn, −π ≤ ω ≤ π.

The corresponding discrete-time inverse Fourier transform is

f[n] = 12π

∫ π

−πF(ω)e−iωndω.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 115 / 239

Page 110: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Discrete Fourier transformf[n], n = 0, 1, . . . ,N − 1: a finite duration discrete-time signal.

Definition The discrete Fourier transform of a finite duration discrete-time signal f[n], n = 0, 1, . . . ,N − 1, is defined as

F[k] :=N−1∑n=0

f[n]eikω0n, ω0 :=2πN , k = 0, 1, . . . ,N − 1.

The corresponding inverse discrete Fourier transform is

f[n] := 1N

N−1∑k=0

F[k]e−ikω0n, n = 0, 1, . . . ,N − 1.

Remark. ω0 = 2πN is the fundamental frequency (one cycle per sequence,

1N Hz, 2π

N rad/s). We also consider the harmonics 2ω0, 3ω0, . . . , (N−1)ω0,and the DC component 0 = 0 · ω0.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 116 / 239

Page 111: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Matrix representationDiscrete Fourier transform of f[n], n = 0, 1, . . . ,N − 1:

F[k] :=N−1∑n=0

f[n]ei 2πN kn, k = 0, 1, . . . ,N − 1.

Matrix representation:F[0]F[1]F[2]

...F[N − 1]

=

1 1 1 1 · · · 11 W W2 W3 · · · WN−1

1 W2 W4 W6 · · · WN−2

1 W3 W6 W9 · · · WN−3

... ... ...1 WN−1 WN−2 WN−3 · · · W

f[0]f[1]f[2]...

f[N − 1]

,

where W := ei 2πN is the Nth complex unit root.

Fast Fourier transform (FFT) algorithms are based on the special structureof the multiplier matrix.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 117 / 239

Page 112: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleConsider the continuous signal:

f(t) = 6︸︷︷︸DC

+ 2 cos(2πt − π/2

)︸ ︷︷ ︸1Hz

+ 3 cos(4πt)︸ ︷︷ ︸

2Hz

, t ≥ 0.

0 0.5 1 1.5 2 2.5−4

−2

0

2

4

6

8

10

t

f(t)

Sample f(t) with frequency 4Hz (i.e. 4times a second, sampling time T = 1/4)from t = 0 to t = 3/4 (N = 4 values).

Sampled signal at time pointst = nT = n/4, n = 0, 1, 2, 3:

f[n] = 6 + 2 cos(πn/2 − π/2

)+ 3 cos

(πn).

f[0] = 9, f[1] = 5, f[2] = 9, f[3] = 1.

Fourier transform:

F[k] =3∑

n=0f[n]ei π2 kn, k = 0, 1, 2, 3.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 118 / 239

Page 113: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Continuous signal: f(t) = 6 + 2 cos(2πt − π/2

)+ 3 cos

(4πt), t ≥ 0.

Sampled signal: f[n] = 6 + 2 cos(πn/2 − π/2

)+ 3 cos

(πn), n = 0, 1, 2, 3.

Fourier transform: F[k] =3∑

n=0f[n]ei π2 kn, k = 0, 1, 2, 3.

W := ei π2 = i, so W2 = −1, W3 = −i, W4 = 1.

Matrix representation:F[0]F[1]F[2]F[3]

=

1 1 1 11 W W2 W3

1 W2 W4 W6

1 W3 W6 W9

f[0]f[1]f[2]f[3]

=

1 1 1 11 i −1 −i1 −1 1 −11 −i −1 i

9591

=

244i12−4i

.

The magnitudes∣∣F[k]∣∣ of the coefficients: 24, 4, 12, 4.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 119 / 239

Page 114: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

z-transform of digital signalsDefinition. Let f : Z 7→ C be a discrete-time (digital) signal such thatf[n] = 0 for n = −1,−2, . . .. The (unilateral) z-transform of f is defined as

F(z) := Z{

f[n]}:=

∞∑n=0

f[n]z−n

for those values of z ∈ C where the series is convergent.Remark. z-transformation is a one-to-one correspondence between f[n]and Z

{f[n]

}.

Example. Let a ∈ C and f[n] := an, n = 0, 1, 2, . . ., and f[n] = 0, n = −1,−2, . . ..

Z{

an} =∞∑

n=0anz−n =

∞∑n=0

(az)n

=z

z − a .

Special case (unit step function): f[n] ≡ 1 = 1n =: u[n], n = 0, 1, 2, . . ., andf[n] = 0, n = −1,−2, . . ..

Z{

u[n]}=

zz − 1 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 121 / 239

Page 115: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Motivation: sampling

t

t

t

pf*(t)

γ1

f(t)

p(t)

T γ

f(t): continuous signal.

p(t): sampling pulse train with mag-nitude 1/γ and period T.Area corresponding to each pulseequals 1.

Sampled function:

f∗p(t) = p(t) · f(t).

γ → 0: Dirac impulse train (idealsampler)

δT(t) :=∞∑

n=−∞δ(t − nT).

δ(t): Dirac delta function.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 122 / 239

Page 116: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Dirac delta functionu−1(t): unit step function.δ(t, γ): pulse signal with magnitude 1/γ and length γ

δ(t, γ) := u−1(t)− u−1(t − γ)

γ.

Delta function:

δ(t) := limγ→0γ>0

δ(t, γ) = limγ→0γ>0

u−1(t)− u−1(t − γ)

γ.

Formally:

δ(t) ={∞ t = 0;0 t = 0,

or δ(t) = ”u ′−1(t)”.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 123 / 239

Page 117: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the Dirac delta functionf(t): continuous signal.∫ ∞

−∞f(τ)δ(τ − t0)dτ =

∫ t2

t1f(τ)δ(τ − t0)dτ = f(t0), t0 ∈ R,

for all intervals t0 ∈ [t1, t2] ⊆ R. Special case:∫ ∞

−∞δ(τ)dτ =

∫ t2

t1δ(τ)dτ =

{1, 0 ∈ [t1, t2];0, otherwise.

Laplace transform:

F(s, γ) := L[δ(t, γ)

]=

(L[u−1(t)

]− L

[u−1(t − γ)

])=

1 − e−γs

γs .

F(s) := L[δ(t)

]= lim

γ→0γ>0

F(s, γ) = limγ→0γ>0

1 − e−γs

γs = 1.

The Laplace transform of the Dirac delta function: L[δ(t)

]= 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 124 / 239

Page 118: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Laplace transform of the sampled signalf(t): continuous signal with f(t) = 0 for t < 0.

δT(t): ideal sampler.

Sampled signal:

f∗δT(t) = f(t) · δT(t) =∞∑

n=0f(t)δ(t − nT).

Laplace transform of the sampled signal:

F∗δT(s) := L

[f∗δT(t)

]=

∫ ∞

0f(t)δT(t)e−stdt

=∞∑

n=0

∫ ∞

0f(t)δ(t − nT)e−stdt =

∞∑n=0

f(nT)e−snT.

Let z = esT, so s = 1T ln z. Then

F∗δT(z) := F∗

δT

( 1T ln z

)=

∞∑n=0

f(nT)z−n = Z{

f[nT]}.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 125 / 239

Page 119: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the z-transform, ITheorem. Let f, g : Z 7→ C be digital signals with f[n] = g[n] = 0 forn = −1,−2, . . ., and denote by F(z) and G(z) the corresponding z-trans-forms.

(Linearity) If α and β are constants or are independent of z and nthen

Z{αf[n] + βg[n]

}= αZ

{f[n]

}+ βZ

{g[n]

}= αF(z) + βG(z).

(Convolution) Let h[k] denote the convolution of f[k] and g[k], that isfor k = 0, 1, 2, . . . we have

h[k] = (f ∗ g)[k] :=∞∑

ℓ=−∞f[ℓ]g[k − ℓ] =

k∑ℓ=0

f[ℓ]g[k − ℓ].

Then

H(z) := Z{

h[k]}= Z

{f[k]

}· Z

{g[k]

}= F(z) · G(z).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 126 / 239

Page 120: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of the z-transform, IITheorem. Let f : Z 7→ C be a digital signal with f[n] = 0 forn = −1,−2, . . ., and denote by F(z) the z-transform of f.

(Delay property)

Z{

f[n − 1]}= z−1Z

{f[n]

}= z−1F(z).

In general, for k ∈ N we have

Z{

f[n − k]}= z−kZ

{f[n]

}= z−kF(z).

(Advance property)

Z{

f[n + 1]}= zZ

{f[n]

}− zf[0] = zF(z)− zf[0].

In general, for k ∈ N we have

Z{

f[n + k]}= zkZ

{f[n]

}−

k−1∑ℓ=0

f[ℓ]zk−ℓ = zkF(z)−k−1∑ℓ=0

f[ℓ]zk−ℓ.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 127 / 239

Page 121: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Examples1. The discrete Dirac function or Kronecker delta

δ0[n] :={

1, n = 0,0, n = 0.

The corresponding z-transform:

Z{δ0[n]

}=

∞∑n=0

δ0[n]z−n = 1.

2. Find the z-transform of f[n] := nan, n = 0, 1, 2, . . . , 0 = a ∈ C.

Solution.Z{

an} =∞∑

n=0anz−n =

zz − a

By taking the derivatives of both sides with respect to z one has∞∑

n=1−nanz−n−1 = − a

(z − a)2 ⇐⇒∞∑

n=1nanz−n =

za(z − a)2 .

ThusZ{

nan} =za

(z − a)2 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 128 / 239

Page 122: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Examples1. Let 0 = a ∈ C, 0 ≤ k ∈ Z, and f[n] :=

(nk)an.

Z{

f[n]}= Z

{(nk

)an

}=

zak

(z − a)k+1 .

Special case: a = 1.

Z

{(nk

)}=

z(z − 1)k+1 .

More special cases (k = 1, 2):

Z{

n}=

z(z − 1)2 , Z

{n(n − 1)/2

}=

z(z − 1)3 .

HenceZ{

n2} =2z

(z − 1)3 +z

(z − 1)2 =z(z + 1)(z − 1)3 .

2. z-transformation by MATLAB>> syms n k a z>> f=nchoosek(n,k)*a^n;>> F=ztrans(f,n,z)

F =piecewise([k == 0, z/(a*(z/a - 1))], [0 < k, z/(a*(z/a - 1)^(k + 1))], [k < 0, 0])

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 129 / 239

Page 123: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example: the Fibonacci sequenceFibonacci sequence: every number after the first two is the sum of the two precedingones.

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, . . .f[n]: the n-th term of the Fibonacci sequence.The recurrence equation defining the sequence:

f[n + 2] = f[n + 1] + f[n], f[0] = f[1] = 1.The solution of the above equation gives the closed form of f[n].

z-transform of the equation:Z{

f[n + 2]}= Z

{f[n + 1] + f[n]

}⇐⇒ Z

{f[n + 2]

}= Z

{f[n + 1]

}+ Z

{f[n]}.

Let F(z) := Z{

f[n]}

. z-transforms of the other components:Z{

f[n + 1]}= zF(z)− zf[0] = zF(z)− z;

Z{

f[n + 2]}= z2F(z)− z2f[0]− zf[1] = z2F(z)− z2 − z.

Substitute the expressions into the equation:z2F(z)− z2 − z = zF(z)− z + F(z) ⇐⇒

(z2 − z − 1

)F(z) = z2.

z-transform of the solution:F(z) = z2

z2 − z − 1 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 130 / 239

Page 124: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example: solution of the Fibonacci recurrence equationRecurrence equation:

f[n + 2] = f[n + 1] + f[n], f[0] = f[1] = 1.z-transform Z

{f[n]}=: F(z) of the solution:

F(z) = z2

z2 − z − 1 = z z(z − 1+

√5

2

)(z − 1−

√5

2

) .Partial fraction decomposition:

F(z) = z(√

5 + 12√

51

z − 1+√

52

+

√5 − 12√

51

z − 1−√

52

)

=

√5 + 12√

5z

z − 1+√

52

+

√5 − 12√

5z

z − 1−√

52

.

As Z{

an} = zz−a for a ∈ C, we have

F(z) =√

5 + 12√

5Z{(

1 +√

52

)n}+

√5 − 12√

5Z{(

1 −√

52

)n}.

The solution is

f[n] =√

5 + 12√

5

(1 +

√5

2

)n+

√5 − 12√

5

(1 −

√5

2

)n.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 131 / 239

Page 125: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Linear difference equations with constant coefficientsThe Fibonacci recurrence equation

f[n + 2]− f[n + 1]− f[n] = 0

is a special linear difference equations with constant coefficients.General form of a linear difference equation of order k with constantcoefficients:

akf[n + k] + ak−1f[n + k − 1] + · · ·+ a1f[n + 1] + a0f[n] = g[n],

where a0, a1, . . . , ak ∈ C, ak = 0, and g[n] is a given digital signal.Homogeneous equation: g[n] ≡ 0.Initial conditions:

f[0] = b0, f[1] = b1, . . . , f[k − 1] = bk−1, b0, b1, . . . , bk−1 ∈ C.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 132 / 239

Page 126: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

General solutionEquation: akf[n+k] + ak−1f[n+k−1] + · · ·+ a1f[n+1] + a0f[n] = g[n].Initial conditions: f[0] = b0, f[1] = b1, . . . , f[k − 1] = bk−1.z-transform of the equation:

akZ{

f[n+k]}+ ak−1Z

{f[n+k−1]

}+ · · ·+ a0Z

{f[n]}= Z{g[n]}.

Let F(z) := Z{

f[n]}

and G(z) := Z{

g[n]}

. z-transforms of the components:

Z{

f[n + j]}= z jZ

{f[n]}−

j−1∑ℓ=0

z j−ℓf[ℓ] = z jF(z)−j−1∑ℓ=0

z j−ℓbℓ, j = 1, 2, . . . , k

The resulting form of the transformed equation:

F(z)( k∑

j=0ajz j)−

k∑j=1

aj

j−1∑ℓ=0

z j−ℓbℓ = G(z);

F(z)( k∑

j=0ajz j)− z

k−1∑ℓ=0

( k∑j=ℓ+1

ajbj=ℓ−1

)z ℓ = G(z).

z-transform of the solution:

F(z) =( k∑

j=0ajz j)−1

(G(z) + z

k−1∑ℓ=0

( k∑j=ℓ+1

ajbj=ℓ−1

)z ℓ

).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 133 / 239

Page 127: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleSolve the following difference equation:

f[n + 2]− 2f[n + 1]− 3f[n] = 6n + 6, f[0] = 0, f[1] = 1.

Solution. Let F(z) := Z{

f[n]}

. The z-transform of g[n] = 6n + 6 is

G(z) := Z{

g[n]}= 6(

z(z − 1)2 +

zz − 1

)=

6z2

(z − 1)2 .

Further,

Z{

f[n + 1]}= zF(z)− zf[0] = zF(z);

Z{

f[n + 2]}= z2F(z)− z2f[0]− zf[1] = z2F(z)− z.

The transformed equation:

z2F(z)− z − 2zF(z)− 3F(z) = 6z2

(z − 1)2 ⇐⇒(z2 − 2z − 3

)F(z) = z + 6z2

(z − 1)2 .

z-transform of the solution:

F(z) = z (z − 1)2 + 6z(z2 − 2z − 3)(z − 1)2 = z z2 + 4z + 1

(z − 3)(z + 1)(z − 1)2 .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 134 / 239

Page 128: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

The difference equation to be solved:

f[n + 2]− 2f[n + 1]− 3f[n] = 6n + 6, f[0] = 0, f[1] = 1.

Partial fraction decomposition of the z-transform of the solution:

F(z) = z z2 + 4z + 1(z − 3)(z + 1)(z − 1)2 =

118

zz − 3 +

18

zz + 1 − 3

2z

z − 1 − 32

z(z − 1)2 .

AsZ{

n}=

z(z − 1)2 and Z

{an} =

zz − a , a ∈ C,

the solution isf[n] = 11

8 3n +18 (−1)n − 3

2 − 32n.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 135 / 239

Page 129: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Fundamental notions of information theorySource alphabet: A finite set X = {x1, x2, . . . , xn}, n ≥ 2. Elements ofthe source alphabet are called source symbols. Can be considered as valu-es of a discrete random variable X called source.X ∗: set of strings of symbols from X . Elements of X ∗ are called sourcemessages.Code alphabet: Finite set Y = {y1, y2, . . . , ys}, s ≥ 2. Elements of Y arecalled code symbols.Y∗: set of strings of symbols from Y. Elements of Y∗ are called codemessages.Encoding or code: a mapping f : X → Y∗. s = 2: binary code.The range K = f(X ) of the mapping f is also referred as code. Elements ofK are called codewords.A code f is called variable-length code if the corresponding code words areof different lengths.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 137 / 239

Page 130: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Uniquely decodable codesDefinition. A code f : X → Y∗ is uniquely decodable, if for all u ∈ X ∗,v ∈ X ∗ where u = u1u2 . . . uk, v = v1v2 . . . vℓ and u = v, one has

f(u1)f(u2) . . . f(uk) = f(v1)f(v2) . . . f(vℓ).In other words, any encoded string in a uniquely decodable code has onlyone possible source string producing it.

Examples.1. X ={a, b, c}, Y={0, 1} and f(a)=1, f(b)=01, f(c)=10110.The encoder f is non-singular, that is every element of X maps into adifferent string in Y∗, but the code is not uniquely decodable. Forinstance, f(c)f(a) = 101101 = f(a)f(b)f(a)f(b).2. X ={a, b, c}, Y={0, 1} and f(a)=1, f(b)=10, f(c)=100.The code is uniquely decodeble as 1 always indicates the first bit of a newcodeword.3. X ={a, b, c}, Y={0, 1} and f(a)=1, f(b)=00, f(c)=01.The code is uniquely decodable.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 138 / 239

Page 131: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Prefix codesDefinition. A code f is called a prefix code or an instantaneous code if nocodeword is a prefix of any other codeword.Remarks.1. Prefix codes are uniquely decodable.2. A code with code words of fixed length is uniquely decodable if allcodewords are different.

Example.1. X ={a, b, c}, Y={0, 1} and f(a)=1, f(b)=00, f(c)=01.The code is prefix.2. X ={a, b, c}, Y={0, 1} and f(a)=1, f(b)=10, f(c)=100.The code is not prefix, but uniquely decodable.3. X = {a, b, c, d, e, f, g}, Y = {0, 1, 2} and f(a)=0, f(b)=10,f(c)=11, f(d)=20, f(e)=21, f(f)=220, f(g)=221.The code is prefix.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 139 / 239

Page 132: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Code treesPrefix codes can be represented by s-ary trees where the branches of thetree represent the symbols of the corresponding codewords. Then each co-deword is represented by a leaf on the tree. The path from the root tracesout the symbols of the codeword.For binary codes one deals with binary trees where e.g. 0 corresponds to abranch going “up”, whereas 1 corresponds to a branch going “down”.

Example. Give the corresponding code trees.1. Y = {0, 1}, K = {0, 100, 1010, 1011, 110, 111}.2. Y = {0, 1, 2}, K = {0, 10, 11, 20, 21, 220, 221}.

Codeword lengths∣∣f(x)∣∣: codeword length of the code f(x) of the source symbol x ∈ X . Inwhat follows, L denotes the set of codeword lengths of a code f.Codeword lengths cannot be arbitrary. E.g. there is no binary code of asource alphabet of 4 letters having codeword lengths {1, 2, 2, 2}.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 140 / 239

Page 133: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

McMillan and Kraft inequalitiesTheorem (McMillan). For any uniquely decodable code f : X → Y∗ overan alphabet of size s, inequality

n∑i=1

s−|f(xi)| ≤ 1

holds.

Theorem (Kraft). If the positive integers L1, L2, . . . , Ln satisfyn∑

i=1s−Li ≤ 1,

then there exists a prefix code f such that∣∣f(xi)∣∣ = Li, i = 1, 2, . . . , n.

Remark. McMillan and Kraft inequalities imply that for any uniquely de-codable code there exists a prefix code having the same codeword lengths.Thus, it suffices to consider only prefix codes.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 141 / 239

Page 134: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Measure of information I.Hartley (1928): the identification of a particular element of a finite set Xof n elements requires

I = log2 namount of information.Heuristics. If n = 2k, then the elements of X can be represented with binary sequencesof length k = log2 n. If log2 n ∈ Z, then the number of required binary digits is thesmallest integer not smaller than log2 n

(⌈log2 n

⌉). The binary representation of a block

of elements of X of length m (the number of such blocks is nm), requires a length k sa-tisfying 2k−1 < nm ≤ 2k. Thus, the average length K = k/m of the representation of asingle symbol of X satisfies log2 n < K ≤ log2 n + 1/m. In this way, the lower boundlog2 n can be arbitrarily approximated.

The formula defines the information content as the lower bound of thelength of binary representations. The information content is measured inbits. The identification of the symbols of a set of two elements requires 1bit of information.Problem: Hartley assumes that all elements of X are equally likely.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 142 / 239

Page 135: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Measure of information II.Shannon (1948): The amount of information provided by the occurrenceof an event A with probability P(A) equals

I(A) = log21

P(A) = − log2 P(A).

Heuristics. Requirements on the amount of information I(A).If P(A) ≤ P(B), then I(A) ≥ I(B).Corollary: I(A) depends only on the probability P(A), that is I(A) = g

(P(A)

).

In case of mutual occurrence of two independent events the amounts of informa-tion should be added, that is if P(A · B)=P(A)P(B), then I(A · B)= I(A)+I(B).Hence, g(p · q) = g(p) + g(q), p, q ∈]0, 1].If P(A) = 1/2, then I(A) := 1, that is g(1/2) = 1.

Theorem. If g : [0, 1] → R is a function satisfyinga) g(p) ≥ g(q), if 0 < p ≤ q ≤ 1;b) g(p · q) = g(p) + g(q), p, q ∈]0, 1];c) g(1/2) = 1,

theng(p) = log2

1p , p ∈]0, 1].

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 143 / 239

Page 136: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Measure of information III.Connection between the two definitions:If all elements of X are equally likely (occur with probability 1/n), theneach element provides log2 n information.

Remark. In what follows, for a ≥ 0 and b > 0 we have

0 log20a = 0 log2

a0 = 0; b log2

b0 = +∞; b log2

0b = −∞.

X: a discrete random variable with alphabet X .p(x): the probability of the source symbol x ∈ X , that is

p(x) := P(X = x), x ∈ X .

The average amount of information provided by a single value of X isn∑

i=1p(xi)I(X = xi) = −

n∑i=1

p(xi) log2 p(xi) = E(− log2 p(X)

).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 144 / 239

Page 137: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Entropy, average codeword lengthDefinition. The entropy H(X) of a discrete random variable X with ran-ge (alphabet) X = {x1, x2, · · · , xn} is defined as

H(X) := E(− log2 p(X)

)= −

n∑i=1

p(xi) log2 p(xi).

Remark. The same formula defines the entropy H(X ) of a source alpha-bet X with distribution

P :={

p(x1), p(x2), . . . , p(xn)},

that isH(X ) := −

n∑i=1

p(xi) log2 p(xi).

Definition. The average codeword length E(f) of a code f : X → Y∗ isdefined as

E(f) := E∣∣f(X)∣∣ = n∑

i=1p(xi)

∣∣f(xi)∣∣.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 145 / 239

Page 138: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Examples1. X ={a, b, c}, Y={0, 1}, and f(a)=1, f(b)=00, f(c)=01.Distribution: p(a)=0.6, p(b)=0.3, p(c)=0.1.Short notation: s=2, K={1, 00, 01}, L={1, 2, 2}, P={0.6, 0.3, 0.1}.

H(X) = −0.6 · log2 0.6 − 0.3 · log2 0.3 − 0.1 · log2 0.1 ≈ 1.295;E(f) = 0.6 · 1 + 0.3 · 2 + 0.1 · 2 = 1.4

2. s = 2, L = {1, 3, 3, 3, 4, 4}, P ={1

2 ,18 ,

18 ,

18 ,

116 ,

116}

.

H(X) = −12 · log2

12 − 3 · 1

8 · log218 − 2 · 1

16 · log2116 = 2.125;

E(f) = 12 · 1 + 3 · 1

8 · 3 + 2 · 116 · 4 = 2.125.

Aim: to determine the lower bound of the average codeword length as theshorter the average codeword length, the better the code.Find the code f minimizing the function

E(f) =n∑

i=1p(xi)

∣∣f(xi)∣∣ given

n∑i=1

s−|f(xi)| ≤ 1.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 146 / 239

Page 139: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Bounds of the average codeword lengthTheorem (Shannon’s noiseless coding theorem). For any uniquely deco-dable code f : X → Y∗ we have

E(f) =n∑

i=1p(xi)

∣∣f(xi)∣∣ ≥ −

n∑i=1

p(xi) logs p(xi) =H(X )

log2 s ,

where equality holds if and only if p(xi) = s−|f(xi)|, i = 1, 2, . . . , n.If p(xi) = s−Li , where Li ∈ N, then there exists a prefix code f such that∣∣f(xi)

∣∣ = Li, i = 1, 2, . . . , n, and

E(f) = H(X )

log2 s .

For any distribution of the source alphabet X , there exists a prefix codef : X → Y∗ such that

E(f) < H(X )

log2 s + 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 147 / 239

Page 140: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Block codesSource messages are split into blocks of length m and then these block areencoded.Formal definition: a mapping f : Xm → Y∗.Block encoding is the encoding of the source alphabet X := Xm.Source block: a random vectorX = (X1,X2, . . . ,Xm). Distribution of X:

p(x) = p(x1, x2, . . . , xm) = P(X1 = x1,X2 = x2, . . . ,Xm = xm).

Entropy:H(X) = −

∑x∈Xm

p(x) log2 p(x).

If X1,X2, . . . ,Xm are independent, then H(X) =∑m

i=1 H(Xi).If X1,X2, . . . ,Xm are independent and identically distributed (i.i.d.), thenH(X)=mH(X1).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 148 / 239

Page 141: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Average codeword length per symbolThe average codeword length per symbol of a code f : Xm → Y∗ of anm-dimensional source X is

1mE

∣∣f(X)∣∣ = 1

m∑

x∈Xm

p(x)∣∣f(x)∣∣.

Shannon’s theorem:E∣∣f(X)

∣∣ ≥ H(X)

log2 s .

Corollary. If X1, . . . ,Xm are independent random variables distributed asX, then there exists a prefix code f : Xm → Y∗ such that

1mE

∣∣f(X)∣∣ < H(X)

log2 s +1m .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 149 / 239

Page 142: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Optimal codesBinary case: s = 2.Theorem. If a prefix code f : X → {0, 1}∗ is optimal and the probabilitymasses of the symbols of X are given in descending order, that isp(x1) ≥ p(x2) ≥ · · · ≥ p(xn) > 0, then one may assume that f satisfies thefollowing properties.

a)∣∣f(x1)

∣∣ ≤ ∣∣f(x2)∣∣ ≤ · · · ≤

∣∣f(xn)∣∣, that is codeword lengths are order-

ed inversely with the probabilities.b)

∣∣f(xn−1)∣∣ = ∣∣f(xn)

∣∣, that is the two longest codewords have the samelength.

c) Two of the longest codewords differ only in the last bit (siblings) andcorrespond to the two least likely symbols.

Heuristics. a) If p(xk) > p(xj) and |f(xk)| > |f(xj)|, then swapping the codewords of xjand xk results in a code with shorter average codeword length. Thus, the original codecannot be optimal.b) If |f(xn−1)| < |f(xn)|, then by deleting the last bit of f(xn) we obtain a code withshorter average codeword length, which remains prefix.c) If there exists a codeword f(xi) such that f(xi) and f(xn) differ only in the last bit,then |f(xi)|= |f(xn−1)|= |f(xn)|. If i =n−1, then swap the codes of xi and xn−1. 2

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 150 / 239

Page 143: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Binary Huffman codeTheorem. Assume that the symbols {x1, x2, . . . , xn} of the source alpha-bet X are numbered so that p(x1)≥p(x2)≥· · ·≥p(xn)>0. Combine xn−1and xn into a new symbol xn−1 with probability p(xn−1)=p(xn−1)+p(xn)and consider the reduced source alphabet X = {x1, x2, . . . , xn−2, xn−1}.If g is an optimal prefix code of the reduced source alphabet X with dist-ribution

{p(x1),p(x2), . . . , p(xn−2), p(xn−1)+p(xn)

}then an optimal prefix

code f of the original source X with distribution{

p(x1), p(x2), . . . , p(xn)}

can be obtained by appending 0 and 1 to the codeword g(xn−1) and lea-ving the other codewords unchanged.

Example. Find the binary Huffman codes corresponding to the followingdistributions. Examine the deviation of the average codeword length fromthe theoretical lower bound.

1. P1 = {0.68, 0.17, 0.04, 0.04, 0.03, 0.03, 0.01}.2. P2 = {0.49, 0.14, 0.14, 0.07, 0.07, 0.04, 0.02, 0.02, 0.01}.3. P3 = {0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14}.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 151 / 239

Page 144: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Binary Shannon-Fano codeAssume that the symbols of the source alphabet X = {x1, x2, . . . , xn} arenumbered so that p(x1)≥p(x2)≥· · ·≥p(xn)>0. Let

Li :=⌈− logs p(xi)

⌉, where ⌈a⌉ := min{n ∈ Z, n ≥ a},

and

w1 := 0, wi :=i−1∑ℓ=1

p(xℓ), i = 2, 3, . . . , n.

Let the codeword f(xi) of the source symbol xi be the binary representa-tion of

⌊2Liwi

⌋on Li bits, where ⌊a⌋ := max{n ∈ Z, n ≤ a}.

Theorem. The binary Shannon-Fano code is prefix and the average code-word length satisfies E(f) ≤ H(X ) + 1.Example. Find the binary Shannon-Fano codes corresponding to the fol-lowing distributions.

1. P1 = {0.68, 0.17, 0.04, 0.04, 0.03, 0.03, 0.01}.2. P2 = {0.49, 0.14, 0.14, 0.07, 0.07, 0.04, 0.02, 0.02, 0.01}.3. P3 = {0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14}.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 152 / 239

Page 145: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Shannon entropyDefinition. The entropy of a discrete random variable X with range (al-phabet) X = {x1, x2, · · · , xn} is defined as

H(X) := −n∑

i=1p(xi) log2 p(xi).

Remarks. Entropy isthe amount of information required to determine the value of X;the level of uncertainty conveyed in the value of X.

Entropy has the same definition for random vectors X = (X1,X2, . . . ,Xr)⊤

with range X = {x1, x2, · · · , xn}, namely:

H(X) := −n∑

i=1p(xi) log2 p(xi).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 154 / 239

Page 146: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of entropyX ∈ X , Y ∈ Y: discrete random variables.Theorem.

a) If the number of elements in the range of X is n, then

0 ≤ H(X) ≤ log2 n.

On the left hand side inequality holds if and only if X is constant withprobability one, whereas the necessary and sufficient condition of hav-ing equality on the right hand side is that X is uniformly distributed,that is p(xi) =

1n , i = 1, 2, . . . , n.

b) For discrete random variables X and Y we have

H(X,Y) ≤ H(X) + H(Y),

and equality holds if and only if X and Y are independent.c) For any function g(X) of X we have

H(g(X)

)≤ H(X),

with equality if and only if g is one-to-one.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 155 / 239

Page 147: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Conditional entropy

p(x) :=P(X=x); p(y) :=P(Y=y); p(x, y) :=P(X=x,Y=y);

p(x|y) :=P(X=x|Y=y)= p(x, y)p(y) ;

p(y|x) :=P(Y=y|X=x)= p(y, x)p(x) .

Definition. The conditional entropy of X given Y = y is defined as

H(X|Y = y) := −∑x∈X

p(x|y) log2 p(x|y).

The conditional entropy of X given Y is

H(X|Y) :=∑y∈Y

p(y)H(X|Y = y) = −∑y∈Y

∑x∈X

p(x, y) log2 p(x|y).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 156 / 239

Page 148: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of conditional entropyTheorem. Let X,Y and Z be discrete random variables with finite ran-ges. Then

a)H(X,Y) = H(Y) + H(X|Y) = H(X) + H(Y|X).

b)0 ≤ H(X|Y) ≤ H(X).

On the left hand side equality holds if and only if X is uniquely deter-mined by Y with probability one, whereas the necessary and sufficientcondition of having equality on the right hand side is the independen-ce of X and Y.

c)H(X|Z,Y) ≤ H(X|Z),

with equality if and only ifp(x|z, y) = p(x|z)

for all x, y, z, where p(x, y, z) > 0.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 157 / 239

Page 149: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of conditional entropy

d) For any function f(Y) of Y we have

H(X|Y) ≤ H(X∣∣f(Y)),

with equality if and only if for all fixed z and values x and y satis-fying f(y) = z and p(y) > 0

p(x|y) = P(X = x

∣∣f(Y) = z).

e) The joint entropy of random variables X1,X2, . . . ,Xn satisfies

H(X1,X2, . . . ,Xn) =H(X1) + H(X2|X1) + H(X3|X2,X1) + · · ·

· · ·+ H(Xn|Xn−1, . . . ,X1).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 158 / 239

Page 150: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Mutual informationDefinition. The mutual information of discrete random variables X and Yis defined as

I(X;Y) := H(X) + H(Y)− H(X,Y).

Remark. Mutual information is symmetric and

I(X;Y) = H(X)− H(X|Y) = H(Y)− H(Y|X) = I(Y;X).

Remark.

I(X;Y) =∑x,y

p(x, y) log2p(x, y)

p(x)p(y)

=∑x,y

p(x, y) log2p(x|y)p(x) =

∑x,y

p(x, y) log2p(y|x)p(y) .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 159 / 239

Page 151: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Properties of mutual information

Theorem. Let X and Y be discrete random variables.a) I(X;Y) ≥ 0 and I(X;Y) equals 0 if and only if X and Y are inde-

pendent.b) I(X;X) = H(X).c) I(X;Y) ≤ H(X) and I(X,Y) ≤ H(Y).d) For any functions f and g of X and Y, respectively, we have

I(X;Y) ≥ I(g(X); h(Y)

).

e) The following three statements are equivalent:i) I(X;Y) = H(X);ii) H(X|Y) = 0;iii) there exists a function g : R → R, such that P

(X = g(Y)

)= 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 160 / 239

Page 152: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Noiseless channelsGiven an information channel which is able to transmit the symbols of thecode alphabet Y =

{y1, y2, . . . , ys}.

X: input signal at the entrance of the channel.Y: output signal at the exit of the channel, corresponding to X.

We assume that the channel is memoryless, that is Y depends only on X.

Output signal Y contains I(X,Y) information about the input value X.

Noiseless channel: X = Y, so we have I(X,Y) = H(X).

The maximal value of H(X) is log2 s, that is a single code symbol cancontain that much information. This amount of information can fully betransmitted through the channel. Thus, with a single symbol a noiselesschannel can transmit at most C = log2 s bits of information, which is theinformation capacity of the channel.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 161 / 239

Page 153: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Channel capacityNoisy channel: X = Y, so I(X,Y) < H(X).The behaviour of the channel can be described with transition probabilities

pi|j := P{

Y = yi∣∣X = yj

}, i, j = 1, 2, . . . , s.

Distribution of the input signal X: qj := P(X = yj), j = 1, 2, . . . , s.

The channel capacity of a memoryless information channel isC := sup I(X,Y),

where the supremum is taken over all possible distributions of X.Example. Noiseless channel:

pi|j =

{1, i = j,0, i = j.

I(X,Y) = H(X) ≤ log2 s, with equality if and only if X is uniformly distributed, that isqj = 1/s, j = 1, 2, . . . , s. Capacity:

C = log2 s.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 162 / 239

Page 154: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Memoryless binary symmetric channel

1

00

1

X

1−p

1−p

p

p

1−q

q

Y

Y = {0, 1}, that is s = 2.

Let p, q ∈ [0, 1].

Distribution of the input signal:

P(X = 1) = q, P(X = 0) = 1 − q.Transition probabilities:

p0|0 := P(Y = 0 | X = 0) = 1 − p, p1|0 := P(Y = 1 | X = 0) = p,p1|1 := P(Y = 1 | X = 1) = 1 − p, p0|1 := P(Y = 0 | X = 1) = p.

Capacity of the memoryless binary symmetric channel (BSCp):

C = 1 − H2(p), where H2(p) := −p log2 p − (1 − p) log2(1 − p).

Special cases:C = 1 ⇐⇒ H2(p) = 0 ⇐⇒ p = 0 or p = 1.

p = 0: noiseless; p = 1: binary inversion channel.

C = 1 ⇐⇒ H2(p) = 1 ⇐⇒ p = 1/2: random channel.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 163 / 239

Page 155: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Information sourcesX: information source, infinite sequence X1,X2, . . . of random variables.At time point i the source takes value Xi.Each random variable has the same range (alphabet) X = {x1, x2, . . . , xn}.X is called memoryless, if random variables X1,X2, . . . are independent.X is called stationary, if X1,X2, . . . is stationary, that is for any positive in-tegers n and k, the joint distribution of X1,X2 . . . ,Xn coincides with thejoint distribution of the shifted random vector Xk+1,Xk+2, . . . ,Xk+n.X is called ergodic, if for any function f(x1, . . . , xk) we have

limn→∞

1n

n∑i=1

f(Xi, . . . ,Xi+k−1) = Ef(X1, . . . ,Xk) with probability one,

given the limit exists.Consider a uniquely decodable code f with alphabet Y={y1, y2, . . . , ys}.Block encoding with block length k ≥ 1.Aim: minimization of the average codeword length per source symbol L.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 164 / 239

Page 156: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Source coding with variable lengthLet X be memoryless and stationary, symbol encoding. For the code of asource message of length k we have

L =1kE

(∣∣f(X1)∣∣+ · · ·+

∣∣f(Xk)∣∣) = E

∣∣f(X1)∣∣.

Shannon’s theorem: E∣∣f(X1)

∣∣ ≥ H(X1)

log2 s .

There exists a prefix code f such that: E∣∣f(X1)

∣∣ < H(X1)

log2 s + 1.

Block encoding f : X k → Y∗

L =1kE

(∣∣f(X1, . . . ,Xk)∣∣) ≥ 1

kH(X1, . . . ,Xk)

log2 s =independence

H(X1)

log2 s .

For any k there exists a prefix code f : X k → Y∗ with average codewordlength per symbol L satisfying

L <H(X1)

log2 s +1k .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 165 / 239

Page 157: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Encoding of general information sourcesDefinition. The source entropy of the source X = X1,X2, . . . is definedas

H(X) = limn→∞

1nH(X1,X2, . . . ,Xn),

given the limit exists.Theorem. If the source X = X1,X2, . . . is stationary, then the sourceentropy exists and

H(X) = limn→∞

H(Xn|X1,X2, . . . ,Xn−1).

Theorem. The average codeword length per symbol L of a uniquely de-codable block code f : X k → Y∗ of a stationary source X = X1,X2, . . .satisfies inequality

L ≥ H(X)log2 s .

For a sufficiently large block length k there exists a uniquely decodablecode f with average codeword length per symbol arbitrary close to theabove lower bound.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 166 / 239

Page 158: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Universal source codingCosts of data transmission for the previously studied (block) codes:

Fixed costs: e.g. relative frequencies of source symbols.Variable costs: the codewords corresponding to the source message.

Theoretically, sources are of infinite length, so the the proportion of thefixed cost vanishes.In practice, source messages have finite lengths. The fixed cost might evenbe higher, than the variable cost of the codewords.

Adaptive codes: the actual source symbols are encoded with the help ofthe preceding symbols.

Examples:Adaptive Huffman code.Lempel-Ziv algorithms (LZ77, LZ78, LZW).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 168 / 239

Page 159: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

LZ77 algorithm (sliding window Lempel-Ziv algorithmAbraham Lempel and Jakov Ziv (1977)LZ77 achieves compression by replacing repeated occurrences of data withreferences to a single copy of that data existing earlier in the uncompress-ed data stream.A sliding window of length ha is moved on the source message.

Parts of the sliding window:dictionary: contains the last hk previously coded source symbols;lookahead buffer: contains the next he source symbols to be coded.

Example. Source message

. . . cabracadabrarrarrad . . .

Sliding window: ha := 13, hk := 7, he := 6.

c a b r a c a d a b r a r r a r r a dSándor Baran Mathematics and Information Theory 2018/19, 2. sem. 169 / 239

Page 160: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

LZ77 algorithmCoding:

1 Using a backward pointer the encoder finds in the dictionary the sym-bols which match the first symbol (after the cursor) of the lookaheadbuffer.

2 Checks the lengths of matching strings of the dictionary and the look-ahead buffer.

3 Finds the longest match.4 Output: a triple ⟨t, h, c⟩.

t: position relative to the cursor of the longest match that starts inthe dictionary. If no match is found, t = 0.h: length of the longest match. If no match is found, h = 0.c: code of the next symbol in the lookahead buffer beyond thelongest match.

5 Advances window by h + 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 170 / 239

Page 161: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Sliding window: ha := 13, hk := 7, he := 6.

c a b r a c a d a b r a r r a r r a d

d is not in the dictionary. Output: ⟨0, 0, f(d)⟩.

c a b r a c a d a b r a r r a r r a d

a in the dictionary: t = 2, h = 1, t = 4, h = 1 and t = 7, h = 4.Longest match: t = 7, h = 4. Output: ⟨7, 4, f(r)⟩

c a b r a c a d a b r a r r a r r a d

r in the dictionary: t = 1, h = 1 and t = 3, h = 5.Longest match: t = 3, h = 5. Output: ⟨3, 5, f(d)⟩

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 171 / 239

Page 162: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

PropertiesEncoding of ⟨t, h, c⟩ using a fixed length binary code requires

⌈log2 hk⌉+ ⌈log2 he⌉+ ⌈log2 n⌉bits, where n is the size of the source alphabet.The efficiency of the encoding asymptotically (hk, he → ∞) equals that ofthe optimal algorithm, which requires the distribution of the source.For a stationary and ergodic source the limit as hk, he → ∞ of the averagecodeword length per symbol equals H(X)

log2 s .Modifications, increasing efficiency, pl.

Variable length code for the compression of ⟨t, h, c⟩, e.g. adaptiveHuffman code.Dual format: the output is either ⟨t, h⟩ or ⟨c⟩. Formats are indicatedby a flag bit.LZSS – Lempel-Ziv-Storer-SzymanskiDictionary and lookahead buffer of variable length.

Applications: pkzip, arjSándor Baran Mathematics and Information Theory 2018/19, 2. sem. 172 / 239

Page 163: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

LZ78 algorithmBoth the compressor and the decompressor builds and maintains a dictio-nary from the previously appeared strings.

1 Starting from the cursor (actual position) the compressors finds thelongest match in the dictionary.

2 Output: ⟨i, c⟩.i: index of the dictionary entry of the match;c: code of the first non-matching character.If no match is found, the output will be ⟨0, c⟩.

3 Extends the dictionary with the concatenation of the dictionary entryi and the first non-matching character (having code c). Has an eofsymbol.

For a stationary and ergodic source the average codeword length per sym-bol converges to H(X)

log2 s .Problem: the dictionary quickly increases without limits.Solution: the use of a fixed dictionary after some time, or removal of therarely used or unnecessary entries.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 173 / 239

Page 164: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleSource message:

dabbacdabbacdabbacdabbacdeecdeecdee

output dictionary output dictionaryof compressor index entry of compressor index entry

⟨0, f(d)⟩ 1 d ⟨4, f(c)⟩ 10 bac⟨0, f(a)⟩ 2 a ⟨9, f(b)⟩ 11 dabb⟨0, f(b)⟩ 3 b ⟨8, f(d)⟩ 12 acd⟨3, f(a)⟩ 4 ba ⟨0, f(e)⟩ 13 e⟨0, f(c)⟩ 5 c ⟨13, f(c)⟩ 14 ec⟨1, f(a)⟩ 6 da ⟨1, f(e)⟩ 15 de⟨3, f(b)⟩ 7 bb ⟨14, f(d)⟩ 16 ecd⟨2, f(c)⟩ 8 ac ⟨13, f(e)⟩ 17 ee⟨6, f(b)⟩ 9 dab

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 174 / 239

Page 165: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

LZW algorithm

An effective variant of LZ78. Terry Welch (1984)As a contrast to the pair ⟨i, c⟩ of LZ78, the output is just the dictionaryindex i. The dictionary must contain the complete source alphabet.

1 Starting from the cursor the compressor reads source symbols havinga match in the dictionary into a buffer p.Let c be the first symbol such that pc is not a dictionary entry

2 Output: index of dictionary entry c.3 Extends the dictionary with the string pc and continues the algorithm

from character c.

Application: compress command of UNIX, GIF format.Adaptive dictionary length. In case of compress 512 entries, after fillingthem 1024, etc. The upper bound can be specified up to 216 entries.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 175 / 239

Page 166: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleSource message:

dabbacdabbacdabbacdabbacdeecdeecdee

index entry output index entry output1 a 14 acd 102 b 15 dabb 123 c 16 bac 94 d 17 cda 115 e 18 abb 76 da 4 19 bacd 167 ab 1 20 de 48 bb 2 21 ee 59 ba 2 22 ec 510 ac 1 23 cde 1111 cd 3 24 eec 2112 dab 6 25 cdee 2313 bba 8 5

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 176 / 239

Page 167: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Compress the following source messages:a) abbabbabbbaababa;b) “bed spreaders spread spreads on beds”, where space and eof are

separate symbols.Use

LZ77 algorithm with parameters hk = 7, he = 6;LZ78 algorithm;LZW algorithm.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 177 / 239

Page 168: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

QuantizationX = X1,X2, . . . : stationary source, Xi ∈ R absolutely continuous.Q : R → R: a function with discrete range, quantizer.Q(X1),Q(X2), . . . : quantized signal of X, sequence of discrete randomvariables. Source code with block length k = 1.Measure of distortion for a block of length n:

D(Q) :=1nE

( n∑i=1

(Xi −Q(Xi)

)2)

=stationarity

E(X −Q(X)

)2.

D(Q): mean-squared distortion of the quantizer Q.X: a random variable distributed as X1,X2, . . ..{x1, x2, . . . , xN}: range of the quantizer Q. Elements of Q are the levelsof quantization.Quantization regions: Bi :=

{x∈R : Q(x)=xi

}, i=1, 2, . . . ,N.

All definitions are valid also for discrete sources X.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 179 / 239

Page 169: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Optimal quantizerGiven the levels of quantization {x1, x2, . . . , xN}, the regions of the quanti-zer Q with smallest mean-squared distortion:

Bi ={

x : |x − xi| ≤ |x − xj|, j = 1, 2, . . .N}.

In case of equality x is assigned to the region with the smallest index. Thisis the nearest neighbour condition. We are dealing with quantizers satisfy-ing it.If x1 < x2 < · · · < xn, then the boundaries of quantization regions are:

yi =xi + xi+1

2 , i = 1, 2, . . . ,N − 1, that is

B1 =]−∞, y1 ], Bi =]yi−1, yi], i = 1, 2, . . . ,N − 1, BN =]yN−1,∞[.

f(x): PDF corresponding to the stationary source X.The optimal level corresponding to a given region Bi is the centroid of Bi:

xi =

∫Bi

xf(x)dx∫Bi

f(x)dx = E(X∣∣X ∈ Bi

).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 180 / 239

Page 170: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Uniform quantizerThe mean-squared distortion of the quantizer

Q(x) = xi, if x ∈ Bi, i = 1, 2, . . . ,N,

equals

D(Q) =

∫ ∞

−∞

(x −Q(x)

)2f(x)dx =N∑

i=1

∫Bi

(x − xi)2f(x)dx.

[−A,A]: range of X, that is f(x) = 0, if x ∈ [−A,A].The N-level uniform quantizer:

QN(x) = −A + (2i − 1)AN , if

−A + 2(i − 1)AN < x ≤ −A + 2iAN , i = 1, 2, . . . ,N.

Explanation. Regions of QN are obtained by partitioning the interval[−A,A] into N equal parts. The levels are the midpoints of the intervals.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 181 / 239

Page 171: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Non-uniform quantizersPrincipal idea: on regions with high probability mass a finer quantizationis used, more levels are assigned to this region.

Aim: for a given random variable (source) X find the levels of quantizati-on x1 < x2 < . . . < xN and quantizer Q minimizing D(Q).An optimal quantizer satisfies the following two necessary conditions(referred as Lloyd-Max condition).

1 Nearest neighbour condition:∣∣x −Q(x)∣∣ = min

1≤i≤N|x − xi|, ∀x ∈ R.

2 Centroid condition:Each level xj equals the mean (conditional expectation) of thosesample values Xi which are quantized to this particular level(Q(Xi) = xj).

A quantizer satisfying the above condition is called Lloyd-Max quantizer.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 182 / 239

Page 172: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Not all Lloyd-Max quantizers are optimal.

Let X be uniformly distributed on {1, 2, 3, 4}. Possible 2-level Lloyd-Maxquantizers:

Q1(1) = 1; Q1(2) = Q1(3) = Q1(4) = 3;

Q2(4) = 4; Q2(1) = Q2(2) = Q2(3) = 2;

Q3(1) = Q3(2) = 1.5; Q3(3) = Q3(4) = 3.5.

Mean-squared distortions:

D(Q1) = D(Q2) = 0.5, D(Q3) = 0.25.

Only quantizer Q3 is optimal.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 183 / 239

Page 173: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Lloyd-Max condition for absolutely continuous sourcesX: absolutely continuous random variable (source) with PDF f.Lloyd-Max condition:

1 Nearest neighbour condition:

y0 = −∞, yi =xi + xi+1

2 , i = 1, 2, . . . ,N − 1, yN = ∞,

where yi−1 and yi are the boundaries of the quantization regioncorresponding to xi.

2 Centroid condition:

xi =

∫ yiyi−1

xf(x)dx∫ yiyi−1

f(x)dx , i = 1, 2, . . . ,N.

Theorem (Fleischer, 1964). Let f(x) be log-concave that is log f(x) isconcave. Then there exists a unique N-level Lloyd-Max quantizer for f(x),which is in this way the optimal quantizer for f(x).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 184 / 239

Page 174: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Lloyd-Max algorithm

Find the optimal quantization levels xi and the corresponding quantizationregions Bi =]yi−1, yi].Stuart P. Lloyd, Bell Laboratories, 1957 (published: 1982);Joel Max, General Telephone and Electronics Lab., Waltham, 1960.

Algorithm1 Choose an arbitrary set of starting levels x1 < x2 < · · · < xN.2 Determine the region boundaries yi according to the nearest neighbo-

ur condition, that is yi=xi+xi+1

2 , i=1, 2, . . . ,N−1.3 Choosing y0 = −∞ and yN = ∞ optimize the quantizer by finding

new levels according to centroid condition.4 Determine the change in the mean squared distortion. If it is below a

previously specified level stop, otherwise repeat steps 2. and 3.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 185 / 239

Page 175: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Companding quantizersIn electronic devices usually uniform quantizers are implemented. However,in case of signals with large dynamic range (e.g. speech or audio signals)uniform quantization does not result in efficient coding.Using a strictly monotone increasing function called compressor, the sour-ce is transformed into the interval [−1, 1], then a uniform quantizer is app-lied. The method results in a non-uniform quantization.Quantized values are decoded, then with the help of the expander (inverseof the compressor) the original dynamic range is restored.Compander: compressor and expander.Applications:

Digital telephony systems, compressing before input to an analog-to-digital converter, and then expanding after a digital-to-analog conver-ter.Professional wireless microphones, as the dynamic range of the micro-phone audio signal itself is larger than the dynamic range provided byradio transmission.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 186 / 239

Page 176: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Companders in speech coding8-bit Pulse Code Modulation (PCM) digital telephony systems.North-America and Japan: µ-law (µ = 255).

Gµ(x) = sign(x) log(1 + µ|x|)log(1 + µ)

, − 1 ≤ x ≤ 1.

G−1µ (x) = sign(x) 1

µ

((1 + µ)|x| − 1

), − 1 ≤ x ≤ 1.

Europe: A-law (A = 87.7 or A = 87.6)

GA(x) ={

sign(x) A|x|1+log A , 0 ≤ |x| < 1

A ;

sign(x) 1+log |Ax|1+log A , 1

A ≤ |x| ≤ 1.

G−1A (x) =

{sign(x) |x|(1+log A)

A , 0 ≤ |x| < 11+log A ;

sign(x) exp{|x|(1+log A)−1}A , 1

1+log A ≤ |x| ≤ 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 187 / 239

Page 177: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

µ-law and A-law

−0.1 −0.05 0 0.05 0.1

−0.4

−0.2

0

0.2

0.4

0.6

x

µ−law

A−law

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 188 / 239

Page 178: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Vector quantizationThe output of the source can be considered as a random vector. In thisway for a fixed mean squared distortion, one can reach better compressionrate than considering the coordinates separately, especially, when the coor-dinates are correlated.Example. The RGB values of a color image are quantized not separately,bus as an element of the 3D color space. Using 3 scalar quantizers thequantization regions are 3D bricks, whereas under vector quantization regi-ons of arbitrary shape can be considered.

X: d-dimensional source vector with PDF f(x).

Quantizer: Q : Rd → {x1, x2, . . . , xN}, xi ∈ Rd, i = 1, 2, . . . ,N.

Quantization regions: B1,B2, . . . ,BN. A partition of Rd, that is the setsB1,B2, . . . ,BN are disjoint and

∪Ni=1 Bi = Rd.

Q(x) = xi, if x ∈ Bi, i = 1, 2, . . . ,N.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 189 / 239

Page 179: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Lloyd-Max condition for vector quantizersMean squared distortion:

D(Q) =1dE

∥∥X −Q(X)∥∥2

=1d

N∑i=1

∥x − xi∥2f(x)dx.

Lloyd-Max condition:1 Nearest neighbour condition: regions of Rd are Voronoi regions, that

isBi =

{x : ∥x − xi∥ ≤ ∥x − xj∥, ∀j = i

}.

2 Centroid condition:

xi = arg miny

∫Bi

∥x − y∥2f(x)dx,

that is the output vectors are centroids of the corresponding regions.Natural generalization of the Lloyd-Max algorithm: Linde-Buzo-Gray algo-rithm.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 190 / 239

Page 180: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

SamplingX(t): square integrable signal.{X(kT), k = 0,±1,±2, . . .}: sample of X(t) with sampling period T > 0.Reconstruction of X(t):

X(t) :=∞∑

k=−∞X(kT) sinc

( tT − k

), t ∈ R,

wheresinc(t) := sin(πt)

πt , t ∈ R.

Remark. sinc(0) = 1 and sinc(k) = 0, k = 0,±1,±2, . . . , that is fort = kT one has X(t) = X(t).

Problem: Under what conditions can the signal X(t) be fully reconstruct-ed from the sample X(kT) (X(t) = X(t), t ∈ R)?

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 191 / 239

Page 181: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Nyquist-Shannon sampling theoremTheorem. If a square integrable signal X(t) is bandlimited to W > 0,that is the Fourier transform X(ω) of X(t) equals 0 for |ω| > W, then

X(t) is continuous at any point t ∈ R,X(t) can be completely restored from a sample with period T, that isfor all t ∈ R we have

X(t) = X(t) :=∞∑

k=−∞X(kT) sinc

( tT − k

),

ifT <

π

W .

Remark. If the band frequency is W′ = W2π , then the sampling frequency

should be at least the double of W′ (Nyquist frequency).Example. In telephony signal is bandlimited to 3400 Hz and the samplingfequency is 8000 Hz.For CD quality audio signal is bandlimited to 20 kHz and the sampling fre-quency is 44100 Hz.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 192 / 239

Page 182: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Transform coding1. The source signal is divided into blocks and a reversible transformationis applied to each block resulting in the corresponding transform coeffici-ents.2. Transform coefficients are quantized.3. The quantized transform coefficients are coded using a binary code.x = (x0, x1, . . . , xk−1)⊤: source block to be transformed.y = (y0, y1, . . . , yk−1)⊤: transformed coefficients.A = (ai,j): k × k dimensional orthonormal transform matrix.Forward and inverse transform:

y = Ax and x = By, where B = A−1 = A⊤.

In two-dimensional case (image compression) both the source block andthe transform coefficients are matrices (X and Y):

Y = AXA⊤ and X = A⊤YA.

Example. In case of JPEG compression 8 × 8 pixel blocks are used.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 194 / 239

Page 183: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Special transformations1. Discrete Cosine Transform (DCT) Ahmet, Natarajan, Rao (1974)Entries of A:

a1,j =1√k, ai,j =

√2k cos

((2j−1)(i−1)π

2k

), i, j = 1, 2, . . . , k.

The most popular transformation. Applications: JPEG, MPEG.2. Discrete Walsh-Hadamard Transform (DWHT, 1923) Jacques Hada-mard, Joseph L. Walsh.Entries of A are obtained using recursion:

A2k =

[A2k−1 A2k−1

A2k−1 −A2k−1

], where A1 = 1.

A2kA⊤2k = 2kI2k , so the transform matrix is 1√

2k A2k .Applications: JPEG XR (JPEG extended range, 2009; Microsoft HD pho-to) and MPEG-4 AVC or H.264 (MPEG-4 Part 10 Advanced Video Co-ding, 2003; e.g. Blue-Ray disks).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 195 / 239

Page 184: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Subband Coding1. Source is passed through a filter bank which divides its spectrum intosome number of subbands (analysis filters). E.g. one can take M subbandsof equal width.2. In order to keep synchronized, output of filters are then subsampledaccording to the ratio of input and output bandwidths (decimation ordownsampling). E.g. one can keep each Mth sample value.3. The subsampled signals are separately coded and transmitted (com-pressed).4. Encoded samples from each subband are decoded and the decoded va-lues are then upsampled by inserting an appropriate number of zeros bet-ween samples.5. Upsampled signals are passed through a filter bank (synthesis filters).Output of reconstruction filters are added to give a final output.

Human sensory organs are very sensitive on frequencies. More importantfrequencies should be reconstructed more precisely, whereas less importantones can have larger distortion.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 196 / 239

Page 185: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleInput of the filter bank: (x1, x2, . . . , xn).Outputs: (y1, y2, . . . , yn) and (z1, z2, . . . , zn), where assuming x0 = 0

yi =xi + xi−1

2 ; zi = xi − yi =xi − xi−1

2 , i = 1, 2, . . . , n.

Both output sequences are smoother (have smaller dynamic range) thanthe original signal. Thus, they can be compressed with smaller distortion.The output is doubled.Downsampling: transmit only signals with even indices, that is y2i and z2i.Synthesis:

x2i = y2i + z2i, x2i−1 = y2i − z2i.

y1, y2, . . . - 2 ↓ -y2, y4, . . . 2 ↑ -y2, 0, y4, 0, y6, . . . -+ y2+z2, y4+z4, . . .

Uz1, z2, . . . - 2 ↓ -z2, z4, . . . 2 ↑ -z2, 0, z4, 0, z6, 0, . . . -− y2−z2, y4−z4, . . .

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 197 / 239

Page 186: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Delta codingA special case of predictive coding. It is advantageous if the differencebetween subsequent signal values are small, e.g. in case of digital imagesprovided we are not close to an edge.

Example. 8 subsequent pixel values of an 8-bit intensity image:147, 145, 141, 146, 149, 147, 143, 145.

Fixed bit length encoding on 8 bits: 64 bits.Differences (except the first value):

147, −2, −4, 5, 3, −2, −4, 2.The largest absolute difference is 5, which can be stored using 3 bits.To encode the differences it suffices to use 4 bits (3 + 1 for the sign).8 bits are used to store the length of the binary representation of the diffe-rences.Length of delta encoding: 8 + 8 + 7 · 4 = 44 bits. 31% gain.Lossless compression.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 198 / 239

Page 187: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Lossy compression, exampleOutput of the source:

5.4, 10.1, 7.2, 4.6, 6.9, 12.5, 6.2, 5.3.

Differences:

5.4, 4.7, −2.9, −2.6, 2.3, 5.6, −6.3, −0.9.

Uniform quantizer with 7 levels: −6, −4, −2, 0, 2, 4, 6.Quantized values:

6, 4, −2, −2, 2, 6, −6, 0.

Restored values:6, 10, 8, 6, 8, 14, 8, 8.

Errors:−0.6, 0.1, −0.8, −1.4, −1.1, −1.5, −1.8, −2.7.

Longer sequences may result in even larger errors.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 199 / 239

Page 188: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Quantization errors{xn}, {xn}: input and reconstructed signal, respectively.{dn}: sequence of differences, dn = xn − xn−1.Sequence of quantized differences: dn = Q(dn) = dn + qn.Reconstruction: x0 = x0 and xn = xn−1 + dn.

d1 = x1 − x0; d1 = Q(d1) = d1 + q1;

x1 = x0 + d1 = x0 + d1 + q1 = x1 + q1;

d2 = x2 − x1; d2 = Q(d2) = d2 + q2;

x2 = x1 + d2 = x1 + q1 + d2 + q2 = x2 + q1 + q2;

xn = xn +n∑

k=1qk.

Quantizaton errors are accumulated.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 200 / 239

Page 189: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Predictive codingAt the nth step the encoder knows the previously restored value xn−1.Modified differences: dn = xn − xn−1.

d1 = x1 − x0; d1 = Q(d1) = d1 + q1;

x1 = x0 + d1 = x0 + d1 + q1 = x1 + q1;

d2 = x2 − x1; d2 = Q(d2) = d2 + q2;

x2 = x1 + d2 = x1 + d2 + q2 = x2 + q2;

xn = xn + qn.

Aim: to keep the differences dn as small as possible.The value of xn is approximated with a function of the previously reconst-ructed signal values pn = f(xn−1, xn−2, . . . , x0) called predictor.

dn = xn − pn = xn − f(xn−1, xn−2, . . . , x0).

The method is called differential pulse code modulation (DPCM).Patent: C. Chapin Cutler, Bell Laboratories, 1950.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 201 / 239

Page 190: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

DPCM-xn + -dn quantizer -dn

pn −

pn + +

xn

?

�predictor

6

-

encoder

-dn + -xn

pn+

predictor �

6

decoder

Input signals might change their character: adaptive DPCM.1. Adaptation to the input signal xn of the encoder: forward-adaptivemethod. The decoder does not know the signal xn, the new decoding pa-rameters have to be transferred.2. Adaptation to the output signal xn: backward-adaptive method. Boththe encoder and the decoder knows its value.Quantization can also be adaptive. Forward-adaptive case: the source isdivided into blocks and the parameters of the optimal quantizer for eachblock are transferred.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 202 / 239

Page 191: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Jayant quantizerBackward-adaptive quantizer. Nikil S. Jayant, Bell Laboratories, 1973.If current input falls in the inner levels (close to the origin), contract stepsize, otherwise expand it.Each quantization interval (level) has a multiplier, which is less than 1 ininner levels and greater than 1 in outer levels. The multipliers are sym-metric to the origin.Mk: multiplier of the kth level.Outer level: Mk > 1; inner level: Mk < 1.∆n: step size of the quantizer at time n (the input is xn).If xn−1 falls in the reqion ℓ(n − 1), then step size adaptation

∆n = Mℓ(n−1)∆n−1.

Due to finite precision arithmetic, one has to specify ∆min and ∆max.

Example. Multipliers of a quantizer with 8 levels (3-bit quantizer):M1 = M8 = 1.2, M2 = M7 = 1, M3 = M6 = 0.9, M4 = M5 = 0.8.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 203 / 239

Page 192: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Output levels of a 3-bit Jayant quantizer6

-−3∆ −2∆ −∆

∆ 2∆ 3∆

−∆/2

−3∆/2

−5∆/2

−7∆/2

∆/2

3∆/2

5∆/2

7∆/2Output

Input

1

2

3

4

5

6

7

8

Multipliers are symmetric: M1 = M8, M2 = M7, M3 = M6, M4 = M5.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 204 / 239

Page 193: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

ExampleInner levels: M4 = M5 = 0.8, M3 = M6 = 0.9.Outer levels: M2 = M7 = 1, M1 = M8 = 1.2.Initial step size: ∆0 = 0.5.Input: 0.1, −0.2, 0.2, 0.1, −0.3, 0.1, 0.2, 0.5, 0.9, 1.5, 1.0, 0.9.

Quantization process:

n ∆n Input Output level Output Error Step size update0 0.5 0.1 5 0.25 0.15 ∆1 = M5 ×∆01 0.4 −0.2 4 −0.2 0.0 ∆2 = M4 ×∆12 0.32 0.2 5 0.16 0.04 ∆3 = M5 ×∆23 0.256 0.1 5 0.128 0.028 ∆4 = M5 ×∆34 0.2048 −0.3 3 −0.3072 −0.072 ∆5 = M3 ×∆45 0.1843 0.1 5 0.0922 −0.0078 ∆6 = M5 ×∆56 0.1475 0.2 6 0.2212 0.0212 ∆7 = M6 ×∆67 0.1328 0.5 8 0.4646 −0.0354 ∆8 = M8 ×∆78 0.1594 0.9 8 0.5578 −0.3422 ∆9 = M8 ×∆89 0.1913 1.5 8 0.6696 −0.8304 ∆10 = M8 ×∆910 0.2296 1.0 8 0.8036 0.1964 ∆11 = M8 ×∆1011 0.2755 0.9 8 0.9643 0.0643 ∆12 = M8 ×∆11

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 205 / 239

Page 194: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Delta modulationFor high frequency samples of a continuous source the difference betweenneighbouring sample values is small.Delta modulation (DM): DPCM with 2 level (1 bit) quantizer.In order to decrease distortion, the sampling frequency is increased even upto the centuple of the band frequency.Output values: ±∆. Fixed step size ∆: linear delta modulation.Problem: for flat input the output oscillates (granular noise), whereassteeply increasing output cannot be followed (overload noise).

Source: John Edward Abate: Linear and adaptive delta modulation. PhD thesis, Newark Collegeof Engineering, 1967.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 206 / 239

Page 195: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Adaptive delta modulationJohn Edward Abate, AT&T and Bell Laboratories, 1967.Nearly flat output: small step size ∆; fast changes: large ∆.sn: DM “step” at the nth time point, sn = ±∆n.Step size updating based on one step:

∆n+1 =

{M1∆n, if sign sn = sign sn−1;M2∆n, if sign sn = sign sn−1;

1 < M1 =1

M2< 2.

Source: John Edward Abate: Linear and adaptive delta modulation. PhD thesis, Newark Collegeof Engineering, 1967.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 207 / 239

Page 196: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Continuous variable slope delta modulationJohannes Anton Griefkes, Karel Riemens, Philips, 1970.Continuous variable slope delta modulation (CVSD):

∆n = β∆n−1 + αn∆0.

β: constant, slightly less than 1;αn ∈ {0, 1}: αn = 1, if J of the previous K outputs of the quantizer haveequal signs, otherwise αn = 0. Typical values: J = K = 3.Encodes at 1 bit per sample, e.g. audio sampled at 16 kHz is encoded at16 kbit/s.

Applications:16 and 32 kbit/s CVSD: military TRI-TAC digital telephones.16 kbit/s: US Army; 32 kbit/s: US Air Force.64 kbit/s CVSD: telephone-related bluetooth (e.g. wireless headsets,communication between mobile phones).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 208 / 239

Page 197: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Predictorspn= f(xn−1, xn−2, . . . , x0): predictor of a predictive encoder (e.g. DPCM).Aim: find the optimal function f minimizing the the mean squared error

σ2d = E(Xn − pn)

2.

In the general case the problem is very complicated.For a fine enough quantization xn ≈ Xn, that is one can consider

pn = f(Xn−1,Xn−2, . . . ,X0).

σ2d is minimal if

f(Xn−1,Xn−2, . . . ,X0) = E(Xn|Xn−1,Xn−2, . . . ,X0),

however, this requires the knowledge of the corresponding conditional dist-ributions.Conditional distributions are usually not known. In case of a normally dist-ributed source, the conditional expectation is a linear function of the valu-es Xn−1,Xn−2, . . . ,X0.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 209 / 239

Page 198: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Linear predictionLinear predictor of order N: pn :=

∑Ni=1 aixn−i.

For a fine enough quantization one has to minimize:

σ2d = E

(Xn −

N∑i=1

aiXn−i

)2.

R(k) = E(XnXn+k): autocovariance function of a centered weekly statio-nary source Xk (constant mean, autocovariances depend only on the lag).From equations ∂

∂ajσ2

d = 0, j = 1, 2, . . . ,N, we have:

N∑i=1

aiR(i − j) = R(j), j = 1, 2, . . . ,N.

Solution on the above system of equations results in the coefficients of thepredictor.Problem: the Wiener-Hopf equation to be solved has been derived understationarity assumption, which might hold only locally.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 210 / 239

Page 199: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Adaptive predictorForward adaptive case: input is divided into blocks.Speech encoding: blocks of length 16 ms. 8000 Hz sampling results in128 sample values per block.Image compression: 8 × 8 pixel blocks.Sample autocovariance of the ℓth block of length M:

R(ℓ)(k) = 1M − |k|

ℓM−|k|∑i=(ℓ−1)M+1

XiXi+|k|, R(ℓ)(−k) = R(ℓ)(k).

Input has to be buffered, which adds a delay to the system. As the decoderdoes not know the input signal, it requires some additional information.Backward adaptive case: using the output signal of the encoder a recursiveformula for minimization of

d2n =

(Xn −

N∑i=1

aixn−i

)2.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 211 / 239

Page 200: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Waveform based speech compressionReference method: pulse code modulation (PCM) codec.Analog speech signal with a bandwidth limited to 300 to 3400 Hz issampled with rate 8000 Hz and quantized using an 8-bit quantizer.Transmission bit rate: 8000 × 8 = 64 kbit/s.ITU-T (International Telecommunication Union – TelecommunicationStandardization Sector) G.711 telecommunication standard (1972): PCMcoding with companded quantizer (A-law or µ-law).Adaptive DPCM (ADPCM): utilizes the correlation between the differentvoice samples (Jayant, Bell Laboratories, 1974)Quantization: 5, 4, 3, 2 bits; bit rate: 40, 32, 24, 16 kbit/s.ITU-T G.726 telecommunication standard (speech codec, 1990): superse-des both G.721 (32 kbit/s, 1984) and G.723 (24 and 40 kbit/s, 1988)standards.Most commonly used mode: 32 kbit/s, standard codec in DECT (digitalenhanced cordless telecommunications) wireless phone systems (e.g. Pana-sonic KX-TG1100).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 213 / 239

Page 201: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Speech formationAir from lungs push through the vocal tract (consisting of the laryngeal ca-vity, pharyx, oral and nasal cavities) out of the mouth to produce a sound.The vocal tract modulates the voice produced by the vocal cords. In spe-ech generators a generated signal is modulated.

Waveform for the word “Decision”: Source: Sun, L., Mkwawa, I.-H., Jammeh, E., Ifeachor, E.Guide to Voice and Video over IP. Springer, 2013 (p. 21., fig. 2.3).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 214 / 239

Page 202: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Voiced and unvoiced soundsVoiced sounds: all vowels and e.g. consonants b, d, g, j, v, z. Vocal cordsvibrate (open and close) at a given frequency (fundamental frequency,pitch frequency) and the speech samples show a quasi-periodic pattern.

Sample of voiced speech. Source: Sun et al. (2013); p. 22., fig. 2.4.

Unvoiced sounds: e.g. f, k, p, s, t, ch. Vocal cords do not vibrate, remainopen during the sound production. The waveform is more like noise.

Sample of unvoiced speech. Source: Sun et al. (2013); p. 23., fig. 2.5.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 215 / 239

Page 203: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Parametric compression codingSpeech signal is stationary as the shape of the vocal tract is stable in shortperiod of time (around 20 ms).In a stationary segment (frame) the vocal tract can be modeled by a filter.The encoder analyzes the different speech segments:

classifies whether the speech segment is voiced or unvoiced;determines the parameters of the voice generating filter;estimates the gain (energy) of the speech excitation signal and forvoiced segments the pitch frequency (males: ≈ 125 Hz; females:≈ 250 Hz).

The parameters are coded into a binary bit stream and transmitted to thedecoder. The decoder will produce its excitation signal and reconstruct thespeech (carry out speech synthesis) based on the received parameters.

Parametric encoders are more complex than waveform based ones.The quality of parametric based speech codecs is low, with mechanicsound but fair intelligibility.Have very low transmission bit rate: 1.2 − 4.8 kbit/s.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 216 / 239

Page 204: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Linear prediction codingLinear prediction coding (LPC), Bishnu S. Atal, Bell Labs, 1971. Uses ap-order linear filter.εn: excitation signal. Voiced case: periodic with a given frequency; unvo-iced case: white noise (random, independent, stationary).G: gain of the signal.xn: output speech signal.

xn =

p∑i=1

aixn−i + Gεn.pitch period, T

?

periodpulse train

voiced

-• × - -vocal tract(time varying filter)

6

energy (G)

εn xn

LPC coefficients (ai)?

6666

white noise

unvoiced

•R

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 217 / 239

Page 205: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

LPC-10 algorithmDepartment of Defense, USA. Federal Standard (FS) 1015 (1984). Mainlyused in radio communications with secure voice transmissions.Length of a segment (frame): 22.5 ms. Information to be transmitted: 54bits per frame.

Type of excitation (voiced/unvoiced): 1 bit.Pitch frequency (period): 6 bits (quantizer with logarithmic compan-ding).Filter parameters: 41 bits. Sensitive on the errors of parameter valuesaround 1. Instead of a1 and a2 values gi = (1+ai)/(1−ai), i = 1, 2,are quantized.

▶ Voiced case: 10-order predictive filter. Uniform quantizer,g1, g2, a3, a4: 5 bits; a5, . . . , a8: 4 bits; a9: 3 bits, a10: 2 bits.

▶ Unvoiced case: 4-order predictive filter. Uniform quantizer,g1, g2, a3, a4: 5 bits; error correction: 21 bits.

Energy: 5 bits (quantizer with logarithmic companding).Synchronization: 1 bit.

Bit rate: 54 bits/22 ms = 2.4 kbit/s. Compression ratio is 26.7 comparedwith 64 kbit/s PCM. Enhanced variants may achieve 800 bit/s.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 218 / 239

Page 206: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Code excited linear predictionAnalysis-by-synthesis (AbS): a synthesizer is included at the encoder side.A closed-loop search is carried out in order to find the best match excita-tion signal, that is the one minimizing the error between the original andthe synthesized speech signal. The parameters of this excitation signal aretransferred to the decoder.Code excited linear prediction (CELP), Manfred R. Schroeder and BishnuS. Atal, 1985.Optimal excitation signal is chosen from a code book with a size of 256 to1024, the index of the chosen signal is sent to the decoder.Fair quality speech transmission on a bit rate of 4.8 kbit/s.Slow search in the code book. Large code book is split into smaller ones.Original algorithm (Schroeder and Atal, 1983) implemented on a Cray-1supercomputer (80 MFLOPS; DE HPC: 254 TFLOPS): coding of 1s spe-ech signal took 150s.Standards: ITU-T G.728 (16 kbit/s), G.729 (8 kbit/s)Application: part of RealAudio and MPEG-4 Audio formats.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 219 / 239

Page 207: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Audio compressionCD quality: bandwidth limited to 20 kHz, 44100 Hz sampling frequency,16-bit uniform quantizer (e.g. WAV: waveform audio file format, 1991).Transmission bit rate: 44100 × 16 × 2 ≈ 1400 kbit/s (×2: stereo).

Factors should be taken into account during audio compression.Frequencies between 2 and 4 kHz are the easiest to perceive. As thefrequencies change towards the ends of the audible bandwidth, thevolume must also be increased to detect them. Hence, in this regioneven a larger distortion is more tolerated.A high intensity dominant sound on a given frequency makes inaudible(masks out) the neighbouring frequencies (simultaneous masking).A high intensity dominant sound on a given frequency masks out theweaker sounds in neighbourig frequencies which are present immedia-tely preceding (≈ 2 ms) or following (≈ 15 ms) it (temporal masking).

The audio signal to be compressed is analyzed in frequency domain.Not all frequency components are transferred and for the transferred onesquantizers with different distortions are applied.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 220 / 239

Page 208: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MPEG-2 Audio Layer III (MP3) compression, IInput signal: uncompressed PCM audio (e.g. WAV file), divided intoblocks (frames) of 1152 samples.Two processes start simultaneously.1a. The samples of the given frame are filtered into 32 equally spaced

frequency subbands. For a 44.1 kHz sampling rate each subband willbe approximately 22050/32 = 689 Hz wide.

1b. Fast Fourier transform: input signals are transformed from time do-main to frequency domain.

2a. Subband signals are windowed to reduce the artifacts caused by theedges of the time-limited signal segment. MPEG standard uses 4 win-dow types. After windowing, by applying a modified discrete cosinetransform (MDCT) each of the 32 subbands is split into 18 finer sub-bands resulting in a total of 576 frequency lines.

2b. Psychoacoustic model: models human sound perception. Provides in-formation about which part of the audio signal can be omitted due tomasking, which window types the MDCT should apply, how to quan-tize the different frequency lines.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 221 / 239

Page 209: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MPEG-2 Audio Layer III (MP3) compression, IIThe two parallel processes already being in the frequency domain arejoined.

3. Based on the information provided by the psychoacoustic model the576 frequency lines form 22 bands and the different bands are quan-tized with different scale factors. Masking also takes place here.

4. Huffman encoding: the quantized values are Huffman coded. Using aconstant bit rate (CBR) each block will have the same code length,whereas with variable bit rate (VBR), some blocks will have shortercodes and the unused bytes are passed to the next block.

5. Coding of side information: codes all parameters generated by the en-coder which should be transmitted to the decoder.

6. Multiplexer: generates the bit stream representing the 1152 encodedPCM samples. The frame header, CRC (cyclic redundancy code) co-deword (error correction), side information and Huffman coded frequ-ency lines are put together to form a transferable frame.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 222 / 239

Page 210: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

MP3 encoding scheme

Source: Rassol Raissi, The theory behind mp3. www.mp3-tech.orgSándor Baran Mathematics and Information Theory 2018/19, 2. sem. 223 / 239

Page 211: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Visual perceptionColor and brightness perception: cone cells (cones) and rod cells.Rod cells: role in peripheral vision, a key function in night vision.Cone cells: three types corresponding to three different light wavelengthranges: (s(λ): small; m(λ): medium; ℓ(λ): large).Tristimulus vector corresponding to light with spectral density L(λ):

(S,M, L)⊤ =

∫ (s(λ),m(λ), ℓ(λ)

)⊤L(λ)dλ.This response results in the brightnessand hueMetameric colors: colors with diffe-rent spectral distributions resulting inmatching S, M and L valuesOne cannot distinguish them.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 225 / 239

Page 212: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Color spaceSeparation of luminance and chromaticity:

(X,Y,Z)⊤ = M(S,M, L)⊤.Y: luminance (brightness); X and Z: chromaticity (hue and saturation).M: a linear transform (3 × 3 matrix) always resulting in a vector withnon-negative components.In practice, instead of (X,Y,Z)⊤ one uses

(Y, x, y)⊤, where x =X

X + Y + Z , y =Y

X + Y + Z .

Representation of chromaticity: (x, y) color space chromaticity diagram.If (xi, yi) corresponds to spectral density Li(λ), i = 1, 2, then combinationµ1L1(λ) + µ2L2(λ), µ1, µ2 > 0, will be represented by a point on thesection between (x1, y1) and (x2, y2).Monochromatic (spectrum) color: consists of a single wavelength of lightλ0. They form a curve giving the boundary of the horseshoe like chroma-ticity diagram of (x, y) values.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 226 / 239

Page 213: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Chromaticity diagram

International Commission on Illumination (CIE: Commission internationale de l’éclaira-ge), 1931.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 227 / 239

Page 214: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

RGB color spaceDefined by the three chromaticities of the red (R), green (G), and blue (B)additive primaries.Can produce any chromaticity in the triangle defined by the primary colors.(R′,G′,B′): ratio of primary colors on the integer scale 0 − 255 obtainedafter gamma correction.

Various RGB color spaces (left); color calibration of an LG 42LB731V smart TV (right).Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 228 / 239

Page 215: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

YCbCr color space(Y, x, y): separates luminance and chromaticity.Problem: human visual perception is not uniform in Y coordinate.Y′

CbCr

=

16128128

+1

256

65.728 129.057 25.064−37.945 −74.494 112.439112.439 −94.154 −18.285

R′

G′

B′

.

Y′: luma component. Grayscale copy of the image. Perceived uniformly.Cb, Cr: blue-difference and red-difference chroma components.All coordinates are on the integer scale 0 − 255.ITU-R BT.601 SDTV standard (formerly CCIR 601, 1982).

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 229 / 239

Page 216: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Graphics Interchange Format (GIF)Colors used in an image have their RGB values defined in a palette table.Data for the image refer to the colors by their indices in the table.GIF: a maximum of 8-bit palette (256 colors) from the 24-bit RGB colorspace (3 × 8 bit). CompuServe, 1987.Horizontal scan from top left, lossless LZW compression.Main applications: compression of icons, simple graphics. Properties:

Small number of colors.Lots of large, monocrome areas and repeated patterns. Can be effici-ently compressed using the LZW algorithm.

Problem: non-applicable for compression of photographs.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 230 / 239

Page 217: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Joint Photographic Experts Group (JPEG), ILossy compression (1993). Input image: 24-bit YCbCr color space.Image is split into 3 channels according to the coordinates. Each channelis compressed separately. As humans can see considerably more fine detailin the brightness of an image than in the hue and color saturation, the re-duction of spatial resolutions of Cb and Cr is allowed (downsampling). Ra-tios: 4 : 4 : 4 (no downsampling), 4 : 2 : 2 (horizontal reduction by a factor2), 4 : 2 : 0 (horizontal and vertical reduction by a factor 2).

1. Each channel is split into 8 × 8 blocks. If the image size is not a mul-tiple of 8, it is extended by repeating the last column/row.

2. Two dimensional DCT is applied to each 8 × 8 block in order to con-vert them into the frequency domain. Elements of the transformedblocks: harmonics corresponding to different frequencies. Upper leftcorner: low frequencies where human eye is more sensitive to differen-ces. (0, 0) entry: DC component (basic hue for the block). Differentblocks often have similar DC components.

3. DC component is compressed using delta coding with respect to theDC component of the preceding block.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 231 / 239

Page 218: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Joint Photographic Experts Group (JPEG), II

4. The various harmonics are quantized uniformly, however, using diffe-rent quantization steps. The more sensitive the human vision, thefiner quantization is used.Quantization matrix: entries specify the quantization steps. Dependson the compression rate. Higher compression rate requires larger valu-es. E.g. for a 50% compression the proposed quantization matrix:

Q =

16 11 10 16 24 40 51 6112 12 14 19 26 58 60 5514 13 16 24 40 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 35 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 102 100 103 99

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 232 / 239

Page 219: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Example

Input matrix :

187 130 113 31 19 125 69 112170 52 52 162 207 206 149 51188 129 48 160 228 15 185 9236 25 26 210 166 217 105 24638 149 210 189 198 200 45 30114 237 37 222 49 193 168 236186 115 251 183 197 22 43 8763 112 216 135 47 139 130 22

DCT matrix :

1021.7 16.6 −104.4 24.3 43.5 21.8 0.2 −57.2−40.4 −61.6 74.2 149.4 73.5 −4.0 −18.8 13.9−83.2 137.5 94.3 6.4 −85.3 67.3 56.8 82.129.1 47.7 74.0 −32.9 −56.8 −61.2 52.7 −100.2−86.7 −40.2 −46.0 −40.5 −114.0 −30.3 121.6 −42.8−13.0 −1.0 96.6 −76.6 113.9 −20.8 17.1 33.1−2.3 −20.4 157.5 −26.4 −49.9 7.5 −102.6 −72.7−25.4 198.6 −71.4 −27.9 −13.1 −16.5 −14.5 168.8

Quantized DCT :

1024 22 −100 32 48 40 0 −61−36 −60 70 152 78 0 0 0−84 143 96 0 −80 57 69 5628 51 66 −29 −51 −87 80 −124−90 −44 −37 −56 −136 0 103 −77−24 0 110 −64 81 0 0 00 0 156 0 0 0 −120 −1010 184 −95 0 0 0 0 198

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 233 / 239

Page 220: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Joint Photographic Experts Group (JPEG), III5. The differences of the DC components and the DCT values are rearr-

anged in a zigzag order.6. The obtained sequences are split into sequences (runs) starting with a

sequence of zeros followed by a single non-zero element. The run-length code of a run is

({n, s}, ν

).

n: number of zeros before the non-zero element;s: number of bits required to represent the non-zero element;ν: (signed) bit representation of the non-zero element.

7. Pairs {n, s} are encoded using either Huffman or arithmetic encoding,their codes are followed by the concatenation of bit representations ν.

Example.81, 0, 0, 0, 0, 0, −6, 0, 0, 0, −12, 0, 0, . . .Run-length code:

({0, 8}, 01010001

);(

{5, 4}, 1001);({3, 5}, 10011

).

Bit representations: 01010001100110011.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 234 / 239

Page 221: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

PropertiesStandards: ISO/IEC 10918, ITU-T T.81, T.83, T.84, T.86Efficient, if there are no contrasting edges. For high compression quantiza-tion results in artifacts caused by the noise around contrasting edges.

Source: http://www.gimp.org/tutorials/GIMP_Quickies/Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 235 / 239

Page 222: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Lossless JPEGExtension to the JPEG format. Joint Photographic Experts Group, 1993.Two-dimensional predictive encoding (DPCM).Horizontal scan from top left. Prediction Xi,j of the value Xi,j of pixel (i, j)using values Xi−1,j, Xi,j−1 and Xi−1,j−1.Eight different prediction schemes:

0 : Xi,j = 0; 4 : Xi,j = Xi−1,j + Xi,j−1 − Xi−1,j−1;

1 : Xi,j = Xi−1,j; 5 : Xi,j = Xi,j−1 +Xi−1,j − Xi−1,j−1

2 ;

2 : Xi,j = Xi,j−1; 6 : Xi,j = Xi−1,j +Xi,j−1 − Xi−1,j−1

2 ;

3 : Xi,j = Xi−1,j−1; 7 : Xi,j =Xi−1,j + Xi,j−1

2 .

Any one of the eight predictors can be used, however, the same for the en-tire image.Adaptive arithmetic or Huffman encoding.Compression rate in the predictive case (all schemes but 0): around 2 : 1.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 236 / 239

Page 223: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Moving Picture Experts Group (MPEG)Work group generating specifications for the International Organization forStandardization (ISO) and International Electrotechnical Commission(IEC). Standards: MPEG-1, MPEG-2, MPEG-4, MPEG-7, MPEG-21.Lossy video compression (1993). MPEG-1 video layers:

video sequence −→ group of pictures −→ frames −→makroblocks −→ blocks

Group of pictures (GOP): independently encoded unit.Three types of frames:

I-frame (intra frame): independent picture, JPEG compression;P-frame (predictive coded frame): encoded with the help of the pre-vious I- or P-frame;B-frame (bidirectionally predictive coded frame): encoded with thehelp of the previous/subsequent I- or P-frame.

Macroblock: a set of 6 blocks with a resolution of 16 × 16 pixels. Blockresolution: 8 × 8 pixels. 4 luma (Y′) and a pair of downsampled croma(Cr,Cb) blocks. P- and B-frames are encoded by macroblocks.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 237 / 239

Page 224: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

Frame reorderingDuring encoding B-frames moveforward to be encoded after theneighbouring I- and P-frames.Simplifies buffering.

Source and display order: 0 1 2 3 4 5 6 7 8 9Frame type: I B B P B B P B B IOrder in the coded bit stream: 0 2 3 1 5 6 4 8 9 7

Typical pattern: two P-frames are encoded using a single I-frame and ha-ving two B-frames between them.I-frames: encoded independently. High-speed seeking through an MPEG-1video is only possible to the nearest I-frame.Compression of P- and B-frames is based on macroblock motion estimati-on.

Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 238 / 239

Page 225: Mathematics and Information Theory for Engineers [5mm] …Thomas M. Cover and Joy A. Thomas: Elements of Information Theory. Wiley, 2006. Roberto Togneri and Christopher J. S. de Silva:

PredictionsP-frames: to each macroblock the encoder finds the most matching mac-roblock of the previous I- or P-frame (reference macroblock). Only themotion vector (distance and direction) and the difference from the referen-ce block (residual error) is encoded.Motion vector: Huffman code; error: JPEG-like encoding. No referenceblock: JPEG encoding.B-frames: searching for matches in the previous or subsequent I- or P-fra-mes. If the match is bidirectional, the error with respect to the mean ofmatching frames and the two motion vectors will be encoded. For a unidi-rectional match the algorithm is similar to the compression of P-frames.MPEG-1 compression for a 356 × 260 resolution and 24-bit color space:

Type Size RateI 18 Kb 7 : 1P 6 Kb 20 : 1B 2.5 Kb 50 : 1

Average 4.6 Kb 27 : 1

Video bit rate for 30 frame/s: 1.2 Mbit/s; with audio: 1.45 Mbit/s.Sándor Baran Mathematics and Information Theory 2018/19, 2. sem. 239 / 239