slides it2 ss2012

238
Lecture Course: Information Theory II Marius Pesavento Communication Systems Group Institute of Telecommunications Technische Universit¨ at Darmstadt 19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 1 NTS

Upload: kostasntougias5453

Post on 19-Jul-2016

27 views

Category:

Documents


3 download

DESCRIPTION

Information Theory

TRANSCRIPT

Page 1: Slides IT2 SS2012

Lecture Course: Information Theory II

Marius Pesavento

Communication Systems Group

Institute of Telecommunications

Technische Universitat Darmstadt

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 1 NTS

Page 2: Slides IT2 SS2012

COURSE ORGANIZATION

I Instructors: Dr.-Ing. Marius Pesavento, S3/06/204,

[email protected] FG Nachrichtentechnische Systeme

(NTS),

I Teaching assistant: Yong Cheng, S3/06/205, e-mail:

[email protected]

I Website: http://www.nts.tu-darmstadt.de/

I Lecture notes and slides will be posted in TUCAN

I Office hours: on request (please send an e-mail to the TA or instructor)

I Written final exam (closed-book)

I Examination date (presumably) Tuesday July 31, 2012: 12.00 - 14.00

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 2 NTS

Page 3: Slides IT2 SS2012

RECOMMENDED TEXTBOOKS

1. D. Tse, Fundamentals of Wireless Communication, Cambridge University

Press, 2005. (main reference)

2. A. El Gamal and Y.H. Kim Network Information Theory, Cambridge

University Press, 2012.

3. A. Goldsmith, Wireless Communications, Cambridge University Press, 2005.

4. T. M. Cover and J A. Thomas, Elements of Information Theory, John Wiley

& Sons, 1991.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 3 NTS

Page 4: Slides IT2 SS2012

COURSE OUTLINE

I Overview of basics of information theoryI Entropy, mutual information, capacityI Source coding and channel coding theoremI Memoryless Gaussian channel

I Multi-antenna channel capacity, water-fillingI Basic theory of network information theory

I Multi-access channelsI Broadcast channelsI Relay channels

I Cyclic codesI Convolutional codesI Turbo-codes

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 4 NTS

Page 5: Slides IT2 SS2012

Topics of the earlier basic IT course

I Information, entropy, mutual information, and their derivatives

I Basic theory of source coding, Shannon’s source coding theorem, Huffman

coding, Lempel-Ziv coding

I Channel capacity, Shannon’s channel coding theorem, Gaussian channel,

bandlimited channel, Shannon’s limit, multiple Gaussian channels, multiple

colored noise channels, water-filling, ergodic and outage capacities, basics of

MIMO channels,

I Basic theory of channel coding, linear block coding, Reed-Muller codes, Golay

code

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 5 NTS

Page 6: Slides IT2 SS2012

REVIEW OF PROBABILITY THEORY:CDF AND PDF

Let X be a continuous random variable with the cumulative density function (cdf)

FX (x) = Probability{X ≤ x} = P(X ≤ x)

Probability density function (pdf):

fX (x) =∂FX (x)

∂x

where

FX (x0) =

∫ x0

−∞fX (x) dx

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 6 NTS

Page 7: Slides IT2 SS2012

NORMALIZATION PROPERTY OF CDFs

Since FX (∞) = 1, we obtain the so-called normalization property∫ ∞−∞

fX (x) dx = 1

Simple interpretation:

fX (x) = lim∆→0P{x −∆/2 ≤ X ≤ x + ∆/2}

f (x)X

x1 x2

SURFACE Probability{x1<X< x2}=

x

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 7 NTS

Page 8: Slides IT2 SS2012

EXAMPLE 1

Let the real-valued random variable X be uniformly distributed in the interval

[0, T ].

F (x)X

f (x)X

0

1

0 T

T

1/T

x

x19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 8 NTS

Page 9: Slides IT2 SS2012

EXAMPLE 2

Let the real-valued random variable X has the so-called Gaussian (normal)

distribution

fX (x) =1√

2πσ2X

e−(x−µX )2/2σ2X

where σ2X = var{X} is the variance and µX is the mean. The corresponding

distribution function is given by

FX (x) =1√

2πσ2X

∫ x

−∞e−(ξ−µX )2/2σ2

X dξ

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 9 NTS

Page 10: Slides IT2 SS2012

CDF AND PDF OF GAUSSIAN RANDOMVARIABLE

F (x)X

f (x)X

0

0 x

x

1

(2πσX2)−1/2

σ2 X

µX

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 10 NTS

Page 11: Slides IT2 SS2012

PROBABILITY MASS FUNCTION

Let X now be a discrete random variable which takes the values xi (i = 1, ... , I )

with the probabilities P(xi ) (i = 1, ... , I ), respectively.

For discrete variables, we define the probability mass function

P(xi ) = Probability(X = xi )

The normalization condition:I∑

i=1

P(xi ) = 1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 11 NTS

Page 12: Slides IT2 SS2012

EXTENSION TO DISCRETE VARIABLES

How to extend the concepts of pdf and cdf to discrete variables?

Define the unit step function as

u(x) =

{0 , x < 0

1 , x ≥ 0

Define the Dirac delta-function as

δ(x) =

{∞ , x = 0

0 , x 6= 0,

∞∫−∞

δ(x) dx = 1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 12 NTS

Page 13: Slides IT2 SS2012

EXTENSION TO DISCRETE VARIABLES

Relationships between the delta-function and unit step function:∫ x

−∞δ(ξ) dξ = u(x) , δ(x) =

∂u(x)

∂x

Shifting property of delta-function:∫ ∞−∞

g(x) δ(x − y) dx = g(y)

Using the definition of the unit step function, we can express the cdf as

FX (x) =I∑

i=1

P(xi )u(x − xi )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 13 NTS

Page 14: Slides IT2 SS2012

EXTENSION TO DISCRETE VARIABLE

Then, the pdf can be expressed as

fX (x) =I∑

i=1

P(xi )δ(x − xi )

Using the delta-function sifting property, we have∫ ∞−∞

fX (x) dx =

∫ ∞−∞

I∑i=1

P(xi )δ(x − xi ) dx

=I∑

i=1

∫ ∞−∞

P(xi )δ(x − xi ) dx =I∑

i=1

P(xi ) = 1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 14 NTS

Page 15: Slides IT2 SS2012

EXAMPLE 1

Let the random variable X be an outcome of the coin tossing experiment.

0

0.5

1.0

F (x)X

f (x)X

x

x1

0 1

0.5 0.5( ) ( )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 15 NTS

Page 16: Slides IT2 SS2012

EXAMPLE 2

Let the random variable X be an outcome of the die throwing experiment.

F (x)X

f (x)X

6

6

0

0 x

x

1/6( )

1

1/6

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 16 NTS

Page 17: Slides IT2 SS2012

STATISTICAL EXPECTATION

Expected value (mean) of a continuous random variable:

µX = E{X} =

∫ ∞−∞

x fX (x) dx

For a discrete random variable:

µX = E{X} =

∫ ∞−∞

xfX (x) dx =

∫ ∞−∞

xI∑

i=1

P(xi )δ(x − xi ) dx

=I∑

i=1

∫ ∞−∞

xP(xi )δ(x − xi ) dx =I∑

i=1

xiP(xi )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 17 NTS

Page 18: Slides IT2 SS2012

STATISTICAL EXPECTATION

We can also compute expected value of a function of continuous random variable:

E{g(X )} =

∫ ∞−∞

g(x) fX (x) dx

For a discrete random variable:

E{g(X )} =I∑

i=1

g(xi ) P(xi )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 18 NTS

Page 19: Slides IT2 SS2012

VARIANCE OF A RANDOM VARIABLE

var{X} = E{(X − E{X})2}= E{X 2} − E{X}2 = σ2

X

where σX is commonly called standard deviation.

The variance and standard deviation can be interpreted as measures of the

statistical dispersion of a random variable w.r.t. the expected value.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 19 NTS

Page 20: Slides IT2 SS2012

EXAMPLE

Compute the mean and variance of the random variable uniformly distributed in

the interval [0, 1]

f (x)X

0 x1

1

µX =

∫ 1

0

x dx =x2

2

∣∣∣∣∣1

0

=1

2

σ2X =

∫ 1

0

x2 dx − µ2X =

x3

3

∣∣∣∣∣1

0

− 1

4=

1

3− 1

4=

1

12

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 20 NTS

Page 21: Slides IT2 SS2012

JOINT DISTRIBUTION

Let us now consider two random variables X and Y jointly.

Joint distribution function:

FX ,Y (x , y) = P(X ≤ x , Y ≤ y)

Joint pdf:

fX ,Y (x , y) =∂2FX ,Y (x , y)

∂x ∂y

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 21 NTS

Page 22: Slides IT2 SS2012

JOINT DISTRIBUTION

The inverse relationship:

FX ,Y (x0, y0) =

∫ x0

−∞

∫ y0

−∞fX ,Y (x , y) dx dy

Any pdf satisfies the following normalization property:∫ ∞−∞

∫ ∞−∞

fX ,Y (x , y) dx dy = 1

Also, ∫ ∞−∞

fX ,Y (x , y) dx = fY (y) ,

∫ ∞−∞

fX ,Y (x , y) dy = fX (x)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 22 NTS

Page 23: Slides IT2 SS2012

CONDITIONAL DISTRIBUTION

In practical problems, we are often interested in the pdf of one random variable X

conditioned by the fact that a second random variable Y has some specific value

y . It is obvious that

P(X ≤ x ; Y ≤ y) = P(X ≤ x |Y ≤ y)P(Y ≤ y)

Then, conditional cdf is defined as

FX (x |y) = P(X ≤ x |Y ≤ y) =FX ,Y (x , y)

FY (y)

From symmetry, it also follows that

FY (y |x) =FX ,Y (x , y)

FX (x)19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 23 NTS

Page 24: Slides IT2 SS2012

CONDITIONAL DISTRIBUTION

fX (x |y) =fX ,Y (x , y)

fY (y)

fY (y |x) =fX ,Y (x , y)

fX (x)

From the last two equations, we obtain the Bayes rule

fX (x |y) fY (y) = fY (y |x) fX (x)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 24 NTS

Page 25: Slides IT2 SS2012

NORMALIZATION CONDITION

∫ ∞−∞

fX (x |y) dx =

∫ ∞−∞

fX ,Y (x , y)

fY (y)dx

=1

fY (y)

∫ ∞−∞

fX ,Y (x , y) dx = 1

Conditional expectation:

E{g(X )|y} =

∫ ∞−∞

g(x)fX (x |y) dx

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 25 NTS

Page 26: Slides IT2 SS2012

STATISTICAL INDEPENDENCE

Two random variables X and Y are statistically independent if

fX ,Y (x , y) = fX (x)fY (y)

Substituting this equation to the conditional pdf, we obtain that statistical

independence implies

fX (x |y) = fX (x)

That is, the variable Y does not have any influence on the variable X .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 26 NTS

Page 27: Slides IT2 SS2012

EXAMPLE

Let

fX ,Y (x , y) =

{4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

0 , otherwise

Are the variables X and Y statistically dependent?

fX (x) =

∫ ∞−∞

fX ,Y (x , y) dy = 4x

∫ 1

0

y dy = 4xy 2

2

∣∣∣∣∣1

0

=

{2x , 0 ≤ x ≤ 1,

0 , otherwise

fY (y) =

{2y , 0 ≤ y ≤ 1,

0 , otherwise

fX ,Y (x , y) = fX (x)fY (y) and, hence, the variables are independent!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 27 NTS

Page 28: Slides IT2 SS2012

CORRELATION AND COVARIANCE

Two fundamental characteristics of linear statistical dependence are correlation

rXY = E{XY }

and covariance

cov{X , Y } = E{(X − E{X})(Y − E{Y })}= E{XY } − E{X}E{Y }= E{XY } − µXµY

For X = Y , covariance boils down to variance:

cov{X , X} = E{X 2} − µ2X = var{X}

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 28 NTS

Page 29: Slides IT2 SS2012

SOME USEFUL PROPERTIES

I var{X + Y } = var{X}+ var{Y }+ 2 cov{X , Y }.I If the variables X and Y are statistically independent then for any functions h

and g , E{h(X )g(Y )} = E{h(X )}E{g(Y )}.I If the variables X and Y are statistically independent then cov{X , Y } = 0.

Therefore, covariance is sometimes used a measure of statistical dependence.

However, the reverse statement is not necessarily true!

I If the variables X and Y are statistically independent then

var{X + Y } = var{X}+ var{Y }.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 29 NTS

Page 30: Slides IT2 SS2012

EXTENSION TOMULTIVARIAT DISTRIBUTIONS

We may also consider multiple (more than two) random variables X1, ... , Xn.

Joint distribution function:

FX1,X2,...,Xn(x1, x2, ... , xn) = P(X1 ≤ x1, X2 ≤ x2, ... , Xn ≤ xn)

Joint pdf:

fX1,X2,...,Xn(x1, x2, ... , xn) =∂nFX1,X2,...,Xn(x1, x2, ... , xn)

∂x1 ∂x2 · · · ∂xn

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 30 NTS

Page 31: Slides IT2 SS2012

MULTIVARIAT DISTRIBUTIONS

Introducing vectors

X = [X1, X2, ... , Xn]T

x = [x1, x2, ... , xn]T

we rewrite the previous equations in symbolic (vector) notation as

FX(x) = P(X ≤ x)

fX(x) =∂NFX(x)

∂x1 ∂x2 · · · ∂xn

Normalization condition: ∫ ∞−∞· · ·∫ ∞−∞

fX(x) dx = 1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 31 NTS

Page 32: Slides IT2 SS2012

MULTIVARIAT DISTRIBUTIONS

Statistical expectation can be defined as

E{g(X)} =

∫ ∞−∞· · ·∫ ∞−∞

g(x) fX(x) dx

where g(X) is some function of the random vector X.

In particular bivariate case

E{g(X , Y )} =

∫ ∞−∞

∫ ∞−∞

g(x , y) fX ,Y (x , y) dx dy

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 32 NTS

Page 33: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONS

Jointly Gaussian random variables have the following joint multivariate pdf:

fX(x) =1

(√

2π)ndet{R}1/2e−

12 (x−µX)T R−1(x−µX)

where the mean

µX = E{X}

and the covariance matrix

R = E{(X− E{X})(X− E{X})T} = E{XXT} − µXµTX

In symbolic notation

X ∼ N (µX, R)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 33 NTS

Page 34: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTION

In the case of a single (n = 1) random variable X = X1, the n-variate Gaussian

pdf reduces to

pX (x) =1√

2πσ2X

e−(x−µX )2/2σ2X

which is the well-known Gaussian pdf.

In the case of two (N = 2) random variables X = X1 and Y = X2, we have that

R =

[σ2X ρ σXσY

ρ σXσY σ2Y

], ρ =

E{(X − µX )(Y − µY )}σXσY

Note that ρ = ρXY is nothing else as correlation coefficient.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 34 NTS

Page 35: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTION

The determinant of R is given by

det{R} = σ2Xσ

2Y (1− ρ2)

and, therefore, the n-variate pdf reduces to the so-called bivariate pdf

fXY (x , y) =1

2πσXσY√

1− ρ2

· exp

{− 1

2(1− ρ2)

[(x−µX )2

σ2X

− 2ρ(x−µX )(y−µY )

σXσY+

(y−µY )2

σ2Y

]}The maximum of this function is located in the point {x = µX ; y = µY } and the

maximal value is

max {fX ,Y (x , y)} = fX ,Y (µX ,µY ) =1

2πσXσY√

1− ρ2

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 35 NTS

Page 36: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTION

In the case of uncorrelated X and Y , ρ = 0 and we have

fXY (x , y) =1

2πσXσYexp

{−1

2

[(x−µX )2

σ2X

+(y−µY )2

σ2Y

]}

=

(1√

2πσXe−(x−µX )2/2σ2

X

)(1√

2πσYe−(y−µY )2/2σ2

Y

)= fX (x) fY (y)

i.e., the variables X and Y become statistically independent. This is a very

important result showing that any uncorrelated Gaussian random variables are also

statistically independent! Note that in the case of non-Gaussian random variables,

this is not true.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 36 NTS

Page 37: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

Contour plots of the bivariate Gaussian pdf with the parameters µX = µY = 0 and

σX = σY = 1.

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 37 NTS

Page 38: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.25

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 38 NTS

Page 39: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.5

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 39 NTS

Page 40: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.75

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 40 NTS

Page 41: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.95

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 41 NTS

Page 42: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.25

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 42 NTS

Page 43: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.5

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 43 NTS

Page 44: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.75

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 44 NTS

Page 45: Slides IT2 SS2012

MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

x

y

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.95

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 45 NTS

Page 46: Slides IT2 SS2012

BASICS OF INFORMATION THEORY

Shannon: Information is the resolution of uncertainty about some statistical event:

I Before the event occurs, there is an amount of uncertainty.

I After the occurrence of the event, there is no uncertainty anymore, but there

is gain in the amount of information.

I Highly expected messages deliver small amount of information, while highly

unexpected ones deliver a large amount of information.Hence, the amount of

information should be inversely proportional to the probability of the message.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 46 NTS

Page 47: Slides IT2 SS2012

Information and entropy

The amount of information of the symbol x with the probability P(x):

I (x) = log

(1

P(x)

)= −log(P(x)) with [I (x)] = Bit

Considering a source with the alphabet X = {x1, ... , xN}, entropy is defined as the

statistically averaged amount of information (mean of I (X )):

H(X ) = E{I (X )} = E{−log(P(X ))}

=N∑i=1

−P(xi ) log(P(xi ))

=N∑i=1

P(xi ) log

(1

P(xi )

)with [H(X )] = Bit/Symbol

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 47 NTS

Page 48: Slides IT2 SS2012

Example

Entropy of non-symmetric binary source with the probabilities P(0) = p and

P(1) = 1− p

HB(X ) = −p log(p)− (1− p) log(1− p)

maximale Ungewissheit

0 10.5

1

p

H (X)B , Bit/Zeichen

I The entropy characterizes the source uncertainty.

I The entropy is a concave function of probability.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 48 NTS

Page 49: Slides IT2 SS2012

SOME “DERIVATIVES” OF ENTROPYJoint Entropy

The definition of entropy can be extended to a pair of random variables X and Y

(two discrete sources X = {x1, ... , xN} and Y = {y1, ... , yM}).

The joint entropy H(X , Y ) is defined as:

H(X , Y ) = −E{log(P(X , Y ))}

= −N∑i=1

M∑l=1

P(xi , yl) log(P(xi , yl))

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 49 NTS

Page 50: Slides IT2 SS2012

Conditional Entropy

The conditional entropy H(Y |X ) is the amount of uncertainty remaining about

the random variable Y after the random variable X has been observed:

H(Y |X ) = −EX ,Y {log(P(Y |X ))}

= −N∑i=1

M∑l=1

P(xi , yl) log(P(yl |xi ))

= −N∑i=1

P(xi )M∑l=1

P(yl |xi ) log(P(yl |xi ))

where we use the Bayes rule

P(xi , yl) = P(xi |yl)P(yl) = P(yl |xi )P(xi )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 50 NTS

Page 51: Slides IT2 SS2012

Useful properties

Important conditional entropy property:

H(X , Y ) = H(X ) + H(Y |X )

Hence the entropy, conditional entropy, and joint entropy are related quantities.

Another important property: Conditioning reduces entropy:

H(X |Y ) ≤ H(X )

with the equality if and only if X and Y are statistically independent.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 51 NTS

Page 52: Slides IT2 SS2012

Mutual information

Let us consider two random variables (sources). The amount of information

exchanged between two symbols xi und yl can be defined as:

I (xi ; yl) = log

(P(xi |yl)

P(xi )

)= log

(P(xi , yl)

P(xi )P(yl)

)with [I (xi ; yl)] = Bit

where we again use the Bayes rule P(xi , yl) = P(xi |yl)P(yl).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 52 NTS

Page 53: Slides IT2 SS2012

Mutual information

The amount of mutual information exchanged between two sources X and Y can

be obtained by averaging of I (xi ; yl) as

I (X ; Y ) =N∑i=1

M∑l=1

P(xi , yl) log

(P(xi |yl)

P(xi )

)

=N∑i=1

M∑l=1

P(xi , yl) log

(P(xi , yl)

P(xi )P(yl)

)with [I (X ; Y )] = Bit/Symbol

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 53 NTS

Page 54: Slides IT2 SS2012

Mutual information

Mutual information is the reduction in the uncertainty of X due to the knowledge

of Y :

I (X ; Y ) = H(X )− H(X |Y )

Relation of mutual information to entropies and joint entropy:

I (X ; Y ) = H(X ) + H(Y )− H(X , Y )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 54 NTS

Page 55: Slides IT2 SS2012

Channel capacity

The input probabilities P(xi ) are independent of the channel. We can then

maximize the mutual information I (X ; Y ) w.r.t. P(xi ). The channel capacity can

be then defined as the maximum mutual information in any single use of the

channel, where the maximization is over P(xi ) (i = 1, ... , M):

C = max{P(xi )}

I (X ; Y ) with [C ] = Bits/Symbol

or bits per channel use (bpcu).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 55 NTS

Page 56: Slides IT2 SS2012

Example

Channel capacity of a binary symmetric channel

x y

x y2

1

2

p

p

p1-

p1-

1

CB = 1 + p logp + (1− p) log(1− p)

= 1− HB(X )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 56 NTS

Page 57: Slides IT2 SS2012

Entropy/capacity of a binary symmetric channel

0

1

H (X)

10.5 10.50

1

BCB , Bit/Zeichen , Bit/Zeichen

p p

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 57 NTS

Page 58: Slides IT2 SS2012

Channel coding/decoding

The inevitable presence of noise in a channel causes errors between the output

and input data sequences of a digital communication system. To reduce these

errors we will resort to channel coding.

Channel encoder maps the incoming source data into a channel input sequence. It

adds redundancy to these data to protect it from errors.

Channel decoder inversely maps the channel output sequence into an output data

sequence in a way that the overall effect of the channel noise on the system is

minimized.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 58 NTS

Page 59: Slides IT2 SS2012

Shannon’s Channel-Coding Theorem

Let information be transmitted through a discrete memoryless channel of capacity

C . If the transmission rate

R < C

then there exists a channel coding scheme for which the source output can be

transmitted over the channel with an arbitrarily small probability of error.

Conversely, if

R ≥ C

than it is impossible to transmit information over the channel with an arbitrary

small probability of error.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 59 NTS

Page 60: Slides IT2 SS2012

Joint source-channel coding theorem

If

H(X ) > C

then it is impossible to transmit the source outputs over the channel with an

arbitrary small probability of error.

The latter theorem follows from the direct combination of source-coding and

channel-coding theorems.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 60 NTS

Page 61: Slides IT2 SS2012

Continuous sources

The mutual information between two continuous random sources X and Y with

the joint symbol pdf fX ,Y (x , y) is given by

I (X ; Y ) =

∫ ∞−∞

∫ ∞−∞

fX ,Y (x , y) log

(fX (x |y)

fX (x)

)dx dy

=

∫ ∞−∞

∫ ∞−∞

fX ,Y (x , y) log

(fX ,Y (x , y)

fX (x)fY (y)

)dx dy

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 61 NTS

Page 62: Slides IT2 SS2012

What is the relationship between the discrete and continuous mutual information?

It can be shown that the definitions of mutual information in the continuous and

discrete cases are essentially similar.

This property enables to use the continuous mutual information to define the

capacity in the case of continuously distributed (infinite alphabet) sources.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 62 NTS

Page 63: Slides IT2 SS2012

Continuous-time bandlimited channel

Consider a continuous-time bandlimited channel with additive Gaussian white

noise (AGWN). The output of such AWGN channel can be described as

Y (t) = (X (t) + Z (t)) ∗ h(t)

where X (t) and Z (t) are the signal and noise waveforms, respectively, and h(t) is

the impulse response of an ideal bandpass filter with the cutoff frequency B.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 63 NTS

Page 64: Slides IT2 SS2012

+

n(t)- B B f

H(f)

idealer TiefpassBBandbreite

AWGN

Bandbegrentzter AWGN-Kanal

f

S (f)

0

NN /2

Leistungsdichtespectrumdes Rauschens

x(t) y(t)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 64 NTS

Page 65: Slides IT2 SS2012

Capacity

Capacity of the bandlimited channel:

C = B log

(1 +

P

N0B

)bits per second

where it is taken into account that PN = N0B.

Shannon’s bound:

C∞ = limB→∞

B log

(1 +

P

N0B

)= loge · P

N0' 1.44

P

N0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 65 NTS

Page 66: Slides IT2 SS2012

Parallel AWGN channels

Consider multiple parallel AWGN channels

Yi = Xi + Zi , i = 1, ... , K

with a common power constraint

E

{K∑i=1

X 2i

}=

K∑i=1

E{X 2i } =

K∑i=1

Pi ≤ P

where Zi ∼ N (0, PN,i ), the noise is statistically independent from channel to

channel, and Pi = E{X 2i }.

How to distribute the power P among the channels to maximize the total

capacity?

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 66 NTS

Page 67: Slides IT2 SS2012

Water-filling

Result (water-filling): The total capacity is maximized when

Pi = (ν − PN,i )+

where the value of ν is chosen that

K∑i=1

Pi =K∑i=1

(ν − PN,i )+ ≤ P

and (·)+ denotes the positive part, i.e., for any x ,

(x)+ ,

{x , if x ≥ 0

0, if x < 0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 67 NTS

Page 68: Slides IT2 SS2012

Water-filling

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 68 NTS

Page 69: Slides IT2 SS2012

SKETCH OF THE PROOF

The mutual information of a system with multiple Gaussian channels can be

shown to be upper-bounded by the value

1

2

K∑i=1

log

(1 +

Pi

PN,i

)

Equality is achieved when X = [X1, X2, ... , XK ]T is Gaussian vector:

X ∼ N (0, P)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 69 NTS

Page 70: Slides IT2 SS2012

SKETCH OF THE PROOF

The covariance matrix

P =

P1 0 · · · 0

0 P2 · · · 0...

.... . .

...

0 0 · · · PK

= diag{P1, ... , PK}

Hence, the capacity of multiple Gaussian channels is given by

C =1

2

K∑i=1

log

(1 +

Pi

PN,i

)

Let us now maximize C over {Pi}Ki=1 subject to the constraints∑K

i=1 Pi = P. and

Pi ≥ 0, for i = 1, ... , K .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 70 NTS

Page 71: Slides IT2 SS2012

SKETCH OF THE PROOF

We use the Lagrange multiplier method. The Lagrangian function can be written

as

L(P1, ... , PK ) =1

2

K∑i=1

log

(1 +

Pi

PN,i

)+ λ0(P −

K∑i=1

Pi ) +K∑i=1

λiPi

where λ0, ... ,λK are the Lagrange multipliers. Differentiating L(P1, ... , PK ) w.r.t.

Pi , we have

∂L∂Pi

=∂

∂Pi

(1

2log e ln

(1 +

Pi

PN,i

)+ λ0(P −

K∑i=1

Pi ) +K∑i=1

λiPi

)

=1

2log e

1/PN,i

1 + Pi/PN,i− λ0 + λi

=log e

2

1

Pi + PN,i− λ0 + λi

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 71 NTS

Page 72: Slides IT2 SS2012

SKETCH OF THE PROOF

From the so-called Karnush-Kuhn-Tucker (KKT) conditions for constraint convex

optimization problems:

K∑i=1

P?i = P ; P?

i ≥ 0 (constraint satisfaction)

log e

2

1

P?i + PN,i

− λ?0 + λ?i = 0 (zero gradient)

λ?i P?i = 0 (complementary slackness)

λ?i ≥ 0; i = 1, ... , K (for inequality constraints)

Thus P?i ≥ 0 and

∑Ki=1 P?

i = P as well as

P?i (λ?0 −

log e

2

1

P?i + PN,i

) and λ?0 ≥log e

2

1

P?i + PN,i

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 72 NTS

Page 73: Slides IT2 SS2012

SKETCH OF THE PROOF

from KKT: P?i ≥ 0 and

∑Ki=1 P?

i = P as well as

P?i (λ?0 −

log e

2

1

P?i + PN,i

) = 0 and λ?0 ≥log e

2

1

P?i + PN,i

Thus if

λ?0 <log e

2

1

PN,i,

then from the last equation we have P?i > 0 which by slackness conditions implies

that

λ?0 =log e

2

1

P?i + PN,i

.

and thus for ν? = log e/(2λ?0)

P?i = (ν? − PN,i ) .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 73 NTS

Page 74: Slides IT2 SS2012

SKETCH OF THE PROOF

from KKT: P?i ≥ 0 and

∑Ki=1 P?

i = P as well as

P?i (λ?0 −

log e

2

1

P?i + PN,i

) = 0 and λ?0 ≥log e

2

1

P?i + PN,i

Reversely if

λ?0 ≥log e

2

1

PN,i,

then P?i > 0 is impossible as it would imply that

λ?0 ≥log e

2

1

P?i

>log e

2

1

P?i + PN,i

,

which violates the complementary slackness condition. We conclude that for

PN,i ≤ ν? we have P?i = 0 and P?

i = (ν? − PN,i ) otherwise.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 74 NTS

Page 75: Slides IT2 SS2012

EXTENDED DEFINITIONS OF CAPACITYErgodic capacity

Ergodic capacity: In the case of random Gaussian channel, it is sometimes more

useful to separate the effects of the transmitted signal and the channel as

Y (i) = X (i)H(i) + Z (i)

where H(i) is the channel gain in the ith channel use. In contrast to noise and

signal waveforms, the channel gain is usually treated as non-random

(deterministic) value.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 75 NTS

Page 76: Slides IT2 SS2012

Ergodic capacity

For this model,

P = E{X 2}

can be interpreted as the transmitted signal power, whereas

P = E{(XH)2} = E{X 2}H2 = PH2

can be interpreted as the received signal power.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 76 NTS

Page 77: Slides IT2 SS2012

Ergodic capacity

In this case, the capacity formula reads

C =1

2log

(1 +

PH2

PN

)Note that the conventional capacity is instantaneous, that is, it characterizes the

maximal achievable rate for particular given realization of the gain H of the

channel.

How can we characterize the maximal achievable rate in average rather than for

some particular channel gain?

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 77 NTS

Page 78: Slides IT2 SS2012

Ergodic capacity

In practice, wireless channels are random and, therefore, should be treated as

random.

Based on this fact, the ergodic capacity is defined as the instantaneous capacity C

averaged over the channel realizations:

CE = EH{C}

where EH{·} denotes statistical expectation over the random channel gain.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 78 NTS

Page 79: Slides IT2 SS2012

Ergodic capacity

Assume that we know the channel gain pdf fH(h). In this case, we can compute

the ergodic capacity as

CE =

∫ ∞−∞

fH(h) C (h) dh

Ergodic capacity provides another look at the achievable transmission rate as

compared to the conventional instantaneous capacity, because it gives the average

rather than the instantaneous picture.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 79 NTS

Page 80: Slides IT2 SS2012

Outage

Outage capacity: the transmission rate Cpout which does not exceed the

instantaneous capacity C in pout × 100 percents of channel realizations.

The quantity pout is called outage probability.

Outage is defined as the event where, for some particular channel realization, the

chosen transmission rate is higher than the instantaneous capacity (that is, where

no error-free transmission is possible).

In the cases of small pout (roughly speaking, pout ≤ 0.1), outage-induced errors

can be cured by means of channel coding.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 80 NTS

Page 81: Slides IT2 SS2012

Outage

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 81 NTS

Page 82: Slides IT2 SS2012

Outage

The outage capacity can be characterized as follows. Let the pdf of the

instantaneous capacity C = C (H) be fC (c) where fC (c) = 0 for c < 0. Then, the

outage capacity is defined by the equation

p = P(C < Cpout ) =

∫ Cpout

0

fC (c) dc

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 82 NTS

Page 83: Slides IT2 SS2012

Channel coding

Channel encoding and decoding is used to correct errors that may occur during

the signal transmission over the channel.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 83 NTS

Page 84: Slides IT2 SS2012

Linear block codes

Linear binary block codes: coding/decoding operations can be described using

linear algebra. Binary codes use modulo-2 arithmetic.

A code is said to be linear if the modulo-2 sum of any two codewords in the code

give another codeword of this code.

A code is denoted as (n, k) linear block code if n is the total number of bits of the

code, and k is the number of bits containing the message.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 84 NTS

Page 85: Slides IT2 SS2012

Linear block codes

Row-vector notations

m = [m1, ... , mk ]

b = [b1, ... , bn−k ]

c = [b1, ... , bn−k , m1, ... , mk ] = [b, m]

Block codes use the message bits to generate parity-check bits according to the

equation:

b = mP

where P is the k × (n − k) coefficient matrix. Noting that c = [b, m], we get that

c = [b, m] = [mP, m] = m[P, Ik ] = mG

where G is the k × n generator matrix.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 85 NTS

Page 86: Slides IT2 SS2012

Hamming codes

Hamming codes, a family of codes with

n = 2m − 1

k = 2m −m − 1, n − k = m

(7,4) Hamming code (n = 7, m = 3, k = 4) generator matrix:

G = [P, I4] =

1 1 0 1 0 0 0

0 1 1 0 1 0 0

1 1 1 0 0 1 0

1 0 1 0 0 0 1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 86 NTS

Page 87: Slides IT2 SS2012

Message Codeword Hamming Weight

0000 0000000 0

0001 1010001 3

0010 1110010 4

0011 0100011 3

0100 0110100 3

0101 1100101 4

... and so on ... ...

For the given Hamming code, dmin = 3. Therefore, it is a single-error correcting

code.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 87 NTS

Page 88: Slides IT2 SS2012

MULTI-ANTENNA CHANNELS

Consider a multiple-input multiple-output (MIMO) channel:

Tx Rx

N Mantennas antennaschannel

In the frequency flat fading case, the signal in the mth receive antenna

Ym(t) =N∑

n=1

Hmn(t)Xn(t) + Zm(t), m = 1, ... , M

where Hmn is the channel coefficient between the mth receive and nth transmit

antennas, Xn is the signal sent from the nth transmit antenna, and Zm is the noise

in the mth receive antenna.19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 88 NTS

Page 89: Slides IT2 SS2012

MIMO channel

Defining the M × N channel matrix

H =

H11 H12 · · · H1N

H21 H22 · · · H2N

......

. . ....

HM1 HM2 · · · HMN

and the transmit signal, receive signal, and noise column-vectors

x = [X1, ... , XN ]T , y = [Y1 ... , YM ]T , z = [Z1, ... , ZM ]T

we can write the system input-output relationship in the matrix form as:

y = Hx + z

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 89 NTS

Page 90: Slides IT2 SS2012

SIMO channel

One particular case of the MIMO channel is a single-input multiple-output

(SIMO) channel:

Tx Rx

N1 antenna antennas

In the frequency flat fading case, the signal in the nth receive antenna

Yn(t) = Hn(t)X (t) + Zn(t), n = 1, ... , N

where Hn is the channel coefficient between the nth receive antenna and the

transmit antenna, and X (t) is the signal sent from the transmit antenna.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 90 NTS

Page 91: Slides IT2 SS2012

SIMO channel

Defining the N × 1 channel vector

h = [H1, ... , HN ]T

we can write the system input-output relationship in the vector form as:

y = hX + z

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 91 NTS

Page 92: Slides IT2 SS2012

MISO channel

Another particular case of the MIMO channel is a multiple-input single-output

(MISO) channel:

Tx Rx

Nantennas 1 antenna

In the frequency flat fading case, the signal in the receive antenna

Y (t) =N∑

n=1

Hn(t)Xn(t) + Z (t)

where Hn is the channel coefficient between the receive antenna and the nth

transmit antenna, and Xn(t) is the signal sent from the nth transmit antenna.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 92 NTS

Page 93: Slides IT2 SS2012

MISO channel

Defining the N × 1 channel row-vector

h = [H1, ... , HN ]

we can write the system input-output relationship in the vector form as:

Y = hx + Z

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 93 NTS

Page 94: Slides IT2 SS2012

Capacity in the case of an informed transmitter

Let us consider the MIMO case assuming that z ∼ NC (0,σ2I). Then, the equation

y = Hx + z

describes a vector Gaussian channel. In the case of known channel at the

transmitter, the capacity can be computed by decomposing this channel into a set

of parallel independent scalar Gaussian sub-channels.

Singular value decomposition (SVD) of H:

H = UΛVH

where the M ×M matrix U and the N × N matrix V are unitary, that is,

UHU = UUH = I and VHV = VVH = I.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 94 NTS

Page 95: Slides IT2 SS2012

SVD for any n ×m matrix A

A = UΛVH =∑

λiuivHi

������������

������

��������

������

������

��������

VH

U0

0

VH

A

=

=

n

m

n

m

n<m

n>m

A

Λ

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 95 NTS

Page 96: Slides IT2 SS2012

MIMO capacity (informed transmitter)

Using the SVD of H, the MIMO model equation becomes

y = UΛVHx + z

Multiplying this equation by UH from right, and using the unitary property of U,

we have

UHy = ΛVHx + UHz

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 96 NTS

Page 97: Slides IT2 SS2012

MIMO capacity (informed transmitter)

Introducing the notations y , UHy, x , VHx, z , UHz we become a system

of parallel Gaussian channels

y = Λx + z

where E{zzH} = UHE{zzH}U = σ2I, and, therefore

z ∼ NC (0,σ2I)

Moreover,

‖x‖2 = xHVVHx = ‖x‖2

Thus, the power is preserved!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 97 NTS

Page 98: Slides IT2 SS2012

MIMO capacity (informed transmitter)

The system of parallel channels can be also written componentwise

Yi = λi Xi + Zi , i = 1, ... , no

where no = min{N, M}. The transition to this equivalent system corresponds to

pre-processing

x = Vx

at the transmitter and post-processing

y = UHy

at the receiver. Hence, the pre- and post-processing operators are V and UH ,

respectively.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 98 NTS

Page 99: Slides IT2 SS2012

MIMO capacity (informed transmitter)

To implement the pre-/post-processing operations, the original vector to be

transmitted has to be x. It should be pre-processed at the transmitter to obtain

x = Vx

The vector x should then be sent over the channel. At the receiver, we have

y = Hx = UΛVHVx = UΛx

and after post-processing, we obtain y = UHy = UHUΛx = Λx

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 99 NTS

Page 100: Slides IT2 SS2012

MIMO capacity (informed transmitter)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 100 NTS

Page 101: Slides IT2 SS2012

MIMO capacity (informed transmitter)

The capacity of the resulting parallel independent channel system:

C = Bno∑i=1

log

(1 +

Piλ2i

σ2

)bits/s

where Pi are the water-filling power allocations:

Pi =

(ν − σ2

λ2i

)+

and the water level ν is obtained from the total power constraint∑no

i=1 Pi ≤ P

Each λi corresponds to an eigenmode of the channel, also called eigenchannel.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 101 NTS

Page 102: Slides IT2 SS2012

Wireless MIMO channel

N Tx M Rx

wireless MIMO channel

Page 103: Slides IT2 SS2012

System model

assume perfect channel state information at the transmitter

+

+

+

+

+

+

Page 104: Slides IT2 SS2012

MIMO channel equation in matrix notation

; ; ;

N Tx

M Rx

What is the optimum transmission and power allocation scheme if the channel matrix H is known at the transmitter?

; ;

Page 105: Slides IT2 SS2012

Capacity of a MIMO channel

sum power contraint due to hardwarelimitations and/or regulations

max subject to: (1)

(2)

Page 106: Slides IT2 SS2012

Singular Value Decomposition of MIMO channel

U and V are unitary:

• ui and vi are left and right singular vectors.

• λi is corresponding sigular value (≥ 0)

;

Page 107: Slides IT2 SS2012

Decoupling the channels using linear transformation

; ;

for i = 1,…,r;

r independent parallel channels

Page 108: Slides IT2 SS2012

Independent parallel channel representation

+

+

+X

X

X

r independent parallel channels

Page 109: Slides IT2 SS2012

Optimization problem

Capacity:

subject to (1)

(2)

pi is power assigned to i-th input signal .

max

Page 110: Slides IT2 SS2012

Water-filling principle

Page 111: Slides IT2 SS2012

Water-filling principle

Page 112: Slides IT2 SS2012

Water-filling principle

Page 113: Slides IT2 SS2012

Water-filling principle

Page 114: Slides IT2 SS2012

Water-filling principle

Page 115: Slides IT2 SS2012

High SNR regimeWhat are the key parameters that determine the performance?

At high SNR, the water level is high and the policy of allocating equal amounts of

power to each channel is asymptotically optimal. In this case,

C ' Br∑

i=1

log

(1 +

Pλ2i

rσ2

)

' Br∑

i=1

log

(Pλ2

i

rσ2

)

' rB log SNR + Br∑

i=1

logλ2i

rbits/s

where r , rank{H} and SNR = P/σ2.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 115 NTS

Page 116: Slides IT2 SS2012

High SNR regimeWhat are the key parameters that determine the performance?

It can be proved that among the channels with the same power gain, the channels

with the equal spread of singular values result in the highest capacity.

This means that well-conditioned channel matrices are preferable in the high SNR

regime.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 116 NTS

Page 117: Slides IT2 SS2012

Low SNR regimeWhat are the key parameters that determine the performance?

In this regime, the optimal policy is to allocate power to the channel with the

strongest eigenmode:

C ' B log

(1 +

Pλ2max

σ2

)and ill-conditioned (rank-one) channel matrices are preferable.

Using the property log(1 + x) ' x loge that is valid for x � 1, we have

C ' BPλ2max log e

σ2

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 117 NTS

Page 118: Slides IT2 SS2012

MIMO capacity (uninformed transmitter)

Let us now obtain the MIMO channel capacity based on general considerations

assuming that H is fixed, while the other values (x, y and z) are random. In such

a case, no assumption on the channel knowledge at the transmitter is used at this

time, but the receiver is assumed to know H.

Capacity via mutual information:

C = maxp(x)

I (x; y) = maxp(x)

[H(y)− H(y|x)]

The output covariance matrix is given by

R = E{yyH} = HPHH + σ2I

where

P , E{xxH}19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 118 NTS

Page 119: Slides IT2 SS2012

MIMO capacity (uninformed transmitter)

Result: (Telatar, 1995; Foschini and Gans, 1998): Consider the model

y = Hx + z

where x ∼ NC (0, P), z ∼ NC (0,σ2I), and H is fixed. Let B be the channel

bandwidth in Hz. Then, the MIMO channel capacity is equal to

C = B log det

{I +

1

σ2HPHH

}bits/s

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 119 NTS

Page 120: Slides IT2 SS2012

Result 1

Let X1, ... , Xn have a multivariate complex circular Gaussian distribution with the

mean µ and covariance matrix P:

fX(x) =1

(π)ndet{P}e−(x−µX)HP−1(x−µX)

Then

H(X) = H(X1, ... , Xn) = log ((πe)n det{P})

Proof:

H(X) =

∫fX(x)(x− µx)HP−1(x− µx)dx

+ ln ((π)ndet{P})

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 120 NTS

Page 121: Slides IT2 SS2012

Result 1 (proof)

= E{(x− µX)HP−1(x− µX)}+ ln((π)ndet{P})= E{tr(P−1(x− µX)(x− µX)H)}+ ln((π)ndet{P})= tr(P−1E{(x− µX)(x− µX)H}) + ln((π)ndet{P})= tr(P−1P) + ln((π)ndet{P})= n + ln((π)ndet{P})= ln((πe)ndet{P}) nats

= log((πe)ndet{P}) bits

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 121 NTS

Page 122: Slides IT2 SS2012

Result 2

Let the random vector x ∈ Cn have zero mean and covariance ExxH = P. Then

H(X) = H(X1, ... , Xn) ≤ log {(πe)n det{P}} with equality if and only if

X ∼ NC (0, P).

Proof:: Let g(x) be a pdf with covariance [P]ij =∫

g(x)xix∗j dx and let φP(x) be

the complex circular Gaussian pdf NC (0, P).

Note, that the logarithm of the complex circular Gaussian vector

φP(x) ∼ (x− µx)HP−1(x− µx) is a quadratic form in x.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 122 NTS

Page 123: Slides IT2 SS2012

Result 2 (proof)

Then the Kullback-Leibler D(g(x)||φP(x)) distance between the two pdf’s is given

as

⇒ 0 ≤ D(g(x)||φP(x)) =

∫g(x) log(

g(x)

φP(x))dx

= −Hg (X)−∫

g(x) log(φP(x))︸ ︷︷ ︸quadratic form

dx

︸ ︷︷ ︸∼second moment of X

= −Hg (X)−∫φP(x) log(φP(x))dx

= −Hg (X) + HφP (X)⇒ HφP (X) ≥ Hg (X)

The Gaussian distribution maximizes the entropy over all distributions with the

same variance.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 123 NTS

Page 124: Slides IT2 SS2012

Proof of MIMO capacity result

It has been shown that for all random vectors with the covariance matrix R, the

entropy of y is maximized when y is zero-mean circularly-symmetric complex

Gaussian. But this is only true when the input vectors x are zero-mean

circularly-symmetric complex Gaussian, and, therefore, it is the optimal

distribution for X.

Using these facts, the capacity formula can be proved by obtaining explicit

expressions for H(Y) and H(Y|X).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 124 NTS

Page 125: Slides IT2 SS2012

Proof of MIMO capacity result

Recall the signal model:

Y = HX + Z

Then the mutual information between X and Y is given as

I (X; Y) = H(Y)− H(Y|X)

= H(Y)− H(HX + Z|X)

= H(Y)− H(HX|X)︸ ︷︷ ︸0

−H(Z|X)

= H(Y)− H(Z)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 125 NTS

Page 126: Slides IT2 SS2012

Proof of MIMO capacity result

From Result 2 we know that the entropy H(Y) is maximized for the complex

circular Gaussian input distribution, thus

maxall pdfs with R

I (X; Y) = maxall pdfs with R

H(Y)− H(Z)

= log{(πe)n det(HPHH + σ2I)} − log{(πe)nσ2}= log det(I + 1/σ2HPHH)} bits per channel use

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 126 NTS

Page 127: Slides IT2 SS2012

Transition to the classic Shannon’s capacityresult

Assuming a single-input single-output (SISO) system with N = M = 1 and the

constant channel gain H, which transmits with a power P, we have

H = H, P = P, I = 1

and, therefore

C = B log det

{I +

1

σ2HPHH

}= B log

{1 +|H|2P

σ2

}This is the classical Shannon capacity formula for a bandlimited channel!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 127 NTS

Page 128: Slides IT2 SS2012

Channel known at the transmitter

If the channel matrix H is known at the transmitter then the in general unequal

powers should be chosen, and P is not a scaled identity matrix.

Eigenchannels and power allocation using water-filling should be used as discussed

above.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 128 NTS

Page 129: Slides IT2 SS2012

Channel unknown at the transmitter

If the channel matrix H is unknown at the transmitter, then if follows from the

symmetry reasons that P should be scaled identity matrix. Using the power

constraint

tr{P} = P

we obtain that P has to be chosen as

P = (P/N)I

Indeed, the power constraint is satisfied because

tr{P} = tr {(P/N)I} = (P/N)tr{I} = P

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 129 NTS

Page 130: Slides IT2 SS2012

Channel unknown at the transmitter

Choosing P = (P/N)I, we obtain that the MIMO capacity in the uninformed

transmitter case is given by

C = B log det

{I +

P

σ2NHHH

}Assuming that, although being fixed, the entries of H are statistically independent

random values with the unit variance, and using the law of large numbers, we

obtain that for a large number of transmit antennas and a fixed number of receive

antennasHHH

N→ I

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 130 NTS

Page 131: Slides IT2 SS2012

Channel unknown at the transmitter

Using the latter property, we obtain that for large N,

C = B log det

{(1 +

P

σ2

)I

}= B log

{(1 +

P

σ2

)M}

= MB log

(1 +

P

σ2

)which is M times the SISO Shannon capacity!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 131 NTS

Page 132: Slides IT2 SS2012

Parallel SISO channel interpretation

Consider the general MIMO channel capacity formula. Let the eigendecomposition

of the positive semi-definite Hermitian matrix HPHH be

HPHH =r∑

i=1

λiuiuHi = UΛUH

where UHU = I, and r , rank{HPHH}. The matrices U and Λ should not be

confused with that of the SVD of the matrix H used earlier!

We will use the property

det{I + AB} = det{I + BA}

valid for any matrices A and B of conformable dimensions.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 132 NTS

Page 133: Slides IT2 SS2012

Parallel SISO channel interpretation

Assuming that A = U and B = ΛUH , we obtain that

C = B log det

{I +

1

σ2HPHH

}= B log det

{I +

1

σ2UΛUH

}= B log det

{I +

1

σ2ΛUHU

}= B log det

{I +

1

σ2Λ

}= B log

{r∏

i=1

(1 + λi/σ2)

}= B

r∑i=1

log{

1 + λi/σ2}

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 133 NTS

Page 134: Slides IT2 SS2012

Parallel SISO channel interpretation

The latter formula interprets the capacity of the MIMO channel as the sum of

capacities of r parallel SISO channels.

Assuming the case of uninformed transmitter (P = (P/N)I), r can be interpreted

as the rank of H → full rank channels are preferable!

If H is drawn randomly, then almost sure

rank{H} = min{M, N}

This leads us to the conclusion that the capacity grows nearly proportionally to

min{M, N}.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 134 NTS

Page 135: Slides IT2 SS2012

Assume M = N and let the Frobenius norm of H be given. What type of channel

will maximize the MIMO capacity?

Result: The capacity is maximized in the case when H is orthogonal:

HHH = HHH = ζI

where ζ is a constant. In this case,

C = B log det

{(1 +

σ2N

)I

}= B log

(1 +

σ2N

)N

= NB log

(1 +

σ2N

)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 135 NTS

Page 136: Slides IT2 SS2012

SIMO channel capacity

Consider a SIMO column-vector channel h with one transmit and N receive

antennas. The capacity formula becomes

C = B log det

{I +

1

σ2PhhH

}= B log

(1 +

1

σ2PhHh

)= B log

(1 +

P

σ2‖h‖2

)Hence, the SIMO channel comprises only one spatial data pipe. The addition of

receive antennas yields only a logarithmic (rather than linear) increase of capacity.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 136 NTS

Page 137: Slides IT2 SS2012

MISO channel capacity

Consider a MISO row-vector channel h with one receive and N transmit antennas.

The capacity formula becomes

C = B log

(1 +

1

σ2hPhH

)= B log

(1 +

1

σ2‖hP1/2‖2

)The situation is similar to that in the SIMO case. The increase in capacity is only

logarithmic (rather than linear).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 137 NTS

Page 138: Slides IT2 SS2012

Ergodic MIMO channel capacity

The channel matrix H is no longer fixed, but is treated as random. The capacity

formula can be averaged over H:

EH{C} = B EH

[log det

{I +

1

σ2HPHH

}]Result (Telatar, 1999): Let H be a Gaussian random matrix with i.i.d. elements.

Then, the average capacity is maximized subject to the power constraint

tr{P} ≤ P when

P =P

NI

That is, to maximize the average capacity, the antennas should transmit

uncorrelated streams with the same power – an intuitively appealing fact.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 138 NTS

Page 139: Slides IT2 SS2012

Ergodic MIMO channel capacity:Proof (sketch)

H be a Gaussian random matrix with i.i.d. elements.

C = maxP:trP≤P

EH

[log det{I +

1

σ2HPHH}

]Introduce P = ∆P + Poff

C = maxP:trP≤P

EH

[log det{I +

1

σ2H∆PHH +

1

σ2HPoffHH}

]= max

∆P:tr∆P≤PEH

[log det{I +

1

σ2H∆PHH +

1

σ2HPoffHH}

]≤ max

∆P:tr∆P≤Plog det{EH

[I +

1

σ2H∆PHH +

1

σ2HPoffHH

]}

Where the last inequality follows form Jensen’s inequality.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 139 NTS

Page 140: Slides IT2 SS2012

Ergodic MIMO channel capacity:Proof (sketch)

C ≤ max∆P:tr∆P≤P

log det{EH

[I +

1

σ2H∆PHH +

1

σ2HPoffHH

]}

= max∆P:tr∆P≤P

log det{EH

[I +

1

σ2H∆PHH

]+ EH

[1

σ2HPoffHH

]︸ ︷︷ ︸

=0

}

where the last term in the second equation is identical zero due to the statistical

independence of the entries in H.

We conclude that restricting the transmit covariance to exhibit the diagonal

structure P = ∆P does not reduce the achievable capacity.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 140 NTS

Page 141: Slides IT2 SS2012

Ergodic MIMO channel capacity:Proof (sketch)

Thus

C = max∆P:tr∆P≤P

EH log det{[

I +1

σ2H∆PHH

]}

We can show that due to the i.i.d. property of H the objective function is

symmetric w.r.t. the input variable, i.e. exchanging the order of the entries

P1, ... , PN does not change the function value. Further the function is concave.

We conclude that the optimal power allocation strategy in this case is to equally

distribute the power among the transmitted symbols, e.g. to choose

P1 = P2 = ... = PN .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 141 NTS

Page 142: Slides IT2 SS2012

Ergodic MIMO channel capacity

Note that the latter choice of P coincides with our earlier choice of this matrix in

the case of fixed channel and uninformed transmitter.

Choosing P = (P/N) I, the maximal average capacity (which is commonly

referred to as ergodic capacity) becomes

CE = B EH

[log det

{I +

P

σ2NHHH

}]Ergodic capacity has an important advantage w.r.t. fixed-channel capacity as it

gives an average rather than an instantaneous picture.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 142 NTS

Page 143: Slides IT2 SS2012

Ergodic MIMO channel capacity

Using the parallel SISO channel interpretation and denoting the singular values of

H as γi , we obtain

CE = B EH

[r∑

i=1

log

{1 +

Pγ2i

σ2N

}]

= Br∑

i=1

EH

[log

{1 +

Pγ2i

σ2N

}]Please, note the difference with the water-filling capacity. In contrast to it, in the

latter expression equal powers are used for each eigen-channel.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 143 NTS

Page 144: Slides IT2 SS2012

Large antenna regime

Let us denote SNR = Pσ2

Then, the capacity formula becomes

CE = Br∑

i=1

EH

[log

{1 +

SNRγ2i

N

}]Assume M = N and i.i.d. Rayleigh fading. Then, using random matrix theory, it

can be obtained that for any SNR

limN→∞

CE

N= const

Therefore, capacity grows linearly in N at any SNR in such an asymptotic regime!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 144 NTS

Page 145: Slides IT2 SS2012

Outage capacity

A value Cout which is larger than the capacity C in pout percents of channel

realizations. In other words,

Pr(Cout > C ) = pout

If one wants to transmit with Cout bits per second, then the channel capacity is

less than Cout with the probability pout. Hence, the transmission is impossible (the

system is in outage) in pout · 100 percents of time.

Alternatively, we can write

Pr(Cout ≤ C ) = 1− pout

and, hence, in (1− pout) · 100 percents of time the transmission is possible as the

system is not in outage.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 145 NTS

Page 146: Slides IT2 SS2012

Outage capacity

1− pout is called non-outage probability.

Using the instantaneous MIMO capacity formula, we can define the MIMO outage

capacity by means of the following expression

mintr{P}≤P

Pr

(Cout > B log det

{I +

1

σ2HPHH

})= pout

where we additionally use the opportunity to minimize the outage probability by

means of a proper choice of P. This particular choice, of course, depends on the

statistics of the random channel matrix H.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 146 NTS

Page 147: Slides IT2 SS2012

Example: Rayleigh fading channel

Rayleigh fading, the channel coefficients are circularly symmetric complex

Gaussian with zero mean and unit variance: a) known channel at the transmitter;

b) unknown channel at the transmitter

−10 0 10 20 30 4010

−1

100

101

102

SNR (dB)

Out

age

capa

city

(bi

ts /

s / H

z)

pout

= 0.01

pout

= 0.1

pout

= 0.5

−10 0 10 20 30 4010

−1

100

101

102

SNR (dB)

Out

age

capa

city

(bi

ts /

s / H

z)

pout

= 0.01

pout

= 0.1

pout

= 0.5

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 147 NTS

Page 148: Slides IT2 SS2012

MULTIUSER CHANNELS

Why multiuser channels:

I Up to now, we have considered point-to-point communications links.

I Most of communication systems serve multiple users. Therefore, multiuser

channels are of great interest.

I In multiuser channels, one user can interfere to another user. This type of

interference is called multiuser interference (MUI).

Common multiuser channel types:

I Multiple-access channels

I Broadcast channels

I Relay channels

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 148 NTS

Page 149: Slides IT2 SS2012

Multiple-access channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 149 NTS

Page 150: Slides IT2 SS2012

Broadcast channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 150 NTS

Page 151: Slides IT2 SS2012

Relay channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 151 NTS

Page 152: Slides IT2 SS2012

Multiple-access channels

Two-user multiple-access Gaussian channel:

Y (i) = X1(i) + X2(i) + Z (i), Z (i) ∼ NC (0,σ2)

In the point-to-point (single user) case, the rate limit is the channel capacity. The

achievable rate region is, therefore, given by:

R < B log

(1 +

P

σ2

)In the two-user case, we should extend this concept to a capacity region C which

is a set of all pairs (R1, R2) such that users 1 and 2 can simultaneously reliably

communicate at rates R1 and R2, respectively.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 152 NTS

Page 153: Slides IT2 SS2012

Multiple-access channels

Since the two users share the same bandwidth, there is a tradeoff between the

rates R1 and R2: if one user wants to communicate at a higher rate, then the

other user may need to lower its rate.

Example of tradeoff: In orthogonal multiple access schemes such as OFDM, the

tradeoff can be achieved by varying the number of subcarriers allocated to each

user.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 153 NTS

Page 154: Slides IT2 SS2012

Rate region

Different scalar performance measures can be obtained from the capacity region:

I The symmetric capacity

Csym = max(R,R)∈C

R

is the maximum common rate at which both users can simultaneously reliably

communicate.

I The sum capacity

Csum = max(R1,R2)∈C

(R1 + R2)

is the maximum total throughput that can be achieved.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 154 NTS

Page 155: Slides IT2 SS2012

Rate region

If we have two users with the powers P1 and P2, then the capacity region for the

two-user channel is defined by the following inequalities:

R1 < B log

(1 +

P1

σ2

)R2 < B log

(1 +

P2

σ2

)R1 + R2 < B log

(1 +

P1 + P2

σ2

)The first two constraints say that the rate of each individual used cannot exceed

the capacity of the point-to-point link with the other user absent.

The last constraint says that the total throughput cannot exceed the capacity of

the point-to-point link with a single user defined as the sum of the two users.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 155 NTS

Page 156: Slides IT2 SS2012

Rate region

That is, not only the rates R1 and R1 are limited, but their sum is limited as well.

This means that the signal of each user may be viewed as an interference for

another user.

Result: The two-user capacity region is a pentagon.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 156 NTS

Page 157: Slides IT2 SS2012

Rate region: multiple-access channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 157 NTS

Page 158: Slides IT2 SS2012

Rate region: multiple-access channel

Remark: Surprisingly, user 1 can achieve its single-user rate bound

R1 = B log(1 + P1

σ2

)while at the same time, user 2 can get a non-zero rate, as

high as R2 = B log(

1 + P2

P1+σ2

). This corresponds to point A of the capacity

region plot. Indeed,

R1 + R2 = B log

((1 +

P1

σ2

)(1 +

P2

P1 + σ2

))= B log

(1 +

P1

σ2+

P2

P1 + σ2+

P1P2

σ2(P1 + σ2)

)= B log

(1 +

P21 + P1σ

2 + P2σ2 + P1P2

σ2(P1 + σ2)

)= B log

(1 +

P1 + P2

σ2

)19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 158 NTS

Page 159: Slides IT2 SS2012

Successive interference cancellationHow to achieve this?

Each user should encode its data using a capacity achieving channel code. The

receiver should decode the information of both users in two stages:

I In the first stage, the data of user 2 are decoded treating user 1 as AWGN.

Then, the maximum rate of user 2 can achieve R2 = B log(

1 + P2

P1+σ2

).

I In the second stage, the reconstructed (decoded) signal of user 2 is

subtracted from the aggregate received signal, and then the data of user 1

are decoded. Since the user 2 is already subtracted and there is only the

background AWGN left in the system, the achieved rate for user 1 will be

R1 = B log(1 + P1

σ2

).

This two-stage decoding is called successive interference cancellation.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 159 NTS

Page 160: Slides IT2 SS2012

Successive interference cancellation

If one reverses the order of cancellation then one can achieve point B rather than

A.

All other rate points on the segment AB can be obtained by time-sharing between

the multiple-access strategies of points A and B.

The segment AB contains all the optimal operating points of the channel, in the

sense that any point in the capacity region is dominated by some point on AB.

That is, for any point within the capacity region that corresponds to the rates R1∗and R2∗ we can always find a point on the segment AB whose rates R1 and R2

satisfy:

R1∗ ≤ R1, R2∗ ≤ R2

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 160 NTS

Page 161: Slides IT2 SS2012

Pareto-optimal

The points on the segment AB are called Pareto-optimal.

One can always increase the user rates to move to a point on the segment AB,

and there is no reason not to do this.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 161 NTS

Page 162: Slides IT2 SS2012

The concrete choice of the point on AB depends on our particular objectives:

I To maximize the sum capacity Csum, any point on AB is equally fine. Note

that we have already computed the sum of R1 and R2 in the point A. Hence,

Csum = B log

(1 +

P1 + P2

σ2

)I To maximize the symmetric capacity Csym, we should take the point on AB

that gives us equal rates R1 and R2.I Some operating points on AB may be not fair, especially if the received

power of one user is much higher than that of the other user. In this case, we

should consider operating on the corner point in which the stronger user is

decoded first.19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 162 NTS

Page 163: Slides IT2 SS2012

How does the system with successive cancellation compares to

a standard CDMA system in terms of achievable rate?

The principal difference between CDMA detection and successive cancellation

detection is that:

I In the CDMA system, each user is decoded treating other users as

interference. This corresponds to the single-user receiver principle and we

immediately conclude that the performance of the CDMA system is

suboptimal; i.e., it achieves the point which is strictly in the interior of the

capacity region.

I In contrast to CDMA, the successive cancellation receiver is a multiuser

receiver: only one of the users (say, user 1) is decoded treating user 2 as

interference, but user 2 is decoded with the benefit of the signal of user 1

being already removed.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 163 NTS

Page 164: Slides IT2 SS2012

In the successive cancellation receiver case,

R1 = B log

(1 +

P1

σ2

), R2 = B log

(1 +

P2

P1 + σ2

)or

R1 = B log

(1 +

P1

P2 + σ2

), R2 = B log

(1 +

P2

σ2

)In the CDMA receiver case,

R1 = B log

(1 +

P1

P2 + σ2

), R2 = B log

(1 +

P2

P1 + σ2

)That is, one of the rates in the CDMA case is always lower than in the case of

successive cancellation!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 164 NTS

Page 165: Slides IT2 SS2012

Correspondingly, in the successive cancellation receiver case,

Csum = B log

(1 +

P1 + P2

σ2

)

In the CDMA receiver case, the sum rate is

B log

(1 +

P1

P2 + σ2

)+ B log

(1 +

P2

P1 + σ2

)= B log

((1 +

P1

P2 + σ2

)(1 +

P2

P1 + σ2

))= B log

(1 +

P1 + P2

σ2− P1P2(P1 + P2 + σ2)

σ2(P1 + σ2)(P2 + σ2)

)< Csum

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 165 NTS

Page 166: Slides IT2 SS2012

K -user multiple-access Gaussian channel

Y (i) =K∑

k=1

Xk(i) + Z (i), Z (i) ∼ NC (0,σ2)

Similar to the two-user case, in the case of K users, all of them share the same

bandwidth, and there is a tradeoff between the rates Rk (k = 1, 2, ... , K ). If one

(or more) users want to communicate at higher rate(s), then the other user(s)

may need to lower their rate(s).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 166 NTS

Page 167: Slides IT2 SS2012

In the K user-case, we can define the capacity region C as a set of all

(R1, R2, ... , RK ) such that users 1, 2, ... , K can simultaneously reliably

communicate at rates R1, R2, ... , RK , respectively.

This capacity region is described by the 2K − 1 constraints:

Rk < B log

(1 +

Pk

σ2

), k = 1, ... , K

Rk + Ri < B log

(1 +

Pk + Pi

σ2

), k , i = 1, ... , K

Ri + Rk + Rl < B log

(1 +

Pk + Pi + Pi

σ2

), k, i , l = 1, ... , K

· · · · · · · · · · · · · · · · · · · · · · · ·K∑

k=1

Rk < B log

(1 +

∑Kk=1 Pk

σ2

)19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 167 NTS

Page 168: Slides IT2 SS2012

K -user multiple-access Gaussian channel

The K -user capacity region can be written in a short form as∑k∈S

Rk < B log

(1 +

∑k∈S Pk

σ2

)for all S ⊂ {1, ... , K}

The right hand side

B log

(1 +

∑k∈S Pk

σ2

)is the maximum sum rate that can be achieved by a single transmitter with the

total power of the users in S and with no other users in the system.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 168 NTS

Page 169: Slides IT2 SS2012

The sum capacity can be defined as

Csum = max(R1,...,RK )∈C

K∑k=1

RK

It can be shown that

Csum = B log

(1 +

∑Kk=1 Pk

σ2

)and that there are exactly K ! corner points in the capacity region, each one

corresponding to a different successive cancellation order among the users.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 169 NTS

Page 170: Slides IT2 SS2012

In the equal power case(P1 = P2 = · · · = PK = P)

Csum = B log

(1 +

KP

σ2

)Observe that the sum capacity is unbounded as the number of users grows. In

contrast, in the conventional CDMA receiver (decoding each user treating all the

other users as noise), the sum rate will be only

BK log

(1 +

P

(K − 1)P + σ2

)which approaches

BKP

(K − 1)P + σ2log e ' B log e

as K →∞. The growing interference is a limiting factor here!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 170 NTS

Page 171: Slides IT2 SS2012

The symmetric capacity can be defined as

Csym = max(R,R,...,R)∈C

R

It can be shown that in the equal power case (P1 = P2 = · · · = PK = P),

Csym =B

Klog

(1 +

KP

σ2

)This rate for each user can be obtained by orthogonal multiplexing where each

user is allocated a fraction 1/K of the total degrees of freedom (for example, of

the total bandwidth B).

Note that Csym = Csum

K

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 171 NTS

Page 172: Slides IT2 SS2012

Broadcast channels

Two-user broadcast AGWN channel:

Yk(i) = hkX (i) + Z (i), k = 1, 2; Zk(i) ∼ NC (0,σ2)

where hk is the fixed complex channel gain corresponding to the kth user.

The broadcast case is often referred to as downlink.

The transmit power constraint: the average power of the transmit signal is P.

As in the multi-access (uplink) channel case, we can define the capacity region Cas the region of rates (R1, R2) at which both users can simultaneously reliably

communicate.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 172 NTS

Page 173: Slides IT2 SS2012

Broadcast channels

We have just two single-user bounds:

Rk < B log

(1 +

P|hk |2

σ2

), k = 1, 2

For any k, this upper bound on Rk can be attained by using all the transmit

power to communicate to user k (with the rate of the remaining user being zero).

Thus, we have two extreme points:

R1 = B log

(1 +

P|h1|2

σ2

), R2 = 0

R2 = B log

(1 +

P|h2|2

σ2

), R1 = 0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 173 NTS

Page 174: Slides IT2 SS2012

Rate region in the symmetric case |h1| = |h2|

Further, we can share the degrees of freedom (time and bandwidth) between the

users in an orthogonal manner to obtain any rate pair on the line joining these two

extreme points.

Hence, for the symetric ase of |h1| = |h2| the capacity region is a triangle.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 174 NTS

Page 175: Slides IT2 SS2012

Rate region in the symmetric case |h1| = |h2|

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 175 NTS

Page 176: Slides IT2 SS2012

In the symmetric case |h1| = |h2| , |h| sum rate can be shown to be bounded by

the single-user capacity:

R1 + R2 < B log

(1 +

P|h|2

σ2

)The latter conclusion follows from the triangle form of the capacity region.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 176 NTS

Page 177: Slides IT2 SS2012

As have been already mentioned, the rate pairs in the capacity region can be

achieved by sharing the degrees of freedom (bandwidth and time) between the

two users. What are the alternative ways to achieve the boundary of the capacity

region?

The channel symmetry suggests an alternative natural approach:

I Let the channel of the user 2 be stronger than that of user 1 (|h1| < |h2|).

Thus, if user 1 can successfully decode its data from Y1, then user 2 (which

has higher SNR) should also be able to decode the data of user 1 from Y2.

Then, user 2 can subtract the data of user 1 from its received signal Y2 to

better decode its own data; i.e., it can perform successive interference

cancellation.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 177 NTS

Page 178: Slides IT2 SS2012

Consider the following transmission strategy that superposes the signals of two

users, much like in a spread-spectrum CDMA system. The transmitted signal is

the sum of two signals:

X (i) = X1(i) + X2(i)

where Xk(i) is the signal intended for user k .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 178 NTS

Page 179: Slides IT2 SS2012

Superposition coding

Weaker user 1 decodes it own signal by treating the signal for user 2 as noise.

Stronger user 2 performs successive interference cancellation: it first decodes the

data of user 1 by treating X2 as noise, subtracts the so-determined signal of user 1

from Y2, and then extracts its own data. As a result, for any possible power split

of P = P1 + P2, the following rate pair can be achieved

R1 = B log

(1 +

P1|h1|2

P2|h1|2 + σ2

)R2 = B log

(1 +

P2|h2|2

σ2

)This strategy is commonly referred to as superposition coding.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 179 NTS

Page 180: Slides IT2 SS2012

Orthogonal scheme

On the other hand, in orthogonal schemes, for any power split P = P1 + P2 and

degree-of-freedom split α ∈ [0, 1], the following rates are jointly achieved

R1 = αB log

(1 +

P1|h1|2

ασ2

)R2 = (1− α)B log

(1 +

P2|h2|2

(1− α)σ2

)Here, α can be interpreted, for example, as the fraction of bandwidth (e.g. both

bandwidth B and noise power reduced by factor α) assigned to user 1. Alternative

α can be interpreted as a fraction of time assigned to user 1 (e.g · bits per 1

seconds becomes · bits per α second and signal power P1 is consumed in fraction

α second of time).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 180 NTS

Page 181: Slides IT2 SS2012

Rate region in the symmetric case|h1| = |h2| = |h|

Assume that superposition coding is used and that the power is split such that

P ≤ P1 + P2. In this case if user 1 can decode its data treating the data of user 2

as noise, then also user 2 can decode the data of user 1, substract it from its

received signal and decoding its own data. Hence the the following rates pairs are

suported.

R1 ≤ B log(1 +P1|h1|2

P2|h1|2 + σ2) = B log(1 +

(P1 + P2)|h1|2

σ2)− B log(1 +

P2|h1|2

σ2)

R2 ≤ B log(1 +P2|h2|2

σ2)

Thus for |h1| = |h2| = |h| and the power constraint P ≤ P1 + P2 the sum capacity

is given by

R1 + R2 ≤ B log B log(1 +P|h|2

σ2)

.19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 181 NTS

Page 182: Slides IT2 SS2012

Rate region in the general case|h1| ≤ |h2|

Solid line: optimal power split using superposition coding.

Dashed line: optimal degrees of freedom split using orthogonal coding.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 182 NTS

Page 183: Slides IT2 SS2012

In the K -user broadcast case, the boundary of the capacity region can be proved

to be given by

Rk = log

(1 +

Pk |hk |2

σ2 + (∑K

l=k+1 Pl)|hk |2

), k = 1, ... , K

for all possible power splits P =∑K

k=1 Pk of the total power at the base station.

The optimal points are achieved by superposition coding and successive

interference cancellation at the receivers. The cancellation order at every receiver

should be always to decode the weaker users before decoding its own data.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 183 NTS

Page 184: Slides IT2 SS2012

Fading channels

Until now, all multi-user channels have been considered without random channel

fading.

Let us now include fading in the signal model. Channel state information issue is

critical in such cases.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 184 NTS

Page 185: Slides IT2 SS2012

Multiple-access fading channels

K -user multiple-access fading channel:

Y (i) =K∑

k=1

hk(i)Xk(i) + Z (i)

where {hk(i)} is the random fading process of user k .

We assume that

E{|hk(i)|2

}= 1, k = 1, ... , K

and that the fading processes of different users are i.i.d.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 185 NTS

Page 186: Slides IT2 SS2012

Slow fading

The time-scale of communication is short relative to the channel coherence time

of all users. Hence, hk(i) = hk for all K .

Suppose all users transmit at the rate R. Conditioned on each realization of

h1, ... , hK , we have the standard multiple-access AGWN channel with received

SNR of user k equal to |hk |2P/σ2. If the symmetric capacity is less than R, then

this results in outage. Using the expressions for the K -user capacity region, the

outage probability can be written as

pout =Pr

{B log

(1+SNR

∑k∈S

|hk |2)

)< |S|R for some S⊂{1, ... , K}

}

where |S| denotes the cardinality of S and SNR = P/σ2.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 186 NTS

Page 187: Slides IT2 SS2012

Fast fading

Each hk(i) is modelled as a time-varying ergodic process.

The sum capacity in the fast fading case:

Csum = E

{B log

(1 +

∑Kk=1 |hk |2P

σ2

)}

How does this compare to the sum capacity of the uplink channel without fading?

Let us use Jensen’s inequality which basically says that

E{f (X )} ≤ f (E{X})

for any concave function f (·) and random variable X .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 187 NTS

Page 188: Slides IT2 SS2012

Using this inequality, we obtain that

Csum = E

{B log

(1 +

∑Kk=1 |hk |2P

σ2

)}

≤ B log

1 +E{∑K

k=1 |hk |2}

P

σ2

= B log

(1 +

KP

σ2

)where the property E

{|hk(m)|2

}= 1 (k = 1, ... , K ) has been used in the last line.

The last expression can be identified as the sum capacity of the AWGN

multiple-access channel. Hence, without channel state information at the

transmitter, fading can only hurt.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 188 NTS

Page 189: Slides IT2 SS2012

However, if the number of users K becomes large, then

K∑k=1

|hk |2 → K

and the penalty due to fading vanishes. Basically, the effect of fading is averaged

over a large number of users.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 189 NTS

Page 190: Slides IT2 SS2012

Let us now assume that we have full (possibly also non-causal) channel state

information at both the transmitter and receiver sides.

Block-fading model:

Y (i) =K∑

k=1

hk(i)Xk(i) + Z (i)

where hk(i) = hk,l remains constant over the lth coherence channel period of Tc

(Tc � 1) symbols and is i.i.d. across different coherence periods.

The channel over L such coherence periods can be viewed as a number of L

parallel “sub-channels” which fade independently. Therefore, we can again use

water-filling philosophy.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 190 NTS

Page 191: Slides IT2 SS2012

For a given realization of the channel gains hk,l (k = 1, ... , K ; l = 1, ... , L), the

sum capacity is given by

max{Pk,l}

B

L

L∑l=1

log

(1 +

∑Kk=1 Pk,l |hk,l |2

σ2

)

subject to Pk,l ≥ 0 (k = 1, ... , K ; l = 1, ... , L) and the average power constraint

1

L

L∑l=1

Pk,l = P, k = 1, ... , K

The solution to this optimization problem as L→∞ yields the appropriate power

allocation policy.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 191 NTS

Page 192: Slides IT2 SS2012

This leads to a variable rate scheme: in each lth “sub-channel”, the rates that are

dictated by the above optimization problem are used.

Optimal strategy: The sum rate in the lth “sub-channel”

B log

(1 +

∑Kk=1 Pk,l |hk,l |2

σ2

)

for a given total power∑K

k=1 Pk,l allocated to this “sub-channel” is maximized by

giving all this power to the user with the strongest channel gain. That is, each

time only one user with the best channel is allowed to transmit. Under this

strategy, the multiuser channel for each time l reduces to a point-to-point channel

with the channel gain

maxk=1,...,K

|hk,l |2

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 192 NTS

Page 193: Slides IT2 SS2012

Broadcast fading channels

K -user downlink fading channel:

Yk(i) = hk(i)X (i) + Zk(i), k = 1, ... , K

where {hk(i)} is the random fading process of user k .

Similar to the uplink case, we assume that

E{|hk(i)|2

}= 1, k = 1, ... , K

and that the fading processes of different users are i.i.d.

The transmit power is constrained to be equal to P.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 193 NTS

Page 194: Slides IT2 SS2012

Let us first consider the case when the channel state information is available only

at the receiver.

We have the following single-user bounds:

Rk < B E

{log

(1 +

P|h|2

σ2

)}, k = 1, ... , K

where h is a random channel gain.

For any k , this upper bound on Rk can be attained by using all the transmit power

to communicate to user k (with the rate to the remaining users being zero). Thus,

as in the non-fading case, we have K extreme points of the capacity region.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 194 NTS

Page 195: Slides IT2 SS2012

Similar to the non-fading case, it can be shown that the sum rate is also bounded

by the same quantity

K∑k=1

Rk < B E

{log

(1 +

P|h|2

σ2

)}This bound can be achieved by transmitting only to one user or by time-sharing

between any number of users.

It can be shown that the rate pairs in the capacity region can be achieved by both

orthogonal schemes and superposition coding.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 195 NTS

Page 196: Slides IT2 SS2012

Let us now consider the case when the channel state information is available both

at the transmitter and receiver.

Let us focus on the sum capacity. As in the uplink case, it can be shown that the

sum capacity is achieved by transmitting only to the best user at each time. Under

this strategy, the downlink channel reduces to a point-to-point channel with the

channel gain

maxk=1,...,K

|hk |2

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 196 NTS

Page 197: Slides IT2 SS2012

Multiuser diversity

We have seen that in the full channel state information case, from the sum

capacity perspective, the optimal strategy both in the uplink and downlink cases

reduces the multiuser case to the single-user (point-to-point) case with the fading

of magnitude maxk |hk(i)|. Compared to a system with a single user, the multiuser

diversity gain comes from:

I the increase of the total transmit power in the uplink case;

I the improvement of the effective channel gain at time i from |hk(i)|2 to

maxk=1,...,K |hk |2.

The second effect appears entirely due to the ability to dynamically schedule

resources among the users as a function of the channel state.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 197 NTS

Page 198: Slides IT2 SS2012

Remarks

I The multiuser diversity gain comes from the following effect: when many

users fade independently, at any time there is a high probability that one of

them has a strong channel. By allowing only that user to transmit or, vice

versa, transmitting only to that user, the shared channel resource is used in

the most efficient manner, and the total throughput is maximized.

I The larger the number of users, the higher is the multiuser diversity gain.

I The amount of multiuser diversity gain depends critically on the tail of the

distribution of |hk |2: the heavier the tail, the more likely there is a user with

the strong channel, and the larger the multiuser diversity gain.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 198 NTS

Page 199: Slides IT2 SS2012

System requirements to extract the multiuserdiversity benefits

I the base station has to access the channel quality of each user:

I in downlink, each user has to track its own channel SNR and feed back the

channel quality to the base station.I in uplink, the base station has to track the user channel quality (user SNRs).

I the base station has to schedule transmissions among the users as well as to

adapt the data rate as a function of instantaneous channel quality.

Such a scheduling procedure is often called opportunistic scheduling.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 199 NTS

Page 200: Slides IT2 SS2012

Fairness and delay

I In reality, the fading statistics of different users may be non-symmetric: there

are users some users who are closer to the base station and better in their

average SNR; there are users that are stationary (non-moving), or having no

scatterers around.

I The multiuser diversity strategy is only concerned with maximizing long-term

average throughputs. In practice, there are latency requirements, that is, the

average throughput over the delay is the performance metric of interest.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 200 NTS

Page 201: Slides IT2 SS2012

Channel measurement and feedback

I All scheduling decisions are done as a function of user channel states. Hence,

the quality of channel estimation is a primary issue, and feedback from the

users to the base station is needed in the downlink case.

I Both the error in channel measurement and the delay/error in feeding the

channel state back are significant bottlenecks of practical applications of the

multiuser diversity strategy.

Slow or limited fading:

I We have observed that the use of multiuser diversity strategy requires fading

to be rich and fast. Not useful for line-of-sight scenarios or cases with little

scattering or slowly changing environments.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 201 NTS

Page 202: Slides IT2 SS2012

Proportional fair downlink scheduling

I Keeps track of the average throughput Tk(i) (k = 1, ... K ) of each user in

some (e.g., exponentially weighted) time-window of length tW .

I In the ith time-slot, the base station receives the requested/supportable rates

Rk(i) (k = 1, ... K ) from all users, and transmits to the user k∗ with the

largest

γ = Rk(i)/Tk(i)

I The average throughputs are updated as:

Tk(i + 1) =

{(1− 1/tW )Tk(i) + Rk(i)/tW , k = k∗

(1− 1/tW )Tk(i), k 6= k∗

This algorithm is used in the downlink mode of the 3G system IS-856.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 202 NTS

Page 203: Slides IT2 SS2012

Combination of multiuser diversity and super-position coding

I Divides the users in several classes (say, in two classes depending on whether

they are near to the base station or near the cell edge). Then, users in each

class have statistically comparable channel strengths.

I Users whose current channel is instantaneously strongest in their own class

are scheduled for simultaneous transmission using superposition coding. Users

of “stronger” classes (e.g., nearby users) receive less power, still enjoying very

good rates and minimally affecting the performance of the “weak” classes of

users.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 203 NTS

Page 204: Slides IT2 SS2012

ADVANCES OF CHANNEL CODING

We have already discussed the linear block channel codes in the Information

Theory I. Now, we will discuss cyclic codes as well as convolutional codes.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 204 NTS

Page 205: Slides IT2 SS2012

Cyclic codes

An important subclass of linear block codes.

Consider an n-tuple

c = [c0, c1, ... , cn−1]

Cyclically shifting the components of c, we have

c(1) = [cn−1, c0, ... , cn−2]

Using i subsequent cyclic shifts, we have

c(i) = [cn−i , cn−i+1, ... , cn−1, c0, c1, ... , cn−i−1]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 205 NTS

Page 206: Slides IT2 SS2012

Definition: cyclic codes

An (n, k) linear block code C is called a cyclic code if every cyclic shift of any

codeword in C is also a codeword in C .

Properties:

I Linearity: the sum of any two codewords is also a codeword;

I Cyclic property: Any cyclic shift of any codeword is also a codeword.

To develop the theory of cyclic codes, let us treat the components of the

codeword c as the coefficients of the following polynomial:

c(X ) = c0 + c1X + · · ·+ cn−1X n−1

where X is an indeterminate.

The fact that all ci are binary is taken into account by using the binary arithmetic

for all polynomial coefficients when operating with polynomials.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 206 NTS

Page 207: Slides IT2 SS2012

Cyclic codes

There is one-to-one correspondence between the vector c and the polynomial

c(X ). We will call c(X ) the code polynomial of c.

Each power of X in the polynomial c(X ) represents a one-bit shift in time. Hence,

multiplication of c(X ) by X may be viewed as shift to the right.

Key question: how to make such a shift cyclic?

Let c(X ) be multiplied by X i , yielding

X ic(X ) = X i(c0 +c1X +...+cn−i−1X n−i−1+cn−iXn−i+...+cn−1X n−1)

= c0X i +c1Xi+1+...+cn−i−1X

n−1+cn−iXn+...+cn−1X n+i−1

= cn−iXn+...+cn−1X n+i−1+c0X i +c1X

i+1+...+cn−i−1Xn−1

where, in the last line, we have just rearranged the terms.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 207 NTS

Page 208: Slides IT2 SS2012

Cyclic codes

Recognizing, for example, that cn−i + cn−i = 0 in the modulo-2 arithmetic, we

can manipulate the first i terms as follows:

X ic(X ) = cn−i +...+ cn−1Xi−1 + c0X i + c1X i+1 + ... + cn−i−1X n−1

+cn−i (X n + 1) + ... + cn−1X i−1(X n + 1)

Defining

c(i)(X ) , cn−i +...+ cn−1Xi−1 + c0X i + c1X i+1 + ... + cn−i−1X n−1

q(X ) , cn−i + cn−i+1X + ... + cn−1X i−1

we can reformulate the first equation in this page in the following compact form

X ic(X ) = q(X )(X n + 1) + c(i)(X )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 208 NTS

Page 209: Slides IT2 SS2012

Cyclic codes

The polynomial c(i)(X ) can be recognized as the code polynomial of the codeword

c(i) obtained by applying i cyclic shifts to the codeword c.

Moreover, from the latter equation, we readily see that c(i)(X ) is the remainder

that results from dividing X ic(X ) by (X n + 1).

Hence, we may formally state the cyclic property in polynomial notation as

follows: if c(X ) is a code polynomial, then the polynomial

c(i)(X ) = X ic(X ) mod(X n + 1)

is also a code polynomial for any cyclic shift i , where mod(X n + 1) stands for

“modulo-(X n + 1) multiplication”.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 209 NTS

Page 210: Slides IT2 SS2012

Cyclic codes

Note that n cyclic shifts of any codeword does not change it, which means that

X n = 1, and hence X n + 1 = 0 in modulo-(X n + 1) arithmetics!

Generator polynomial: a polynomial g(X ) of minimal degree that completely

specifies the code and is a factor of X n + 1. The degree of g(X ) is equal to the

number of parity-check bits of the code, n − k .

It can be shown that any cyclic code is uniquely determined by its generator

polynomial in that each code polynomial in the code can be expressed in the form

of a polynomial product as follows:

c(X ) = a(X )g(X )

where a(X ) is a polynomial of degree k − 1.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 210 NTS

Page 211: Slides IT2 SS2012

Cyclic codes

Given the generator polynomial g(X ), we want to encode the message

[m0, ... , mk−1] in an (n, k) systematic form. The codeword structure is

[b0, b1, ... , bn−k−1, m0, m1, ... , mk−1]

Define the message bits and parity bits polynomials as

m(X ) , m0 + m1X + ... + mk−1X k−1

b(X ) , b0 + b1X + ... + bn−k−1X n−k−1

We want the code polynomial to be in the form c(X ) = b(X ) + X n−km(X )

This means that b0, ... , bn−k−1 occupy the first n − k positions of each codeword,

whereas the message bits start from the (n − k + 1)st position.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 211 NTS

Page 212: Slides IT2 SS2012

Cyclic codes

Using equation c(X ) = a(X )g(X ) yields

a(X )g(X ) = b(X ) + X n−km(X )

Equivalently,X n−km(X )

g(X )= a(X ) +

b(X )

g(X )

which means that b(X ) is the remainder left over after dividing X n−km(X ) by

g(X ).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 212 NTS

Page 213: Slides IT2 SS2012

Example: A (7,4) cyclic code

We start with the polynomial X 7 − 1 and factorize it into three irreducible

polynomials as

X 7 − 1 = (1 + X )(1 + X 2 + X 3)(1 + X + X 3)

where by an irreducible polynomial we mean a polynomial that cannot be factored

using only polynomials with binary coefficients.

Let us take

g(X ) = 1 + X + X 3

as generator polynomial whose degree is equal to the number of parity bits.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 213 NTS

Page 214: Slides IT2 SS2012

Example: A (7,4) cyclic code

We can also define a parity check polynomial

h(X ) = 1 +k−1∑i=1

hiXi + X k

such that g(X )h(X ) = X n + 1

or, equivalently g(X )h(X ) mod(X n + 1) = 0

For our example, the parity check polynomial

h(X ) = 1 + X + X 2 + X 4

so that h(X )g(X ) = (1 + X + X 2 + X 4)(1 + X + X 3) = X 7 + 1.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 214 NTS

Page 215: Slides IT2 SS2012

Example: A (7,4) cyclic code

How to encode, for example, the message sequence 1001?

The corresponding message polynomial is

m(X ) = 1 + X 3

Multiplying m(X ) by X n−k = X 3, we have

X n−km(x) = X 3 + X 6

Dividing X n−km(x) by g(X ), we have

X 3 + X 6

1 + X + X 3= X + X 3 +

X + X 2

1 + X + X 3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 215 NTS

Page 216: Slides IT2 SS2012

Example: A (7,4) cyclic code

That is,

a(X ) = X + X 3, b(X ) = X + X 2

and the encoded message is

c(X ) = b(X ) + X n−km(X )

= X + X 2 + X 3(1 + X 3)

= X + X 2 + X 3 + X 6

or, alternatively,

c = [0111001]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 216 NTS

Page 217: Slides IT2 SS2012

Relationship to conventional linear block codes

for the considered (7, 4) code, we can construct the generator matrix from

generator polynomial by using

g(X ) = 1 + X + X 3

Xg(X ) = X + X 2 + X 4

X 2g(X ) = X 2 + X 3 + X 5

X 3g(X ) = X 3 + X 4 + X 6

as the rows of the 4× 7 generator matrix

G =

1 1 0 1 0 0 0

0 1 1 0 1 0 0

0 0 1 1 0 1 0

0 0 0 1 1 0 1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 217 NTS

Page 218: Slides IT2 SS2012

Relationship to conventional linear block codes

Clearly, the latter generator matrix is in non-systematic form. We can put it in a

systematic form by manipulating with its rows, that is, by adding the first row to

the third row and adding the sum of the first two rows to the fourth row. Then,

we get

G =

1 1 0 1 0 0 0

0 1 1 0 1 0 0

1 1 1 0 0 1 0

1 0 1 0 0 0 1

Decoding cyclic codes can be made in the same way as for any other linear block

codes, e.g., using syndrome.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 218 NTS

Page 219: Slides IT2 SS2012

Popular cyclic codes are the so-called cyclic redundancy check (CRC) codes,

Bose-Chaudhuri-Hocquenghem (BCH) codes, and non-binary Reed-Solomon (RS)

codes. The are parts of different international communication standards, e.g.,

digital subscriber line (DSL) standards.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 219 NTS

Page 220: Slides IT2 SS2012

Convolutional codes

Most powerful class of linear codes.

Similar to the linear block codes the encoder of a convolutional code accepts k-bit

message blocks and produces an encoded sequence of n-bit blocks. However, each

encoded block depends not only on the corresponding k-bit message block, but

also on the M previous message blocks.

Such an encoder is said to have a memory order of M.

The ratio

R = k/n

is called the code rate.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 220 NTS

Page 221: Slides IT2 SS2012

Convolutional codes

The message sequence m = [m0, m1, m2, ...] enters the encoder one bit at a time.

The encoder output sequences are obtained as the convolution of the input

sequence with the encoder generator sequences. For an encoder with the memory

order M, the length of these sequences is M + 1. For example, in the case of two

impulse generator sequences,

g(0) = [g(0)0 , ... , g

(0)M ], g(1) = [g

(1)0 , ... , g

(1)M ]

we can write encoding equations

c(0) = m ∗ g(0), c(1) = m ∗ g(1)

where ∗ denotes the discrete convolution and all operations are modulo-2.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 221 NTS

Page 222: Slides IT2 SS2012

Convolutional codes

The convolution operation implies that

c(j)l =

M∑i=0

ml−ig(j)i , j = 0, 1

where ml−i = 0 for all l < i .

After encoding, the output sequences are multiplexed into a single sequence called

the codeword

c = [c(0)0 , c

(1)0 , c

(0)1 , c

(1)1 , ... ]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 222 NTS

Page 223: Slides IT2 SS2012

Convolutional codes

Defining a matrix

G =

g

(0)0 g

(1)0 g

(0)1 g

(1)1 · · · g

(0)M g

(1)M

g(0)0 g

(1)0 g

(0)1 g

(1)1 · · · g

(0)M g

(1)M

. . .. . .

. . .

where all blank areas are zeros, we can rewrite the encoding equations in matrix

form as

c = mG

This form of this equation is equivalent to that of linear block codes! Therefore,

we call G the generator matrix of the code.

In the case of semi-infinite message sequence, the matrix G is semi-infinite as well.

However, if m is finite-length, then G becomes finite-length as well.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 223 NTS

Page 224: Slides IT2 SS2012

Example: R = 1/2 code

With the generator sequences:

g(0) = [1011]

g(1) = [1111]

Let the message sequence be

m = [10111]

Encoding equations yield

c(0) = [10111] ∗ [1011] = [10000001]

c(1) = [10111] ∗ [1111] = [11011101]

and, hence, the 2(k + M)-bit codeword

c = [11 01 00 01 01 01 00 11]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 224 NTS

Page 225: Slides IT2 SS2012

Example: R = 1/2 code

Alternatively, we can write the k × 2(k + M) generator matrix as

G =

11 01 11 11

11 01 11 11

11 01 11 11

11 01 11 11

11 01 11 11

and obtain the same codeword as

c=[10111]

11 01 11 11

11 01 11 11

11 01 11 11

11 01 11 11

11 01 11 11

=[11 01 00 01 01 01 00 11]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 225 NTS

Page 226: Slides IT2 SS2012

Code tree and trellis

Let us discuss the concepts of code tree and trellis using a particular example of

the R = 1/2 convolutional code with M = 2 and the impulse responses

g(0) = [111], g(1) = [101]

Consider the input sequence m = [10011]. Similar to the example above, it can be

shown that the codeword becomes

c = [11 10 11 11 01 01 11]

To enforce the R = 1/2 property, let us truncate the codeword by dropping the

last 2M = 4 bits (the effect of truncation becomes negligible if longer messages

and codewords are used). Then, the codeword becomes [11 10 11 11 01]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 226 NTS

Page 227: Slides IT2 SS2012

Convolutional Encoder

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 227 NTS

Page 228: Slides IT2 SS2012

The code tree is defined as follows: each branch of the tree represents an input

symbol (0 or 1). The corresponding output (coded) symbols are indicated on each

branch. A specific path can be traced for each message sequence. The

corresponding coded symbols on the branches following this path form the output

sequence.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 228 NTS

Page 229: Slides IT2 SS2012

Code tree

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 229 NTS

Page 230: Slides IT2 SS2012

State diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 230 NTS

Page 231: Slides IT2 SS2012

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 231 NTS

Page 232: Slides IT2 SS2012

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 232 NTS

Page 233: Slides IT2 SS2012

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 233 NTS

Page 234: Slides IT2 SS2012

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 234 NTS

Page 235: Slides IT2 SS2012

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 235 NTS

Page 236: Slides IT2 SS2012

Complexity of Viterbi Decoder

Over L binary intervals, the total number of comparisons made by the Viterbi

algorithm is 2K−1L, rather than 2L comparisons required by the standard

maximum-likelihood procedure (full tree search).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 236 NTS

Page 237: Slides IT2 SS2012

Probability of deviating from correct path

Let a(d) denote the number of pathes with a Hamming distance d deviating from,

and then returning to, the all-0 test path. The error probability of Pe of deviating

from the correct path is then upper bounded by

Pe <

∞∑d=dF

a(d)Pd

where Pd denotes that probability that d bits are received in error and dF denotes

the minimum free distance.

Inequality sign because pathes a not mutually exclusive.

Pe depends critically on the minimum free distance dF !

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 237 NTS

Page 238: Slides IT2 SS2012

CONCLUSION

I We have studied advanced information theory including the capacity

characterization of multi-antenna and multi-user channels (and the resulting

concept of multiuser diversity), and advanced channel coding approaches

such as cyclic, convolutional codes.

I To apply these concepts and approaches to practice or to do research in these

fields, a deeper study is required.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 238 NTS