slides it2 ss2012
DESCRIPTION
Information TheoryTRANSCRIPT
Lecture Course: Information Theory II
Marius Pesavento
Communication Systems Group
Institute of Telecommunications
Technische Universitat Darmstadt
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 1 NTS
COURSE ORGANIZATION
I Instructors: Dr.-Ing. Marius Pesavento, S3/06/204,
[email protected] FG Nachrichtentechnische Systeme
(NTS),
I Teaching assistant: Yong Cheng, S3/06/205, e-mail:
I Website: http://www.nts.tu-darmstadt.de/
I Lecture notes and slides will be posted in TUCAN
I Office hours: on request (please send an e-mail to the TA or instructor)
I Written final exam (closed-book)
I Examination date (presumably) Tuesday July 31, 2012: 12.00 - 14.00
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 2 NTS
RECOMMENDED TEXTBOOKS
1. D. Tse, Fundamentals of Wireless Communication, Cambridge University
Press, 2005. (main reference)
2. A. El Gamal and Y.H. Kim Network Information Theory, Cambridge
University Press, 2012.
3. A. Goldsmith, Wireless Communications, Cambridge University Press, 2005.
4. T. M. Cover and J A. Thomas, Elements of Information Theory, John Wiley
& Sons, 1991.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 3 NTS
COURSE OUTLINE
I Overview of basics of information theoryI Entropy, mutual information, capacityI Source coding and channel coding theoremI Memoryless Gaussian channel
I Multi-antenna channel capacity, water-fillingI Basic theory of network information theory
I Multi-access channelsI Broadcast channelsI Relay channels
I Cyclic codesI Convolutional codesI Turbo-codes
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 4 NTS
Topics of the earlier basic IT course
I Information, entropy, mutual information, and their derivatives
I Basic theory of source coding, Shannon’s source coding theorem, Huffman
coding, Lempel-Ziv coding
I Channel capacity, Shannon’s channel coding theorem, Gaussian channel,
bandlimited channel, Shannon’s limit, multiple Gaussian channels, multiple
colored noise channels, water-filling, ergodic and outage capacities, basics of
MIMO channels,
I Basic theory of channel coding, linear block coding, Reed-Muller codes, Golay
code
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 5 NTS
REVIEW OF PROBABILITY THEORY:CDF AND PDF
Let X be a continuous random variable with the cumulative density function (cdf)
FX (x) = Probability{X ≤ x} = P(X ≤ x)
Probability density function (pdf):
fX (x) =∂FX (x)
∂x
where
FX (x0) =
∫ x0
−∞fX (x) dx
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 6 NTS
NORMALIZATION PROPERTY OF CDFs
Since FX (∞) = 1, we obtain the so-called normalization property∫ ∞−∞
fX (x) dx = 1
Simple interpretation:
fX (x) = lim∆→0P{x −∆/2 ≤ X ≤ x + ∆/2}
∆
f (x)X
x1 x2
SURFACE Probability{x1<X< x2}=
x
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 7 NTS
EXAMPLE 1
Let the real-valued random variable X be uniformly distributed in the interval
[0, T ].
F (x)X
f (x)X
0
1
0 T
T
1/T
x
x19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 8 NTS
EXAMPLE 2
Let the real-valued random variable X has the so-called Gaussian (normal)
distribution
fX (x) =1√
2πσ2X
e−(x−µX )2/2σ2X
where σ2X = var{X} is the variance and µX is the mean. The corresponding
distribution function is given by
FX (x) =1√
2πσ2X
∫ x
−∞e−(ξ−µX )2/2σ2
X dξ
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 9 NTS
CDF AND PDF OF GAUSSIAN RANDOMVARIABLE
F (x)X
f (x)X
0
0 x
x
1
(2πσX2)−1/2
σ2 X
µX
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 10 NTS
PROBABILITY MASS FUNCTION
Let X now be a discrete random variable which takes the values xi (i = 1, ... , I )
with the probabilities P(xi ) (i = 1, ... , I ), respectively.
For discrete variables, we define the probability mass function
P(xi ) = Probability(X = xi )
The normalization condition:I∑
i=1
P(xi ) = 1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 11 NTS
EXTENSION TO DISCRETE VARIABLES
How to extend the concepts of pdf and cdf to discrete variables?
Define the unit step function as
u(x) =
{0 , x < 0
1 , x ≥ 0
Define the Dirac delta-function as
δ(x) =
{∞ , x = 0
0 , x 6= 0,
∞∫−∞
δ(x) dx = 1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 12 NTS
EXTENSION TO DISCRETE VARIABLES
Relationships between the delta-function and unit step function:∫ x
−∞δ(ξ) dξ = u(x) , δ(x) =
∂u(x)
∂x
Shifting property of delta-function:∫ ∞−∞
g(x) δ(x − y) dx = g(y)
Using the definition of the unit step function, we can express the cdf as
FX (x) =I∑
i=1
P(xi )u(x − xi )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 13 NTS
EXTENSION TO DISCRETE VARIABLE
Then, the pdf can be expressed as
fX (x) =I∑
i=1
P(xi )δ(x − xi )
Using the delta-function sifting property, we have∫ ∞−∞
fX (x) dx =
∫ ∞−∞
I∑i=1
P(xi )δ(x − xi ) dx
=I∑
i=1
∫ ∞−∞
P(xi )δ(x − xi ) dx =I∑
i=1
P(xi ) = 1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 14 NTS
EXAMPLE 1
Let the random variable X be an outcome of the coin tossing experiment.
0
0.5
1.0
F (x)X
f (x)X
x
x1
0 1
0.5 0.5( ) ( )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 15 NTS
EXAMPLE 2
Let the random variable X be an outcome of the die throwing experiment.
F (x)X
f (x)X
6
6
0
0 x
x
1/6( )
1
1/6
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 16 NTS
STATISTICAL EXPECTATION
Expected value (mean) of a continuous random variable:
µX = E{X} =
∫ ∞−∞
x fX (x) dx
For a discrete random variable:
µX = E{X} =
∫ ∞−∞
xfX (x) dx =
∫ ∞−∞
xI∑
i=1
P(xi )δ(x − xi ) dx
=I∑
i=1
∫ ∞−∞
xP(xi )δ(x − xi ) dx =I∑
i=1
xiP(xi )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 17 NTS
STATISTICAL EXPECTATION
We can also compute expected value of a function of continuous random variable:
E{g(X )} =
∫ ∞−∞
g(x) fX (x) dx
For a discrete random variable:
E{g(X )} =I∑
i=1
g(xi ) P(xi )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 18 NTS
VARIANCE OF A RANDOM VARIABLE
var{X} = E{(X − E{X})2}= E{X 2} − E{X}2 = σ2
X
where σX is commonly called standard deviation.
The variance and standard deviation can be interpreted as measures of the
statistical dispersion of a random variable w.r.t. the expected value.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 19 NTS
EXAMPLE
Compute the mean and variance of the random variable uniformly distributed in
the interval [0, 1]
f (x)X
0 x1
1
µX =
∫ 1
0
x dx =x2
2
∣∣∣∣∣1
0
=1
2
σ2X =
∫ 1
0
x2 dx − µ2X =
x3
3
∣∣∣∣∣1
0
− 1
4=
1
3− 1
4=
1
12
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 20 NTS
JOINT DISTRIBUTION
Let us now consider two random variables X and Y jointly.
Joint distribution function:
FX ,Y (x , y) = P(X ≤ x , Y ≤ y)
Joint pdf:
fX ,Y (x , y) =∂2FX ,Y (x , y)
∂x ∂y
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 21 NTS
JOINT DISTRIBUTION
The inverse relationship:
FX ,Y (x0, y0) =
∫ x0
−∞
∫ y0
−∞fX ,Y (x , y) dx dy
Any pdf satisfies the following normalization property:∫ ∞−∞
∫ ∞−∞
fX ,Y (x , y) dx dy = 1
Also, ∫ ∞−∞
fX ,Y (x , y) dx = fY (y) ,
∫ ∞−∞
fX ,Y (x , y) dy = fX (x)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 22 NTS
CONDITIONAL DISTRIBUTION
In practical problems, we are often interested in the pdf of one random variable X
conditioned by the fact that a second random variable Y has some specific value
y . It is obvious that
P(X ≤ x ; Y ≤ y) = P(X ≤ x |Y ≤ y)P(Y ≤ y)
Then, conditional cdf is defined as
FX (x |y) = P(X ≤ x |Y ≤ y) =FX ,Y (x , y)
FY (y)
From symmetry, it also follows that
FY (y |x) =FX ,Y (x , y)
FX (x)19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 23 NTS
CONDITIONAL DISTRIBUTION
fX (x |y) =fX ,Y (x , y)
fY (y)
fY (y |x) =fX ,Y (x , y)
fX (x)
From the last two equations, we obtain the Bayes rule
fX (x |y) fY (y) = fY (y |x) fX (x)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 24 NTS
NORMALIZATION CONDITION
∫ ∞−∞
fX (x |y) dx =
∫ ∞−∞
fX ,Y (x , y)
fY (y)dx
=1
fY (y)
∫ ∞−∞
fX ,Y (x , y) dx = 1
Conditional expectation:
E{g(X )|y} =
∫ ∞−∞
g(x)fX (x |y) dx
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 25 NTS
STATISTICAL INDEPENDENCE
Two random variables X and Y are statistically independent if
fX ,Y (x , y) = fX (x)fY (y)
Substituting this equation to the conditional pdf, we obtain that statistical
independence implies
fX (x |y) = fX (x)
That is, the variable Y does not have any influence on the variable X .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 26 NTS
EXAMPLE
Let
fX ,Y (x , y) =
{4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0 , otherwise
Are the variables X and Y statistically dependent?
fX (x) =
∫ ∞−∞
fX ,Y (x , y) dy = 4x
∫ 1
0
y dy = 4xy 2
2
∣∣∣∣∣1
0
=
{2x , 0 ≤ x ≤ 1,
0 , otherwise
fY (y) =
{2y , 0 ≤ y ≤ 1,
0 , otherwise
fX ,Y (x , y) = fX (x)fY (y) and, hence, the variables are independent!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 27 NTS
CORRELATION AND COVARIANCE
Two fundamental characteristics of linear statistical dependence are correlation
rXY = E{XY }
and covariance
cov{X , Y } = E{(X − E{X})(Y − E{Y })}= E{XY } − E{X}E{Y }= E{XY } − µXµY
For X = Y , covariance boils down to variance:
cov{X , X} = E{X 2} − µ2X = var{X}
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 28 NTS
SOME USEFUL PROPERTIES
I var{X + Y } = var{X}+ var{Y }+ 2 cov{X , Y }.I If the variables X and Y are statistically independent then for any functions h
and g , E{h(X )g(Y )} = E{h(X )}E{g(Y )}.I If the variables X and Y are statistically independent then cov{X , Y } = 0.
Therefore, covariance is sometimes used a measure of statistical dependence.
However, the reverse statement is not necessarily true!
I If the variables X and Y are statistically independent then
var{X + Y } = var{X}+ var{Y }.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 29 NTS
EXTENSION TOMULTIVARIAT DISTRIBUTIONS
We may also consider multiple (more than two) random variables X1, ... , Xn.
Joint distribution function:
FX1,X2,...,Xn(x1, x2, ... , xn) = P(X1 ≤ x1, X2 ≤ x2, ... , Xn ≤ xn)
Joint pdf:
fX1,X2,...,Xn(x1, x2, ... , xn) =∂nFX1,X2,...,Xn(x1, x2, ... , xn)
∂x1 ∂x2 · · · ∂xn
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 30 NTS
MULTIVARIAT DISTRIBUTIONS
Introducing vectors
X = [X1, X2, ... , Xn]T
x = [x1, x2, ... , xn]T
we rewrite the previous equations in symbolic (vector) notation as
FX(x) = P(X ≤ x)
fX(x) =∂NFX(x)
∂x1 ∂x2 · · · ∂xn
Normalization condition: ∫ ∞−∞· · ·∫ ∞−∞
fX(x) dx = 1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 31 NTS
MULTIVARIAT DISTRIBUTIONS
Statistical expectation can be defined as
E{g(X)} =
∫ ∞−∞· · ·∫ ∞−∞
g(x) fX(x) dx
where g(X) is some function of the random vector X.
In particular bivariate case
E{g(X , Y )} =
∫ ∞−∞
∫ ∞−∞
g(x , y) fX ,Y (x , y) dx dy
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 32 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONS
Jointly Gaussian random variables have the following joint multivariate pdf:
fX(x) =1
(√
2π)ndet{R}1/2e−
12 (x−µX)T R−1(x−µX)
where the mean
µX = E{X}
and the covariance matrix
R = E{(X− E{X})(X− E{X})T} = E{XXT} − µXµTX
In symbolic notation
X ∼ N (µX, R)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 33 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTION
In the case of a single (n = 1) random variable X = X1, the n-variate Gaussian
pdf reduces to
pX (x) =1√
2πσ2X
e−(x−µX )2/2σ2X
which is the well-known Gaussian pdf.
In the case of two (N = 2) random variables X = X1 and Y = X2, we have that
R =
[σ2X ρ σXσY
ρ σXσY σ2Y
], ρ =
E{(X − µX )(Y − µY )}σXσY
Note that ρ = ρXY is nothing else as correlation coefficient.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 34 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTION
The determinant of R is given by
det{R} = σ2Xσ
2Y (1− ρ2)
and, therefore, the n-variate pdf reduces to the so-called bivariate pdf
fXY (x , y) =1
2πσXσY√
1− ρ2
· exp
{− 1
2(1− ρ2)
[(x−µX )2
σ2X
− 2ρ(x−µX )(y−µY )
σXσY+
(y−µY )2
σ2Y
]}The maximum of this function is located in the point {x = µX ; y = µY } and the
maximal value is
max {fX ,Y (x , y)} = fX ,Y (µX ,µY ) =1
2πσXσY√
1− ρ2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 35 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTION
In the case of uncorrelated X and Y , ρ = 0 and we have
fXY (x , y) =1
2πσXσYexp
{−1
2
[(x−µX )2
σ2X
+(y−µY )2
σ2Y
]}
=
(1√
2πσXe−(x−µX )2/2σ2
X
)(1√
2πσYe−(y−µY )2/2σ2
Y
)= fX (x) fY (y)
i.e., the variables X and Y become statistically independent. This is a very
important result showing that any uncorrelated Gaussian random variables are also
statistically independent! Note that in the case of non-Gaussian random variables,
this is not true.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 36 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
Contour plots of the bivariate Gaussian pdf with the parameters µX = µY = 0 and
σX = σY = 1.
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 37 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.25
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 38 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.5
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 39 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.75
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 40 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.95
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 41 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.25
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 42 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.5
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 43 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.75
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 44 NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONEXAMPLE
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
x
y
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = −0.95
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 45 NTS
BASICS OF INFORMATION THEORY
Shannon: Information is the resolution of uncertainty about some statistical event:
I Before the event occurs, there is an amount of uncertainty.
I After the occurrence of the event, there is no uncertainty anymore, but there
is gain in the amount of information.
I Highly expected messages deliver small amount of information, while highly
unexpected ones deliver a large amount of information.Hence, the amount of
information should be inversely proportional to the probability of the message.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 46 NTS
Information and entropy
The amount of information of the symbol x with the probability P(x):
I (x) = log
(1
P(x)
)= −log(P(x)) with [I (x)] = Bit
Considering a source with the alphabet X = {x1, ... , xN}, entropy is defined as the
statistically averaged amount of information (mean of I (X )):
H(X ) = E{I (X )} = E{−log(P(X ))}
=N∑i=1
−P(xi ) log(P(xi ))
=N∑i=1
P(xi ) log
(1
P(xi )
)with [H(X )] = Bit/Symbol
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 47 NTS
Example
Entropy of non-symmetric binary source with the probabilities P(0) = p and
P(1) = 1− p
HB(X ) = −p log(p)− (1− p) log(1− p)
maximale Ungewissheit
0 10.5
1
p
H (X)B , Bit/Zeichen
I The entropy characterizes the source uncertainty.
I The entropy is a concave function of probability.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 48 NTS
SOME “DERIVATIVES” OF ENTROPYJoint Entropy
The definition of entropy can be extended to a pair of random variables X and Y
(two discrete sources X = {x1, ... , xN} and Y = {y1, ... , yM}).
The joint entropy H(X , Y ) is defined as:
H(X , Y ) = −E{log(P(X , Y ))}
= −N∑i=1
M∑l=1
P(xi , yl) log(P(xi , yl))
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 49 NTS
Conditional Entropy
The conditional entropy H(Y |X ) is the amount of uncertainty remaining about
the random variable Y after the random variable X has been observed:
H(Y |X ) = −EX ,Y {log(P(Y |X ))}
= −N∑i=1
M∑l=1
P(xi , yl) log(P(yl |xi ))
= −N∑i=1
P(xi )M∑l=1
P(yl |xi ) log(P(yl |xi ))
where we use the Bayes rule
P(xi , yl) = P(xi |yl)P(yl) = P(yl |xi )P(xi )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 50 NTS
Useful properties
Important conditional entropy property:
H(X , Y ) = H(X ) + H(Y |X )
Hence the entropy, conditional entropy, and joint entropy are related quantities.
Another important property: Conditioning reduces entropy:
H(X |Y ) ≤ H(X )
with the equality if and only if X and Y are statistically independent.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 51 NTS
Mutual information
Let us consider two random variables (sources). The amount of information
exchanged between two symbols xi und yl can be defined as:
I (xi ; yl) = log
(P(xi |yl)
P(xi )
)= log
(P(xi , yl)
P(xi )P(yl)
)with [I (xi ; yl)] = Bit
where we again use the Bayes rule P(xi , yl) = P(xi |yl)P(yl).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 52 NTS
Mutual information
The amount of mutual information exchanged between two sources X and Y can
be obtained by averaging of I (xi ; yl) as
I (X ; Y ) =N∑i=1
M∑l=1
P(xi , yl) log
(P(xi |yl)
P(xi )
)
=N∑i=1
M∑l=1
P(xi , yl) log
(P(xi , yl)
P(xi )P(yl)
)with [I (X ; Y )] = Bit/Symbol
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 53 NTS
Mutual information
Mutual information is the reduction in the uncertainty of X due to the knowledge
of Y :
I (X ; Y ) = H(X )− H(X |Y )
Relation of mutual information to entropies and joint entropy:
I (X ; Y ) = H(X ) + H(Y )− H(X , Y )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 54 NTS
Channel capacity
The input probabilities P(xi ) are independent of the channel. We can then
maximize the mutual information I (X ; Y ) w.r.t. P(xi ). The channel capacity can
be then defined as the maximum mutual information in any single use of the
channel, where the maximization is over P(xi ) (i = 1, ... , M):
C = max{P(xi )}
I (X ; Y ) with [C ] = Bits/Symbol
or bits per channel use (bpcu).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 55 NTS
Example
Channel capacity of a binary symmetric channel
x y
x y2
1
2
p
p
p1-
p1-
1
CB = 1 + p logp + (1− p) log(1− p)
= 1− HB(X )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 56 NTS
Entropy/capacity of a binary symmetric channel
0
1
H (X)
10.5 10.50
1
BCB , Bit/Zeichen , Bit/Zeichen
p p
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 57 NTS
Channel coding/decoding
The inevitable presence of noise in a channel causes errors between the output
and input data sequences of a digital communication system. To reduce these
errors we will resort to channel coding.
Channel encoder maps the incoming source data into a channel input sequence. It
adds redundancy to these data to protect it from errors.
Channel decoder inversely maps the channel output sequence into an output data
sequence in a way that the overall effect of the channel noise on the system is
minimized.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 58 NTS
Shannon’s Channel-Coding Theorem
Let information be transmitted through a discrete memoryless channel of capacity
C . If the transmission rate
R < C
then there exists a channel coding scheme for which the source output can be
transmitted over the channel with an arbitrarily small probability of error.
Conversely, if
R ≥ C
than it is impossible to transmit information over the channel with an arbitrary
small probability of error.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 59 NTS
Joint source-channel coding theorem
If
H(X ) > C
then it is impossible to transmit the source outputs over the channel with an
arbitrary small probability of error.
The latter theorem follows from the direct combination of source-coding and
channel-coding theorems.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 60 NTS
Continuous sources
The mutual information between two continuous random sources X and Y with
the joint symbol pdf fX ,Y (x , y) is given by
I (X ; Y ) =
∫ ∞−∞
∫ ∞−∞
fX ,Y (x , y) log
(fX (x |y)
fX (x)
)dx dy
=
∫ ∞−∞
∫ ∞−∞
fX ,Y (x , y) log
(fX ,Y (x , y)
fX (x)fY (y)
)dx dy
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 61 NTS
What is the relationship between the discrete and continuous mutual information?
It can be shown that the definitions of mutual information in the continuous and
discrete cases are essentially similar.
This property enables to use the continuous mutual information to define the
capacity in the case of continuously distributed (infinite alphabet) sources.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 62 NTS
Continuous-time bandlimited channel
Consider a continuous-time bandlimited channel with additive Gaussian white
noise (AGWN). The output of such AWGN channel can be described as
Y (t) = (X (t) + Z (t)) ∗ h(t)
where X (t) and Z (t) are the signal and noise waveforms, respectively, and h(t) is
the impulse response of an ideal bandpass filter with the cutoff frequency B.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 63 NTS
+
n(t)- B B f
H(f)
idealer TiefpassBBandbreite
AWGN
Bandbegrentzter AWGN-Kanal
f
S (f)
0
NN /2
Leistungsdichtespectrumdes Rauschens
x(t) y(t)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 64 NTS
Capacity
Capacity of the bandlimited channel:
C = B log
(1 +
P
N0B
)bits per second
where it is taken into account that PN = N0B.
Shannon’s bound:
C∞ = limB→∞
B log
(1 +
P
N0B
)= loge · P
N0' 1.44
P
N0
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 65 NTS
Parallel AWGN channels
Consider multiple parallel AWGN channels
Yi = Xi + Zi , i = 1, ... , K
with a common power constraint
E
{K∑i=1
X 2i
}=
K∑i=1
E{X 2i } =
K∑i=1
Pi ≤ P
where Zi ∼ N (0, PN,i ), the noise is statistically independent from channel to
channel, and Pi = E{X 2i }.
How to distribute the power P among the channels to maximize the total
capacity?
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 66 NTS
Water-filling
Result (water-filling): The total capacity is maximized when
Pi = (ν − PN,i )+
where the value of ν is chosen that
K∑i=1
Pi =K∑i=1
(ν − PN,i )+ ≤ P
and (·)+ denotes the positive part, i.e., for any x ,
(x)+ ,
{x , if x ≥ 0
0, if x < 0
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 67 NTS
Water-filling
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 68 NTS
SKETCH OF THE PROOF
The mutual information of a system with multiple Gaussian channels can be
shown to be upper-bounded by the value
1
2
K∑i=1
log
(1 +
Pi
PN,i
)
Equality is achieved when X = [X1, X2, ... , XK ]T is Gaussian vector:
X ∼ N (0, P)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 69 NTS
SKETCH OF THE PROOF
The covariance matrix
P =
P1 0 · · · 0
0 P2 · · · 0...
.... . .
...
0 0 · · · PK
= diag{P1, ... , PK}
Hence, the capacity of multiple Gaussian channels is given by
C =1
2
K∑i=1
log
(1 +
Pi
PN,i
)
Let us now maximize C over {Pi}Ki=1 subject to the constraints∑K
i=1 Pi = P. and
Pi ≥ 0, for i = 1, ... , K .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 70 NTS
SKETCH OF THE PROOF
We use the Lagrange multiplier method. The Lagrangian function can be written
as
L(P1, ... , PK ) =1
2
K∑i=1
log
(1 +
Pi
PN,i
)+ λ0(P −
K∑i=1
Pi ) +K∑i=1
λiPi
where λ0, ... ,λK are the Lagrange multipliers. Differentiating L(P1, ... , PK ) w.r.t.
Pi , we have
∂L∂Pi
=∂
∂Pi
(1
2log e ln
(1 +
Pi
PN,i
)+ λ0(P −
K∑i=1
Pi ) +K∑i=1
λiPi
)
=1
2log e
1/PN,i
1 + Pi/PN,i− λ0 + λi
=log e
2
1
Pi + PN,i− λ0 + λi
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 71 NTS
SKETCH OF THE PROOF
From the so-called Karnush-Kuhn-Tucker (KKT) conditions for constraint convex
optimization problems:
K∑i=1
P?i = P ; P?
i ≥ 0 (constraint satisfaction)
log e
2
1
P?i + PN,i
− λ?0 + λ?i = 0 (zero gradient)
λ?i P?i = 0 (complementary slackness)
λ?i ≥ 0; i = 1, ... , K (for inequality constraints)
Thus P?i ≥ 0 and
∑Ki=1 P?
i = P as well as
P?i (λ?0 −
log e
2
1
P?i + PN,i
) and λ?0 ≥log e
2
1
P?i + PN,i
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 72 NTS
SKETCH OF THE PROOF
from KKT: P?i ≥ 0 and
∑Ki=1 P?
i = P as well as
P?i (λ?0 −
log e
2
1
P?i + PN,i
) = 0 and λ?0 ≥log e
2
1
P?i + PN,i
Thus if
λ?0 <log e
2
1
PN,i,
then from the last equation we have P?i > 0 which by slackness conditions implies
that
λ?0 =log e
2
1
P?i + PN,i
.
and thus for ν? = log e/(2λ?0)
P?i = (ν? − PN,i ) .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 73 NTS
SKETCH OF THE PROOF
from KKT: P?i ≥ 0 and
∑Ki=1 P?
i = P as well as
P?i (λ?0 −
log e
2
1
P?i + PN,i
) = 0 and λ?0 ≥log e
2
1
P?i + PN,i
Reversely if
λ?0 ≥log e
2
1
PN,i,
then P?i > 0 is impossible as it would imply that
λ?0 ≥log e
2
1
P?i
>log e
2
1
P?i + PN,i
,
which violates the complementary slackness condition. We conclude that for
PN,i ≤ ν? we have P?i = 0 and P?
i = (ν? − PN,i ) otherwise.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 74 NTS
EXTENDED DEFINITIONS OF CAPACITYErgodic capacity
Ergodic capacity: In the case of random Gaussian channel, it is sometimes more
useful to separate the effects of the transmitted signal and the channel as
Y (i) = X (i)H(i) + Z (i)
where H(i) is the channel gain in the ith channel use. In contrast to noise and
signal waveforms, the channel gain is usually treated as non-random
(deterministic) value.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 75 NTS
Ergodic capacity
For this model,
P = E{X 2}
can be interpreted as the transmitted signal power, whereas
P = E{(XH)2} = E{X 2}H2 = PH2
can be interpreted as the received signal power.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 76 NTS
Ergodic capacity
In this case, the capacity formula reads
C =1
2log
(1 +
PH2
PN
)Note that the conventional capacity is instantaneous, that is, it characterizes the
maximal achievable rate for particular given realization of the gain H of the
channel.
How can we characterize the maximal achievable rate in average rather than for
some particular channel gain?
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 77 NTS
Ergodic capacity
In practice, wireless channels are random and, therefore, should be treated as
random.
Based on this fact, the ergodic capacity is defined as the instantaneous capacity C
averaged over the channel realizations:
CE = EH{C}
where EH{·} denotes statistical expectation over the random channel gain.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 78 NTS
Ergodic capacity
Assume that we know the channel gain pdf fH(h). In this case, we can compute
the ergodic capacity as
CE =
∫ ∞−∞
fH(h) C (h) dh
Ergodic capacity provides another look at the achievable transmission rate as
compared to the conventional instantaneous capacity, because it gives the average
rather than the instantaneous picture.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 79 NTS
Outage
Outage capacity: the transmission rate Cpout which does not exceed the
instantaneous capacity C in pout × 100 percents of channel realizations.
The quantity pout is called outage probability.
Outage is defined as the event where, for some particular channel realization, the
chosen transmission rate is higher than the instantaneous capacity (that is, where
no error-free transmission is possible).
In the cases of small pout (roughly speaking, pout ≤ 0.1), outage-induced errors
can be cured by means of channel coding.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 80 NTS
Outage
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 81 NTS
Outage
The outage capacity can be characterized as follows. Let the pdf of the
instantaneous capacity C = C (H) be fC (c) where fC (c) = 0 for c < 0. Then, the
outage capacity is defined by the equation
p = P(C < Cpout ) =
∫ Cpout
0
fC (c) dc
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 82 NTS
Channel coding
Channel encoding and decoding is used to correct errors that may occur during
the signal transmission over the channel.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 83 NTS
Linear block codes
Linear binary block codes: coding/decoding operations can be described using
linear algebra. Binary codes use modulo-2 arithmetic.
A code is said to be linear if the modulo-2 sum of any two codewords in the code
give another codeword of this code.
A code is denoted as (n, k) linear block code if n is the total number of bits of the
code, and k is the number of bits containing the message.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 84 NTS
Linear block codes
Row-vector notations
m = [m1, ... , mk ]
b = [b1, ... , bn−k ]
c = [b1, ... , bn−k , m1, ... , mk ] = [b, m]
Block codes use the message bits to generate parity-check bits according to the
equation:
b = mP
where P is the k × (n − k) coefficient matrix. Noting that c = [b, m], we get that
c = [b, m] = [mP, m] = m[P, Ik ] = mG
where G is the k × n generator matrix.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 85 NTS
Hamming codes
Hamming codes, a family of codes with
n = 2m − 1
k = 2m −m − 1, n − k = m
(7,4) Hamming code (n = 7, m = 3, k = 4) generator matrix:
G = [P, I4] =
1 1 0 1 0 0 0
0 1 1 0 1 0 0
1 1 1 0 0 1 0
1 0 1 0 0 0 1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 86 NTS
Message Codeword Hamming Weight
0000 0000000 0
0001 1010001 3
0010 1110010 4
0011 0100011 3
0100 0110100 3
0101 1100101 4
... and so on ... ...
For the given Hamming code, dmin = 3. Therefore, it is a single-error correcting
code.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 87 NTS
MULTI-ANTENNA CHANNELS
Consider a multiple-input multiple-output (MIMO) channel:
Tx Rx
N Mantennas antennaschannel
In the frequency flat fading case, the signal in the mth receive antenna
Ym(t) =N∑
n=1
Hmn(t)Xn(t) + Zm(t), m = 1, ... , M
where Hmn is the channel coefficient between the mth receive and nth transmit
antennas, Xn is the signal sent from the nth transmit antenna, and Zm is the noise
in the mth receive antenna.19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 88 NTS
MIMO channel
Defining the M × N channel matrix
H =
H11 H12 · · · H1N
H21 H22 · · · H2N
......
. . ....
HM1 HM2 · · · HMN
and the transmit signal, receive signal, and noise column-vectors
x = [X1, ... , XN ]T , y = [Y1 ... , YM ]T , z = [Z1, ... , ZM ]T
we can write the system input-output relationship in the matrix form as:
y = Hx + z
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 89 NTS
SIMO channel
One particular case of the MIMO channel is a single-input multiple-output
(SIMO) channel:
Tx Rx
N1 antenna antennas
In the frequency flat fading case, the signal in the nth receive antenna
Yn(t) = Hn(t)X (t) + Zn(t), n = 1, ... , N
where Hn is the channel coefficient between the nth receive antenna and the
transmit antenna, and X (t) is the signal sent from the transmit antenna.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 90 NTS
SIMO channel
Defining the N × 1 channel vector
h = [H1, ... , HN ]T
we can write the system input-output relationship in the vector form as:
y = hX + z
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 91 NTS
MISO channel
Another particular case of the MIMO channel is a multiple-input single-output
(MISO) channel:
Tx Rx
Nantennas 1 antenna
In the frequency flat fading case, the signal in the receive antenna
Y (t) =N∑
n=1
Hn(t)Xn(t) + Z (t)
where Hn is the channel coefficient between the receive antenna and the nth
transmit antenna, and Xn(t) is the signal sent from the nth transmit antenna.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 92 NTS
MISO channel
Defining the N × 1 channel row-vector
h = [H1, ... , HN ]
we can write the system input-output relationship in the vector form as:
Y = hx + Z
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 93 NTS
Capacity in the case of an informed transmitter
Let us consider the MIMO case assuming that z ∼ NC (0,σ2I). Then, the equation
y = Hx + z
describes a vector Gaussian channel. In the case of known channel at the
transmitter, the capacity can be computed by decomposing this channel into a set
of parallel independent scalar Gaussian sub-channels.
Singular value decomposition (SVD) of H:
H = UΛVH
where the M ×M matrix U and the N × N matrix V are unitary, that is,
UHU = UUH = I and VHV = VVH = I.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 94 NTS
SVD for any n ×m matrix A
A = UΛVH =∑
λiuivHi
������������
������
��������
������
������
��������
UΛ
VH
U0
0
VH
A
=
=
n
m
n
m
n<m
n>m
A
Λ
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 95 NTS
MIMO capacity (informed transmitter)
Using the SVD of H, the MIMO model equation becomes
y = UΛVHx + z
Multiplying this equation by UH from right, and using the unitary property of U,
we have
UHy = ΛVHx + UHz
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 96 NTS
MIMO capacity (informed transmitter)
Introducing the notations y , UHy, x , VHx, z , UHz we become a system
of parallel Gaussian channels
y = Λx + z
where E{zzH} = UHE{zzH}U = σ2I, and, therefore
z ∼ NC (0,σ2I)
Moreover,
‖x‖2 = xHVVHx = ‖x‖2
Thus, the power is preserved!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 97 NTS
MIMO capacity (informed transmitter)
The system of parallel channels can be also written componentwise
Yi = λi Xi + Zi , i = 1, ... , no
where no = min{N, M}. The transition to this equivalent system corresponds to
pre-processing
x = Vx
at the transmitter and post-processing
y = UHy
at the receiver. Hence, the pre- and post-processing operators are V and UH ,
respectively.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 98 NTS
MIMO capacity (informed transmitter)
To implement the pre-/post-processing operations, the original vector to be
transmitted has to be x. It should be pre-processed at the transmitter to obtain
x = Vx
The vector x should then be sent over the channel. At the receiver, we have
y = Hx = UΛVHVx = UΛx
and after post-processing, we obtain y = UHy = UHUΛx = Λx
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 99 NTS
MIMO capacity (informed transmitter)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 100 NTS
MIMO capacity (informed transmitter)
The capacity of the resulting parallel independent channel system:
C = Bno∑i=1
log
(1 +
Piλ2i
σ2
)bits/s
where Pi are the water-filling power allocations:
Pi =
(ν − σ2
λ2i
)+
and the water level ν is obtained from the total power constraint∑no
i=1 Pi ≤ P
Each λi corresponds to an eigenmode of the channel, also called eigenchannel.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 101 NTS
Wireless MIMO channel
N Tx M Rx
wireless MIMO channel
System model
assume perfect channel state information at the transmitter
+
+
+
+
+
+
MIMO channel equation in matrix notation
; ; ;
N Tx
M Rx
What is the optimum transmission and power allocation scheme if the channel matrix H is known at the transmitter?
; ;
Capacity of a MIMO channel
sum power contraint due to hardwarelimitations and/or regulations
max subject to: (1)
(2)
Singular Value Decomposition of MIMO channel
U and V are unitary:
• ui and vi are left and right singular vectors.
• λi is corresponding sigular value (≥ 0)
;
Decoupling the channels using linear transformation
; ;
for i = 1,…,r;
r independent parallel channels
Independent parallel channel representation
+
+
+X
X
X
r independent parallel channels
Optimization problem
Capacity:
subject to (1)
(2)
pi is power assigned to i-th input signal .
max
Water-filling principle
Water-filling principle
Water-filling principle
Water-filling principle
Water-filling principle
High SNR regimeWhat are the key parameters that determine the performance?
At high SNR, the water level is high and the policy of allocating equal amounts of
power to each channel is asymptotically optimal. In this case,
C ' Br∑
i=1
log
(1 +
Pλ2i
rσ2
)
' Br∑
i=1
log
(Pλ2
i
rσ2
)
' rB log SNR + Br∑
i=1
logλ2i
rbits/s
where r , rank{H} and SNR = P/σ2.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 115 NTS
High SNR regimeWhat are the key parameters that determine the performance?
It can be proved that among the channels with the same power gain, the channels
with the equal spread of singular values result in the highest capacity.
This means that well-conditioned channel matrices are preferable in the high SNR
regime.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 116 NTS
Low SNR regimeWhat are the key parameters that determine the performance?
In this regime, the optimal policy is to allocate power to the channel with the
strongest eigenmode:
C ' B log
(1 +
Pλ2max
σ2
)and ill-conditioned (rank-one) channel matrices are preferable.
Using the property log(1 + x) ' x loge that is valid for x � 1, we have
C ' BPλ2max log e
σ2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 117 NTS
MIMO capacity (uninformed transmitter)
Let us now obtain the MIMO channel capacity based on general considerations
assuming that H is fixed, while the other values (x, y and z) are random. In such
a case, no assumption on the channel knowledge at the transmitter is used at this
time, but the receiver is assumed to know H.
Capacity via mutual information:
C = maxp(x)
I (x; y) = maxp(x)
[H(y)− H(y|x)]
The output covariance matrix is given by
R = E{yyH} = HPHH + σ2I
where
P , E{xxH}19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 118 NTS
MIMO capacity (uninformed transmitter)
Result: (Telatar, 1995; Foschini and Gans, 1998): Consider the model
y = Hx + z
where x ∼ NC (0, P), z ∼ NC (0,σ2I), and H is fixed. Let B be the channel
bandwidth in Hz. Then, the MIMO channel capacity is equal to
C = B log det
{I +
1
σ2HPHH
}bits/s
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 119 NTS
Result 1
Let X1, ... , Xn have a multivariate complex circular Gaussian distribution with the
mean µ and covariance matrix P:
fX(x) =1
(π)ndet{P}e−(x−µX)HP−1(x−µX)
Then
H(X) = H(X1, ... , Xn) = log ((πe)n det{P})
Proof:
H(X) =
∫fX(x)(x− µx)HP−1(x− µx)dx
+ ln ((π)ndet{P})
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 120 NTS
Result 1 (proof)
= E{(x− µX)HP−1(x− µX)}+ ln((π)ndet{P})= E{tr(P−1(x− µX)(x− µX)H)}+ ln((π)ndet{P})= tr(P−1E{(x− µX)(x− µX)H}) + ln((π)ndet{P})= tr(P−1P) + ln((π)ndet{P})= n + ln((π)ndet{P})= ln((πe)ndet{P}) nats
= log((πe)ndet{P}) bits
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 121 NTS
Result 2
Let the random vector x ∈ Cn have zero mean and covariance ExxH = P. Then
H(X) = H(X1, ... , Xn) ≤ log {(πe)n det{P}} with equality if and only if
X ∼ NC (0, P).
Proof:: Let g(x) be a pdf with covariance [P]ij =∫
g(x)xix∗j dx and let φP(x) be
the complex circular Gaussian pdf NC (0, P).
Note, that the logarithm of the complex circular Gaussian vector
φP(x) ∼ (x− µx)HP−1(x− µx) is a quadratic form in x.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 122 NTS
Result 2 (proof)
Then the Kullback-Leibler D(g(x)||φP(x)) distance between the two pdf’s is given
as
⇒ 0 ≤ D(g(x)||φP(x)) =
∫g(x) log(
g(x)
φP(x))dx
= −Hg (X)−∫
g(x) log(φP(x))︸ ︷︷ ︸quadratic form
dx
︸ ︷︷ ︸∼second moment of X
= −Hg (X)−∫φP(x) log(φP(x))dx
= −Hg (X) + HφP (X)⇒ HφP (X) ≥ Hg (X)
The Gaussian distribution maximizes the entropy over all distributions with the
same variance.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 123 NTS
Proof of MIMO capacity result
It has been shown that for all random vectors with the covariance matrix R, the
entropy of y is maximized when y is zero-mean circularly-symmetric complex
Gaussian. But this is only true when the input vectors x are zero-mean
circularly-symmetric complex Gaussian, and, therefore, it is the optimal
distribution for X.
Using these facts, the capacity formula can be proved by obtaining explicit
expressions for H(Y) and H(Y|X).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 124 NTS
Proof of MIMO capacity result
Recall the signal model:
Y = HX + Z
Then the mutual information between X and Y is given as
I (X; Y) = H(Y)− H(Y|X)
= H(Y)− H(HX + Z|X)
= H(Y)− H(HX|X)︸ ︷︷ ︸0
−H(Z|X)
= H(Y)− H(Z)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 125 NTS
Proof of MIMO capacity result
From Result 2 we know that the entropy H(Y) is maximized for the complex
circular Gaussian input distribution, thus
maxall pdfs with R
I (X; Y) = maxall pdfs with R
H(Y)− H(Z)
= log{(πe)n det(HPHH + σ2I)} − log{(πe)nσ2}= log det(I + 1/σ2HPHH)} bits per channel use
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 126 NTS
Transition to the classic Shannon’s capacityresult
Assuming a single-input single-output (SISO) system with N = M = 1 and the
constant channel gain H, which transmits with a power P, we have
H = H, P = P, I = 1
and, therefore
C = B log det
{I +
1
σ2HPHH
}= B log
{1 +|H|2P
σ2
}This is the classical Shannon capacity formula for a bandlimited channel!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 127 NTS
Channel known at the transmitter
If the channel matrix H is known at the transmitter then the in general unequal
powers should be chosen, and P is not a scaled identity matrix.
Eigenchannels and power allocation using water-filling should be used as discussed
above.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 128 NTS
Channel unknown at the transmitter
If the channel matrix H is unknown at the transmitter, then if follows from the
symmetry reasons that P should be scaled identity matrix. Using the power
constraint
tr{P} = P
we obtain that P has to be chosen as
P = (P/N)I
Indeed, the power constraint is satisfied because
tr{P} = tr {(P/N)I} = (P/N)tr{I} = P
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 129 NTS
Channel unknown at the transmitter
Choosing P = (P/N)I, we obtain that the MIMO capacity in the uninformed
transmitter case is given by
C = B log det
{I +
P
σ2NHHH
}Assuming that, although being fixed, the entries of H are statistically independent
random values with the unit variance, and using the law of large numbers, we
obtain that for a large number of transmit antennas and a fixed number of receive
antennasHHH
N→ I
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 130 NTS
Channel unknown at the transmitter
Using the latter property, we obtain that for large N,
C = B log det
{(1 +
P
σ2
)I
}= B log
{(1 +
P
σ2
)M}
= MB log
(1 +
P
σ2
)which is M times the SISO Shannon capacity!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 131 NTS
Parallel SISO channel interpretation
Consider the general MIMO channel capacity formula. Let the eigendecomposition
of the positive semi-definite Hermitian matrix HPHH be
HPHH =r∑
i=1
λiuiuHi = UΛUH
where UHU = I, and r , rank{HPHH}. The matrices U and Λ should not be
confused with that of the SVD of the matrix H used earlier!
We will use the property
det{I + AB} = det{I + BA}
valid for any matrices A and B of conformable dimensions.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 132 NTS
Parallel SISO channel interpretation
Assuming that A = U and B = ΛUH , we obtain that
C = B log det
{I +
1
σ2HPHH
}= B log det
{I +
1
σ2UΛUH
}= B log det
{I +
1
σ2ΛUHU
}= B log det
{I +
1
σ2Λ
}= B log
{r∏
i=1
(1 + λi/σ2)
}= B
r∑i=1
log{
1 + λi/σ2}
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 133 NTS
Parallel SISO channel interpretation
The latter formula interprets the capacity of the MIMO channel as the sum of
capacities of r parallel SISO channels.
Assuming the case of uninformed transmitter (P = (P/N)I), r can be interpreted
as the rank of H → full rank channels are preferable!
If H is drawn randomly, then almost sure
rank{H} = min{M, N}
This leads us to the conclusion that the capacity grows nearly proportionally to
min{M, N}.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 134 NTS
Assume M = N and let the Frobenius norm of H be given. What type of channel
will maximize the MIMO capacity?
Result: The capacity is maximized in the case when H is orthogonal:
HHH = HHH = ζI
where ζ is a constant. In this case,
C = B log det
{(1 +
Pζ
σ2N
)I
}= B log
(1 +
Pζ
σ2N
)N
= NB log
(1 +
Pζ
σ2N
)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 135 NTS
SIMO channel capacity
Consider a SIMO column-vector channel h with one transmit and N receive
antennas. The capacity formula becomes
C = B log det
{I +
1
σ2PhhH
}= B log
(1 +
1
σ2PhHh
)= B log
(1 +
P
σ2‖h‖2
)Hence, the SIMO channel comprises only one spatial data pipe. The addition of
receive antennas yields only a logarithmic (rather than linear) increase of capacity.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 136 NTS
MISO channel capacity
Consider a MISO row-vector channel h with one receive and N transmit antennas.
The capacity formula becomes
C = B log
(1 +
1
σ2hPhH
)= B log
(1 +
1
σ2‖hP1/2‖2
)The situation is similar to that in the SIMO case. The increase in capacity is only
logarithmic (rather than linear).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 137 NTS
Ergodic MIMO channel capacity
The channel matrix H is no longer fixed, but is treated as random. The capacity
formula can be averaged over H:
EH{C} = B EH
[log det
{I +
1
σ2HPHH
}]Result (Telatar, 1999): Let H be a Gaussian random matrix with i.i.d. elements.
Then, the average capacity is maximized subject to the power constraint
tr{P} ≤ P when
P =P
NI
That is, to maximize the average capacity, the antennas should transmit
uncorrelated streams with the same power – an intuitively appealing fact.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 138 NTS
Ergodic MIMO channel capacity:Proof (sketch)
H be a Gaussian random matrix with i.i.d. elements.
C = maxP:trP≤P
EH
[log det{I +
1
σ2HPHH}
]Introduce P = ∆P + Poff
C = maxP:trP≤P
EH
[log det{I +
1
σ2H∆PHH +
1
σ2HPoffHH}
]= max
∆P:tr∆P≤PEH
[log det{I +
1
σ2H∆PHH +
1
σ2HPoffHH}
]≤ max
∆P:tr∆P≤Plog det{EH
[I +
1
σ2H∆PHH +
1
σ2HPoffHH
]}
Where the last inequality follows form Jensen’s inequality.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 139 NTS
Ergodic MIMO channel capacity:Proof (sketch)
C ≤ max∆P:tr∆P≤P
log det{EH
[I +
1
σ2H∆PHH +
1
σ2HPoffHH
]}
= max∆P:tr∆P≤P
log det{EH
[I +
1
σ2H∆PHH
]+ EH
[1
σ2HPoffHH
]︸ ︷︷ ︸
=0
}
where the last term in the second equation is identical zero due to the statistical
independence of the entries in H.
We conclude that restricting the transmit covariance to exhibit the diagonal
structure P = ∆P does not reduce the achievable capacity.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 140 NTS
Ergodic MIMO channel capacity:Proof (sketch)
Thus
C = max∆P:tr∆P≤P
EH log det{[
I +1
σ2H∆PHH
]}
We can show that due to the i.i.d. property of H the objective function is
symmetric w.r.t. the input variable, i.e. exchanging the order of the entries
P1, ... , PN does not change the function value. Further the function is concave.
We conclude that the optimal power allocation strategy in this case is to equally
distribute the power among the transmitted symbols, e.g. to choose
P1 = P2 = ... = PN .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 141 NTS
Ergodic MIMO channel capacity
Note that the latter choice of P coincides with our earlier choice of this matrix in
the case of fixed channel and uninformed transmitter.
Choosing P = (P/N) I, the maximal average capacity (which is commonly
referred to as ergodic capacity) becomes
CE = B EH
[log det
{I +
P
σ2NHHH
}]Ergodic capacity has an important advantage w.r.t. fixed-channel capacity as it
gives an average rather than an instantaneous picture.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 142 NTS
Ergodic MIMO channel capacity
Using the parallel SISO channel interpretation and denoting the singular values of
H as γi , we obtain
CE = B EH
[r∑
i=1
log
{1 +
Pγ2i
σ2N
}]
= Br∑
i=1
EH
[log
{1 +
Pγ2i
σ2N
}]Please, note the difference with the water-filling capacity. In contrast to it, in the
latter expression equal powers are used for each eigen-channel.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 143 NTS
Large antenna regime
Let us denote SNR = Pσ2
Then, the capacity formula becomes
CE = Br∑
i=1
EH
[log
{1 +
SNRγ2i
N
}]Assume M = N and i.i.d. Rayleigh fading. Then, using random matrix theory, it
can be obtained that for any SNR
limN→∞
CE
N= const
Therefore, capacity grows linearly in N at any SNR in such an asymptotic regime!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 144 NTS
Outage capacity
A value Cout which is larger than the capacity C in pout percents of channel
realizations. In other words,
Pr(Cout > C ) = pout
If one wants to transmit with Cout bits per second, then the channel capacity is
less than Cout with the probability pout. Hence, the transmission is impossible (the
system is in outage) in pout · 100 percents of time.
Alternatively, we can write
Pr(Cout ≤ C ) = 1− pout
and, hence, in (1− pout) · 100 percents of time the transmission is possible as the
system is not in outage.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 145 NTS
Outage capacity
1− pout is called non-outage probability.
Using the instantaneous MIMO capacity formula, we can define the MIMO outage
capacity by means of the following expression
mintr{P}≤P
Pr
(Cout > B log det
{I +
1
σ2HPHH
})= pout
where we additionally use the opportunity to minimize the outage probability by
means of a proper choice of P. This particular choice, of course, depends on the
statistics of the random channel matrix H.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 146 NTS
Example: Rayleigh fading channel
Rayleigh fading, the channel coefficients are circularly symmetric complex
Gaussian with zero mean and unit variance: a) known channel at the transmitter;
b) unknown channel at the transmitter
−10 0 10 20 30 4010
−1
100
101
102
SNR (dB)
Out
age
capa
city
(bi
ts /
s / H
z)
pout
= 0.01
pout
= 0.1
pout
= 0.5
−10 0 10 20 30 4010
−1
100
101
102
SNR (dB)
Out
age
capa
city
(bi
ts /
s / H
z)
pout
= 0.01
pout
= 0.1
pout
= 0.5
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 147 NTS
MULTIUSER CHANNELS
Why multiuser channels:
I Up to now, we have considered point-to-point communications links.
I Most of communication systems serve multiple users. Therefore, multiuser
channels are of great interest.
I In multiuser channels, one user can interfere to another user. This type of
interference is called multiuser interference (MUI).
Common multiuser channel types:
I Multiple-access channels
I Broadcast channels
I Relay channels
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 148 NTS
Multiple-access channel
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 149 NTS
Broadcast channel
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 150 NTS
Relay channel
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 151 NTS
Multiple-access channels
Two-user multiple-access Gaussian channel:
Y (i) = X1(i) + X2(i) + Z (i), Z (i) ∼ NC (0,σ2)
In the point-to-point (single user) case, the rate limit is the channel capacity. The
achievable rate region is, therefore, given by:
R < B log
(1 +
P
σ2
)In the two-user case, we should extend this concept to a capacity region C which
is a set of all pairs (R1, R2) such that users 1 and 2 can simultaneously reliably
communicate at rates R1 and R2, respectively.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 152 NTS
Multiple-access channels
Since the two users share the same bandwidth, there is a tradeoff between the
rates R1 and R2: if one user wants to communicate at a higher rate, then the
other user may need to lower its rate.
Example of tradeoff: In orthogonal multiple access schemes such as OFDM, the
tradeoff can be achieved by varying the number of subcarriers allocated to each
user.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 153 NTS
Rate region
Different scalar performance measures can be obtained from the capacity region:
I The symmetric capacity
Csym = max(R,R)∈C
R
is the maximum common rate at which both users can simultaneously reliably
communicate.
I The sum capacity
Csum = max(R1,R2)∈C
(R1 + R2)
is the maximum total throughput that can be achieved.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 154 NTS
Rate region
If we have two users with the powers P1 and P2, then the capacity region for the
two-user channel is defined by the following inequalities:
R1 < B log
(1 +
P1
σ2
)R2 < B log
(1 +
P2
σ2
)R1 + R2 < B log
(1 +
P1 + P2
σ2
)The first two constraints say that the rate of each individual used cannot exceed
the capacity of the point-to-point link with the other user absent.
The last constraint says that the total throughput cannot exceed the capacity of
the point-to-point link with a single user defined as the sum of the two users.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 155 NTS
Rate region
That is, not only the rates R1 and R1 are limited, but their sum is limited as well.
This means that the signal of each user may be viewed as an interference for
another user.
Result: The two-user capacity region is a pentagon.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 156 NTS
Rate region: multiple-access channel
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 157 NTS
Rate region: multiple-access channel
Remark: Surprisingly, user 1 can achieve its single-user rate bound
R1 = B log(1 + P1
σ2
)while at the same time, user 2 can get a non-zero rate, as
high as R2 = B log(
1 + P2
P1+σ2
). This corresponds to point A of the capacity
region plot. Indeed,
R1 + R2 = B log
((1 +
P1
σ2
)(1 +
P2
P1 + σ2
))= B log
(1 +
P1
σ2+
P2
P1 + σ2+
P1P2
σ2(P1 + σ2)
)= B log
(1 +
P21 + P1σ
2 + P2σ2 + P1P2
σ2(P1 + σ2)
)= B log
(1 +
P1 + P2
σ2
)19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 158 NTS
Successive interference cancellationHow to achieve this?
Each user should encode its data using a capacity achieving channel code. The
receiver should decode the information of both users in two stages:
I In the first stage, the data of user 2 are decoded treating user 1 as AWGN.
Then, the maximum rate of user 2 can achieve R2 = B log(
1 + P2
P1+σ2
).
I In the second stage, the reconstructed (decoded) signal of user 2 is
subtracted from the aggregate received signal, and then the data of user 1
are decoded. Since the user 2 is already subtracted and there is only the
background AWGN left in the system, the achieved rate for user 1 will be
R1 = B log(1 + P1
σ2
).
This two-stage decoding is called successive interference cancellation.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 159 NTS
Successive interference cancellation
If one reverses the order of cancellation then one can achieve point B rather than
A.
All other rate points on the segment AB can be obtained by time-sharing between
the multiple-access strategies of points A and B.
The segment AB contains all the optimal operating points of the channel, in the
sense that any point in the capacity region is dominated by some point on AB.
That is, for any point within the capacity region that corresponds to the rates R1∗and R2∗ we can always find a point on the segment AB whose rates R1 and R2
satisfy:
R1∗ ≤ R1, R2∗ ≤ R2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 160 NTS
Pareto-optimal
The points on the segment AB are called Pareto-optimal.
One can always increase the user rates to move to a point on the segment AB,
and there is no reason not to do this.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 161 NTS
The concrete choice of the point on AB depends on our particular objectives:
I To maximize the sum capacity Csum, any point on AB is equally fine. Note
that we have already computed the sum of R1 and R2 in the point A. Hence,
Csum = B log
(1 +
P1 + P2
σ2
)I To maximize the symmetric capacity Csym, we should take the point on AB
that gives us equal rates R1 and R2.I Some operating points on AB may be not fair, especially if the received
power of one user is much higher than that of the other user. In this case, we
should consider operating on the corner point in which the stronger user is
decoded first.19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 162 NTS
How does the system with successive cancellation compares to
a standard CDMA system in terms of achievable rate?
The principal difference between CDMA detection and successive cancellation
detection is that:
I In the CDMA system, each user is decoded treating other users as
interference. This corresponds to the single-user receiver principle and we
immediately conclude that the performance of the CDMA system is
suboptimal; i.e., it achieves the point which is strictly in the interior of the
capacity region.
I In contrast to CDMA, the successive cancellation receiver is a multiuser
receiver: only one of the users (say, user 1) is decoded treating user 2 as
interference, but user 2 is decoded with the benefit of the signal of user 1
being already removed.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 163 NTS
In the successive cancellation receiver case,
R1 = B log
(1 +
P1
σ2
), R2 = B log
(1 +
P2
P1 + σ2
)or
R1 = B log
(1 +
P1
P2 + σ2
), R2 = B log
(1 +
P2
σ2
)In the CDMA receiver case,
R1 = B log
(1 +
P1
P2 + σ2
), R2 = B log
(1 +
P2
P1 + σ2
)That is, one of the rates in the CDMA case is always lower than in the case of
successive cancellation!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 164 NTS
Correspondingly, in the successive cancellation receiver case,
Csum = B log
(1 +
P1 + P2
σ2
)
In the CDMA receiver case, the sum rate is
B log
(1 +
P1
P2 + σ2
)+ B log
(1 +
P2
P1 + σ2
)= B log
((1 +
P1
P2 + σ2
)(1 +
P2
P1 + σ2
))= B log
(1 +
P1 + P2
σ2− P1P2(P1 + P2 + σ2)
σ2(P1 + σ2)(P2 + σ2)
)< Csum
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 165 NTS
K -user multiple-access Gaussian channel
Y (i) =K∑
k=1
Xk(i) + Z (i), Z (i) ∼ NC (0,σ2)
Similar to the two-user case, in the case of K users, all of them share the same
bandwidth, and there is a tradeoff between the rates Rk (k = 1, 2, ... , K ). If one
(or more) users want to communicate at higher rate(s), then the other user(s)
may need to lower their rate(s).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 166 NTS
In the K user-case, we can define the capacity region C as a set of all
(R1, R2, ... , RK ) such that users 1, 2, ... , K can simultaneously reliably
communicate at rates R1, R2, ... , RK , respectively.
This capacity region is described by the 2K − 1 constraints:
Rk < B log
(1 +
Pk
σ2
), k = 1, ... , K
Rk + Ri < B log
(1 +
Pk + Pi
σ2
), k , i = 1, ... , K
Ri + Rk + Rl < B log
(1 +
Pk + Pi + Pi
σ2
), k, i , l = 1, ... , K
· · · · · · · · · · · · · · · · · · · · · · · ·K∑
k=1
Rk < B log
(1 +
∑Kk=1 Pk
σ2
)19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 167 NTS
K -user multiple-access Gaussian channel
The K -user capacity region can be written in a short form as∑k∈S
Rk < B log
(1 +
∑k∈S Pk
σ2
)for all S ⊂ {1, ... , K}
The right hand side
B log
(1 +
∑k∈S Pk
σ2
)is the maximum sum rate that can be achieved by a single transmitter with the
total power of the users in S and with no other users in the system.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 168 NTS
The sum capacity can be defined as
Csum = max(R1,...,RK )∈C
K∑k=1
RK
It can be shown that
Csum = B log
(1 +
∑Kk=1 Pk
σ2
)and that there are exactly K ! corner points in the capacity region, each one
corresponding to a different successive cancellation order among the users.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 169 NTS
In the equal power case(P1 = P2 = · · · = PK = P)
Csum = B log
(1 +
KP
σ2
)Observe that the sum capacity is unbounded as the number of users grows. In
contrast, in the conventional CDMA receiver (decoding each user treating all the
other users as noise), the sum rate will be only
BK log
(1 +
P
(K − 1)P + σ2
)which approaches
BKP
(K − 1)P + σ2log e ' B log e
as K →∞. The growing interference is a limiting factor here!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 170 NTS
The symmetric capacity can be defined as
Csym = max(R,R,...,R)∈C
R
It can be shown that in the equal power case (P1 = P2 = · · · = PK = P),
Csym =B
Klog
(1 +
KP
σ2
)This rate for each user can be obtained by orthogonal multiplexing where each
user is allocated a fraction 1/K of the total degrees of freedom (for example, of
the total bandwidth B).
Note that Csym = Csum
K
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 171 NTS
Broadcast channels
Two-user broadcast AGWN channel:
Yk(i) = hkX (i) + Z (i), k = 1, 2; Zk(i) ∼ NC (0,σ2)
where hk is the fixed complex channel gain corresponding to the kth user.
The broadcast case is often referred to as downlink.
The transmit power constraint: the average power of the transmit signal is P.
As in the multi-access (uplink) channel case, we can define the capacity region Cas the region of rates (R1, R2) at which both users can simultaneously reliably
communicate.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 172 NTS
Broadcast channels
We have just two single-user bounds:
Rk < B log
(1 +
P|hk |2
σ2
), k = 1, 2
For any k, this upper bound on Rk can be attained by using all the transmit
power to communicate to user k (with the rate of the remaining user being zero).
Thus, we have two extreme points:
R1 = B log
(1 +
P|h1|2
σ2
), R2 = 0
R2 = B log
(1 +
P|h2|2
σ2
), R1 = 0
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 173 NTS
Rate region in the symmetric case |h1| = |h2|
Further, we can share the degrees of freedom (time and bandwidth) between the
users in an orthogonal manner to obtain any rate pair on the line joining these two
extreme points.
Hence, for the symetric ase of |h1| = |h2| the capacity region is a triangle.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 174 NTS
Rate region in the symmetric case |h1| = |h2|
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 175 NTS
In the symmetric case |h1| = |h2| , |h| sum rate can be shown to be bounded by
the single-user capacity:
R1 + R2 < B log
(1 +
P|h|2
σ2
)The latter conclusion follows from the triangle form of the capacity region.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 176 NTS
As have been already mentioned, the rate pairs in the capacity region can be
achieved by sharing the degrees of freedom (bandwidth and time) between the
two users. What are the alternative ways to achieve the boundary of the capacity
region?
The channel symmetry suggests an alternative natural approach:
I Let the channel of the user 2 be stronger than that of user 1 (|h1| < |h2|).
Thus, if user 1 can successfully decode its data from Y1, then user 2 (which
has higher SNR) should also be able to decode the data of user 1 from Y2.
Then, user 2 can subtract the data of user 1 from its received signal Y2 to
better decode its own data; i.e., it can perform successive interference
cancellation.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 177 NTS
Consider the following transmission strategy that superposes the signals of two
users, much like in a spread-spectrum CDMA system. The transmitted signal is
the sum of two signals:
X (i) = X1(i) + X2(i)
where Xk(i) is the signal intended for user k .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 178 NTS
Superposition coding
Weaker user 1 decodes it own signal by treating the signal for user 2 as noise.
Stronger user 2 performs successive interference cancellation: it first decodes the
data of user 1 by treating X2 as noise, subtracts the so-determined signal of user 1
from Y2, and then extracts its own data. As a result, for any possible power split
of P = P1 + P2, the following rate pair can be achieved
R1 = B log
(1 +
P1|h1|2
P2|h1|2 + σ2
)R2 = B log
(1 +
P2|h2|2
σ2
)This strategy is commonly referred to as superposition coding.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 179 NTS
Orthogonal scheme
On the other hand, in orthogonal schemes, for any power split P = P1 + P2 and
degree-of-freedom split α ∈ [0, 1], the following rates are jointly achieved
R1 = αB log
(1 +
P1|h1|2
ασ2
)R2 = (1− α)B log
(1 +
P2|h2|2
(1− α)σ2
)Here, α can be interpreted, for example, as the fraction of bandwidth (e.g. both
bandwidth B and noise power reduced by factor α) assigned to user 1. Alternative
α can be interpreted as a fraction of time assigned to user 1 (e.g · bits per 1
seconds becomes · bits per α second and signal power P1 is consumed in fraction
α second of time).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 180 NTS
Rate region in the symmetric case|h1| = |h2| = |h|
Assume that superposition coding is used and that the power is split such that
P ≤ P1 + P2. In this case if user 1 can decode its data treating the data of user 2
as noise, then also user 2 can decode the data of user 1, substract it from its
received signal and decoding its own data. Hence the the following rates pairs are
suported.
R1 ≤ B log(1 +P1|h1|2
P2|h1|2 + σ2) = B log(1 +
(P1 + P2)|h1|2
σ2)− B log(1 +
P2|h1|2
σ2)
R2 ≤ B log(1 +P2|h2|2
σ2)
Thus for |h1| = |h2| = |h| and the power constraint P ≤ P1 + P2 the sum capacity
is given by
R1 + R2 ≤ B log B log(1 +P|h|2
σ2)
.19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 181 NTS
Rate region in the general case|h1| ≤ |h2|
Solid line: optimal power split using superposition coding.
Dashed line: optimal degrees of freedom split using orthogonal coding.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 182 NTS
In the K -user broadcast case, the boundary of the capacity region can be proved
to be given by
Rk = log
(1 +
Pk |hk |2
σ2 + (∑K
l=k+1 Pl)|hk |2
), k = 1, ... , K
for all possible power splits P =∑K
k=1 Pk of the total power at the base station.
The optimal points are achieved by superposition coding and successive
interference cancellation at the receivers. The cancellation order at every receiver
should be always to decode the weaker users before decoding its own data.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 183 NTS
Fading channels
Until now, all multi-user channels have been considered without random channel
fading.
Let us now include fading in the signal model. Channel state information issue is
critical in such cases.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 184 NTS
Multiple-access fading channels
K -user multiple-access fading channel:
Y (i) =K∑
k=1
hk(i)Xk(i) + Z (i)
where {hk(i)} is the random fading process of user k .
We assume that
E{|hk(i)|2
}= 1, k = 1, ... , K
and that the fading processes of different users are i.i.d.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 185 NTS
Slow fading
The time-scale of communication is short relative to the channel coherence time
of all users. Hence, hk(i) = hk for all K .
Suppose all users transmit at the rate R. Conditioned on each realization of
h1, ... , hK , we have the standard multiple-access AGWN channel with received
SNR of user k equal to |hk |2P/σ2. If the symmetric capacity is less than R, then
this results in outage. Using the expressions for the K -user capacity region, the
outage probability can be written as
pout =Pr
{B log
(1+SNR
∑k∈S
|hk |2)
)< |S|R for some S⊂{1, ... , K}
}
where |S| denotes the cardinality of S and SNR = P/σ2.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 186 NTS
Fast fading
Each hk(i) is modelled as a time-varying ergodic process.
The sum capacity in the fast fading case:
Csum = E
{B log
(1 +
∑Kk=1 |hk |2P
σ2
)}
How does this compare to the sum capacity of the uplink channel without fading?
Let us use Jensen’s inequality which basically says that
E{f (X )} ≤ f (E{X})
for any concave function f (·) and random variable X .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 187 NTS
Using this inequality, we obtain that
Csum = E
{B log
(1 +
∑Kk=1 |hk |2P
σ2
)}
≤ B log
1 +E{∑K
k=1 |hk |2}
P
σ2
= B log
(1 +
KP
σ2
)where the property E
{|hk(m)|2
}= 1 (k = 1, ... , K ) has been used in the last line.
The last expression can be identified as the sum capacity of the AWGN
multiple-access channel. Hence, without channel state information at the
transmitter, fading can only hurt.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 188 NTS
However, if the number of users K becomes large, then
K∑k=1
|hk |2 → K
and the penalty due to fading vanishes. Basically, the effect of fading is averaged
over a large number of users.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 189 NTS
Let us now assume that we have full (possibly also non-causal) channel state
information at both the transmitter and receiver sides.
Block-fading model:
Y (i) =K∑
k=1
hk(i)Xk(i) + Z (i)
where hk(i) = hk,l remains constant over the lth coherence channel period of Tc
(Tc � 1) symbols and is i.i.d. across different coherence periods.
The channel over L such coherence periods can be viewed as a number of L
parallel “sub-channels” which fade independently. Therefore, we can again use
water-filling philosophy.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 190 NTS
For a given realization of the channel gains hk,l (k = 1, ... , K ; l = 1, ... , L), the
sum capacity is given by
max{Pk,l}
B
L
L∑l=1
log
(1 +
∑Kk=1 Pk,l |hk,l |2
σ2
)
subject to Pk,l ≥ 0 (k = 1, ... , K ; l = 1, ... , L) and the average power constraint
1
L
L∑l=1
Pk,l = P, k = 1, ... , K
The solution to this optimization problem as L→∞ yields the appropriate power
allocation policy.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 191 NTS
This leads to a variable rate scheme: in each lth “sub-channel”, the rates that are
dictated by the above optimization problem are used.
Optimal strategy: The sum rate in the lth “sub-channel”
B log
(1 +
∑Kk=1 Pk,l |hk,l |2
σ2
)
for a given total power∑K
k=1 Pk,l allocated to this “sub-channel” is maximized by
giving all this power to the user with the strongest channel gain. That is, each
time only one user with the best channel is allowed to transmit. Under this
strategy, the multiuser channel for each time l reduces to a point-to-point channel
with the channel gain
maxk=1,...,K
|hk,l |2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 192 NTS
Broadcast fading channels
K -user downlink fading channel:
Yk(i) = hk(i)X (i) + Zk(i), k = 1, ... , K
where {hk(i)} is the random fading process of user k .
Similar to the uplink case, we assume that
E{|hk(i)|2
}= 1, k = 1, ... , K
and that the fading processes of different users are i.i.d.
The transmit power is constrained to be equal to P.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 193 NTS
Let us first consider the case when the channel state information is available only
at the receiver.
We have the following single-user bounds:
Rk < B E
{log
(1 +
P|h|2
σ2
)}, k = 1, ... , K
where h is a random channel gain.
For any k , this upper bound on Rk can be attained by using all the transmit power
to communicate to user k (with the rate to the remaining users being zero). Thus,
as in the non-fading case, we have K extreme points of the capacity region.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 194 NTS
Similar to the non-fading case, it can be shown that the sum rate is also bounded
by the same quantity
K∑k=1
Rk < B E
{log
(1 +
P|h|2
σ2
)}This bound can be achieved by transmitting only to one user or by time-sharing
between any number of users.
It can be shown that the rate pairs in the capacity region can be achieved by both
orthogonal schemes and superposition coding.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 195 NTS
Let us now consider the case when the channel state information is available both
at the transmitter and receiver.
Let us focus on the sum capacity. As in the uplink case, it can be shown that the
sum capacity is achieved by transmitting only to the best user at each time. Under
this strategy, the downlink channel reduces to a point-to-point channel with the
channel gain
maxk=1,...,K
|hk |2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 196 NTS
Multiuser diversity
We have seen that in the full channel state information case, from the sum
capacity perspective, the optimal strategy both in the uplink and downlink cases
reduces the multiuser case to the single-user (point-to-point) case with the fading
of magnitude maxk |hk(i)|. Compared to a system with a single user, the multiuser
diversity gain comes from:
I the increase of the total transmit power in the uplink case;
I the improvement of the effective channel gain at time i from |hk(i)|2 to
maxk=1,...,K |hk |2.
The second effect appears entirely due to the ability to dynamically schedule
resources among the users as a function of the channel state.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 197 NTS
Remarks
I The multiuser diversity gain comes from the following effect: when many
users fade independently, at any time there is a high probability that one of
them has a strong channel. By allowing only that user to transmit or, vice
versa, transmitting only to that user, the shared channel resource is used in
the most efficient manner, and the total throughput is maximized.
I The larger the number of users, the higher is the multiuser diversity gain.
I The amount of multiuser diversity gain depends critically on the tail of the
distribution of |hk |2: the heavier the tail, the more likely there is a user with
the strong channel, and the larger the multiuser diversity gain.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 198 NTS
System requirements to extract the multiuserdiversity benefits
I the base station has to access the channel quality of each user:
I in downlink, each user has to track its own channel SNR and feed back the
channel quality to the base station.I in uplink, the base station has to track the user channel quality (user SNRs).
I the base station has to schedule transmissions among the users as well as to
adapt the data rate as a function of instantaneous channel quality.
Such a scheduling procedure is often called opportunistic scheduling.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 199 NTS
Fairness and delay
I In reality, the fading statistics of different users may be non-symmetric: there
are users some users who are closer to the base station and better in their
average SNR; there are users that are stationary (non-moving), or having no
scatterers around.
I The multiuser diversity strategy is only concerned with maximizing long-term
average throughputs. In practice, there are latency requirements, that is, the
average throughput over the delay is the performance metric of interest.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 200 NTS
Channel measurement and feedback
I All scheduling decisions are done as a function of user channel states. Hence,
the quality of channel estimation is a primary issue, and feedback from the
users to the base station is needed in the downlink case.
I Both the error in channel measurement and the delay/error in feeding the
channel state back are significant bottlenecks of practical applications of the
multiuser diversity strategy.
Slow or limited fading:
I We have observed that the use of multiuser diversity strategy requires fading
to be rich and fast. Not useful for line-of-sight scenarios or cases with little
scattering or slowly changing environments.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 201 NTS
Proportional fair downlink scheduling
I Keeps track of the average throughput Tk(i) (k = 1, ... K ) of each user in
some (e.g., exponentially weighted) time-window of length tW .
I In the ith time-slot, the base station receives the requested/supportable rates
Rk(i) (k = 1, ... K ) from all users, and transmits to the user k∗ with the
largest
γ = Rk(i)/Tk(i)
I The average throughputs are updated as:
Tk(i + 1) =
{(1− 1/tW )Tk(i) + Rk(i)/tW , k = k∗
(1− 1/tW )Tk(i), k 6= k∗
This algorithm is used in the downlink mode of the 3G system IS-856.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 202 NTS
Combination of multiuser diversity and super-position coding
I Divides the users in several classes (say, in two classes depending on whether
they are near to the base station or near the cell edge). Then, users in each
class have statistically comparable channel strengths.
I Users whose current channel is instantaneously strongest in their own class
are scheduled for simultaneous transmission using superposition coding. Users
of “stronger” classes (e.g., nearby users) receive less power, still enjoying very
good rates and minimally affecting the performance of the “weak” classes of
users.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 203 NTS
ADVANCES OF CHANNEL CODING
We have already discussed the linear block channel codes in the Information
Theory I. Now, we will discuss cyclic codes as well as convolutional codes.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 204 NTS
Cyclic codes
An important subclass of linear block codes.
Consider an n-tuple
c = [c0, c1, ... , cn−1]
Cyclically shifting the components of c, we have
c(1) = [cn−1, c0, ... , cn−2]
Using i subsequent cyclic shifts, we have
c(i) = [cn−i , cn−i+1, ... , cn−1, c0, c1, ... , cn−i−1]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 205 NTS
Definition: cyclic codes
An (n, k) linear block code C is called a cyclic code if every cyclic shift of any
codeword in C is also a codeword in C .
Properties:
I Linearity: the sum of any two codewords is also a codeword;
I Cyclic property: Any cyclic shift of any codeword is also a codeword.
To develop the theory of cyclic codes, let us treat the components of the
codeword c as the coefficients of the following polynomial:
c(X ) = c0 + c1X + · · ·+ cn−1X n−1
where X is an indeterminate.
The fact that all ci are binary is taken into account by using the binary arithmetic
for all polynomial coefficients when operating with polynomials.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 206 NTS
Cyclic codes
There is one-to-one correspondence between the vector c and the polynomial
c(X ). We will call c(X ) the code polynomial of c.
Each power of X in the polynomial c(X ) represents a one-bit shift in time. Hence,
multiplication of c(X ) by X may be viewed as shift to the right.
Key question: how to make such a shift cyclic?
Let c(X ) be multiplied by X i , yielding
X ic(X ) = X i(c0 +c1X +...+cn−i−1X n−i−1+cn−iXn−i+...+cn−1X n−1)
= c0X i +c1Xi+1+...+cn−i−1X
n−1+cn−iXn+...+cn−1X n+i−1
= cn−iXn+...+cn−1X n+i−1+c0X i +c1X
i+1+...+cn−i−1Xn−1
where, in the last line, we have just rearranged the terms.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 207 NTS
Cyclic codes
Recognizing, for example, that cn−i + cn−i = 0 in the modulo-2 arithmetic, we
can manipulate the first i terms as follows:
X ic(X ) = cn−i +...+ cn−1Xi−1 + c0X i + c1X i+1 + ... + cn−i−1X n−1
+cn−i (X n + 1) + ... + cn−1X i−1(X n + 1)
Defining
c(i)(X ) , cn−i +...+ cn−1Xi−1 + c0X i + c1X i+1 + ... + cn−i−1X n−1
q(X ) , cn−i + cn−i+1X + ... + cn−1X i−1
we can reformulate the first equation in this page in the following compact form
X ic(X ) = q(X )(X n + 1) + c(i)(X )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 208 NTS
Cyclic codes
The polynomial c(i)(X ) can be recognized as the code polynomial of the codeword
c(i) obtained by applying i cyclic shifts to the codeword c.
Moreover, from the latter equation, we readily see that c(i)(X ) is the remainder
that results from dividing X ic(X ) by (X n + 1).
Hence, we may formally state the cyclic property in polynomial notation as
follows: if c(X ) is a code polynomial, then the polynomial
c(i)(X ) = X ic(X ) mod(X n + 1)
is also a code polynomial for any cyclic shift i , where mod(X n + 1) stands for
“modulo-(X n + 1) multiplication”.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 209 NTS
Cyclic codes
Note that n cyclic shifts of any codeword does not change it, which means that
X n = 1, and hence X n + 1 = 0 in modulo-(X n + 1) arithmetics!
Generator polynomial: a polynomial g(X ) of minimal degree that completely
specifies the code and is a factor of X n + 1. The degree of g(X ) is equal to the
number of parity-check bits of the code, n − k .
It can be shown that any cyclic code is uniquely determined by its generator
polynomial in that each code polynomial in the code can be expressed in the form
of a polynomial product as follows:
c(X ) = a(X )g(X )
where a(X ) is a polynomial of degree k − 1.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 210 NTS
Cyclic codes
Given the generator polynomial g(X ), we want to encode the message
[m0, ... , mk−1] in an (n, k) systematic form. The codeword structure is
[b0, b1, ... , bn−k−1, m0, m1, ... , mk−1]
Define the message bits and parity bits polynomials as
m(X ) , m0 + m1X + ... + mk−1X k−1
b(X ) , b0 + b1X + ... + bn−k−1X n−k−1
We want the code polynomial to be in the form c(X ) = b(X ) + X n−km(X )
This means that b0, ... , bn−k−1 occupy the first n − k positions of each codeword,
whereas the message bits start from the (n − k + 1)st position.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 211 NTS
Cyclic codes
Using equation c(X ) = a(X )g(X ) yields
a(X )g(X ) = b(X ) + X n−km(X )
Equivalently,X n−km(X )
g(X )= a(X ) +
b(X )
g(X )
which means that b(X ) is the remainder left over after dividing X n−km(X ) by
g(X ).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 212 NTS
Example: A (7,4) cyclic code
We start with the polynomial X 7 − 1 and factorize it into three irreducible
polynomials as
X 7 − 1 = (1 + X )(1 + X 2 + X 3)(1 + X + X 3)
where by an irreducible polynomial we mean a polynomial that cannot be factored
using only polynomials with binary coefficients.
Let us take
g(X ) = 1 + X + X 3
as generator polynomial whose degree is equal to the number of parity bits.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 213 NTS
Example: A (7,4) cyclic code
We can also define a parity check polynomial
h(X ) = 1 +k−1∑i=1
hiXi + X k
such that g(X )h(X ) = X n + 1
or, equivalently g(X )h(X ) mod(X n + 1) = 0
For our example, the parity check polynomial
h(X ) = 1 + X + X 2 + X 4
so that h(X )g(X ) = (1 + X + X 2 + X 4)(1 + X + X 3) = X 7 + 1.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 214 NTS
Example: A (7,4) cyclic code
How to encode, for example, the message sequence 1001?
The corresponding message polynomial is
m(X ) = 1 + X 3
Multiplying m(X ) by X n−k = X 3, we have
X n−km(x) = X 3 + X 6
Dividing X n−km(x) by g(X ), we have
X 3 + X 6
1 + X + X 3= X + X 3 +
X + X 2
1 + X + X 3
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 215 NTS
Example: A (7,4) cyclic code
That is,
a(X ) = X + X 3, b(X ) = X + X 2
and the encoded message is
c(X ) = b(X ) + X n−km(X )
= X + X 2 + X 3(1 + X 3)
= X + X 2 + X 3 + X 6
or, alternatively,
c = [0111001]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 216 NTS
Relationship to conventional linear block codes
for the considered (7, 4) code, we can construct the generator matrix from
generator polynomial by using
g(X ) = 1 + X + X 3
Xg(X ) = X + X 2 + X 4
X 2g(X ) = X 2 + X 3 + X 5
X 3g(X ) = X 3 + X 4 + X 6
as the rows of the 4× 7 generator matrix
G =
1 1 0 1 0 0 0
0 1 1 0 1 0 0
0 0 1 1 0 1 0
0 0 0 1 1 0 1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 217 NTS
Relationship to conventional linear block codes
Clearly, the latter generator matrix is in non-systematic form. We can put it in a
systematic form by manipulating with its rows, that is, by adding the first row to
the third row and adding the sum of the first two rows to the fourth row. Then,
we get
G =
1 1 0 1 0 0 0
0 1 1 0 1 0 0
1 1 1 0 0 1 0
1 0 1 0 0 0 1
Decoding cyclic codes can be made in the same way as for any other linear block
codes, e.g., using syndrome.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 218 NTS
Popular cyclic codes are the so-called cyclic redundancy check (CRC) codes,
Bose-Chaudhuri-Hocquenghem (BCH) codes, and non-binary Reed-Solomon (RS)
codes. The are parts of different international communication standards, e.g.,
digital subscriber line (DSL) standards.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 219 NTS
Convolutional codes
Most powerful class of linear codes.
Similar to the linear block codes the encoder of a convolutional code accepts k-bit
message blocks and produces an encoded sequence of n-bit blocks. However, each
encoded block depends not only on the corresponding k-bit message block, but
also on the M previous message blocks.
Such an encoder is said to have a memory order of M.
The ratio
R = k/n
is called the code rate.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 220 NTS
Convolutional codes
The message sequence m = [m0, m1, m2, ...] enters the encoder one bit at a time.
The encoder output sequences are obtained as the convolution of the input
sequence with the encoder generator sequences. For an encoder with the memory
order M, the length of these sequences is M + 1. For example, in the case of two
impulse generator sequences,
g(0) = [g(0)0 , ... , g
(0)M ], g(1) = [g
(1)0 , ... , g
(1)M ]
we can write encoding equations
c(0) = m ∗ g(0), c(1) = m ∗ g(1)
where ∗ denotes the discrete convolution and all operations are modulo-2.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 221 NTS
Convolutional codes
The convolution operation implies that
c(j)l =
M∑i=0
ml−ig(j)i , j = 0, 1
where ml−i = 0 for all l < i .
After encoding, the output sequences are multiplexed into a single sequence called
the codeword
c = [c(0)0 , c
(1)0 , c
(0)1 , c
(1)1 , ... ]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 222 NTS
Convolutional codes
Defining a matrix
G =
g
(0)0 g
(1)0 g
(0)1 g
(1)1 · · · g
(0)M g
(1)M
g(0)0 g
(1)0 g
(0)1 g
(1)1 · · · g
(0)M g
(1)M
. . .. . .
. . .
where all blank areas are zeros, we can rewrite the encoding equations in matrix
form as
c = mG
This form of this equation is equivalent to that of linear block codes! Therefore,
we call G the generator matrix of the code.
In the case of semi-infinite message sequence, the matrix G is semi-infinite as well.
However, if m is finite-length, then G becomes finite-length as well.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 223 NTS
Example: R = 1/2 code
With the generator sequences:
g(0) = [1011]
g(1) = [1111]
Let the message sequence be
m = [10111]
Encoding equations yield
c(0) = [10111] ∗ [1011] = [10000001]
c(1) = [10111] ∗ [1111] = [11011101]
and, hence, the 2(k + M)-bit codeword
c = [11 01 00 01 01 01 00 11]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 224 NTS
Example: R = 1/2 code
Alternatively, we can write the k × 2(k + M) generator matrix as
G =
11 01 11 11
11 01 11 11
11 01 11 11
11 01 11 11
11 01 11 11
and obtain the same codeword as
c=[10111]
11 01 11 11
11 01 11 11
11 01 11 11
11 01 11 11
11 01 11 11
=[11 01 00 01 01 01 00 11]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 225 NTS
Code tree and trellis
Let us discuss the concepts of code tree and trellis using a particular example of
the R = 1/2 convolutional code with M = 2 and the impulse responses
g(0) = [111], g(1) = [101]
Consider the input sequence m = [10011]. Similar to the example above, it can be
shown that the codeword becomes
c = [11 10 11 11 01 01 11]
To enforce the R = 1/2 property, let us truncate the codeword by dropping the
last 2M = 4 bits (the effect of truncation becomes negligible if longer messages
and codewords are used). Then, the codeword becomes [11 10 11 11 01]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 226 NTS
Convolutional Encoder
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 227 NTS
The code tree is defined as follows: each branch of the tree represents an input
symbol (0 or 1). The corresponding output (coded) symbols are indicated on each
branch. A specific path can be traced for each message sequence. The
corresponding coded symbols on the branches following this path form the output
sequence.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 228 NTS
Code tree
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 229 NTS
State diagram
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 230 NTS
Trellis diagram
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 231 NTS
Trellis diagram
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 232 NTS
Trellis diagram
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 233 NTS
Trellis diagram
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 234 NTS
Trellis diagram
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 235 NTS
Complexity of Viterbi Decoder
Over L binary intervals, the total number of comparisons made by the Viterbi
algorithm is 2K−1L, rather than 2L comparisons required by the standard
maximum-likelihood procedure (full tree search).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 236 NTS
Probability of deviating from correct path
Let a(d) denote the number of pathes with a Hamming distance d deviating from,
and then returning to, the all-0 test path. The error probability of Pe of deviating
from the correct path is then upper bounded by
Pe <
∞∑d=dF
a(d)Pd
where Pd denotes that probability that d bits are received in error and dF denotes
the minimum free distance.
Inequality sign because pathes a not mutually exclusive.
Pe depends critically on the minimum free distance dF !
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 237 NTS
CONCLUSION
I We have studied advanced information theory including the capacity
characterization of multi-antenna and multi-user channels (and the resulting
concept of multiuser diversity), and advanced channel coding approaches
such as cyclic, convolutional codes.
I To apply these concepts and approaches to practice or to do research in these
fields, a deeper study is required.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 238 NTS