noise info theory 2
TRANSCRIPT
-
8/3/2019 Noise Info Theory 2
1/44
Noise, Information Theory,and Entropy
CS414 Spring 2007By Roger Cheng, Karrie Karahalios, Brian Bailey
-
8/3/2019 Noise Info Theory 2
2/44
Communication systemabstraction
Informationsource
Encoder Modulator
Channel
Output signal Decoder Demodulator
Sender side
Receiver side
-
8/3/2019 Noise Info Theory 2
3/44
The additive noise channel
Transmitted signal s(t)is corrupted by noisesource n(t), and the
resulting received signalis r(t)
Noise could result formmany sources, including
electronic componentsand transmissioninterference
n(t)
+s(t) r(t)
-
8/3/2019 Noise Info Theory 2
4/44
Random processes
A random variable is the result of a singlemeasurement
A random process is a indexed collection ofrandom variables, or equivalently a non-deterministic signal that can be described by
a probability distribution
Noise can be modeled as a random process
-
8/3/2019 Noise Info Theory 2
5/44
-
8/3/2019 Noise Info Theory 2
6/44
WGN continued
When an additive noise channel has a white Gaussiannoise source, we call it an AWGN channel
Most frequently used model in communications
Reasons why we use this model Its easy to understand and compute
It applies to a broad class of physical channels
-
8/3/2019 Noise Info Theory 2
7/44
Signal energy and power
Energy is defined as
Power is defined as
Most signals are either finite energy and zeropower, or infinite energy and finite power
Noise power is hard to compute in time domain Power of WGN is its variance 2
2= | ( ) | dtx x t
/ 2
2
/ 2
1
= lim | ( ) |
T
x TT
P x t dt T
-
8/3/2019 Noise Info Theory 2
8/44
Signal to Noise Ratio (SNR)
Defined as the ratio of signal power to thenoise power corrupting the signal
Usually more practical to measure SNR on adB scale
Obviously, want as high an SNR as possible
-
8/3/2019 Noise Info Theory 2
9/44
Analog vs. Digital
Analog system Any amount of noise will create distortion at the
output
Digital system A relatively small amount of noise will cause no
harm at all
Too much noise will make decoding of receivedsignal impossible
Both - Goal is to limit effects of noise to amanageable/satisfactory amount
-
8/3/2019 Noise Info Theory 2
10/44
Information theory and entropy
Information theory tries tosolve the problem ofcommunicating as muchdata as possible over a
noisy channel
Measure of data is entropy
Claude Shannon firstdemonstrated that reliablecommunication over a noisychannel is possible (jump-started digital age)
-
8/3/2019 Noise Info Theory 2
11/44
Review of Entropy Coding Alphabet: finite, non-empty set
A = {a, b, c, d, e}
Symbol (S): element from the set
String: sequence of symbols from A
Codeword: sequence representing coded string 0110010111101001010
Probability of symbol in string
Li: length of codeword of symbol I in bits
1
1
N
i
ip
-
8/3/2019 Noise Info Theory 2
12/44
"The fundamental problem ofcommunication is that of reproducing atone point, either exactly or approximately,a message selected at another point."
-Shannon, 1944
-
8/3/2019 Noise Info Theory 2
13/44
Measure of Information
Information content of symbol si
(in bits) log2p(si)
Examples
p(si)= 1 has no information
smaller p(si)has more information, as it wasunexpected or surprising
-
8/3/2019 Noise Info Theory 2
14/44
Entropy
Weigh information content of each sourcesymbol by its probability of occurrence:
value is called Entropy (H)
Produces lower bound on number of bits neededto represent the information with code words
n
i
ii spsp1
)(log)( 2
-
8/3/2019 Noise Info Theory 2
15/44
Entropy Example
Alphabet = {A, B}
p(A) = 0.4; p(B) = 0.6
Compute Entropy (H)
-0.4*log2 0.4 + -0.6*log2 0.6 = .97 bits
Maximum uncertainty (gives largest H)
occurs when all probabilities are equal
-
8/3/2019 Noise Info Theory 2
16/44
Entropy definitions
Shannon entropy
Binary entropy formula
Differential entropy
-
8/3/2019 Noise Info Theory 2
17/44
Properties of entropy
Can be defined as the expectation of log p(x) (ieH(X) = E[-log p(x)])
Is not a function of a variables values, is a functionof the variables probabilities
Usually measured in bits (using logs of base 2) ornats (using logs of base e)
Maximized when all values are equally likely (ieuniform distribution)
Equal to 0 when only one value is possible
-
8/3/2019 Noise Info Theory 2
18/44
Joint and conditional entropy
Joint entropy is the entropy of thepairing (X,Y)
Conditional entropy is the entropy of Xif the value of Y was known
Relationship between the two
-
8/3/2019 Noise Info Theory 2
19/44
Mutual information
Mutual information is how muchinformation about X can be obtained by
observing Y
-
8/3/2019 Noise Info Theory 2
20/44
Mathematical model of a
channelAssume that our input to the channel is
X, and the output is Y
Then the characteristics of the channelcan be defined by its conditional
probability distribution p(y|x)
-
8/3/2019 Noise Info Theory 2
21/44
Channel capacity and rate
Channel capacity is defined as themaximum possible value of the mutual
information
We choose the best f(x) to maximize C
For any rate R < C, we can transmitinformation with arbitrarily smallprobability of error
-
8/3/2019 Noise Info Theory 2
22/44
Binary symmetric channel
Correct bit transmitted with probability 1-p
Wrong bit transmitted with probability p
Sometimes called cross-over probability
Capacity C = 1 - H(p,1-p)
-
8/3/2019 Noise Info Theory 2
23/44
Binary erasure channel
Correct bit transmitted with probability 1-p
Erasure transmitted with probability p
Capacity C = 1 - p
-
8/3/2019 Noise Info Theory 2
24/44
Coding theory
Information theory only gives us an upperbound on communication rate
Need to use coding theory to find a practicalmethod to achieve a high rate
2 types
Source coding - Compress source data to a
smaller size Channel coding - Adds redundancy bits to make
transmission across noisy channel more robust
-
8/3/2019 Noise Info Theory 2
25/44
Source-channel separationtheorem
Shannon showed that when dealing with onetransmitter and one receiver, we can breakup source coding and channel coding intoseparate steps without loss of optimality
Does not apply when there are multiple
transmitters and/or receivers Need to use network information theory principles
in those cases
-
8/3/2019 Noise Info Theory 2
26/44
Coding Intro
Assume alphabet Kof{A, B, C, D, E, F, G, H}
In general, if we want to distinguish ndifferent symbols, we will need to use, log2nbits per symbol, i.e. 3.
Can code alphabet Kas:A 000 B 001 C 010 D 011E 100 F 101 G 110 H 111
-
8/3/2019 Noise Info Theory 2
27/44
Coding Intro
BACADAEAFABBAAAGAH is encoded asthe string of 54 bits
001000010000011000100000101000001001000000000110000111
(fixed length code)
-
8/3/2019 Noise Info Theory 2
28/44
Coding Intro
With this coding:A 0 B 100 C 1010 D 1011
E 1100 F 1101 G 1110 H 1111
100010100101101100011010100100000111001111
42 bits, saves more than 20% in space
-
8/3/2019 Noise Info Theory 2
29/44
Huffman Tree
A (8), B (3), C(1), D(1), E(1), F(1), G(1), H(1)
-
8/3/2019 Noise Info Theory 2
30/44
Huffman Encoding
Use probability distribution to determinehow many bits to use for each symbol
higher-frequency assigned shorter codes entropy-based, block-variable coding
scheme
-
8/3/2019 Noise Info Theory 2
31/44
Huffman Encoding
Produces a code which uses a minimumnumber of bits to represent each symbol
cannot represent same sequence using fewer realbits per symbol when using code words
optimal when using code words, but this maydiffer slightly from the theoretical lower limit
lossless
Build Huffman tree to assign codes
-
8/3/2019 Noise Info Theory 2
32/44
Informal Problem Description
Given a set of symbols from an alphabet andtheir probability distribution
assumes distribution is known and stable
Find a prefix freebinary code with minimumweighted path length
prefix free means no codeword is a prefix of anyother codeword
-
8/3/2019 Noise Info Theory 2
33/44
Huffman Algorithm
Construct a binary tree of codes
leaf nodes represent symbols to encode
interior nodes represent cumulative probability
edges assigned 0 or 1 output code
Construct the tree bottom-up
connect the two nodes with the lowest probabilityuntil no more nodes to connect
-
8/3/2019 Noise Info Theory 2
34/44
Huffman Example
Construct theHuffman coding tree(in class)
Symbol(S)
P(S)
A 0.25
B 0.30
C 0.12
D 0.15
E 0.18
-
8/3/2019 Noise Info Theory 2
35/44
Characteristics of Solution
Lowest probability symbol isalways furthest from root
Assignment of 0/1 to children
edges arbitrary other solutions possible; lengths
remain the same If two nodes have equal
probability, can select any two
Notes prefix free code O(nlgn) complexity
Symbol(S)
Code
A 11
B 00
C 010
D 011
E 10
-
8/3/2019 Noise Info Theory 2
36/44
Example Encoding/Decoding
Encode BEAD
001011011
Decode 0101100
Symbol(S)
Code
A 11
B 00
C 010
D 011
E 10
-
8/3/2019 Noise Info Theory 2
37/44
Entropy (Theoretical Limit)
= -.25 * log2 .25 +-.30 * log2 .30 +-.12 * log2 .12 +-.15 * log2 .15 +
-.18 * log2 .18
H = 2.24 bits
N
i
ii spspH1
)(log)( 2 Symbol P(S) Code
A 0.25 11
B 0.30 00
C 0.12 010
D 0.15 011
E 0.18 10
-
8/3/2019 Noise Info Theory 2
38/44
Average Codeword Length
= .25(2) +.30(2) +.12(3) +.15(3) +.18(2)
L = 2.27 bits
N
i
ii scodelengthspL1
)()( Symbol P(S) Code
A 0.25 11
B 0.30 00
C 0.12 010
D 0.15 011
E 0.18 10
-
8/3/2019 Noise Info Theory 2
39/44
Code Length Relative to Entropy
Huffman reaches entropy limit when allprobabilities are negative powers of 2
i.e., 1/2; 1/4; 1/8; 1/16; etc.
H
-
8/3/2019 Noise Info Theory 2
40/44
Example
H = -.01*log2.01 +-.99*log2.99
= .08
L = .01(1) +.99(1)
= 1
Symbol P(S) Code
A 0.01 1
B 0.99 0
-
8/3/2019 Noise Info Theory 2
41/44
Exercise
Compute Entropy (H)
Build Huffman tree
Compute average
code length
Code BCCADE
Symbol(S)
P(S)
A 0.1
B 0.2
C 0.4
D 0.2
E 0.1
-
8/3/2019 Noise Info Theory 2
42/44
Solution
Compute Entropy (H) H = 2.1 bits
Build Huffman tree
Compute code length L = 2.2 bits
Code BCCADE => 10000111101110
Symbol P(S) Code
A 0.1 111
B 0.2 100
C 0.4 0
D 0.2 101
E 0.1 110
-
8/3/2019 Noise Info Theory 2
43/44
Limitations
Diverges from lower limit when probability ofa particular symbol becomes high always uses an integral number of bits
Must send code book with the data lowers overall efficiency
Must determine frequency distribution must remain stable over the data set
E d i d i
-
8/3/2019 Noise Info Theory 2
44/44
Error detection and correction
Error detection is the ability to detect errors
that are made due to noise or otherimpairments during transmission from thetransmitter to the receiver.
Error correction has the additional featurethat enables localization of the errors andcorrecting them.
Error detection always precedes errorcorrection.
(more next week)