noise info theory 2

8/3/2019 Noise Info Theory 2

1/44

Noise, Information Theory,and Entropy

CS414 Spring 2007By Roger Cheng, Karrie Karahalios, Brian Bailey


2/44

Communication systemabstraction

Informationsource

Encoder Modulator

Channel

Output signal Decoder Demodulator

Sender side

Receiver side


3/44

The additive noise channel

Transmitted signal s(t)is corrupted by noisesource n(t), and the

resulting received signalis r(t)

Noise could result formmany sources, including

electronic componentsand transmissioninterference

n(t)

+s(t) r(t)


4/44

Random processes

A random variable is the result of a singlemeasurement

A random process is a indexed collection ofrandom variables, or equivalently a non-deterministic signal that can be described by

a probability distribution

Noise can be modeled as a random process


5/44


6/44

WGN continued

When an additive noise channel has a white Gaussiannoise source, we call it an AWGN channel

Most frequently used model in communications

Reasons why we use this model Its easy to understand and compute

It applies to a broad class of physical channels


7/44

Signal energy and power

Energy is defined as

Power is defined as

Most signals are either finite energy and zeropower, or infinite energy and finite power

Noise power is hard to compute in time domain Power of WGN is its variance 2

2= | ( ) | dtx x t

/ 2

2

/ 2

1

= lim | ( ) |

T

x TT

P x t dt T


8/44

Signal to Noise Ratio (SNR)

Defined as the ratio of signal power to thenoise power corrupting the signal

Usually more practical to measure SNR on adB scale

Obviously, want as high an SNR as possible


9/44

Analog vs. Digital

Analog system Any amount of noise will create distortion at the

output

Digital system A relatively small amount of noise will cause no

harm at all

Too much noise will make decoding of receivedsignal impossible

Both - Goal is to limit effects of noise to amanageable/satisfactory amount


10/44

Information theory and entropy

Information theory tries tosolve the problem ofcommunicating as muchdata as possible over a

noisy channel

Measure of data is entropy

Claude Shannon firstdemonstrated that reliablecommunication over a noisychannel is possible (jump-started digital age)


11/44

Review of Entropy Coding Alphabet: finite, non-empty set

A = {a, b, c, d, e}

Symbol (S): element from the set

String: sequence of symbols from A

Codeword: sequence representing coded string 0110010111101001010

Probability of symbol in string

Li: length of codeword of symbol I in bits

1

1

N

i

ip


12/44

"The fundamental problem ofcommunication is that of reproducing atone point, either exactly or approximately,a message selected at another point."

-Shannon, 1944


13/44

Measure of Information

Information content of symbol si

(in bits) log2p(si)

Examples

p(si)= 1 has no information

smaller p(si)has more information, as it wasunexpected or surprising


14/44

Entropy

Weigh information content of each sourcesymbol by its probability of occurrence:

value is called Entropy (H)

Produces lower bound on number of bits neededto represent the information with code words

n

i

ii spsp1

)(log)( 2


15/44

Entropy Example

Alphabet = {A, B}

p(A) = 0.4; p(B) = 0.6

Compute Entropy (H)

-0.4*log2 0.4 + -0.6*log2 0.6 = .97 bits

Maximum uncertainty (gives largest H)

occurs when all probabilities are equal


16/44

Entropy definitions

Shannon entropy

Binary entropy formula

Differential entropy


17/44

Properties of entropy

Can be defined as the expectation of log p(x) (ieH(X) = E[-log p(x)])

Is not a function of a variables values, is a functionof the variables probabilities

Usually measured in bits (using logs of base 2) ornats (using logs of base e)

Maximized when all values are equally likely (ieuniform distribution)

Equal to 0 when only one value is possible


18/44

Joint and conditional entropy

Joint entropy is the entropy of thepairing (X,Y)

Conditional entropy is the entropy of Xif the value of Y was known

Relationship between the two


19/44

Mutual information

Mutual information is how muchinformation about X can be obtained by

observing Y


20/44

Mathematical model of a

channelAssume that our input to the channel is

X, and the output is Y

Then the characteristics of the channelcan be defined by its conditional

probability distribution p(y|x)


21/44

Channel capacity and rate

Channel capacity is defined as themaximum possible value of the mutual

information

We choose the best f(x) to maximize C

For any rate R < C, we can transmitinformation with arbitrarily smallprobability of error


22/44

Binary symmetric channel

Correct bit transmitted with probability 1-p

Wrong bit transmitted with probability p

Sometimes called cross-over probability

Capacity C = 1 - H(p,1-p)


23/44

Binary erasure channel

Correct bit transmitted with probability 1-p

Erasure transmitted with probability p

Capacity C = 1 - p


24/44

Coding theory

Information theory only gives us an upperbound on communication rate

Need to use coding theory to find a practicalmethod to achieve a high rate

2 types

Source coding - Compress source data to a

smaller size Channel coding - Adds redundancy bits to make

transmission across noisy channel more robust


25/44

Source-channel separationtheorem

Shannon showed that when dealing with onetransmitter and one receiver, we can breakup source coding and channel coding intoseparate steps without loss of optimality

Does not apply when there are multiple

transmitters and/or receivers Need to use network information theory principles

in those cases


26/44

Coding Intro

Assume alphabet Kof{A, B, C, D, E, F, G, H}

In general, if we want to distinguish ndifferent symbols, we will need to use, log2nbits per symbol, i.e. 3.

Can code alphabet Kas:A 000 B 001 C 010 D 011E 100 F 101 G 110 H 111


27/44

Coding Intro

BACADAEAFABBAAAGAH is encoded asthe string of 54 bits

001000010000011000100000101000001001000000000110000111

(fixed length code)


28/44

Coding Intro

With this coding:A 0 B 100 C 1010 D 1011

E 1100 F 1101 G 1110 H 1111

100010100101101100011010100100000111001111

42 bits, saves more than 20% in space


29/44

Huffman Tree

A (8), B (3), C(1), D(1), E(1), F(1), G(1), H(1)


30/44

Huffman Encoding

Use probability distribution to determinehow many bits to use for each symbol

higher-frequency assigned shorter codes entropy-based, block-variable coding

scheme


31/44

Huffman Encoding

Produces a code which uses a minimumnumber of bits to represent each symbol

cannot represent same sequence using fewer realbits per symbol when using code words

optimal when using code words, but this maydiffer slightly from the theoretical lower limit

lossless

Build Huffman tree to assign codes


32/44

Informal Problem Description

Given a set of symbols from an alphabet andtheir probability distribution

assumes distribution is known and stable

Find a prefix freebinary code with minimumweighted path length

prefix free means no codeword is a prefix of anyother codeword


33/44

Huffman Algorithm

Construct a binary tree of codes

leaf nodes represent symbols to encode

interior nodes represent cumulative probability

edges assigned 0 or 1 output code

Construct the tree bottom-up

connect the two nodes with the lowest probabilityuntil no more nodes to connect


34/44

Huffman Example

Construct theHuffman coding tree(in class)

Symbol(S)

P(S)

A 0.25

B 0.30

C 0.12

D 0.15

E 0.18


35/44

Characteristics of Solution

Lowest probability symbol isalways furthest from root

Assignment of 0/1 to children

edges arbitrary other solutions possible; lengths

remain the same If two nodes have equal

probability, can select any two

Notes prefix free code O(nlgn) complexity

Symbol(S)

Code

A 11

B 00

C 010

D 011

E 10


36/44

Example Encoding/Decoding

Encode BEAD

001011011

Decode 0101100

Symbol(S)

Code

A 11

B 00

C 010

D 011

E 10


37/44

Entropy (Theoretical Limit)

= -.25 * log2 .25 +-.30 * log2 .30 +-.12 * log2 .12 +-.15 * log2 .15 +

-.18 * log2 .18

H = 2.24 bits

N

i

ii spspH1

)(log)( 2 Symbol P(S) Code

A 0.25 11

B 0.30 00

C 0.12 010

D 0.15 011

E 0.18 10


38/44

Average Codeword Length

= .25(2) +.30(2) +.12(3) +.15(3) +.18(2)

L = 2.27 bits

N

i

ii scodelengthspL1

)()( Symbol P(S) Code

A 0.25 11

B 0.30 00

C 0.12 010

D 0.15 011

E 0.18 10


39/44

Code Length Relative to Entropy

Huffman reaches entropy limit when allprobabilities are negative powers of 2

i.e., 1/2; 1/4; 1/8; 1/16; etc.

H


40/44

Example

H = -.01*log2.01 +-.99*log2.99

= .08

L = .01(1) +.99(1)

= 1

Symbol P(S) Code

A 0.01 1

B 0.99 0


41/44

Exercise

Compute Entropy (H)

Build Huffman tree

Compute average

code length

Code BCCADE

Symbol(S)

P(S)

A 0.1

B 0.2

C 0.4

D 0.2

E 0.1


42/44

Solution

Compute Entropy (H) H = 2.1 bits

Build Huffman tree

Compute code length L = 2.2 bits

Code BCCADE => 10000111101110

Symbol P(S) Code

A 0.1 111

B 0.2 100

C 0.4 0

D 0.2 101

E 0.1 110


43/44

Limitations

Diverges from lower limit when probability ofa particular symbol becomes high always uses an integral number of bits

Must send code book with the data lowers overall efficiency

Must determine frequency distribution must remain stable over the data set

E d i d i


44/44

Error detection and correction

Error detection is the ability to detect errors

that are made due to noise or otherimpairments during transmission from thetransmitter to the receiver.

Error correction has the additional featurethat enables localization of the errors andcorrecting them.

Error detection always precedes errorcorrection.

(more next week)

noise info theory 2

Documents