noise info theory 2

Upload: samar

Post on 07-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Noise Info Theory 2

    1/44

    Noise, Information Theory,and Entropy

    CS414 Spring 2007By Roger Cheng, Karrie Karahalios, Brian Bailey

  • 8/3/2019 Noise Info Theory 2

    2/44

    Communication systemabstraction

    Informationsource

    Encoder Modulator

    Channel

    Output signal Decoder Demodulator

    Sender side

    Receiver side

  • 8/3/2019 Noise Info Theory 2

    3/44

    The additive noise channel

    Transmitted signal s(t)is corrupted by noisesource n(t), and the

    resulting received signalis r(t)

    Noise could result formmany sources, including

    electronic componentsand transmissioninterference

    n(t)

    +s(t) r(t)

  • 8/3/2019 Noise Info Theory 2

    4/44

    Random processes

    A random variable is the result of a singlemeasurement

    A random process is a indexed collection ofrandom variables, or equivalently a non-deterministic signal that can be described by

    a probability distribution

    Noise can be modeled as a random process

  • 8/3/2019 Noise Info Theory 2

    5/44

  • 8/3/2019 Noise Info Theory 2

    6/44

    WGN continued

    When an additive noise channel has a white Gaussiannoise source, we call it an AWGN channel

    Most frequently used model in communications

    Reasons why we use this model Its easy to understand and compute

    It applies to a broad class of physical channels

  • 8/3/2019 Noise Info Theory 2

    7/44

    Signal energy and power

    Energy is defined as

    Power is defined as

    Most signals are either finite energy and zeropower, or infinite energy and finite power

    Noise power is hard to compute in time domain Power of WGN is its variance 2

    2= | ( ) | dtx x t

    / 2

    2

    / 2

    1

    = lim | ( ) |

    T

    x TT

    P x t dt T

  • 8/3/2019 Noise Info Theory 2

    8/44

    Signal to Noise Ratio (SNR)

    Defined as the ratio of signal power to thenoise power corrupting the signal

    Usually more practical to measure SNR on adB scale

    Obviously, want as high an SNR as possible

  • 8/3/2019 Noise Info Theory 2

    9/44

    Analog vs. Digital

    Analog system Any amount of noise will create distortion at the

    output

    Digital system A relatively small amount of noise will cause no

    harm at all

    Too much noise will make decoding of receivedsignal impossible

    Both - Goal is to limit effects of noise to amanageable/satisfactory amount

  • 8/3/2019 Noise Info Theory 2

    10/44

    Information theory and entropy

    Information theory tries tosolve the problem ofcommunicating as muchdata as possible over a

    noisy channel

    Measure of data is entropy

    Claude Shannon firstdemonstrated that reliablecommunication over a noisychannel is possible (jump-started digital age)

  • 8/3/2019 Noise Info Theory 2

    11/44

    Review of Entropy Coding Alphabet: finite, non-empty set

    A = {a, b, c, d, e}

    Symbol (S): element from the set

    String: sequence of symbols from A

    Codeword: sequence representing coded string 0110010111101001010

    Probability of symbol in string

    Li: length of codeword of symbol I in bits

    1

    1

    N

    i

    ip

  • 8/3/2019 Noise Info Theory 2

    12/44

    "The fundamental problem ofcommunication is that of reproducing atone point, either exactly or approximately,a message selected at another point."

    -Shannon, 1944

  • 8/3/2019 Noise Info Theory 2

    13/44

    Measure of Information

    Information content of symbol si

    (in bits) log2p(si)

    Examples

    p(si)= 1 has no information

    smaller p(si)has more information, as it wasunexpected or surprising

  • 8/3/2019 Noise Info Theory 2

    14/44

    Entropy

    Weigh information content of each sourcesymbol by its probability of occurrence:

    value is called Entropy (H)

    Produces lower bound on number of bits neededto represent the information with code words

    n

    i

    ii spsp1

    )(log)( 2

  • 8/3/2019 Noise Info Theory 2

    15/44

    Entropy Example

    Alphabet = {A, B}

    p(A) = 0.4; p(B) = 0.6

    Compute Entropy (H)

    -0.4*log2 0.4 + -0.6*log2 0.6 = .97 bits

    Maximum uncertainty (gives largest H)

    occurs when all probabilities are equal

  • 8/3/2019 Noise Info Theory 2

    16/44

    Entropy definitions

    Shannon entropy

    Binary entropy formula

    Differential entropy

  • 8/3/2019 Noise Info Theory 2

    17/44

    Properties of entropy

    Can be defined as the expectation of log p(x) (ieH(X) = E[-log p(x)])

    Is not a function of a variables values, is a functionof the variables probabilities

    Usually measured in bits (using logs of base 2) ornats (using logs of base e)

    Maximized when all values are equally likely (ieuniform distribution)

    Equal to 0 when only one value is possible

  • 8/3/2019 Noise Info Theory 2

    18/44

    Joint and conditional entropy

    Joint entropy is the entropy of thepairing (X,Y)

    Conditional entropy is the entropy of Xif the value of Y was known

    Relationship between the two

  • 8/3/2019 Noise Info Theory 2

    19/44

    Mutual information

    Mutual information is how muchinformation about X can be obtained by

    observing Y

  • 8/3/2019 Noise Info Theory 2

    20/44

    Mathematical model of a

    channelAssume that our input to the channel is

    X, and the output is Y

    Then the characteristics of the channelcan be defined by its conditional

    probability distribution p(y|x)

  • 8/3/2019 Noise Info Theory 2

    21/44

    Channel capacity and rate

    Channel capacity is defined as themaximum possible value of the mutual

    information

    We choose the best f(x) to maximize C

    For any rate R < C, we can transmitinformation with arbitrarily smallprobability of error

  • 8/3/2019 Noise Info Theory 2

    22/44

    Binary symmetric channel

    Correct bit transmitted with probability 1-p

    Wrong bit transmitted with probability p

    Sometimes called cross-over probability

    Capacity C = 1 - H(p,1-p)

  • 8/3/2019 Noise Info Theory 2

    23/44

    Binary erasure channel

    Correct bit transmitted with probability 1-p

    Erasure transmitted with probability p

    Capacity C = 1 - p

  • 8/3/2019 Noise Info Theory 2

    24/44

    Coding theory

    Information theory only gives us an upperbound on communication rate

    Need to use coding theory to find a practicalmethod to achieve a high rate

    2 types

    Source coding - Compress source data to a

    smaller size Channel coding - Adds redundancy bits to make

    transmission across noisy channel more robust

  • 8/3/2019 Noise Info Theory 2

    25/44

    Source-channel separationtheorem

    Shannon showed that when dealing with onetransmitter and one receiver, we can breakup source coding and channel coding intoseparate steps without loss of optimality

    Does not apply when there are multiple

    transmitters and/or receivers Need to use network information theory principles

    in those cases

  • 8/3/2019 Noise Info Theory 2

    26/44

    Coding Intro

    Assume alphabet Kof{A, B, C, D, E, F, G, H}

    In general, if we want to distinguish ndifferent symbols, we will need to use, log2nbits per symbol, i.e. 3.

    Can code alphabet Kas:A 000 B 001 C 010 D 011E 100 F 101 G 110 H 111

  • 8/3/2019 Noise Info Theory 2

    27/44

    Coding Intro

    BACADAEAFABBAAAGAH is encoded asthe string of 54 bits

    001000010000011000100000101000001001000000000110000111

    (fixed length code)

  • 8/3/2019 Noise Info Theory 2

    28/44

    Coding Intro

    With this coding:A 0 B 100 C 1010 D 1011

    E 1100 F 1101 G 1110 H 1111

    100010100101101100011010100100000111001111

    42 bits, saves more than 20% in space

  • 8/3/2019 Noise Info Theory 2

    29/44

    Huffman Tree

    A (8), B (3), C(1), D(1), E(1), F(1), G(1), H(1)

  • 8/3/2019 Noise Info Theory 2

    30/44

    Huffman Encoding

    Use probability distribution to determinehow many bits to use for each symbol

    higher-frequency assigned shorter codes entropy-based, block-variable coding

    scheme

  • 8/3/2019 Noise Info Theory 2

    31/44

    Huffman Encoding

    Produces a code which uses a minimumnumber of bits to represent each symbol

    cannot represent same sequence using fewer realbits per symbol when using code words

    optimal when using code words, but this maydiffer slightly from the theoretical lower limit

    lossless

    Build Huffman tree to assign codes

  • 8/3/2019 Noise Info Theory 2

    32/44

    Informal Problem Description

    Given a set of symbols from an alphabet andtheir probability distribution

    assumes distribution is known and stable

    Find a prefix freebinary code with minimumweighted path length

    prefix free means no codeword is a prefix of anyother codeword

  • 8/3/2019 Noise Info Theory 2

    33/44

    Huffman Algorithm

    Construct a binary tree of codes

    leaf nodes represent symbols to encode

    interior nodes represent cumulative probability

    edges assigned 0 or 1 output code

    Construct the tree bottom-up

    connect the two nodes with the lowest probabilityuntil no more nodes to connect

  • 8/3/2019 Noise Info Theory 2

    34/44

    Huffman Example

    Construct theHuffman coding tree(in class)

    Symbol(S)

    P(S)

    A 0.25

    B 0.30

    C 0.12

    D 0.15

    E 0.18

  • 8/3/2019 Noise Info Theory 2

    35/44

    Characteristics of Solution

    Lowest probability symbol isalways furthest from root

    Assignment of 0/1 to children

    edges arbitrary other solutions possible; lengths

    remain the same If two nodes have equal

    probability, can select any two

    Notes prefix free code O(nlgn) complexity

    Symbol(S)

    Code

    A 11

    B 00

    C 010

    D 011

    E 10

  • 8/3/2019 Noise Info Theory 2

    36/44

    Example Encoding/Decoding

    Encode BEAD

    001011011

    Decode 0101100

    Symbol(S)

    Code

    A 11

    B 00

    C 010

    D 011

    E 10

  • 8/3/2019 Noise Info Theory 2

    37/44

    Entropy (Theoretical Limit)

    = -.25 * log2 .25 +-.30 * log2 .30 +-.12 * log2 .12 +-.15 * log2 .15 +

    -.18 * log2 .18

    H = 2.24 bits

    N

    i

    ii spspH1

    )(log)( 2 Symbol P(S) Code

    A 0.25 11

    B 0.30 00

    C 0.12 010

    D 0.15 011

    E 0.18 10

  • 8/3/2019 Noise Info Theory 2

    38/44

    Average Codeword Length

    = .25(2) +.30(2) +.12(3) +.15(3) +.18(2)

    L = 2.27 bits

    N

    i

    ii scodelengthspL1

    )()( Symbol P(S) Code

    A 0.25 11

    B 0.30 00

    C 0.12 010

    D 0.15 011

    E 0.18 10

  • 8/3/2019 Noise Info Theory 2

    39/44

    Code Length Relative to Entropy

    Huffman reaches entropy limit when allprobabilities are negative powers of 2

    i.e., 1/2; 1/4; 1/8; 1/16; etc.

    H

  • 8/3/2019 Noise Info Theory 2

    40/44

    Example

    H = -.01*log2.01 +-.99*log2.99

    = .08

    L = .01(1) +.99(1)

    = 1

    Symbol P(S) Code

    A 0.01 1

    B 0.99 0

  • 8/3/2019 Noise Info Theory 2

    41/44

    Exercise

    Compute Entropy (H)

    Build Huffman tree

    Compute average

    code length

    Code BCCADE

    Symbol(S)

    P(S)

    A 0.1

    B 0.2

    C 0.4

    D 0.2

    E 0.1

  • 8/3/2019 Noise Info Theory 2

    42/44

    Solution

    Compute Entropy (H) H = 2.1 bits

    Build Huffman tree

    Compute code length L = 2.2 bits

    Code BCCADE => 10000111101110

    Symbol P(S) Code

    A 0.1 111

    B 0.2 100

    C 0.4 0

    D 0.2 101

    E 0.1 110

  • 8/3/2019 Noise Info Theory 2

    43/44

    Limitations

    Diverges from lower limit when probability ofa particular symbol becomes high always uses an integral number of bits

    Must send code book with the data lowers overall efficiency

    Must determine frequency distribution must remain stable over the data set

    E d i d i

  • 8/3/2019 Noise Info Theory 2

    44/44

    Error detection and correction

    Error detection is the ability to detect errors

    that are made due to noise or otherimpairments during transmission from thetransmitter to the receiver.

    Error correction has the additional featurethat enables localization of the errors andcorrecting them.

    Error detection always precedes errorcorrection.

    (more next week)