08 - information theory i - dipartimento di fisica e geologialuca.gammaitoni/fisen/...• according...

76
Information theory I Fisica dell’Energia - a.a. 2015/2016

Upload: buianh

Post on 06-Apr-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Information theory IFisica dell’Energia - a.a. 2015/2016

What is Information

http://www.reddit.com/r/askscience/comments/2ixu0a

What is Information

http://www.reddit.com/r/askscience/comments/2ixu0a

What is Information

http://www.reddit.com/r/askscience/comments/2ixu0a

What is Information

http://www.reddit.com/r/askscience/comments/2ixu0a

Communication• Communication is the transfer of information from one

place to another.

• This should be done

• as efficiently as possible

• with as much fidelity/reliability as possible

• as securely as possible

• Communication System: Components/subsystems act together to accomplish information transfer/exchange

Communication• Verbal Communication

• Spoken communication

• Languages and dialects

• Written Communication

• Symbols, hieroglyphics, and drawings

Communication• Smoke signals, telegraph,

telephone…

• 1895: invention of the radio by Marconi

• 1901: trans-atlantic communication

• …

• nowadays: everything communicate, Internet of things (IoT)

Communication system

A Mathematical Theory of Communication - By C. E. SHANNON

Communication system

• The message produced by a source must be converted by a transducer to a form suitable for the particular type of communication system.

• Example: In electrical communications, speech waves are converted by a microphone to voltage variation.

A Mathematical Theory of Communication - By C. E. SHANNON

Communication system

• The transmitter processes the input signal to produce a signal suits to the characteristics of the transmission channel.

• Signal processing for transmission almost always involves modulation and may also include coding. In addition to modulation, other functions performed by the transmitter are amplification, filtering and coupling the modulated signal to the channel.

A Mathematical Theory of Communication - By C. E. SHANNON

Communication system

• The receiver’s function is to extract the desired signal from the received signal at the channel output and to convert it to a form suitable for the output transducer.

• Other functions performed by the receiver: amplification (the received signal may be extremely weak), demodulation and filtering.

A Mathematical Theory of Communication - By C. E. SHANNON

Communication system

• The output transducer converts the electric signal at its input into the form desired by the system user.

• Example: Loudspeaker, personal computer (PC), tape recorders.

A Mathematical Theory of Communication - By C. E. SHANNON

Communication system

• The channel can have different forms: the atmosphere (or free space), coaxial cable, fiber optic, waveguide, etc…

• The signal undergoes some amount of degradation from noise, interference and distortion

A Mathematical Theory of Communication - By C. E. SHANNON

Communication system

A Mathematical Theory of Communication - By C. E. SHANNON

Communication system

Communication systemWave length Frequency

DesignationsTransmissionMedia

PropagationModes

RepresentativeApplications

Frequency

1 cmExtra HighFrequency (EHF) 100 GHz

10 cmSuper HighFrequency (SHF)

Satellite,Microwave relay,Earth-satellite radar.

10 GHz

1 mUltra HighFrequency (UHF)

Wireless comm.service,Cellular, pagers, UHFTV

1 GHz

10mVery HighFrequency (VHF)

Mobile, Aeronautical,VHF TV and FM,mobile radio 100 MHz

100mHigh Frequency(HF)

Amateur radio, CivilDefense 10 MHz

1 kmMedium HighFrequency (MF)

AM broadcasting1 MHz

10 kmLow Frequency(LF) 100 kHz

100kmVery LowFrequency (VLF)

Wave guide

Coaxial Cable

Wire pairs

Line-of-sight radio

Sky wave radio

Ground waveradio

Aeronautical,Submarine cable,Navigation,Transoceanic radio

10 kHz

Line-of-sight (LOS) propagation

• Transmitting and receiving antennas must be within line of sight

• Examples: Satellite communication, Ground communication

Sky wave (ionospheric) propagation

• Signal reflected from ionized layer of atmosphere. Signal can travel a number of hops, back and forth

• Examples: Shortwave radio

Ground wave propagation

• Follows contour of the earth Can Propagate considerable distances

• Frequencies up to 2 MHz Example : AM radio

Signal• The information to be transmitted can be encoded

modulating amplitude (AM) or frequency (FM) of a signal

• According to Fourier analysis, any composite signal is a combination of simple sine waves with different frequencies, amplitudes, and phases

• The information transmission rate is limited by the transmitter, the medium and the receiver

Harry Nyquist

• Johnson-Nyquist noise

• Telegraphy

• Facsimile (Fax)

• Television

Harry Nyquist• Determined that the number of

independent pulses that could be put through a telegraph channel per unit time is limited to twice the bandwidth of the channel

• Certain factors affecting telegraph speed (1924)

• Certain topics in Telegraph Transmission Theory (1928)

• This rule is essentially a dual of what is now known as the Nyquist–Shannon sampling theorem

Nyquist–Shannon sampling theorem

• From continuous signal to discrete signal

• Sampling is the process of converting a signal into a numeric sequence

• Applies to signals whose Fourier transform are zero outside of a finite region of frequencies

• The fidelity of the result depends on the sampling rate of the original samples

• No actual information is lost during the sampling process

Nyquist–Shannon sampling theorem

• If a function x(t) contains no frequencies higher than B cps (Hz), it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart

• A sufficient sample-rate is therefore 2B samples/second, or anything larger

• For a given sample rate fs the bandlimit for perfect reconstruction is B ≤ fs/2

• 2B: Nyquist rate

• fs/2: Nyquist frequency

Aliasing

https://www.youtube.com/watch?v=vIsS4TP73AU

Aliasing

Aliasing

Two different sinusoids that fit the same set of samples.

Aliasing: critical frequency

Measuring information

Telegraphy• Telegraphy (from Greek: tele

"at a distance", and graphein "to write")

• Long distance transmission of textual/symbolic messages

• Method used for encoding the message be known to both sender and receiver

• Even e-mail is an example of telegraphy

Morse code

Measuring information• s: symbol rate (number of symbol per second)

• n: number of states (binary, decimal, …)

• ns: possible messages per unit time

• The problem is to estimate the quantity of information relative to a message

6 flips of a coin

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

No

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

NoH

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

NoHT

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

NoHTT

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

NoHTTH

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

NoHTTHH

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

NoHTTHHT

Reduction to YES or NO answers

Alice Bob

HTTHHT is it T?Yes

NoHTTHHT

Transmission of 6 symbols requires 6 questions

word composed by 6 characters

Reduction to YES or NO answers

Alice Bob

ginger is it “a”?Yes

No

Reduction to YES or NO answers

Alice Bob

ginger is it “b”?Yes

No

Reduction to YES or NO answers

Alice Bob

ginger is it “c”?Yes

No

Inefficient!Maximum of 26 question, 13 on average (if characters outcome are i.i.d.)

Reduction to YES or NO answers

ABCDEFGHIJKLMNOPQRSTUVWXYZis it lesser than “N”?

ABCDEFGHIJKLMNOPQRSTUVWXYZis it lesser than “F”?

ABCDEFGHIJKLMNOPQRSTUVWXYZis it lesser than “J”?

ABCDEFGHIJKLMNOPQRSTUVWXYZis it lesser than “H”?

ABCDEFGHIJKLMNOPQRSTUVWXYZ

after maximum 5 questions we correctly individuate the character

Minimum number of questions

Minimum number of questions

• 2# questions = 26 (for an alphabet character)

• # questions = log2(26) = 4.7 expected number of questions

• for a word composed by 6 character 6*4.7 = 28.2 questions needed

Reduction to YES or NO answers

• Rationale: reduce at each iteration the size off the set of one half

• Build a decision tree where the leafs of the tree are the available symbols

• Maximum number of questions equal to the height of the tree

Decision tree

• Height of a tree is given by the log2(# leafs)

Reduction to YES or NO answers

• What about a poker hand?

Ralph Hartley• R. Hartley was an electronics

researcher

• Contributed to the foundations of information theory

• The hartley, a unit of information equal to one decimal digit, is named after him

Ralph Hartley

Ralph Hartley

Ralph Hartley

A mathematical theory of communication by Claude Shannon

Information source

• How is an information source to be described mathematically?

• How much information in bits per second is produced in a given source?

How is an information source to be described mathematically?

In telegraphy, for example, the messages to be transmitted consist of sequences of letters. These sequences, however, are not completely random. In general, they form sentences and have the statistical structure of, say, English. The letter E occurs more frequently than Q, the sequence TH more frequently than XP, etc. The existence of this structure allows one to make a saving in time (or channel capacity) by properly encoding the message sequences into signal sequences.

How is an information source to be described mathematically?

How is an information source to be described mathematically?

How is an information source to be described mathematically?

We can think of a discrete source as generating the message, symbol by symbol. It will choose successive symbols according to certain probabilities depending, in general, on preceding choices as well as the particular symbols in question. A physical system, or a mathematical model of a system which produces such a sequence of symbols governed by a set of probabilities, is known as a stochastic process.

The series of approximation to English

The series of approximation to English

Stochastic process which generates a sequences of symbolsUsing the same five letters (ABCDE) let the probabilities be .4, .1, .2, .2, .1, respectively, with successive choices independent. A typical message from this source is then:

• AAACDCBDCEAADADACEDA

• E A D C A B E D A D D C E C A A A A A D

Stochastic process which generates a sequences of symbolsA more complicated structure is obtained if successive symbols are not chosen independently but their probabilities depend on preceding letters.

In the simplest case of this type a choice depends only on the preceding letter and not on ones before that.

The statistical structure can then be described by a set of transition probabilities pi(j), the probability that letter i is followed by letter j

Choice, Uncertainty and Entropy

• We have represented a discrete information source as a Markoff process. Can we define a quantity which will measure, in some sense, how much information is “produced” by such a process, or better, at what rate information is produced?

• Suppose we have a set of possible events whose probabilities of occurrence are p1 ; p2 ;…; pn. These probabilities are known but that is all we know concerning which event will occur. Can we find a measure of how much “choice” is involved in the selection of the event or of how uncertain we are of the outcome?

Choice, Uncertainty and Entropy

Theorem 2:

where the constant K merely amounts to a choice of a unit of measure

Shannon entropy characteristics

• Continuity: the measure should be continuous, so that changing the values of the probabilities by a very small amount should only change the entropy by a small amount.

• Symmetry: the measure should be unchanged if the outcomes are re-ordered

Hn (p1, p2, . . .) = Hn (p2, p1, . . .)

Shannon entropy characteristics

• Additivity: the amount of entropy should be independent of how the process is regarded as being divided into partsif p1 and p2 are independent

Hn(p1, p2) = Hn(p1) +Hn(p2)

Shannon entropy characteristics

• Maximum: the measure should be maximal if all the outcomes are equally likely (uncertainty is highest when all possible events are equiprobable)For equiprobable events the entropy should increase with the number of outcomes

Hn

✓1

n, . . . ,

1

n| {z }n

◆= logb(n) < logb(n+ 1) = Hn+1

✓1

n+ 1

, . . . ,1

n+ 1| {z }n+1

Hn(p1, . . . , pn) Hn

✓1

n, . . . ,

1

n

◆= logb(n)

Entropy in the case of two possibilities

• Entropy in the case of two possibilities with probabilities p and q = 1 - p

Choice, Uncertainty and Entropy

• Let’s suppose that all symbols are equiprobable and independent with probability pi=1/q (q symbols)

the entropy of a message can be written as

H = �K

pX

i=1

pi log pi

H = �KN

qX

i=1

✓1

q

◆log

✓1

q

= �KNq

✓1

q

◆log

✓1

q

= KN log(q)

Choice, Uncertainty and Entropy

• If the number of symbols is equal to 2 (binary system) and assuming K=1

the entropy of the message coincide with its length

H = KN log(q) = N log(2) = N

H = �KN

qX

i=1

✓1

q

◆log

✓1

q

= �KNq

✓1

q

◆log

✓1

q

= KN log(q)

Shannon entropy

• Quantitative measure of information

• Quantitative information of “surprise”

Unit measure• Depending on the base (b) of the logarithm (and

constant K) the unit measure of information changes:

• b = 2 -> bit

• b = e -> nat

• b = 10 -> dit (or hartley)