bioinformatics lectures at rice university lecture 4: shannon entropy and mutual information

19
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Upload: louise-crawford

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Bioinformatics lectures at Rice University

Lecture 4: Shannon entropy and mutual information

Page 2: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

-- from Science 16 December 2011

Page 3: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

The definition of Shannon entropyIn information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits. In this context, a 'message' means a specific realization of the random variable.

Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable. The concept was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication".

Page 4: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information
Page 5: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

How do we measure information in a message?

Definition a message: a string of symbols. The following was Shannon’s argument:

From Shannon’s ‘A mathematical theory of communication’

Page 6: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Shannon entropy was established in the context of telegraph communication

Page 7: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information
Page 8: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information
Page 9: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Shannon’s argument to name H as the entropy:

Page 10: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Some Properties of H:•The amount of entropy is not always an integer number of bits.

•Many data bits may not convey information. For example, data structures often store information redundantly, or have identical sections regardless of the information in the data structure.

•For a message of n characters, H is larger when a larger character set is used. Thus ‘010111000111’ has less information than ‘qwertasdfg12’.

•However, if a character is rarely used, its contribution to H is small, because since p * log(p) 0 as p0. Also, if a character constitute the vast majority, e.g., ‘10111111111011’, the contribution of 1s to H is small, since p * log(p) 0 as p1.

•For a random variable with n outcomes, H reaches maximum when probabilities of the outcomes are all the same, i.e., 1/n, and H = log(n).•When x is a continuous variable with a fixed variance, Shannon proved that H reaches maximum when x follows a Gaussian distribution.

Page 11: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Shannon’s proof:

Page 12: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Summary

Or simply:

Note that:

H>=0Spi =1H is larger when there are more probable states.H can be generally computed whenever there is a p distribution.

Page 13: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Mutual information

Mutual information is measure of dependence.

The concept was introduced by Shannon in 1948 and has become widely used in many different fields

Page 14: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Formulation of mutual information

Page 15: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Meaning of MI

Page 16: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

MI and H

Page 17: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Properties of MI

Connection between correlation and MI.

Page 18: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Example of MI application

Estimated MI between goals a team scored in a game and whetherthe team was playing at home or away. The heights of the grey bars providethe approximate 95% of the null points. The Canada value is below the linebecause it would have been hidden by the grey shading above.

Home advantage

Page 19: Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information

Reference reading

A correlation for the 21st century. Terry Speed. Commentary on MIC.http://www.sciencemag.org/content/334/6062/1502.full

Detecting Novel Associations in Large Data Sets. Reshaf et al., Science 2011. December.

Some data analyses using mutual information. David Brillinger http://www.stat.berkeley.edu/~brill/Papers/bjps1.pdf