large deviations: an introduction
DESCRIPTION
This is a talk on Large Deviations I gave as part of the PSS in the University of Bath.TRANSCRIPT
Ladies and gentlemen…
The whole purpose of these slides is to let you know what Large Deviations are.
So, without further ado…
Go to the PSS talk, they said…It’s going to be fun, they said…
It’s not as tough as it seems…
Let’s first do an example to set the mood…And the simplest examples always have to do with tossing coins!
The fancy way of saying ‘tossing a coin’ used by probabilists is ‘let X be a Bernoulli random variable’.
The probability mass function of a Bernoulli random variable is given by the function
Or alternatively:
One coin? Seriously, one coin?... That’s right, you guessed correctly... Probabilists (as other common human beings) prefer loads of coins, rather than just one…
So that we don’t get lost, we assign numbers to our coins from 1 to n, and call the result of the k-th coin X_k, mathematically:
And assume coins are independent from each other.
We are interested in studying the average “random behaviour” (which probabilists call distribution) of throwing all coins.
If we throw 3 coins, we can get 1 of 8 possible results:
Which we encode as vectors of 0s and 1s, for example:
001
110
This simple strategy has saved probabilists
from using actual coins for centuries! But we want to look at the average (which
we call mean)
1/3
2/3
Number of heads
Number of coins
Mathematically:
Average of n tosses
Number of coins
Sum of the tosses
It so happens that we can find an explicit formula for the number of heads thrown in n tosses (this is called a Binomial distribution with parameters n and p).
Therefore:Stirling’s formula!
And this is preciselythe main idea:
The mythicalrate function
Let’s revisit the definition of a large deviation principle:
There 3 things to consider:
The upper bound
The lower bound
Basically we need to know what a rate function is.
We say it is a good rate function if these sets are not only closed, but also compact.
p=0.5
p=0.25p=0.75
Now see the upper and lower bounds:
Imagine that B is both, a set open and close, then the inequalities look like:
But the liminf is always less than the limsup, therefore
Comparing with what we got from the Bernoulli example:
It’s basically the idea. The definition formally talks about the probability of a set rather than just a single element and then takes the infimum over the set of the rate function.
As Frank Den Hollander puts it:
Any large deviation is done in the least unlikely of all the unlikely ways!
Fortunately, we don’t need to find the linear component in n of the exponential form of the distribution function. Theory has developed some other techniques.
We changed a manipulation of the actual distribution of the random variable that depends on the whole sequence to an optimisation problem of a function that only depends on the distribution of one random variable.For our example:1. Calculate the moment generating function (aka Laplace transform of the density
distribution).
2. Find the Legendre transform of the logarithm of the Laplace transform (aka find the rate function)
And there’s more!
In our example, we found a LDP for the unbiased estimator of the mean. But what if we want the variance estimator?
We can see that after some algebraic manipulations:
So we can define our function like
Since this is a linear function, the inverse is a function so we need no infimum. We find the rate function immediately:
And we can plot both rate functions… have you noticed the zeros?
For the mean estimator For the unbiased variance estimator
If the good rate function has a unique minimiser, then the sequence converges in probability to that minimiser.
Ok, cool… but…
It’s so happens that there actually exist problems that require to know the probability of rare events. Here are just a few…
Risk management deals with the identification, assessment, and prioritisation of risks followed by an application of resources to control the impact of unfortunate events.
Consider N obligors to which a portfolio is exposed. Let Y_k be the default indicator for the k-th obligor and c_k the lost resulting from the default of the k-th obligor.
Then the total loss from defaults is
And risk managers are interested in finding the tail probabilities. That is
We can see how these sort of problems may arise in insurance and financial problems.
Information theory is a branch of applied mathematics, electrical engineering, and computer science involving the quantification of information.
The setting is as follows: consider a finite alphabet {x_1,…,x_n} and let X=(X_1,…,X_N) be a word of N of those characters, each randomly chosen independently and identically distributed from a probability measure m.
In the word X, define the empirical frequency of character x:
Then the vector of all these n frequencies is called the empirical measure, because it can be associated with a measure on the alphabet.
One of the key concepts in Information Theory is the relative entropy between two probability measures:
The magic comes here. Sanov’s theorem says that:
So this relative entropy, which quantifies the amount of information needed to describe n given m, is the rate function of a LDP for the empirical measure!
Moreover, from Sanov’s theorem we can prove Cramér’s theorem (but that I leave as an exercise to the reader…)
Consider a collection of n particles that interact with one another through some forces.
Assume that this particle system could be in any of a finite set of configurations in the system. As we increase the number of particles (n) and manipulate the scales appropriately, we can find the macroscopic behaviour given the microscopic behaviour via the thermodynamic limit. At this point a large deviation question should be almost natural to ask…
It so happens that my research is precisely done on large deviations of hydrodynamic limits of a particular interacting particle system… But that is another story and shall be told another time.
Hydrodynamic limit
Large DeviationsFinally! I’ve been doing some serious probability and finally grasp the concept of LDP!
Turns out one just have to find the exponential decay of the probabilities of unlikely events!
Sounds tough, though. What if I have an LDP for a sequence but I need a transformation?
Well, there are other techniques, like Cramér’sTheorem or theContraction Principle.
Interesting!How do you dothat?
Fascinating! So it not only works for coin tossing, right?
Right! There are other applications too!
Important questions arise from risk management and information theory.
And why do you need to know this stuff?
Me? Well…
Actually I just thought it would be awesome to be the cool slide of a lame probability talk. Nothing fancy, really.
Go use these awesome fonts at www.cooltext.com
Read more Dinosaur comics at http://www.qwantz.com
And, of course, go read some more on this fascinating topic: