18.175: lecture 13 .1in more large deviationsmath.mit.edu/~sheffield/175/lecture13.pdf18.175:...
TRANSCRIPT
![Page 1: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/1.jpg)
18.175: Lecture 13
More large deviations
Scott Sheffield
MIT
18.175 Lecture 13
![Page 2: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/2.jpg)
Outline
Legendre transform
Large deviations
18.175 Lecture 13
![Page 3: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/3.jpg)
Outline
Legendre transform
Large deviations
18.175 Lecture 13
![Page 4: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/4.jpg)
Legendre transform
I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.
I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0
at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex
function?I What’s the higher dimensional analog of rolling the tangent
line?
18.175 Lecture 13
![Page 5: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/5.jpg)
Legendre transform
I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.
I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0
at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex
function?I What’s the higher dimensional analog of rolling the tangent
line?
18.175 Lecture 13
![Page 6: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/6.jpg)
Legendre transform
I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.
I Is the Legendre dual always convex?
I What is the Legendre dual of x2? Of the function equal to 0at 0 and ∞ everywhere else?
I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex
function?I What’s the higher dimensional analog of rolling the tangent
line?
18.175 Lecture 13
![Page 7: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/7.jpg)
Legendre transform
I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.
I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0
at 0 and ∞ everywhere else?
I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex
function?I What’s the higher dimensional analog of rolling the tangent
line?
18.175 Lecture 13
![Page 8: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/8.jpg)
Legendre transform
I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.
I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0
at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?
I What is the Legendre dual of the Legendre dual of a convexfunction?
I What’s the higher dimensional analog of rolling the tangentline?
18.175 Lecture 13
![Page 9: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/9.jpg)
Legendre transform
I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.
I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0
at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex
function?
I What’s the higher dimensional analog of rolling the tangentline?
18.175 Lecture 13
![Page 10: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/10.jpg)
Legendre transform
I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.
I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0
at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex
function?I What’s the higher dimensional analog of rolling the tangent
line?18.175 Lecture 13
![Page 11: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/11.jpg)
Outline
Legendre transform
Large deviations
18.175 Lecture 13
![Page 12: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/12.jpg)
Outline
Legendre transform
Large deviations
18.175 Lecture 13
![Page 13: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/13.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 14: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/14.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 15: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/15.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 16: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/16.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 17: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/17.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 18: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/18.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 19: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/19.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 20: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/20.jpg)
Recall: moment generating functions
I Let X be a random variable.
I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].
I When X is discrete, can write M(t) =∑
x etxpX (x). So M(t)
is a weighted average of countably many exponentialfunctions.
I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So
M(t) is a weighted average of a continuum of exponentialfunctions.
I We always have M(0) = 1.
I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.
I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.
18.175 Lecture 13
![Page 21: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/21.jpg)
Recall: moment generating functions for i.i.d. sums
I We showed that if Z = X + Y and X and Y are independent,then MZ (t) = MX (t)MY (t)
I If X1 . . .Xn are i.i.d. copies of X and Z = X1 + . . .+ Xn thenwhat is MZ?
I Answer: MnX .
18.175 Lecture 13
![Page 22: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/22.jpg)
Recall: moment generating functions for i.i.d. sums
I We showed that if Z = X + Y and X and Y are independent,then MZ (t) = MX (t)MY (t)
I If X1 . . .Xn are i.i.d. copies of X and Z = X1 + . . .+ Xn thenwhat is MZ?
I Answer: MnX .
18.175 Lecture 13
![Page 23: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/23.jpg)
Recall: moment generating functions for i.i.d. sums
I We showed that if Z = X + Y and X and Y are independent,then MZ (t) = MX (t)MY (t)
I If X1 . . .Xn are i.i.d. copies of X and Z = X1 + . . .+ Xn thenwhat is MZ?
I Answer: MnX .
18.175 Lecture 13
![Page 24: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/24.jpg)
Large deviations
I Consider i.i.d. random variables Xi . Can we show thatP(Sn ≥ na)→ 0 exponentially fast when a > E [Xi ]?
I Kind of a quantitative form of the weak law of large numbers.The empirical average An is very unlikely to ε away from itsexpected value (where “very” means with probability less thansome exponentially decaying function of n).
18.175 Lecture 13
![Page 25: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/25.jpg)
Large deviations
I Consider i.i.d. random variables Xi . Can we show thatP(Sn ≥ na)→ 0 exponentially fast when a > E [Xi ]?
I Kind of a quantitative form of the weak law of large numbers.The empirical average An is very unlikely to ε away from itsexpected value (where “very” means with probability less thansome exponentially decaying function of n).
18.175 Lecture 13
![Page 26: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/26.jpg)
General large deviation principle
I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .
I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)
I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ) ≤ lim sup
n→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x).
I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.
I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?
I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?
18.175 Lecture 13
![Page 27: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/27.jpg)
General large deviation principle
I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .
I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)
I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ) ≤ lim sup
n→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x).
I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.
I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?
I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?
18.175 Lecture 13
![Page 28: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/28.jpg)
General large deviation principle
I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .
I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)
I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ) ≤ lim sup
n→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x).
I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.
I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?
I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?
18.175 Lecture 13
![Page 29: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/29.jpg)
General large deviation principle
I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .
I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)
I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ) ≤ lim sup
n→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x).
I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.
I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?
I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?
18.175 Lecture 13
![Page 30: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/30.jpg)
General large deviation principle
I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .
I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)
I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ) ≤ lim sup
n→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x).
I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.
I Simple case: I is continuous, Γ is closure of its interior.
I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?
I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?
18.175 Lecture 13
![Page 31: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/31.jpg)
General large deviation principle
I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .
I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)
I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ) ≤ lim sup
n→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x).
I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.
I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?
I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?
18.175 Lecture 13
![Page 32: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/32.jpg)
General large deviation principle
I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .
I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)
I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ) ≤ lim sup
n→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x).
I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.
I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?
I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?18.175 Lecture 13
![Page 33: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/33.jpg)
Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj for i.i.d.
vectors X1,X2, . . . ,Xn in Rd with same law as X .
I Define log moment generating function of X by
Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),
where (·, ·) is inner product on Rd .
I Define Legendre transform of Λ by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.
18.175 Lecture 13
![Page 34: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/34.jpg)
Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj for i.i.d.
vectors X1,X2, . . . ,Xn in Rd with same law as X .
I Define log moment generating function of X by
Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),
where (·, ·) is inner product on Rd .
I Define Legendre transform of Λ by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.
18.175 Lecture 13
![Page 35: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/35.jpg)
Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj for i.i.d.
vectors X1,X2, . . . ,Xn in Rd with same law as X .
I Define log moment generating function of X by
Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),
where (·, ·) is inner product on Rd .
I Define Legendre transform of Λ by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.
18.175 Lecture 13
![Page 36: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/36.jpg)
Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj for i.i.d.
vectors X1,X2, . . . ,Xn in Rd with same law as X .
I Define log moment generating function of X by
Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),
where (·, ·) is inner product on Rd .
I Define Legendre transform of Λ by
Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)}.
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.
18.175 Lecture 13
![Page 37: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/37.jpg)
Thinking about Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj .
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction
I (x) = Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)},
where Λ(λ) = logM(λ) = Ee(λ,X1).I This means that for all Γ ∈ B we have this asymptotic lower
bound on probabilities µn(Γ)
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ),
so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).I and this asymptotic upper bound on the probabilities µn(Γ)
lim supn→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x),
which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).
18.175 Lecture 13
![Page 38: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/38.jpg)
Thinking about Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj .
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction
I (x) = Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)},
where Λ(λ) = logM(λ) = Ee(λ,X1).
I This means that for all Γ ∈ B we have this asymptotic lowerbound on probabilities µn(Γ)
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ),
so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).I and this asymptotic upper bound on the probabilities µn(Γ)
lim supn→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x),
which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).
18.175 Lecture 13
![Page 39: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/39.jpg)
Thinking about Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj .
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction
I (x) = Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)},
where Λ(λ) = logM(λ) = Ee(λ,X1).I This means that for all Γ ∈ B we have this asymptotic lower
bound on probabilities µn(Γ)
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ),
so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).
I and this asymptotic upper bound on the probabilities µn(Γ)
lim supn→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x),
which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).
18.175 Lecture 13
![Page 40: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/40.jpg)
Thinking about Cramer’s theorem
I Let µn be law of empirical mean An = 1n
∑nj=1 Xj .
I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction
I (x) = Λ∗(x) = supλ∈Rd
{(λ, x)− Λ(λ)},
where Λ(λ) = logM(λ) = Ee(λ,X1).I This means that for all Γ ∈ B we have this asymptotic lower
bound on probabilities µn(Γ)
− infx∈Γ0
I (x) ≤ lim infn→∞
1
nlogµn(Γ),
so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).I and this asymptotic upper bound on the probabilities µn(Γ)
lim supn→∞
1
nlogµn(Γ) ≤ − inf
x∈ΓI (x),
which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).18.175 Lecture 13
![Page 41: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/41.jpg)
Proving Cramer upper bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.
I For simplicity, assume that Λ is defined for all x (whichimplies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.
I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).
I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that
Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),
and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1
n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.
I General Γ: cut into finitely many pieces, bound each piece?
18.175 Lecture 13
![Page 42: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/42.jpg)
Proving Cramer upper bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which
implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.
I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).
I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that
Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),
and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1
n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.
I General Γ: cut into finitely many pieces, bound each piece?
18.175 Lecture 13
![Page 43: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/43.jpg)
Proving Cramer upper bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which
implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.
I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).
I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that
Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),
and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1
n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.
I General Γ: cut into finitely many pieces, bound each piece?
18.175 Lecture 13
![Page 44: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/44.jpg)
Proving Cramer upper bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which
implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.
I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).
I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that
Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),
and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1
n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.
I General Γ: cut into finitely many pieces, bound each piece?
18.175 Lecture 13
![Page 45: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/45.jpg)
Proving Cramer upper bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which
implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.
I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).
I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that
Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),
and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1
n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.
I General Γ: cut into finitely many pieces, bound each piece?18.175 Lecture 13
![Page 46: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/46.jpg)
Proving Cramer lower bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.
I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.
I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).
18.175 Lecture 13
![Page 47: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/47.jpg)
Proving Cramer lower bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.
I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).
18.175 Lecture 13
![Page 48: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/48.jpg)
Proving Cramer lower bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.
I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).
18.175 Lecture 13
![Page 49: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/49.jpg)
Proving Cramer lower bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.
I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).
18.175 Lecture 13
![Page 50: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre](https://reader034.vdocuments.site/reader034/viewer/2022042312/5edb6c87ad6a402d6665a332/html5/thumbnails/50.jpg)
Proving Cramer lower bound
I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).
I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.
I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).
18.175 Lecture 13