a dirichlet form approach to mcmc optimal scaling · intro mcmc dirichlet resultsconcrefs...

94
Intro MCMC Dirichlet Results Conc Refs A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g[email protected], w[email protected], m[email protected] Supported by EPSRC Research Grants EP/D002060, EP/K013939 LMS Durham Symposium on Stochastic Analysis 12th July 2017 1

Upload: others

Post on 05-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

A Dirichlet Form approachto MCMC Optimal Scaling

Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard.

[email protected], [email protected],[email protected]

Supported by EPSRC Research Grants EP/D002060, EP/K013939

LMS Durham Symposium on Stochastic Analysis

12th July 2017

1

Page 2: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Introduction

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

2

Page 3: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

= p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Page 4: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density

Joint probability density

=

p(x,y)

p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Page 5: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

↓ ↓

p(x|y) = p(x,y)Z

Build Markov chain with

Norming constant Z

this as equilibrium

can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Page 6: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

↓ ↓

p(x|y) = p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Page 7: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

↓ ↓

p(x|y) = p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.3

Page 8: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Example: MCMC for Anglo-Saxon statisticsSome historians conjecture, Anglo-Saxon placenames clusterby dissimilar names. Zanella (2015, 2016) uses MCMC:data provides some support, resulting in useful clustering.

4

Page 9: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Example: MCMC for Anglo-Saxon statisticsSome historians conjecture, Anglo-Saxon placenames clusterby dissimilar names. Zanella (2015, 2016) uses MCMC:data provides some support, resulting in useful clustering.

4

Page 10: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC and optimal scaling

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

5

Page 11: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].

Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).(Much easier to apply theory if chain is reversible.)Theory: En → E almost surely.

6

Page 12: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).

(Much easier to apply theory if chain is reversible.)Theory: En → E almost surely.

6

Page 13: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).(Much easier to apply theory if chain is reversible.)

Theory: En → E almost surely.

6

Page 14: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).(Much easier to apply theory if chain is reversible.)Theory: En → E almost surely.

6

Page 15: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Page 16: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Page 17: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Page 18: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Page 19: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Page 20: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):mcmc.x += z

mcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Page 21: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):mcmc.x += z

mcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Page 22: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):mcmc.x += z

mcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Page 23: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):

mcmc.x += zmcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Page 24: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):

mcmc.x += zmcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Page 25: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):

mcmc.x += zmcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Page 26: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

RW MH-MCMC with Gaussian proposals(smooth target, marginal ∝ exp(−x4))

Target is given by 10 i.i.d. coordinates.

Scale parameter for proposal: τ = 1 is too large!

Acceptance ratio 1.7%9

Page 27: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

RW MH-MCMC with Gaussian proposals(smooth target, marginal ∝ exp(−x4))

Target is given by 10 i.i.d. coordinates.

Scale parameter for proposal: τ = 0.1 is better.

Acceptance ratio 76.5%9

Page 28: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

RW MH-MCMC with Gaussian proposals(smooth target, marginal ∝ exp(−x4))

Target is given by 10 i.i.d. coordinates.

Scale parameter for proposal: τ = 0.01 is too small.

Acceptance ratio 98.5%9

Page 29: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).

10

Page 30: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).

10

Page 31: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).

10

Page 32: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).10

Page 33: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Page 34: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Page 35: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Page 36: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Page 37: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Page 38: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Page 39: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Page 40: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Page 41: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Page 42: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Page 43: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Page 44: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and optimal scaling

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

13

Page 45: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 1Definition of Dirichlet form

A (symmetric) Dirichlet form E on a Hilbert space H is aclosed bilinear function E(u,v), defined / finite for anyu,v ∈ D ⊆ H, which satisfies:

1. D is a dense linear subspace of H;

2. E(u,v) = E(v,u) for u,v ∈ D, so E is symmetric;

3. E(u) = E(u,u) ≥ 0 for u ∈ D;

4. D is a Hilbert space under the (“Sobolev”) inner product〈u,v〉 + E(u,v);

5. If u ∈ D then u∗ = (u∧ 1)∨ 0 ∈ D, moreoverE(u∗, u∗) ≤ E(u,u).

Relate to Markov process if (quasi)-regular.

Regular Dirichlet form for locally compact Polish E:

D∩ C0(E) is E12 -dense in D, uniformly dense in C0(E).

14

Page 46: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 1Definition of Dirichlet form

A (symmetric) Dirichlet form E on a Hilbert space H is aclosed bilinear function E(u,v), defined / finite for anyu,v ∈ D ⊆ H, which satisfies:

1. D is a dense linear subspace of H;

2. E(u,v) = E(v,u) for u,v ∈ D, so E is symmetric;

3. E(u) = E(u,u) ≥ 0 for u ∈ D;

4. D is a Hilbert space under the (“Sobolev”) inner product〈u,v〉 + E(u,v);

5. If u ∈ D then u∗ = (u∧ 1)∨ 0 ∈ D, moreoverE(u∗, u∗) ≤ E(u,u).

Relate to Markov process if (quasi)-regular.Regular Dirichlet form for locally compact Polish E:

D∩ C0(E) is E12 -dense in D, uniformly dense in C0(E).

14

Page 47: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Page 48: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Page 49: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Page 50: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Page 51: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 52: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;

(Γ2) For every h ∈ H there are hn → h ∈ H such thatE∞(h) ≥ lim supnEn(hn).

2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 53: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).

2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 54: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 55: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;

(M2) For every h ∈ H there are hn → h strongly in H suchthat E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 56: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 57: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 58: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 59: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Page 60: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Results and methods of proofs

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

17

Page 61: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Results

Theorem (Zanella, Bédard and WSK, 2016)

Consider the Gaussian RW MH-MCMC based on proposalvariance σ2/d with target π⊗d, where dπ = f dx = e−φ dx.Suppose I =

∫∞∞ |φ′|2f dx <∞ (finite Fisher information),

and |φ′(x + v)−φ′(x)| < κmax{|v|γ , |v|α}for some κ > 0, 0 < γ < 1, and α > 1.

Let Ed be the corresponding Dirichlet form scaled as above.Ed Mosco-converges to E

[1∧ exp(N (−1

2σ2I, σ2I))

]E∞,

so corresponding L2 semigroups also converge.

Corollary

Suppose in the above that φ′ is globally Lipschitz. Thecorrespondingly scaled processes exhibit weak convergence.

18

Page 62: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Results

Theorem (Zanella, Bédard and WSK, 2016)

Consider the Gaussian RW MH-MCMC based on proposalvariance σ2/d with target π⊗d, where dπ = f dx = e−φ dx.Suppose I =

∫∞∞ |φ′|2f dx <∞ (finite Fisher information),

and |φ′(x + v)−φ′(x)| < κmax{|v|γ , |v|α}for some κ > 0, 0 < γ < 1, and α > 1.

Let Ed be the corresponding Dirichlet form scaled as above.Ed Mosco-converges to E

[1∧ exp(N (−1

2σ2I, σ2I))

]E∞,

so corresponding L2 semigroups also converge.

Corollary

Suppose in the above that φ′ is globally Lipschitz. Thecorrespondingly scaled processes exhibit weak convergence.

18

Page 63: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 1: a CLT result

Lemma (A conditional CLT)

Under the conditions of the Corollary, almost surely (in xwith invariant measure π⊗∞) the log Metropolis-Hastingsratio converges weakly (in W ) as follows as d→∞:

log

d∏i=1

f(xi + σWi√d )

f (xi)

=

d∑i=1

(φ(xi + σWi√

d )−φ(xi))⇒ N (−1

2σ2I, σ2I) .

We may use this to deduce the asymptotic acceptance rate ofthe RW MH-MCMC sampler.

19

Page 64: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 1: a CLT result

Lemma (A conditional CLT)

Under the conditions of the Corollary, almost surely (in xwith invariant measure π⊗∞) the log Metropolis-Hastingsratio converges weakly (in W ) as follows as d→∞:

log

d∏i=1

f(xi + σWi√d )

f (xi)

=

d∑i=1

(φ(xi + σWi√

d )−φ(xi))⇒ N (−1

2σ2I, σ2I) .

We may use this to deduce the asymptotic acceptance rate ofthe RW MH-MCMC sampler.

19

Page 65: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).2. Decompose variance of second summand to deduce

Var[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Page 66: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).

2. Decompose variance of second summand to deduceVar

[∑di=1

σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Page 67: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).2. Decompose variance of second summand to deduce

Var[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Page 68: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).2. Decompose variance of second summand to deduce

Var[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Page 69: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Page 70: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Page 71: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Page 72: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Page 73: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 3: establishing condition (M1)

If hn → h weakly in L2(π⊗∞), show E∞(h) ≤ lim infnEn(hn).Detailed stochastic analysis involves:

1. Set Ψn(h) =√n2 (h(X

(n)0 )− h(X(n)1 )).

2. Integrate against test functionξ(X1:N ,W1:N)I(U < a(X1:N ,W1:N)) for ξ smooth,compact support, U a Uniform(0,1) random variable.Apply Cauchy-Schwarz.

3. Use integration by parts, careful analysis and conditionson φ′.

22

Page 74: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 3: establishing condition (M1)

If hn → h weakly in L2(π⊗∞), show E∞(h) ≤ lim infnEn(hn).Detailed stochastic analysis involves:

1. Set Ψn(h) =√n2 (h(X

(n)0 )− h(X(n)1 )).

2. Integrate against test functionξ(X1:N ,W1:N)I(U < a(X1:N ,W1:N)) for ξ smooth,compact support, U a Uniform(0,1) random variable.Apply Cauchy-Schwarz.

3. Use integration by parts, careful analysis and conditionson φ′.

22

Page 75: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 3: establishing condition (M1)

If hn → h weakly in L2(π⊗∞), show E∞(h) ≤ lim infnEn(hn).Detailed stochastic analysis involves:

1. Set Ψn(h) =√n2 (h(X

(n)0 )− h(X(n)1 )).

2. Integrate against test functionξ(X1:N ,W1:N)I(U < a(X1:N ,W1:N)) for ξ smooth,compact support, U a Uniform(0,1) random variable.Apply Cauchy-Schwarz.

3. Use integration by parts, careful analysis and conditionson φ′.

22

Page 76: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:

there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Page 77: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Page 78: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Page 79: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Page 80: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Page 81: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

24

Page 82: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Page 83: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Page 84: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);

MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Page 85: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Page 86: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Page 87: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Page 88: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Page 89: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

References:

Andrieu, Christophe and Johannes Thoms (2008). “Atutorial on adaptive MCMC”. In: Statistics andComputing 18.4, pp. 343–373.Bédard, Mylène (2007). “Weak convergence ofMetropolis algorithms for non-I.I.D. target distributions”.In: Annals of Applied Probability 17.4, pp. 1222–1244.

Breyer, L A and Gareth O Roberts (2000). “FromMetropolis to diffusions : Gibbs states and optimalscaling”. In: Stochastic Processes and their Applications90.2, pp. 181–206.Brooks, Stephen P, Andrew Gelman, Galin L Jones andXiao-Li Meng (2011). Handbook of Markov Chain MonteCarlo. Boca Raton: Chapman & Hall/CRC, pp. 592+xxv.

26

Page 90: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Durmus, Alain, Sylvain Le Corff, Eric Moulines andGareth O Roberts (2016). “Optimal scaling of theRandom Walk Metropolis algorithm under $Lˆp$ meandifferentiability”. In: arXiv 1604.06664.Geyer, Charlie (1999). “Likelihood inference for spatialpoint processes”. In: Stochastic Geometry: likelihoodand computation. Ed. by Ole E Barndorff-Nielsen, WSKand M N M van Lieshout. Boca Raton: Chapman &Hall/CRC. Chap. 4, pp. 79–140.Hastings, W K (1970). “Monte Carlo sampling methodsusing Markov chains and their applications”. In:Biometrika 57, pp. 97–109.Mattingly, Jonathan C., Natesh S. Pillai andAndrew M. Stuart (2012). “Diffusion limits of the randomwalk metropolis algorithm in high dimensions”. In:Annals of Applied Probability 22.3, pp. 881–890.

27

Page 91: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Metropolis, Nicholas, Arianna W. Rosenbluth,Marshall N. Rosenbluth, Augusta H. Teller andEdward Teller (1953). “Equation of state calculations byfast computing machines”. en. In: Journal ChemicalPhysics 21.6, pp. 1087–1092.Mosco, Umberto (1994). “Composite media andasymptotic Dirichlet forms”. In: Journal of FunctionalAnalysis 123.2, pp. 368–421.Roberts, Gareth O (1998). “Optimal Metropolisalgorithms for product measures on the vertices of ahypercube”. In: Stochastics and Stochastic Reports June2013, pp. 37–41.Roberts, Gareth O, A Gelman and W Gilks (1997). “WeakConvergence and Optimal Scaling of Random WalkAlgorithms”. In: The Annals of Applied Probability 7.1,pp. 110–120.

28

Page 92: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Roberts, Gareth O and Jeffrey S Rosenthal (1998).“Optimal scaling of discrete approximations to Langevindiffusions.”. In: J. R. Statist. Soc. B 60.1, pp. 255–268.Rosenthal, Jeffrey S (2011). “Optimal ProposalDistributions and Adaptive MCMC”. In: Handbook ofMarkov Chain Monte Carlo 1, pp. 93–112.Sun, Wei (1998). “Weak convergence of Dirichletprocesses”. In: Science in China Series A: Mathematics41.1, pp. 8–21.Thompson, Elizabeth A (2005). “MCMC in the analysis ofgenetic data on pedigree”. In: Markov chain MonteCarlo: Innovations and Applications. Ed. by WSK,Faming Liang and Jian-Sheng Wang. Singapore: WorldScientific. Chap. 5, pp. 183–216.Zanella, Giacomo (2015). “Bayesian ComplementaryClustering, MCMC, and Anglo-Saxon Placenames”. PhDThesis. University of Warwick.

29

Page 93: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Zanella, Giacomo (2016). “Random Partition Models andComplementary Clustering of Anglo-Saxon Placenames”.In: Annals of Applied Statistics 9.4, pp. 1792–1822.

Zanella, Giacomo, Mylène Bédard and WSK (2016). “ADirichlet Form approach to MCMC Optimal Scaling”. In:arXiv 1606.01528, 22pp.

30

Page 94: A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Intro MCMC Dirichlet Results Conc Refs

Version 1.34 (Wed Jul 12 06:27:58 2017 +0100)================================================commit d81469c5f14c484aa363616e3d701c6e3fbb1141Author: Wilfrid Kendall <[email protected]>

Made it explicit that Lp mean differentiability still doesn’t cover weak ocnvergencewithout extra regularity: need to beat this!

31