adaptive increasingly rarely markov chain monte carlo (airmcmc) · 2017-07-28 · adaptive mcmc air...

71
Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski (University of Warwick, UK) Cyril Chimisov Gareth O. Roberts (both Warwick) Durham - June 28th, 2017 Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick) AirMCMC

Upload: others

Post on 26-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive Increasingly RarelyMarkov Chain Monte Carlo

(AirMCMC)

Krys Latuszynski(University of Warwick, UK)

Cyril Chimisov Gareth O. Roberts(both Warwick)

Durham - June 28th, 2017

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 2: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

Air MCMC

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 3: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Markov chain Monte Carlo

I let π be a target probability distribution on X , e.g. to evaluate

θ :=

∫X

f (x)π(dx).

I direct sampling from π is not possible or impracticalI MCMC approach is to simulate (Xn)n≥0, an ergodic Markov chain with

transition kernel P and limiting distribution π, and take ergodic averagesas an estimate of θ.

I it is easy to design an ergodic transition kernel P, e.g. using genericMetropolis or Gibbs recipes

I it is difficult to design a transition kernel P with good convergenceproperties, especially if X is high dimesional

I Trying to find an optimal P would be a disaster problem

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 4: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Markov chain Monte Carlo

I let π be a target probability distribution on X , e.g. to evaluate

θ :=

∫X

f (x)π(dx).

I direct sampling from π is not possible or impracticalI MCMC approach is to simulate (Xn)n≥0, an ergodic Markov chain with

transition kernel P and limiting distribution π, and take ergodic averagesas an estimate of θ.

I it is easy to design an ergodic transition kernel P, e.g. using genericMetropolis or Gibbs recipes

I it is difficult to design a transition kernel P with good convergenceproperties, especially if X is high dimesional

I Trying to find an optimal P would be a disaster problem

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 5: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Markov chain Monte Carlo

I let π be a target probability distribution on X , e.g. to evaluate

θ :=

∫X

f (x)π(dx).

I direct sampling from π is not possible or impracticalI MCMC approach is to simulate (Xn)n≥0, an ergodic Markov chain with

transition kernel P and limiting distribution π, and take ergodic averagesas an estimate of θ.

I it is easy to design an ergodic transition kernel P, e.g. using genericMetropolis or Gibbs recipes

I it is difficult to design a transition kernel P with good convergenceproperties, especially if X is high dimesional

I Trying to find an optimal P would be a disaster problem

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 6: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Markov chain Monte Carlo

I let π be a target probability distribution on X , e.g. to evaluate

θ :=

∫X

f (x)π(dx).

I direct sampling from π is not possible or impracticalI MCMC approach is to simulate (Xn)n≥0, an ergodic Markov chain with

transition kernel P and limiting distribution π, and take ergodic averagesas an estimate of θ.

I it is easy to design an ergodic transition kernel P, e.g. using genericMetropolis or Gibbs recipes

I it is difficult to design a transition kernel P with good convergenceproperties, especially if X is high dimesional

I Trying to find an optimal P would be a disaster problem

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 7: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Markov chain Monte Carlo

I let π be a target probability distribution on X , e.g. to evaluate

θ :=

∫X

f (x)π(dx).

I direct sampling from π is not possible or impracticalI MCMC approach is to simulate (Xn)n≥0, an ergodic Markov chain with

transition kernel P and limiting distribution π, and take ergodic averagesas an estimate of θ.

I it is easy to design an ergodic transition kernel P, e.g. using genericMetropolis or Gibbs recipes

I it is difficult to design a transition kernel P with good convergenceproperties, especially if X is high dimesional

I Trying to find an optimal P would be a disaster problem

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 8: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Markov chain Monte Carlo

I let π be a target probability distribution on X , e.g. to evaluate

θ :=

∫X

f (x)π(dx).

I direct sampling from π is not possible or impracticalI MCMC approach is to simulate (Xn)n≥0, an ergodic Markov chain with

transition kernel P and limiting distribution π, and take ergodic averagesas an estimate of θ.

I it is easy to design an ergodic transition kernel P, e.g. using genericMetropolis or Gibbs recipes

I it is difficult to design a transition kernel P with good convergenceproperties, especially if X is high dimesional

I Trying to find an optimal P would be a disaster problem

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 9: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Optimal Scaling of Metropolis-Hastings P

I for Metropolis chains there is a ”prescription” of how to scale proposals asdimension d →∞.

I If σ2d = l2d−1 then based on an elegant mathematical result (Roberts 1997)

I ConsiderZ(d)

t := X(d,1)btdc , then as d →∞,

I Z(d)t converges to the solution Z of the SDE

dZt = h(l)1/2dBt +12

h(l)∇ log f (Zt)dt

I so maximise h(l) to optimise Metropolis-Hastings.I one-to-one correspondence between lopt and mean acceptance rate of 0.234.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 10: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Optimal Scaling of Metropolis-Hastings P

I for Metropolis chains there is a ”prescription” of how to scale proposals asdimension d →∞.

I If σ2d = l2d−1 then based on an elegant mathematical result (Roberts 1997)

I ConsiderZ(d)

t := X(d,1)btdc , then as d →∞,

I Z(d)t converges to the solution Z of the SDE

dZt = h(l)1/2dBt +12

h(l)∇ log f (Zt)dt

I so maximise h(l) to optimise Metropolis-Hastings.I one-to-one correspondence between lopt and mean acceptance rate of 0.234.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 11: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Optimal Scaling of Metropolis-Hastings P

I for Metropolis chains there is a ”prescription” of how to scale proposals asdimension d →∞.

I If σ2d = l2d−1 then based on an elegant mathematical result (Roberts 1997)

I ConsiderZ(d)

t := X(d,1)btdc , then as d →∞,

I Z(d)t converges to the solution Z of the SDE

dZt = h(l)1/2dBt +12

h(l)∇ log f (Zt)dt

I so maximise h(l) to optimise Metropolis-Hastings.I one-to-one correspondence between lopt and mean acceptance rate of 0.234.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 12: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Optimal Scaling of Metropolis-Hastings P

I for Metropolis chains there is a ”prescription” of how to scale proposals asdimension d →∞.

I If σ2d = l2d−1 then based on an elegant mathematical result (Roberts 1997)

I ConsiderZ(d)

t := X(d,1)btdc , then as d →∞,

I Z(d)t converges to the solution Z of the SDE

dZt = h(l)1/2dBt +12

h(l)∇ log f (Zt)dt

I so maximise h(l) to optimise Metropolis-Hastings.I one-to-one correspondence between lopt and mean acceptance rate of 0.234.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 13: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Optimal Scaling of Metropolis-Hastings P

I for Metropolis chains there is a ”prescription” of how to scale proposals asdimension d →∞.

I If σ2d = l2d−1 then based on an elegant mathematical result (Roberts 1997)

I ConsiderZ(d)

t := X(d,1)btdc , then as d →∞,

I Z(d)t converges to the solution Z of the SDE

dZt = h(l)1/2dBt +12

h(l)∇ log f (Zt)dt

I so maximise h(l) to optimise Metropolis-Hastings.I one-to-one correspondence between lopt and mean acceptance rate of 0.234.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 14: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Optimal Scaling of Metropolis-Hastings P

I for Metropolis chains there is a ”prescription” of how to scale proposals asdimension d →∞.

I If σ2d = l2d−1 then based on an elegant mathematical result (Roberts 1997)

I ConsiderZ(d)

t := X(d,1)btdc , then as d →∞,

I Z(d)t converges to the solution Z of the SDE

dZt = h(l)1/2dBt +12

h(l)∇ log f (Zt)dt

I so maximise h(l) to optimise Metropolis-Hastings.I one-to-one correspondence between lopt and mean acceptance rate of 0.234.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 15: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

I Use scale γ of the proposal such that mean acceptance rate of M-H is 0.234I one needs to learn π to apply thisI Trial run? High dimensions? Metropolis within Gibbs?I Adaptive MCMC: update the scale on the fly.I For adaptive scaling Metropolis-Hastings one may use

log(γn) = log(γn−1) + n.7(α(Xn−1,Yn)− 0.234)

I so Pn used for obtaining Xn|Xn−1 may depend on {X0, . . . ,Xn−1}I however now the process is not Markovian, so the possible benefit comes at

the price of more involving theoretical analysis

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 16: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

I Use scale γ of the proposal such that mean acceptance rate of M-H is 0.234I one needs to learn π to apply thisI Trial run? High dimensions? Metropolis within Gibbs?I Adaptive MCMC: update the scale on the fly.I For adaptive scaling Metropolis-Hastings one may use

log(γn) = log(γn−1) + n.7(α(Xn−1,Yn)− 0.234)

I so Pn used for obtaining Xn|Xn−1 may depend on {X0, . . . ,Xn−1}I however now the process is not Markovian, so the possible benefit comes at

the price of more involving theoretical analysis

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 17: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

I Use scale γ of the proposal such that mean acceptance rate of M-H is 0.234I one needs to learn π to apply thisI Trial run? High dimensions? Metropolis within Gibbs?I Adaptive MCMC: update the scale on the fly.I For adaptive scaling Metropolis-Hastings one may use

log(γn) = log(γn−1) + n.7(α(Xn−1,Yn)− 0.234)

I so Pn used for obtaining Xn|Xn−1 may depend on {X0, . . . ,Xn−1}I however now the process is not Markovian, so the possible benefit comes at

the price of more involving theoretical analysis

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 18: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

I Use scale γ of the proposal such that mean acceptance rate of M-H is 0.234I one needs to learn π to apply thisI Trial run? High dimensions? Metropolis within Gibbs?I Adaptive MCMC: update the scale on the fly.I For adaptive scaling Metropolis-Hastings one may use

log(γn) = log(γn−1) + n.7(α(Xn−1,Yn)− 0.234)

I so Pn used for obtaining Xn|Xn−1 may depend on {X0, . . . ,Xn−1}I however now the process is not Markovian, so the possible benefit comes at

the price of more involving theoretical analysis

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 19: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

I Use scale γ of the proposal such that mean acceptance rate of M-H is 0.234I one needs to learn π to apply thisI Trial run? High dimensions? Metropolis within Gibbs?I Adaptive MCMC: update the scale on the fly.I For adaptive scaling Metropolis-Hastings one may use

log(γn) = log(γn−1) + n.7(α(Xn−1,Yn)− 0.234)

I so Pn used for obtaining Xn|Xn−1 may depend on {X0, . . . ,Xn−1}I however now the process is not Markovian, so the possible benefit comes at

the price of more involving theoretical analysis

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 20: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

I Use scale γ of the proposal such that mean acceptance rate of M-H is 0.234I one needs to learn π to apply thisI Trial run? High dimensions? Metropolis within Gibbs?I Adaptive MCMC: update the scale on the fly.I For adaptive scaling Metropolis-Hastings one may use

log(γn) = log(γn−1) + n.7(α(Xn−1,Yn)− 0.234)

I so Pn used for obtaining Xn|Xn−1 may depend on {X0, . . . ,Xn−1}I however now the process is not Markovian, so the possible benefit comes at

the price of more involving theoretical analysis

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 21: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive MCMC

I Use scale γ of the proposal such that mean acceptance rate of M-H is 0.234I one needs to learn π to apply thisI Trial run? High dimensions? Metropolis within Gibbs?I Adaptive MCMC: update the scale on the fly.I For adaptive scaling Metropolis-Hastings one may use

log(γn) = log(γn−1) + n.7(α(Xn−1,Yn)− 0.234)

I so Pn used for obtaining Xn|Xn−1 may depend on {X0, . . . ,Xn−1}I however now the process is not Markovian, so the possible benefit comes at

the price of more involving theoretical analysis

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 22: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 23: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 24: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 25: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 26: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 27: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 28: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 29: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 30: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Success story of Adaptive Metropolis-Hastings

I The adaptation rule is mathematically appealing (diffusion limit)I The adaptation rule is computationally simple (acceptance rate)I It works in applications (seems to improve convergence significantly)I Improves convergence even in settings that are neither high dimensional, nor

satisfy other assumptions needed for the diffusion limitI

I Adaptive scaling beyond Metropolis-Hastings?I YES. Similar optimal scaling results are available for MALA, HMC, etc.

Each yields an adaptive version of the algorithm!I

I What can you optimise beyond scale?I E.g. covariance matrix of the proposal.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 31: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

The fly in the ointment

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I Adaptive MCMC is not MarkovianI The standard MCMC theory does not applyI Theoretical properties of adaptive MCMC have been studied using a range of

techniques, such as: coupling, martingale approximations, stability ofstochastic approximation (Roberts, Rosenthal, Moulines, Andrieu, Vihola,Saksman, Fort, Atchade, ... )

I Still, the theoretical underpinning of Adaptive MCMC is (even) weaker and(even) less operational than that of standard MCMC

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 32: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

The fly in the ointment

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I Adaptive MCMC is not MarkovianI The standard MCMC theory does not applyI Theoretical properties of adaptive MCMC have been studied using a range of

techniques, such as: coupling, martingale approximations, stability ofstochastic approximation (Roberts, Rosenthal, Moulines, Andrieu, Vihola,Saksman, Fort, Atchade, ... )

I Still, the theoretical underpinning of Adaptive MCMC is (even) weaker and(even) less operational than that of standard MCMC

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 33: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

The fly in the ointment

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I Adaptive MCMC is not MarkovianI The standard MCMC theory does not applyI Theoretical properties of adaptive MCMC have been studied using a range of

techniques, such as: coupling, martingale approximations, stability ofstochastic approximation (Roberts, Rosenthal, Moulines, Andrieu, Vihola,Saksman, Fort, Atchade, ... )

I Still, the theoretical underpinning of Adaptive MCMC is (even) weaker and(even) less operational than that of standard MCMC

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 34: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

The fly in the ointment

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I Adaptive MCMC is not MarkovianI The standard MCMC theory does not applyI Theoretical properties of adaptive MCMC have been studied using a range of

techniques, such as: coupling, martingale approximations, stability ofstochastic approximation (Roberts, Rosenthal, Moulines, Andrieu, Vihola,Saksman, Fort, Atchade, ... )

I Still, the theoretical underpinning of Adaptive MCMC is (even) weaker and(even) less operational than that of standard MCMC

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 35: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

The fly in the ointment

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I Adaptive MCMC is not MarkovianI The standard MCMC theory does not applyI Theoretical properties of adaptive MCMC have been studied using a range of

techniques, such as: coupling, martingale approximations, stability ofstochastic approximation (Roberts, Rosenthal, Moulines, Andrieu, Vihola,Saksman, Fort, Atchade, ... )

I Still, the theoretical underpinning of Adaptive MCMC is (even) weaker and(even) less operational than that of standard MCMC

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 36: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Standard assumptions to validate Adaptive MCMC are e.g. as follows:I (DA) Diminishing Adaptation:

limn→∞ Dn = 0, in probability, whereDn = supx∈X ‖Pγn+1(x, ·)− Pγn(x, ·)‖TV .

I (C) Containment:The sequence {Mε(Xn, γn)}∞n=0 is bounded in probability, whereMε : X × Γ→ N is defined asMε(x, γ) := inf{n ≥ 1 : ‖Pn

γ(x, ·)− π(·)‖TV ≤ ε}.I DA + C guarantee ergodicity, i.e. convergence in distribution

(Roberts + Rosenthal 2007)I and also nondeterioration (KL + Rosenthal 2014)I but for SLLN, you need additional conditions!

(Roberts + Rosenthal 2007; Fort + Moulines + Priouret 2011; Atchade + Fort2010)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 37: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Standard assumptions to validate Adaptive MCMC are e.g. as follows:I (DA) Diminishing Adaptation:

limn→∞ Dn = 0, in probability, whereDn = supx∈X ‖Pγn+1(x, ·)− Pγn(x, ·)‖TV .

I (C) Containment:The sequence {Mε(Xn, γn)}∞n=0 is bounded in probability, whereMε : X × Γ→ N is defined asMε(x, γ) := inf{n ≥ 1 : ‖Pn

γ(x, ·)− π(·)‖TV ≤ ε}.I DA + C guarantee ergodicity, i.e. convergence in distribution

(Roberts + Rosenthal 2007)I and also nondeterioration (KL + Rosenthal 2014)I but for SLLN, you need additional conditions!

(Roberts + Rosenthal 2007; Fort + Moulines + Priouret 2011; Atchade + Fort2010)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 38: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Standard assumptions to validate Adaptive MCMC are e.g. as follows:I (DA) Diminishing Adaptation:

limn→∞ Dn = 0, in probability, whereDn = supx∈X ‖Pγn+1(x, ·)− Pγn(x, ·)‖TV .

I (C) Containment:The sequence {Mε(Xn, γn)}∞n=0 is bounded in probability, whereMε : X × Γ→ N is defined asMε(x, γ) := inf{n ≥ 1 : ‖Pn

γ(x, ·)− π(·)‖TV ≤ ε}.I DA + C guarantee ergodicity, i.e. convergence in distribution

(Roberts + Rosenthal 2007)I and also nondeterioration (KL + Rosenthal 2014)I but for SLLN, you need additional conditions!

(Roberts + Rosenthal 2007; Fort + Moulines + Priouret 2011; Atchade + Fort2010)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 39: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Standard assumptions to validate Adaptive MCMC are e.g. as follows:I (DA) Diminishing Adaptation:

limn→∞ Dn = 0, in probability, whereDn = supx∈X ‖Pγn+1(x, ·)− Pγn(x, ·)‖TV .

I (C) Containment:The sequence {Mε(Xn, γn)}∞n=0 is bounded in probability, whereMε : X × Γ→ N is defined asMε(x, γ) := inf{n ≥ 1 : ‖Pn

γ(x, ·)− π(·)‖TV ≤ ε}.I DA + C guarantee ergodicity, i.e. convergence in distribution

(Roberts + Rosenthal 2007)I and also nondeterioration (KL + Rosenthal 2014)I but for SLLN, you need additional conditions!

(Roberts + Rosenthal 2007; Fort + Moulines + Priouret 2011; Atchade + Fort2010)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 40: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Standard assumptions to validate Adaptive MCMC are e.g. as follows:I (DA) Diminishing Adaptation:

limn→∞ Dn = 0, in probability, whereDn = supx∈X ‖Pγn+1(x, ·)− Pγn(x, ·)‖TV .

I (C) Containment:The sequence {Mε(Xn, γn)}∞n=0 is bounded in probability, whereMε : X × Γ→ N is defined asMε(x, γ) := inf{n ≥ 1 : ‖Pn

γ(x, ·)− π(·)‖TV ≤ ε}.I DA + C guarantee ergodicity, i.e. convergence in distribution

(Roberts + Rosenthal 2007)I and also nondeterioration (KL + Rosenthal 2014)I but for SLLN, you need additional conditions!

(Roberts + Rosenthal 2007; Fort + Moulines + Priouret 2011; Atchade + Fort2010)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 41: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Standard assumptions to validate Adaptive MCMC are e.g. as follows:I (DA) Diminishing Adaptation:

limn→∞ Dn = 0, in probability, whereDn = supx∈X ‖Pγn+1(x, ·)− Pγn(x, ·)‖TV .

I (C) Containment:The sequence {Mε(Xn, γn)}∞n=0 is bounded in probability, whereMε : X × Γ→ N is defined asMε(x, γ) := inf{n ≥ 1 : ‖Pn

γ(x, ·)− π(·)‖TV ≤ ε}.I DA + C guarantee ergodicity, i.e. convergence in distribution

(Roberts + Rosenthal 2007)I and also nondeterioration (KL + Rosenthal 2014)I but for SLLN, you need additional conditions!

(Roberts + Rosenthal 2007; Fort + Moulines + Priouret 2011; Atchade + Fort2010)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 42: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Two setting to verify containment:I (SGE) Simultaneous Geometric Ergodicity:

PγV(x) ≤ λV(x) + bIC(x),Pγ(x, ·) ≥ δν(·) for all x ∈ C,same λ, b, C, δ, ν for all γ ∈ Γ

I (SPE) Simultaneous Polynomial Ergodicity:PγV(x)− V(x) ≤ −cVα(x) + bIC,Pγ(x, ·) ≥ δν(·) for all x ∈ C,same c, α, b, C, δ, ν for all γ ∈ Γ

I How do you verify SGE or SPE?I You ask Jim Hobert!I It has been done for fairly general classes of Adaptive Metropolis under tail

decay conditions of π (Bai + Roberts + Rosenthal 2011)I For similarly general Adaptive Metropolis within Adaptive Gibbs

(KL + Roberts + Rosenthal 2013)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 43: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Two setting to verify containment:I (SGE) Simultaneous Geometric Ergodicity:

PγV(x) ≤ λV(x) + bIC(x),Pγ(x, ·) ≥ δν(·) for all x ∈ C,same λ, b, C, δ, ν for all γ ∈ Γ

I (SPE) Simultaneous Polynomial Ergodicity:PγV(x)− V(x) ≤ −cVα(x) + bIC,Pγ(x, ·) ≥ δν(·) for all x ∈ C,same c, α, b, C, δ, ν for all γ ∈ Γ

I How do you verify SGE or SPE?I You ask Jim Hobert!I It has been done for fairly general classes of Adaptive Metropolis under tail

decay conditions of π (Bai + Roberts + Rosenthal 2011)I For similarly general Adaptive Metropolis within Adaptive Gibbs

(KL + Roberts + Rosenthal 2013)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 44: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Two setting to verify containment:I (SGE) Simultaneous Geometric Ergodicity:

PγV(x) ≤ λV(x) + bIC(x),Pγ(x, ·) ≥ δν(·) for all x ∈ C,same λ, b, C, δ, ν for all γ ∈ Γ

I (SPE) Simultaneous Polynomial Ergodicity:PγV(x)− V(x) ≤ −cVα(x) + bIC,Pγ(x, ·) ≥ δν(·) for all x ∈ C,same c, α, b, C, δ, ν for all γ ∈ Γ

I How do you verify SGE or SPE?I You ask Jim Hobert!I It has been done for fairly general classes of Adaptive Metropolis under tail

decay conditions of π (Bai + Roberts + Rosenthal 2011)I For similarly general Adaptive Metropolis within Adaptive Gibbs

(KL + Roberts + Rosenthal 2013)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 45: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Two setting to verify containment:I (SGE) Simultaneous Geometric Ergodicity:

PγV(x) ≤ λV(x) + bIC(x),Pγ(x, ·) ≥ δν(·) for all x ∈ C,same λ, b, C, δ, ν for all γ ∈ Γ

I (SPE) Simultaneous Polynomial Ergodicity:PγV(x)− V(x) ≤ −cVα(x) + bIC,Pγ(x, ·) ≥ δν(·) for all x ∈ C,same c, α, b, C, δ, ν for all γ ∈ Γ

I How do you verify SGE or SPE?I You ask Jim Hobert!I It has been done for fairly general classes of Adaptive Metropolis under tail

decay conditions of π (Bai + Roberts + Rosenthal 2011)I For similarly general Adaptive Metropolis within Adaptive Gibbs

(KL + Roberts + Rosenthal 2013)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 46: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Two setting to verify containment:I (SGE) Simultaneous Geometric Ergodicity:

PγV(x) ≤ λV(x) + bIC(x),Pγ(x, ·) ≥ δν(·) for all x ∈ C,same λ, b, C, δ, ν for all γ ∈ Γ

I (SPE) Simultaneous Polynomial Ergodicity:PγV(x)− V(x) ≤ −cVα(x) + bIC,Pγ(x, ·) ≥ δν(·) for all x ∈ C,same c, α, b, C, δ, ν for all γ ∈ Γ

I How do you verify SGE or SPE?I You ask Jim Hobert!I It has been done for fairly general classes of Adaptive Metropolis under tail

decay conditions of π (Bai + Roberts + Rosenthal 2011)I For similarly general Adaptive Metropolis within Adaptive Gibbs

(KL + Roberts + Rosenthal 2013)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 47: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Two setting to verify containment:I (SGE) Simultaneous Geometric Ergodicity:

PγV(x) ≤ λV(x) + bIC(x),Pγ(x, ·) ≥ δν(·) for all x ∈ C,same λ, b, C, δ, ν for all γ ∈ Γ

I (SPE) Simultaneous Polynomial Ergodicity:PγV(x)− V(x) ≤ −cVα(x) + bIC,Pγ(x, ·) ≥ δν(·) for all x ∈ C,same c, α, b, C, δ, ν for all γ ∈ Γ

I How do you verify SGE or SPE?I You ask Jim Hobert!I It has been done for fairly general classes of Adaptive Metropolis under tail

decay conditions of π (Bai + Roberts + Rosenthal 2011)I For similarly general Adaptive Metropolis within Adaptive Gibbs

(KL + Roberts + Rosenthal 2013)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 48: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Post-mortem of the fly

I Two setting to verify containment:I (SGE) Simultaneous Geometric Ergodicity:

PγV(x) ≤ λV(x) + bIC(x),Pγ(x, ·) ≥ δν(·) for all x ∈ C,same λ, b, C, δ, ν for all γ ∈ Γ

I (SPE) Simultaneous Polynomial Ergodicity:PγV(x)− V(x) ≤ −cVα(x) + bIC,Pγ(x, ·) ≥ δν(·) for all x ∈ C,same c, α, b, C, δ, ν for all γ ∈ Γ

I How do you verify SGE or SPE?I You ask Jim Hobert!I It has been done for fairly general classes of Adaptive Metropolis under tail

decay conditions of π (Bai + Roberts + Rosenthal 2011)I For similarly general Adaptive Metropolis within Adaptive Gibbs

(KL + Roberts + Rosenthal 2013)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 49: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive Gibbs Sampler - a generic algorithm

I AdapRSG1. Set pn := Rn(pn−1,Xn−1, . . . ,X0) ∈ Y ⊂ [0, 1]d

2. Choose coordinate i ∈ {1, . . . , d} according to selection probabilities pn

3. Draw Y ∼ π(·|Xn−1,−i)4. Set Xn := (Xn−1,1, . . . ,Xn−1,i−1, Y,Xn−1,i+1, . . . ,Xn−1,d)

I Given the target distribution π, what are the optimal selection probabilities p?I Pretend π is a Gaussian - optimal p is known for Gaussians - and it works

outside the Gaussian class.(Chimisov + KL + Roberts 2017)

I How to verify containment?I Simultaneous Geometric Drift will not work!!!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 50: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive Gibbs Sampler - a generic algorithm

I AdapRSG1. Set pn := Rn(pn−1,Xn−1, . . . ,X0) ∈ Y ⊂ [0, 1]d

2. Choose coordinate i ∈ {1, . . . , d} according to selection probabilities pn

3. Draw Y ∼ π(·|Xn−1,−i)4. Set Xn := (Xn−1,1, . . . ,Xn−1,i−1, Y,Xn−1,i+1, . . . ,Xn−1,d)

I Given the target distribution π, what are the optimal selection probabilities p?I Pretend π is a Gaussian - optimal p is known for Gaussians - and it works

outside the Gaussian class.(Chimisov + KL + Roberts 2017)

I How to verify containment?I Simultaneous Geometric Drift will not work!!!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 51: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive Gibbs Sampler - a generic algorithm

I AdapRSG1. Set pn := Rn(pn−1,Xn−1, . . . ,X0) ∈ Y ⊂ [0, 1]d

2. Choose coordinate i ∈ {1, . . . , d} according to selection probabilities pn

3. Draw Y ∼ π(·|Xn−1,−i)4. Set Xn := (Xn−1,1, . . . ,Xn−1,i−1, Y,Xn−1,i+1, . . . ,Xn−1,d)

I Given the target distribution π, what are the optimal selection probabilities p?I Pretend π is a Gaussian - optimal p is known for Gaussians - and it works

outside the Gaussian class.(Chimisov + KL + Roberts 2017)

I How to verify containment?I Simultaneous Geometric Drift will not work!!!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 52: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive Gibbs Sampler - a generic algorithm

I AdapRSG1. Set pn := Rn(pn−1,Xn−1, . . . ,X0) ∈ Y ⊂ [0, 1]d

2. Choose coordinate i ∈ {1, . . . , d} according to selection probabilities pn

3. Draw Y ∼ π(·|Xn−1,−i)4. Set Xn := (Xn−1,1, . . . ,Xn−1,i−1, Y,Xn−1,i+1, . . . ,Xn−1,d)

I Given the target distribution π, what are the optimal selection probabilities p?I Pretend π is a Gaussian - optimal p is known for Gaussians - and it works

outside the Gaussian class.(Chimisov + KL + Roberts 2017)

I How to verify containment?I Simultaneous Geometric Drift will not work!!!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 53: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

Adaptive Gibbs Sampler - a generic algorithm

I AdapRSG1. Set pn := Rn(pn−1,Xn−1, . . . ,X0) ∈ Y ⊂ [0, 1]d

2. Choose coordinate i ∈ {1, . . . , d} according to selection probabilities pn

3. Draw Y ∼ π(·|Xn−1,−i)4. Set Xn := (Xn−1,1, . . . ,Xn−1,i−1, Y,Xn−1,i+1, . . . ,Xn−1,d)

I Given the target distribution π, what are the optimal selection probabilities p?I Pretend π is a Gaussian - optimal p is known for Gaussians - and it works

outside the Gaussian class.(Chimisov + KL + Roberts 2017)

I How to verify containment?I Simultaneous Geometric Drift will not work!!!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 54: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - Adapting increasingly rarely

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I How tweak the strategy to make theory easier?I Do we need to adapt in every step?I How about adapting increasingly rarely?I AirMCMC Sampler

Initiate X0 ∈ X , γ0 ∈ Γ. γ := γ0 k := 1, n := 0.(1) For i = 1, .., nk

1.1. sample Xn+i ∼ Pγ(Xn+i−1, ·);1.2. given {X0, ..,Xn+i, γ0, .., γn+i−1} update γn+i according to some adaptation rule.

(2) Set n := n + nk, k := k + 1. γ := γn.

I Will such a strategy be efficient? With say nk = ckβ

I Will it be mathematically more tractable?

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 55: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - Adapting increasingly rarely

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I How tweak the strategy to make theory easier?I Do we need to adapt in every step?I How about adapting increasingly rarely?I AirMCMC Sampler

Initiate X0 ∈ X , γ0 ∈ Γ. γ := γ0 k := 1, n := 0.(1) For i = 1, .., nk

1.1. sample Xn+i ∼ Pγ(Xn+i−1, ·);1.2. given {X0, ..,Xn+i, γ0, .., γn+i−1} update γn+i according to some adaptation rule.

(2) Set n := n + nk, k := k + 1. γ := γn.

I Will such a strategy be efficient? With say nk = ckβ

I Will it be mathematically more tractable?

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 56: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - Adapting increasingly rarely

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I How tweak the strategy to make theory easier?I Do we need to adapt in every step?I How about adapting increasingly rarely?I AirMCMC Sampler

Initiate X0 ∈ X , γ0 ∈ Γ. γ := γ0 k := 1, n := 0.(1) For i = 1, .., nk

1.1. sample Xn+i ∼ Pγ(Xn+i−1, ·);1.2. given {X0, ..,Xn+i, γ0, .., γn+i−1} update γn+i according to some adaptation rule.

(2) Set n := n + nk, k := k + 1. γ := γn.

I Will such a strategy be efficient? With say nk = ckβ

I Will it be mathematically more tractable?

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 57: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - Adapting increasingly rarely

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I How tweak the strategy to make theory easier?I Do we need to adapt in every step?I How about adapting increasingly rarely?I AirMCMC Sampler

Initiate X0 ∈ X , γ0 ∈ Γ. γ := γ0 k := 1, n := 0.(1) For i = 1, .., nk

1.1. sample Xn+i ∼ Pγ(Xn+i−1, ·);1.2. given {X0, ..,Xn+i, γ0, .., γn+i−1} update γn+i according to some adaptation rule.

(2) Set n := n + nk, k := k + 1. γ := γn.

I Will such a strategy be efficient? With say nk = ckβ

I Will it be mathematically more tractable?

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 58: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - Adapting increasingly rarely

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I How tweak the strategy to make theory easier?I Do we need to adapt in every step?I How about adapting increasingly rarely?I AirMCMC Sampler

Initiate X0 ∈ X , γ0 ∈ Γ. γ := γ0 k := 1, n := 0.(1) For i = 1, .., nk

1.1. sample Xn+i ∼ Pγ(Xn+i−1, ·);1.2. given {X0, ..,Xn+i, γ0, .., γn+i−1} update γn+i according to some adaptation rule.

(2) Set n := n + nk, k := k + 1. γ := γn.

I Will such a strategy be efficient? With say nk = ckβ

I Will it be mathematically more tractable?

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 59: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - Adapting increasingly rarely

I Pγ , γ ∈ Γ - a parametric family of π-invariant kernels;Adaptive MCMC steps:(1) Sample Xn+1 from Pγn (Xn, ·).(2) Given {X0, ..,Xn+1, γ

0, .., γn} update γn+1 according to some adaptation rule.I How tweak the strategy to make theory easier?I Do we need to adapt in every step?I How about adapting increasingly rarely?I AirMCMC Sampler

Initiate X0 ∈ X , γ0 ∈ Γ. γ := γ0 k := 1, n := 0.(1) For i = 1, .., nk

1.1. sample Xn+i ∼ Pγ(Xn+i−1, ·);1.2. given {X0, ..,Xn+i, γ0, .., γn+i−1} update γn+i according to some adaptation rule.

(2) Set n := n + nk, k := k + 1. γ := γn.

I Will such a strategy be efficient? With say nk = ckβ

I Will it be mathematically more tractable?

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 60: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - a simulation study

I π(x) = l(|x|)|x|1+r , x ∈ R,

I Air version of RWM adaptive scalingI The example is polynomially ergodic (not easy for the sampler)I AirRWM

Initiate X0 ∈ R, γ ∈ [q1, q2]. k := 1, n := 0, a sequence {ck}k≥1.(1) For i = 1, .., nk

(1.1.) sample Y ∼ N(Xn+i−1, γ), aγ :=φ(Y)

φ(Xn+i−1);

(1.2.) Xn+i :=

{Y with probability aγ ,

Xn+i−1 with probability 1− aγ ;(1.3.) a := a + aγ .

If i = nk thenγ := exp

(log(γ) + cn

(ank− 0.44

)).

(2) Set n := n + nk, k := k + 1, a := 0.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 61: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - a simulation study

I π(x) = l(|x|)|x|1+r , x ∈ R,

I Air version of RWM adaptive scalingI The example is polynomially ergodic (not easy for the sampler)I AirRWM

Initiate X0 ∈ R, γ ∈ [q1, q2]. k := 1, n := 0, a sequence {ck}k≥1.(1) For i = 1, .., nk

(1.1.) sample Y ∼ N(Xn+i−1, γ), aγ :=φ(Y)

φ(Xn+i−1);

(1.2.) Xn+i :=

{Y with probability aγ ,

Xn+i−1 with probability 1− aγ ;(1.3.) a := a + aγ .

If i = nk thenγ := exp

(log(γ) + cn

(ank− 0.44

)).

(2) Set n := n + nk, k := k + 1, a := 0.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 62: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - a simulation study

I π(x) = l(|x|)|x|1+r , x ∈ R,

I Air version of RWM adaptive scalingI The example is polynomially ergodic (not easy for the sampler)I AirRWM

Initiate X0 ∈ R, γ ∈ [q1, q2]. k := 1, n := 0, a sequence {ck}k≥1.(1) For i = 1, .., nk

(1.1.) sample Y ∼ N(Xn+i−1, γ), aγ :=φ(Y)

φ(Xn+i−1);

(1.2.) Xn+i :=

{Y with probability aγ ,

Xn+i−1 with probability 1− aγ ;(1.3.) a := a + aγ .

If i = nk thenγ := exp

(log(γ) + cn

(ank− 0.44

)).

(2) Set n := n + nk, k := k + 1, a := 0.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 63: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - a simulation study

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

RWM

Lag

AC

F

0 10 20 30 40 50

0.0

0.2

0.4

0.6

ARWM

Lag

AC

F

0 10 20 30 40 50

0.0

0.2

0.4

0.6

AirRWM

Lag

AC

F

Figure: Autocorrelations (ACF)

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 64: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - a simulation study

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

Estimated quantiles

0.00

00.

005

0.01

00.

015

0.02

00.

025

0.03

0

RWM ARWM AirRWM

Figure: Error in quantile levels estimation. X-axis – quantile levels. Y-axis – error inestimation.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 65: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - a simulation study0.

000.

020.

040.

060.

080.

10

Estimation of 0.95 quantile. Running error.

Iterations

Err

or

0 20000 40000 60000 80000 1e+05

RWMARWMAirRWM c=1AirRWM c=10AirRWM c=100AirRWM c=1000

Figure: Error in estimation of a quantile at 0.95 level. X-axis – number of iterations afterburn-in. Y-axis – error in estimation.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 66: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - theory - the SGE case

I Theorem 1I Kernels SGEI nk ≥ ckβ , β > 0I sup |f (x)|

V1/2(x)<∞

Then WLLN holds, and also for any δ ∈ (0, 2)

limN→∞ E∣∣∣ 1

N

∑N−1i=0 f (Xi)− φ(f )

∣∣∣2−δ = 0,

I Theorem 2I Kernels SGE and reversibleI dν

dπ ∈ L2(X , π)I nk ≥ ckβ , β > 0I sup |f (x)|

2(β+1 −δ(x)

<∞, for some δ > 0,

Then SLLN holds.I Note that diminishing adaptation is not needed!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 67: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - theory - the SGE case

I Theorem 1I Kernels SGEI nk ≥ ckβ , β > 0I sup |f (x)|

V1/2(x)<∞

Then WLLN holds, and also for any δ ∈ (0, 2)

limN→∞ E∣∣∣ 1

N

∑N−1i=0 f (Xi)− φ(f )

∣∣∣2−δ = 0,

I Theorem 2I Kernels SGE and reversibleI dν

dπ ∈ L2(X , π)I nk ≥ ckβ , β > 0I sup |f (x)|

2(β+1 −δ(x)

<∞, for some δ > 0,

Then SLLN holds.I Note that diminishing adaptation is not needed!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 68: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - theory - the SGE case

I Theorem 1I Kernels SGEI nk ≥ ckβ , β > 0I sup |f (x)|

V1/2(x)<∞

Then WLLN holds, and also for any δ ∈ (0, 2)

limN→∞ E∣∣∣ 1

N

∑N−1i=0 f (Xi)− φ(f )

∣∣∣2−δ = 0,

I Theorem 2I Kernels SGE and reversibleI dν

dπ ∈ L2(X , π)I nk ≥ ckβ , β > 0I sup |f (x)|

2(β+1 −δ(x)

<∞, for some δ > 0,

Then SLLN holds.I Note that diminishing adaptation is not needed!

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 69: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - theory - the local SGE case

I Theorem 3I Γ is compactI Kernels are locally SGEI nk ≥ ckβ , β > 0, and adaptation takes place if in a compact set BI sup |f (x)|

V1/2i (x)

<∞

Then WLLN holds, and also for any δ ∈ (0, 2)

limN→∞ E∣∣∣ 1

N

∑N−1i=0 f (Xi)− φ(f )

∣∣∣2−δ = 0,

I Theorem 4I Γ is compactI Kernels are locally SGE and reversibleI dν

dπ ∈ L2(X , π)I nk ≥ ckβ , β > 0, and adaptation takes place if in a compact set BI sup |f (x)|

2(β+1 −δ

i (x)

<∞, for some δ > 0,

Then SLLN holds.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 70: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - theory - the local SGE case

I Theorem 3I Γ is compactI Kernels are locally SGEI nk ≥ ckβ , β > 0, and adaptation takes place if in a compact set BI sup |f (x)|

V1/2i (x)

<∞

Then WLLN holds, and also for any δ ∈ (0, 2)

limN→∞ E∣∣∣ 1

N

∑N−1i=0 f (Xi)− φ(f )

∣∣∣2−δ = 0,

I Theorem 4I Γ is compactI Kernels are locally SGE and reversibleI dν

dπ ∈ L2(X , π)I nk ≥ ckβ , β > 0, and adaptation takes place if in a compact set BI sup |f (x)|

2(β+1 −δ

i (x)

<∞, for some δ > 0,

Then SLLN holds.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC

Page 71: Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) · 2017-07-28 · Adaptive MCMC Air MCMC Adaptive Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) Krys Latuszynski

Adaptive MCMCAir MCMC

AirMCMC - theory - the SPE case

I Theorem 5I Kernels SPE with α > 2/3I β > 2α(1−α)

2α−1 if α < 34

β > α4α−2 if α ≥ 3

4 .

I nk ≥ ckβ , β > 0I sup |f (x)|

V3/2α−1(x)<∞

Then WLLN holds.

Krys Latuszynski(University of Warwick, UK) Cyril Chimisov, Gareth O. Roberts (both Warwick)AirMCMC