riemannian manifolds and statistical...

50
CSML Lunch Time Talk Friday 23rd Movember 2012 Ben Calderhead Research Fellow CoMPLEX University College London Riemannian Manifolds and Statistical Models The use geometry in Markov chain Monte Carlo

Upload: others

Post on 19-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

CSML Lunch Time TalkFriday 23rd Movember 2012

Ben CalderheadResearch Fellow

CoMPLEXUniversity College London

Riemannian Manifolds and Statistical Models

The use geometry in Markov chain Monte Carlo

Page 3: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

“Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods”Mark Girolami and Ben CalderheadJournal of the Royal Statistical Society: Series B (with discussion)

www.ucl.ac.uk/statistics/research/csi

Page 4: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

“Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods”Mark Girolami and Ben CalderheadJournal of the Royal Statistical Society: Series B (with discussion)

www.ucl.ac.uk/statistics/research/csi

Bernhard Riemann A manifold William HamiltonPaul Langevin A casino...

Page 5: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Differential Geometric MCMC Methods and ApplicationsBen CalderheadPhD Thesis, University of Glasgow (2011)

(google “ben calderhead thesis”)

Page 6: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Many statistical models have a natural geometric structure that is Riemannian in nature.

Page 7: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Many statistical models have a natural geometric structure that is Riemannian in nature.

Can we use this geometric information to design better Markov chain Monte Carlo (MCMC) algorithms?

Page 8: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

• Ben CalderheadDifferential Geometric MCMC Methods and ApplicationsPhD Thesis, University of Glasgow (2011)

• Ben Calderhead & Mark GirolamiStatistical Analysis of Nonlinear Dynamical Systems using Differential Geometric Sampling MethodsJournal of the Royal Society, Interface Focus (2011)

• Mark Girolami & Ben CalderheadRiemann Manifold Langevin and Hamiltonian Monte Carlo MethodsJournal of the Royal Statistical Society - Series B (with discussion), (2011) Vol. 73(2), 123-214

• Ben Calderhead & Mark GirolamiEstimating Bayes Factors via Thermodynamic Integration and Population MCMCComputational Statistics and Data Analysis, (2009), Elsevier Press, Vol. 53, 4028-4045

SOME MOTIVATION

Page 9: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

HIGHLY NONLINEAR MODELS

• Often sparse, uncertain data with unobserved species

• Often multiple network topologies consistent with the known biology

Page 10: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

MODELLING QUESTIONS

• Which parameters should we use for a given model?

• Which model structure is most appropriate to describe the system of interest?

BCR‐ABL

pBCR‐ABL

JAK2

pJAK2

pSTAT5

STAT5

TKI JAKI

GrowthFactor

Nucleus

Self

phosphoryla@on

TKI_BCR‐ABL JAKI_JAK2

Dephosphorylation

and nuclear export

k4

k3

k10 k9

k10

k9

k7 k8

k7

k8

k1

k2

k5 k6

k12

k11

pBCR‐ABL

pJAK2

k13

k14

k16

k15

Model 2:

with interaction between BCR-ABL and JAK2

BCR‐ABL

pBCR‐ABL

JAK2

pJAK2

pSTAT5

STAT5

TKI JAKI

GrowthFactor

Nucleus

Self

phosphoryla@on

TKI_BCR‐ABL JAKI_JAK2

Model 1:

no interaction between BCR-ABL and JAK2

Dephosphorylation

and nuclear export

k4

k3

k10 k9

k10

k9

k7 k8

k7

k8

k1

k2

k5 k6

k12

k11

Nonlinear dynamics, correlation structure, identifiability...all create problems for standard MCMC.

Page 11: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

BAYESIAN APPROACH

Posterior distribution characterises the uncertainty in the parameters

1.4 Monte Carlo Methods

and rational framework for making sense of the world around us, letting us explicitly

state our assumptions and update our current knowledge in light of newly acquired

data.

Probability theory has been around since the 18th century (16, 51) as a means of

making inferences in light of incomplete information. The axiomatic formulation of

probability theory by Kolmogorov (110), together with a derivation by Cox (39) from a

set of postulates that satisfy the desirable properties we would wish to have in a system

of reasoning, have made Bayesian methods arguably the preferred method for inductive

inference. Recent contributions by Knuth and Skilling (180) add further support for

the use of Bayesian probability; based on symmetry assumptions, they show that one is

led to the probability calculus as the only logical and consistent calculus for reasoning

under uncertainty.

Bayes theorem is simply an expression based on conditional probability and it states

the conditional probability of an event A given an event B in terms of the probability of

A, and the probability of B given A. In the context of a statistical model, the posterior

distribution of the model parameters, ✓ = [✓1

...✓D]T , given the data, y = [y1

...yN ]T , is

proportional to the prior distribution of the parameters multiplied by the likelihood of

the data given the parameters.

p(✓|y) = p(y|✓)p(✓)p(y)

=p(y|✓)p(✓)Rp(y|✓)p(✓)d✓

(1.3)

Here the marginal likelihood in the denominator normalises the posterior density, such

that it integrates to one and is a correctly defined probability distribution.

1.4 Monte Carlo Methods

For the purpose of making predictions, we often want to calculate expectations of a

function with respect to the posterior distribution

µf = Ep(✓|y)(f(✓)) =

Zf(✓)p(✓|y)d✓ (1.4)

Since calculating an expectation is essentially just the same task as evaluating an

integral, we could use quadrature methods and other numerical integration schemes.

9

Page 12: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

BAYESIAN APPROACH

Posterior distribution characterises the uncertainty in the parameters

1.4 Monte Carlo Methods

and rational framework for making sense of the world around us, letting us explicitly

state our assumptions and update our current knowledge in light of newly acquired

data.

Probability theory has been around since the 18th century (16, 51) as a means of

making inferences in light of incomplete information. The axiomatic formulation of

probability theory by Kolmogorov (110), together with a derivation by Cox (39) from a

set of postulates that satisfy the desirable properties we would wish to have in a system

of reasoning, have made Bayesian methods arguably the preferred method for inductive

inference. Recent contributions by Knuth and Skilling (180) add further support for

the use of Bayesian probability; based on symmetry assumptions, they show that one is

led to the probability calculus as the only logical and consistent calculus for reasoning

under uncertainty.

Bayes theorem is simply an expression based on conditional probability and it states

the conditional probability of an event A given an event B in terms of the probability of

A, and the probability of B given A. In the context of a statistical model, the posterior

distribution of the model parameters, ✓ = [✓1

...✓D]T , given the data, y = [y1

...yN ]T , is

proportional to the prior distribution of the parameters multiplied by the likelihood of

the data given the parameters.

p(✓|y) = p(y|✓)p(✓)p(y)

=p(y|✓)p(✓)Rp(y|✓)p(✓)d✓

(1.3)

Here the marginal likelihood in the denominator normalises the posterior density, such

that it integrates to one and is a correctly defined probability distribution.

1.4 Monte Carlo Methods

For the purpose of making predictions, we often want to calculate expectations of a

function with respect to the posterior distribution

µf = Ep(✓|y)(f(✓)) =

Zf(✓)p(✓|y)d✓ (1.4)

Since calculating an expectation is essentially just the same task as evaluating an

integral, we could use quadrature methods and other numerical integration schemes.

9

We can consider an MCMC approach the “gold standard” as we can calculate quantities to arbitrary precision, given sufficient samples

Page 13: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

BAYESIAN APPROACH

Posterior distribution characterises the uncertainty in the parameters

1.4 Monte Carlo Methods

and rational framework for making sense of the world around us, letting us explicitly

state our assumptions and update our current knowledge in light of newly acquired

data.

Probability theory has been around since the 18th century (16, 51) as a means of

making inferences in light of incomplete information. The axiomatic formulation of

probability theory by Kolmogorov (110), together with a derivation by Cox (39) from a

set of postulates that satisfy the desirable properties we would wish to have in a system

of reasoning, have made Bayesian methods arguably the preferred method for inductive

inference. Recent contributions by Knuth and Skilling (180) add further support for

the use of Bayesian probability; based on symmetry assumptions, they show that one is

led to the probability calculus as the only logical and consistent calculus for reasoning

under uncertainty.

Bayes theorem is simply an expression based on conditional probability and it states

the conditional probability of an event A given an event B in terms of the probability of

A, and the probability of B given A. In the context of a statistical model, the posterior

distribution of the model parameters, ✓ = [✓1

...✓D]T , given the data, y = [y1

...yN ]T , is

proportional to the prior distribution of the parameters multiplied by the likelihood of

the data given the parameters.

p(✓|y) = p(y|✓)p(✓)p(y)

=p(y|✓)p(✓)Rp(y|✓)p(✓)d✓

(1.3)

Here the marginal likelihood in the denominator normalises the posterior density, such

that it integrates to one and is a correctly defined probability distribution.

1.4 Monte Carlo Methods

For the purpose of making predictions, we often want to calculate expectations of a

function with respect to the posterior distribution

µf = Ep(✓|y)(f(✓)) =

Zf(✓)p(✓|y)d✓ (1.4)

Since calculating an expectation is essentially just the same task as evaluating an

integral, we could use quadrature methods and other numerical integration schemes.

9

We can consider an MCMC approach the “gold standard” as we can calculate quantities to arbitrary precision, given sufficient samples

The challenge: How do we do this most efficiently?

Page 14: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

ODES AS STATISTICAL MODELS

We can define the log-likelihood as

Derivatives of the LL require the sensitivities

Page 15: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

SOLVING ODES

General form for an ODE and sensitivities

Sensitivities of an ODE may be computed

Sensitivity information is important for exploring this parameter space

Page 16: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

POSTERIOR DISTRIBUTIONS

• Strong correlation structure

• Possible heavy tails

• High dimensional?

Page 17: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

QUICK REMINDER OF THE MCMC BASICS

Page 18: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

REMINDER OF MCMC BASICS

Rate of convergence is independent of dimensionalitygiven independently drawn samples.

We can use a Monte Carlo estimator:

Page 19: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

REMINDER OF MCMC BASICS

We may obtain samples from an ergodic Markov process with the required stationary distribution.

We can construct such a Markov chain that converges if

where,

Page 20: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

REMINDER OF MCMC BASICS

Detailed balance is a convenient, sufficient condition,

and chains that satisfy this are reversible.

Page 21: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

REMINDER OF MCMC BASICS

Convergence is guaranteed...

Page 22: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

REMINDER OF MCMC BASICS

Convergence is guaranteed...

as the number of samples tends to infinity.

Page 23: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

REMINDER OF MCMC BASICS

Convergence is guaranteed...

as the number of samples tends to infinity.

Unfortunately we don’t have an infinite amount of time!

Page 24: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

WHAT CAN GO WRONG?

If the output of the model depends on combination of two parameters, the conditional distributions may be much more

constrained than the marginal distribution.

Page 25: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

WHAT CAN GO WRONG?

If the output of the model depends on combination of two parameters, the conditional distributions may be much more

constrained than the marginal distribution.

Page 26: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

THE MAIN ISSUES

• Global convergence (is our target multimodal?)

• Local mixing (once we’ve found the mode)

• Computational cost (distant moves accepted with high probability)

Page 27: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

MCMC FLAVOURS

“Vanilla” Metropolis-HastingsSlice Sampling

Reversible Jump MCMCAdaptive MCMCParticle MCMC

Metropolis-adjusted Langevin AlgorithmHamiltonian Monte Carlo

Bridge Sampling/Simulated TemperingDifferential Geometric MCMC

Page 28: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Introducing auxiliary variables is a common trick fordeveloping new MCMC algorithms.

• Slice Sampling

• Hamiltonian Monte Carlo

• Bridge Sampling/Simulated Tempering

• Riemannian Manifold HMC

MCMC FLAVOURS

Page 29: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Estimating local covariance structure to improve proposals.

• Adaptive MCMC

• Particle MCMC

MCMC FLAVOURS

Page 30: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

MCMC proposals based on physically inspired dynamics.

• Metropolis-adjusted Langevin Algorithm

• Hamiltonian Monte Carlo

MCMC FLAVOURS

Page 31: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

METROPOLIS-ADJUSTED LANGEVIN DIFFUSION

The following stochastic differential equation defines a Langevin diffusion:

An Euler discretisation gives us the following proposal mechanism

where and is the integration stepsize. We propose

, where

and accept with probability

Page 32: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

DIFFERENTIAL GEOMETRY IN MCMC

Page 33: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

1st order geometry can be useful... but sometimes misleading.

Page 34: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

1st order geometry can be useful... but sometimes misleading.

Page 35: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Instead of estimating local covariance structure,why not calculate it directly?

1st order geometry can be useful... but sometimes misleading.

Page 36: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

SOME INTUITION

Actual distances depend not only on location,but also the geometry at that point.

Page 37: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

• Expected Fisher Information defines a metric (symmetric, bilinear, positive-definite), such that the parameter space can be represented as a Riemannian manifold (Rao, 1945)

• Defines a local basis on a vector space at each point and distance measure at each point on the manifold - link with sensitivity of the statistical model

3.2 An Introduction to Riemannian Geometry

intrinsic qualities rather than the extrinsic parameter-dependent description, which

hints at the power of using a di↵erential geometric approach in statistics.

For now let us focus on some manifold, M , whose points are given by the parameters

of an unnormalised density function, which we can consider as a log-likelihood of some

statistical model given some data. At a particular point ✓, the derivatives of the log-

likelihood are tangent to the manifold and form a basis for the tangent space at ✓,

denoted by T✓M . These tangent basis vectors are simply the score vectors at ✓,

r✓L =

@L

@✓1...

@L

@✓n

�T(3.13)

The tangent space is a linear approximation of the manifold at a given point and it has

the same dimensionality. A natural inner product for this vector space is given by the

covariance of the basis score vectors, since the covariance function satisfies the same

properties, namely symmetry, bilinearity, and positive-definiteness. This inner product

then turns out simply to be the Expected Fisher Information

Gi,j = Cov

✓@L

@✓i,@L

@✓j

◆(3.14)

= Ep(x|✓)

@L

@✓i

T @L

@✓j

!(3.15)

which follows from the fact that the expectation of the score is zero,

Ep(x|✓)

✓@L

@✓i

◆=

Z1

p(x|✓)@

@✓ip(x|✓)p(x|✓)dx (3.16)

=@

@✓i

Zp(x|✓)dx (3.17)

= 0 (3.18)

The Expected Fisher Information can also be expressed in terms of second partial

derivatives, which may be easier to compute for certain problems. This can be obtained

by considering the expectation of the score function,

60

3.2 An Introduction to Riemannian Geometry

intrinsic qualities rather than the extrinsic parameter-dependent description, which

hints at the power of using a di↵erential geometric approach in statistics.

For now let us focus on some manifold, M , whose points are given by the parameters

of an unnormalised density function, which we can consider as a log-likelihood of some

statistical model given some data. At a particular point ✓, the derivatives of the log-

likelihood are tangent to the manifold and form a basis for the tangent space at ✓,

denoted by T✓M . These tangent basis vectors are simply the score vectors at ✓,

r✓L =

@L

@✓1...

@L

@✓n

�T(3.13)

The tangent space is a linear approximation of the manifold at a given point and it has

the same dimensionality. A natural inner product for this vector space is given by the

covariance of the basis score vectors, since the covariance function satisfies the same

properties, namely symmetry, bilinearity, and positive-definiteness. This inner product

then turns out simply to be the Expected Fisher Information

Gi,j = Cov

✓@L

@✓i,@L

@✓j

◆(3.14)

= Ep(x|✓)

@L

@✓i

T @L

@✓j

!(3.15)

which follows from the fact that the expectation of the score is zero,

Ep(x|✓)

✓@L

@✓i

◆=

Z1

p(x|✓)@

@✓ip(x|✓)p(x|✓)dx (3.16)

=@

@✓i

Zp(x|✓)dx (3.17)

= 0 (3.18)

The Expected Fisher Information can also be expressed in terms of second partial

derivatives, which may be easier to compute for certain problems. This can be obtained

by considering the expectation of the score function,

60

Page 38: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

• Expected Fisher Information is equivalent to the covariance of the tangent vectors at a point

• Metric tensors transform using the Jacobian of any reparameterisation - squared distance is invariant!

Page 39: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

MANIFOLD MCMC

• Diffusion processes across the manifold

• Hamiltonian dynamics across the manifold

We can make MCMC proposals based on:

Page 40: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

MANIFOLD MALA

The following stochastic differential equation defines a Langevin diffusion on a Riemannian manifold:

where the natural gradient is denoted by

and the Brownian motion on the manifold is

Page 41: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Discretising gives us the update step for a Markov chain

If we assume a locally constant metric tensor we obtain

which we can compare to a pre-conditioned MALA proposal

We may therefore use proposal and acceptance probability-1

Page 42: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

AN EXAMPLE

Page 43: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

CIRCADIAN RHYTHM MODEL

Note the saturating reaction rates due to the Michaelis-Menten kinetic terms

Page 44: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

CIRCADIAN ODE MODEL

Page 45: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

We can define the log-likelihood with Gaussian error model as

Derivatives of the log-likelihood require the sensitivities

The metric tensor also requires the sensitivities

CALCULATING THE METRIC

Page 46: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

ESTIMATING THE METRIC

• There may be cases where we cannot calculate the metric tensor directly. For example, employing a robust Student-t error model renders the expected Fisher Information intractable

• In such cases we can employ the standard trick of extending the state space, and use a sampling scheme to estimate the metric tensor at each iteration

Page 47: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

In particular, we may define the extended state-space to have a joint distribution

Given the current position we can propose a new state which we accept with probability

This is a reversible transition and we may define to be the likelihood function, such that represents samples of pseudodata, which we can use to obtain an empirical estimate of the expected Fisher Information at each iteration, simply by calculating the covariance of the tangent vectors, since

Page 48: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

Results from the circadian ODE model with a Student-t likelihood for Metropolis-Hastings, MALA and mMALA.

Page 49: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

CONCLUSIONS

• Use of Riemannian geometry is extremely useful in MCMC

• Such algorithms can help us sample efficiently from high dimensional and strongly correlated distributions by following geometric structure of the manifold

Page 50: Riemannian Manifolds and Statistical Modelsevents.csml.ucl.ac.uk/userdata/lunch_talks/2012_11_23_bc.pdf · can be represented as a Riemannian manifold (Rao, 1945) • Defines a local

SOME FURTHER THOUGHTS

• Currently developing software for highly parallelised version of differential geometric MCMC for use on HECToR (UK national supercomputer) - EPSRC grant in collaboration with NAG.