gaussian markov random fields: theory and applications ... · gaussian markov random fields: theory...
TRANSCRIPT
Gaussian Markov Random Fields
Gaussian Markov Random Fields: Theory andApplications
Havard Rue1
1Department of Mathematical SciencesNTNU, Norway
September 26, 2008
Gaussian Markov Random Fields
Part I
Definition and basic properties
Gaussian Markov Random Fields
Outline IIntroduction
Why?
Outline of this courseWhat is a GMRF?The precision matrixDefinition of a GMRFExample: Auto-regressive processWhy are GMRFs important?Main features of GMRFs
Properties of GMRFsInterpretation of elements of QMarkov propertiesConditional densitySpecification through full conditionalsProof via Brook’s lemma
Gaussian Markov Random Fields
Introduction
Why?
Why is it a good idea to learn about Gaussian Markov randomfields (GMRFs)?That is a good question!
What’s there to learn? Isn’t all just a Gaussian?
That is also a good question!
Gaussian Markov Random Fields
Introduction
Why?
Why?
• Gaussians (and most often in the form of a GMRF) areextensively used in statistical models.
• However, its excellent computational properties are not oftenexplored.
• Lack of knowledge of general purpose and near optimalnumerical algorithms
Gaussian Markov Random Fields
Introduction
Why?
Why?
• Fast(er) MCMC based inference
Recent developments (the really nice stuff...)
• Near instant (compare to MCMC) approximate Bayesianinference is often possible
• Deterministic
• Relative error
• In practice: no “error”
Gaussian Markov Random Fields
Introduction
Why?
Longitudinal mixed effects model: Epil-example fromBUGS
Gaussian Markov Random Fields
Introduction
Why?
Longitudinal mixed effects model: Epil-example fromBUGS...
In this example
x = (a0, aBase, aTrt, aBT, aAge, aV4, b1j, bjk)
is a GMRF.
Two (hyper-)parameters
Precision(b1) and Precision(b)
Poisson-observations
Gaussian Markov Random Fields
Outline of this course
Outline of this course: Day I
Lecture 1 GMRFs: definition and basic properties
Lecture 2 Simulation algorithms for GMRFs
Lecture 3 Numerical methods for sparse matrices
Lecture 4 Intrinsic GMRFs
Lecture 5 Hierarchical GMRF models and MCMC algorithmsfor such models
Lecture 6 Case-studies
Gaussian Markov Random Fields
Outline of this course
Outline of this course: Day II
Lecture 1 Motivation for Approximate Bayesian inference
Lecture 2 INLA: Integrated Nested Laplace Approximations
Lecture 3 The inla-program: examples
Lecture 4 Using R-interface to the inla-program (Sara Martino)
Lecture 5 Case studies with R (Sara Martino)
Lecture 6 Discussion
Gaussian Markov Random Fields
Outline of this course
What is a GMRF?
What is a Gaussian Markov random field (GMRF)?
A GMRF is a simple construct
• A normal distributed random vector
x = (x1, . . . , xn)T
• Additional Markov properties:
xi ⊥ xj | x−ij
xi and xj are conditional independent (CI).
Gaussian Markov Random Fields
Outline of this course
The precision matrix
If xi ⊥ xj | x−ij for a set of i , j, then we need to constrain theparametrisation of the GMRF.
• Covariance matrix: difficult
• Precision matrix: easy
Gaussian Markov Random Fields
Outline of this course
The precision matrix
Conditional independence and the precision matrix
The density of a zero mean Gaussian
π(x) ∝ |Q|1/2 exp
(−1
2xT Qx
)Constraining the parametrisation to obey CI properties
Theorem
xi ⊥ xj | x−ij ⇐⇒ Qij = 0
Gaussian Markov Random Fields
Outline of this course
The precision matrix
Use a (undirected) graph G = (V, E) to represent the CI properties,
V Vertices: 1, 2, . . . , n.
E Edges i , j• No edge between i and j if xi ⊥ xj | x−ij .• An edge between i and j if xi 6⊥ xj | x−ij .
Gaussian Markov Random Fields
Outline of this course
Definition of a GMRF
Definition of a GMRF
Definition (GMRF)A random vector x = (x1, . . . , xn)T is called a GMRF wrt thegraph G = (V = 1, . . . , n, E) with mean µ and precision matrixQ > 0, iff its density has the form
π(x) = (2π)−n/2|Q|1/2 exp
(−1
2(x− µ)T Q(x− µ)
)and
Qij 6= 0 ⇐⇒ i , j ∈ E for all i 6= j .
Gaussian Markov Random Fields
Outline of this course
Example: Auto-regressive process
Simple example of a GMRFAuto-regressive process of order 1
xt | xt−1, . . . , x1 ∼ N (φxt−1, 1), t = 2, . . . , n
and x1 ∼ N (0, (1− φ2)−1).
Tridiagonal precision matrix
Q =
1 −φ−φ 1 + φ2 −φ
. . .. . .
. . .
−φ 1 + φ2 −φ−φ 1
Gaussian Markov Random Fields
Outline of this course
Why are GMRFs important?
Usage of GMRFs (I)
Structural time-series analysis
• Autoregressive models.
• Gaussian state-space models.
• Computational algorithms based on the Kalman filter and itsvariants.
Analysis of longitudinal and survival data
• temporal GMRF priors
• state-space approaches
• spatial GMRF priors
used to analyse longitudinal and survival data.
Gaussian Markov Random Fields
Outline of this course
Why are GMRFs important?
Usage of GMRFs (II)
Graphical models
• A key model
• Estimate Q and its (associated) graph from data.
• Often used in a larger context.
Semiparametric regression and splines
• Model a smooth curve in time or a surface in space.
• Intrinsic GMRF models and random walk models
• Discretely observed integrated Wiener processes are GMRFs
• GMRFs models for coefficients in B-splines.
Gaussian Markov Random Fields
Outline of this course
Why are GMRFs important?
Usage of GMRFs (III)
Image analysis
• The first main area for spatial models
• Image restoration using the Wiener filter.
• Texture modelling and texture discrimination.
• Segmentation
• Deformable templates
• Object identification
• 3D reconstruction
• Restoring ultrasound images
Gaussian Markov Random Fields
Outline of this course
Why are GMRFs important?
Usage of GMRFs (IV)
Spatial statistics
• Latent GMRF model analysis of spatial binary data
• Geostatistics using GMRFs
• Analysis of data in social sciences and spatial econometrics
• Spatial and space-time epidemiology
• Environmental statistics
• Inverse problems
Gaussian Markov Random Fields
Outline of this course
Main features of GMRFs
Main features
• Analytical tractable
• Modelling using conditional independence
• Merging GMRFs using conditioning (hierarchical models)
• Unified framework for• understanding• representation• computation using numerical methods for sparse matrices
• Fits nicely into the MCMC world
• Can construct fast and reliable block-MCMC algorithms.
Gaussian Markov Random Fields
Properties of GMRFs
Constraining the parametrisation to obey CI properties
Theorem
xi ⊥ xj | x−ij ⇐⇒ Qij = 0
Gaussian Markov Random Fields
Properties of GMRFs
Proof
Start with the factorisation theorem.
Theorem
x ⊥ y | z ⇐⇒ π(x , y , z) = f (x , z)g(y , z) (1)
for some functions f and g, and for all z with π(z) > 0.
ExampleFor
π(x , y , z) ∝ exp(x + xz + yz)
on some bounded region, we see that x ⊥ y |z . However, this is notthe case for
π(x , y , z) ∝ exp(xyz)
Gaussian Markov Random Fields
Properties of GMRFs
Proof...
π(xi , xj , x−ij) ∝ exp(− 1
2
∑k,l
xkQklxl
)∝ exp
(− 1
2xixj(Qij + Qji )︸ ︷︷ ︸
term 1
−1
2
∑k,l6=i ,j
xkQklxl︸ ︷︷ ︸term 2
).
Term 1 involves xixj iff Qij 6= 0. Term 2 does not involve xixj .
We now see that π(xi , xj , x−ij) = f (xi , x−ij)g(xj , x−ij) for somefunctions f and g , iff Qij = 0.Use the factorisation theorem.
Gaussian Markov Random Fields
Properties of GMRFs
Interpretation of elements of Q
Interpretation of elements of Q
Let x be a GMRF wrt G = (V, E) with mean µ and precisionmatrix Q > 0, then
E(xi | x−i ) = µi − 1
Qii
∑j :j∼i
Qij(xj − µj),
Prec(xi | x−i ) = Qii and
Corr(xi , xj | x−ij) = − Qij√QiiQjj
, i 6= j .
Gaussian Markov Random Fields
Properties of GMRFs
Markov properties
Markov properties
Let x be a GMRF wrt G = (V, E). Then the following areequivalent.The pairwise Markov property:
xi ⊥ xj | x−ij if i , j 6∈ E and i 6= j .
Gaussian Markov Random Fields
Properties of GMRFs
Markov properties
Markov properties
Let x be a GMRF wrt G = (V, E). Then the following areequivalent.The local Markov property:
xi ⊥ x−i ,ne(i) | xne(i) for every i ∈ V.
Gaussian Markov Random Fields
Properties of GMRFs
Markov properties
Markov propertiesLet x be a GMRF wrt G = (V, E). Then the following areequivalent.The global Markov property:
xA ⊥ xB | xC
for all disjoint sets A, B and C where C separates A and B, and Aand B are non-empty.
Gaussian Markov Random Fields
Properties of GMRFs
Conditional density
(Induced) subgraph
Let A ⊂ VGA denote the graph restricted to A.
• remove all nodes not belonging to A, and• all edges where at least one node does not belong to A
ExampleA = 1, 2, then
VA = 1, 2 and EA = 1, 2
Gaussian Markov Random Fields
Properties of GMRFs
Conditional density
Conditional density I
Let V = A ∪ B where A ∩ B = ∅, and
x =
(xA
xB
), µ =
(µA
µB
), Q =
(QAA QAB
QBA QBB
).
Result: xA|xB is then a GMRF wrt the subgraph GA withparameters µA|B and QA|B > 0, where
µA|B = µA −Q−1AAQAB(xB − µB) and QA|B = QAA.
Gaussian Markov Random Fields
Properties of GMRFs
Conditional density
Conditional density II
µA|B = µA −Q−1AAQAB(xB − µB) and QA|B = QAA.
• Have explicit knowledge QA|B as the principal matrix QAA.
• The subgraph GA does not change the structure, only removesnodes and edges in A.
• The conditional mean only depends on nodes in A ∪ ne(A).
• If QAA is sparse, then µA|B is the solution of a sparse linearsystem
QAA(µA|B − µA) = −QAB(xB − µB)
Gaussian Markov Random Fields
Properties of GMRFs
Conditional density
Conditional density III
Gaussian Markov Random Fields
Properties of GMRFs
Conditional density
Conditional density III
Gaussian Markov Random Fields
Properties of GMRFs
Conditional density
Conditional density III
Gaussian Markov Random Fields
Properties of GMRFs
Specification through full conditionals
Specification through full conditionals
An alternative to specifying a GMRF by its mean and precisionmatrix, is to specify it implicitly through the full conditionals
π(xi |x−i ), i = 1, . . . , n
This approach was pioneered by Besag (1974,1975)
Also known as conditional autoregressions or CAR-models.
Gaussian Markov Random Fields
Properties of GMRFs
Specification through full conditionals
Example: AR-process
Specify full conditionals
E (x1 | x−1) = 0.3x2 Prec(x1 | x−1) = 1
E (x3 | x−3) = 0.2x2 + 0.4x4 Prec(x3 | x−3) = 2
and so on.
Gaussian Markov Random Fields
Properties of GMRFs
Specification through full conditionals
Example: Spatial model
Specify full conditionals for each i (assuming zero mean)
E (xi | x−i ) =∑
j
wijxj Prec(xi | x−i ) = κi
Gaussian Markov Random Fields
Properties of GMRFs
Specification through full conditionals
Potential problems
• Even though the “full conditionals” are well defined, are wesure that the joint density exists and is unique?
• What are the compatibility conditions required for a jointGMRF to exists with the prescribed Markov properties.
Gaussian Markov Random Fields
Properties of GMRFs
Specification through full conditionals
In general
Specify the full conditionals as normals with
E(xi | x−i ) = µi −∑j :j∼i
βij(xj − µj) and (2)
Prec(xi | x−i ) = κi > 0 (3)
for i = 1, . . . , n, for some βij , i 6= j, and vectors µ and κ.
Clearly, ∼ is defined implicitly by the nonzero terms of βij.
Gaussian Markov Random Fields
Properties of GMRFs
Specification through full conditionals
E(xi | x−i ) = µi −∑j :j∼i
βij(xj − µj) and Prec(xi | x−i ) = κi > 0
Must be a joint density π(x) with these as the full conditionals.
Comparing term by term with (2):
Qii = κi , and Qij = κiβij
Q is symmetric, i.e.,κiβij = κjβji ,
We have a candidate for a joint density provided Q > 0.
Gaussian Markov Random Fields
Properties of GMRFs
Proof via Brook’s lemma
Lemma (Brook’s lemma)Let π(x) be the density for x ∈ Rn and defineΩ = x ∈ Rn : π(x) > 0. Let x, x′ ∈ Ω, then
π(x)
π(x′)=
n∏i=1
π(xi |x1, . . . , xi−1, x′i+1, . . . , x
′n)
π(x ′i |x1, . . . , xi−1, x ′i+1, . . . , x′n)
=n∏
i=1
π(xi |x ′1, . . . , x ′i−1, xi+1, . . . , xn)
π(x ′i |x ′1, . . . , x ′i−1, xi+1, . . . , xn).
Gaussian Markov Random Fields
Properties of GMRFs
Proof via Brook’s lemma
Assume µ = 0 and fix x′ = 0. Then
logπ(x)
π(0)= −1
2
n∑i=1
κix2i −
n∑i=2
i−1∑j=1
κiβijxixj . (4)
and
logπ(x)
π(0)= −1
2
n∑i=1
κix2i −
n−1∑i=1
n∑j=i+1
κiβijxixj . (5)
Since (4) and (5) must be identical then κiβij = κjβji for i 6= j ,and
log π(x) = const− 1
2
n∑i=1
κix2i −
1
2
∑i 6=j
κiβijxixj ;
hence x is zero mean multivariate normal provided Q > 0.
Gaussian Markov Random Fields
Part II
Simulation algorithms for GMRFs
Gaussian Markov Random Fields
Outline IIntroductionSimulation algorithms for GMRFs
TaskSummary of the simulation algorithms
Basic numerical linear algebraCholesky factorisation and the Cholesky triangleSolving linear equationsAvoid computing the inverse
Unconditional samplingSimulation algorithmEvaluating the log-density
Conditional samplingCanonical parameterisationConditional distribution in the canonical parameterisationSampling from a canonical parameterised GMRF
Gaussian Markov Random Fields
Outline IISampling under hard linear constraints
IntroductionGeneral algorithm (slow)Specific algorithm with not to many constraints (fast)ExampleEvaluating the log-density
Sampling under soft linear constraintsIntroductionSpecific algorithm with not to many constraints (fast)The algorithmEvaluate the log-densityExample
Gaussian Markov Random Fields
Introduction
Sparse precision matrix
Recall that
E (xi − µi | x−i ) = − 1
Qii
∑j∼i
Qij(xi − µj), Prec(xi | x−i ) = Qii
In most cases:
• Total number of neighbours is O(n).
• Only O(n) of the n2 terms in Q will be non-zero.
• Use this to construct exact simulation algorithms for GMRFs,using numerical algorithms for sparse matrices.
Gaussian Markov Random Fields
Introduction
Example of a typical precision matrix
Gaussian Markov Random Fields
Introduction
Example of a typical precision matrix
Each node have on the average 7 neighbours.
Gaussian Markov Random Fields
Introduction
Example of a typical precision matrix
Each node have on the average 7 neighbours.
Gaussian Markov Random Fields
Simulation algorithms for GMRFs
Task
Simulation algorithms for GMRFs
Can we take advantage of the sparse structure of Q?
• It is faster to factorise a sparse Q compared to a dense Q.
• The speedup depends on the “pattern” in Q, not only thenumber of non-zero terms.
Our task
• Formulate all algorithms to use only sparse matrices.
• Unconditional simulation
• Conditional simulation• Condition on a subset of variables• Condition on linear constraints• Condition on linear constraints with normal noise
• Evaluation of the log-density in all cases.
Gaussian Markov Random Fields
Simulation algorithms for GMRFs
Summary of the simulation algorithms
The result
In most cases, the cost is
• O(n) for temporal GMRFs
• O(n3/2) for spatial GMRFs
• O(n2) for spatio-temporal GMRFs
including evaluation of the log-density.
Condition on k linear constraints, add O(k3).
These are general algorithms only depending on the graph G notthe numerical values in Q.
The core is numerical algorithms for sparse matrices.
Gaussian Markov Random Fields
Basic numerical linear algebra
Cholesky factorisation and the Cholesky triangle
Cholesky factorisation
If A > 0 be a n × n positive definite matrix, then there exists aunique Cholesky triangle L, such that L is a lower triangularmatrix, and
A = LLT
Computing L costs n3/3 flops.
This factorisation is the basis for solving systems like
Ax = b or AX = B
for k right hand sides, or equivalently, computing
x = A−1b or X = A−1B
Gaussian Markov Random Fields
Basic numerical linear algebra
Solving linear equations
Algorithm 1 Solving Ax = b where A > 0
1: Compute the Cholesky factorisation, A = LLT
2: Solve Lv = b3: Solve LT x = v4: Return x
Step 2 is called forward-substitution and cost O(n2) flops.
=
The solution v is computed in a forward-loop
vi =1
Lii(bi −
i−1∑j=1
Lijvj), i = 1, . . . , n (6)
Gaussian Markov Random Fields
Basic numerical linear algebra
Solving linear equations
Algorithm 1 Solving Ax = b where A > 0
1: Compute the Cholesky factorisation, A = LLT
2: Solve Lv = b3: Solve LT x = v4: Return x
Step 3 is called back-substitution and costs O(n2) flops.
The solution x is computed in a backward-loop
xi =1
Lii(vi −
n∑j=i+1
Ljixj), i = n, . . . , 1 (7)
Gaussian Markov Random Fields
Basic numerical linear algebra
Avoid computing the inverse
To compute A−1B where B is a n × k matrix, we do this bycomputing the solution X of
AXj = Bj
for each of the k columns of X.
Algorithm 2 Solving AX = B where A > 0
1: Compute the Cholesky factorisation, A = LLT
2: for j = 1 to k do3: Solve Lv = Bj
4: Solve LT Xj = v5: end for6: Return X
Gaussian Markov Random Fields
Unconditional sampling
Simulation algorithm
Sample x ∼ N (µ,Q−1)If Q = LLT and z ∼ N (0, I), then x defined by
LT x = z
has covariance
Cov(x) = Cov(L−T z) = (LLT )−1 = Q−1
Algorithm 3 Sampling x ∼ N (µ,Q−1)
1: Compute the Cholesky factorisation, Q = LLT
2: Sample z ∼ N (0, I)3: Solve LT v = z4: Compute x = µ + v5: Return x
Gaussian Markov Random Fields
Unconditional sampling
Evaluating the log-density
The log-density is
log π(x) = −n
2log 2π +
n∑i=1
log Lii − 1
2(x− µ)T Q(x− µ)︸ ︷︷ ︸
=q
If x is sampled, thenq = zT z
otherwise, compute this term as
• u = x− µ
• v = Qu
• q = uT v
Gaussian Markov Random Fields
Conditional sampling
Canonical parameterisation
The canonical parameterisation
A GMRF x wrt G having a canonical parameterisation (b,Q), hasdensity
π(x) ∝ exp(− 1
2xT Qx + bT x
)(8)
ie, precision matrix Q and mean µ = Q−1b.
Write this asx ∼ NC (b,Q). (9)
The relation to the Gaussian distribution, is that
N (µ,Q−1) = NC (Qµ,Q)
Gaussian Markov Random Fields
Conditional sampling
Conditional distribution in the canonical parameterisation
Conditional simulation of a GMRF
Decompose x as(xA
xB
)∼ N
((µA
µB
),
(QAA QAB
QBA QBB
)−1)
ThenxA | xB
has canonical parameterisation
xA − µA | xB ∼ NC (−QAB(xB − µB),QAA) (10)
Simulate using an algorithm for a canonical parameterisation.
Gaussian Markov Random Fields
Conditional sampling
Sampling from a canonical parameterised GMRF
Sample x ∼ NC (b,Q−1)
Recall that“NC (b,Q) = N (Q−1b,Q−1)′′
so we need to compute the mean as well.
Algorithm 4 Sampling x ∼ NC (b,Q)
1: Compute the Cholesky factorisation, Q = LLT
2: Solve Lw = b3: Solve LTµ = w4: Sample z ∼ N (0, I)5: Solve LT v = z6: Compute x = µ + v7: Return x
Gaussian Markov Random Fields
Sampling under hard linear constraints
Introduction
Sampling from x | Ax = e
• A is a k × n matrix, 0 < k < n, with rank k ,
• e is a vector of length k.
This case occurs quite frequently:
A sum-to-zero constraint corresponds to k = 1, A = 1T
and e = 0.
We will denote this problem sampling under a hard constraint.
Note that π(x | Ax) is singular with rank n − k .
Gaussian Markov Random Fields
Sampling under hard linear constraints
Introduction
The linear constraint makes the conditional distribution Gaussian
• but it is singular as the rank of the constrained covariancematrix is n − k
• more care must be exercised when sampling from thisdistribution.
We have that
E(x | Ax = e) = µ− AQ−1(AQ−1AT )−1(Aµ− e) (11)
Cov(x | Ax = e) = Q−1 −Q−1AT (AQ−1AT )−1AQ−1 (12)
This is typically a dense-matrix case, which must be solved usinggeneral O(n3) algorithms (see next frame for details).
Gaussian Markov Random Fields
Sampling under hard linear constraints
General algorithm (slow)
We can sample from this distribution as follows. As the covariancematrix is singular we compute the eigenvalues and eigenvectors, andwrite the covariance matrix as VΛVT where V have the eigenvectors oneach row and Λ is a diagonal matrix with the eigenvalues on thediagonal. This corresponds to a different factorisation than the Choleskytriangle, but any matrix C which satisfy CCT = Σ, will do. Note that kof the eigenvalues are zero. We can produce a sample by computingv = Cz, where C = VΛ1/2, z ∼ N (0, I), and then add the conditionalmean. We can compute the log-density as
log(π(x|Ax = e)) = −n − k
2log 2π − 1
2
∑i :Λii>0
log Λii
−1
2(x− µ∗)T Σ−(x− µ∗) (13)
where µ∗ is (11), and Σ− = VΛ−VT where (Λ−)ii is Λ−1ii if Λii > 0 and
zero otherwise. In total, this is a quite computational demeaningprocedure, as the algorithm is not able to take advantage of the sparsestructure of Q.
Gaussian Markov Random Fields
Sampling under hard linear constraints
Specific algorithm with not to many constraints (fast)
Conditioning via Kriging
There is an alternative algorithm which correct for the constraints,at nearly no costs if k n.
Let x ∼ N (µ,Q), then compute
x∗ = x−Q−1AT (AQ−1AT )−1(Ax− e). (14)
Now x∗ has the correct conditional distribution!
AQ−1AT is a k × k matrix, hence its factorisation is fast tocompute for small k .
Gaussian Markov Random Fields
Sampling under hard linear constraints
Specific algorithm with not to many constraints (fast)
Algorithm 5 Sampling x | Ax = e when x ∼ N (µ,Q−1)
1: Compute the Cholesky factorisation, Q = LLT
2: Sample z ∼ N (0, I)3: Solve LT v = z4: Compute x = µ + v5: Compute Vn×k = Q−1AT with Algorithm 26: Compute Wk×k = AV7: Compute Uk×n = W−1VT using Algorithm 28: Compute c = Ax− e9: Compute x∗ = x−UT c
10: Return x∗
If z = 0 in Algorithm 5, then x∗ is the conditional mean.Extra cost is only O(k3) for large k!
Gaussian Markov Random Fields
Sampling under hard linear constraints
Example
ExampleLet x1, . . . , xn be independent Gaussian variables with variance σ2
i andmean µi .To sample x conditioned on ∑
xi = 0
we sample firstxi ∼ N (µi , σ
2i ), i = 1, . . . , n
and compute the constrained sample x∗, as
x∗i = xi − c σ2i (15)
where
c =
∑j xj∑j σ
2j
Gaussian Markov Random Fields
Sampling under hard linear constraints
Evaluating the log-density
The log-density can be rapidly computed using the followingidentity,
π(x | Ax) =π(x)π(Ax | x)
π(Ax), (16)
We can compute each term on the rhs easier than the lhs.
π(x)-term. This is a GMRF and the log-density is easy tocompute using L computed in Algorithm 5 step 1.
Gaussian Markov Random Fields
Sampling under hard linear constraints
Evaluating the log-density
The log-density can be rapidly computed using the followingidentity,
π(x | Ax) =π(x)π(Ax | x)
π(Ax), (16)
We can compute each term on the rhs easier than the lhs.
π(Ax | x)-term. This is a degenerate density, which is either zeroor a constant,
log π(Ax | x) = −1
2log |AAT | (17)
The determinant of a k × k matrix can be found itsCholesky factorisation.
Gaussian Markov Random Fields
Sampling under hard linear constraints
Evaluating the log-density
The log-density can be rapidly computed using the followingidentity,
π(x | Ax) =π(x)π(Ax | x)
π(Ax), (17)
We can compute each term on the rhs easier than the lhs.
π(Ax)-term. Ax is Gaussian with mean Aµ and covariance matrixAQ−1AT with Cholesky triangle L available fromAlgorithm 5 step 7.
Gaussian Markov Random Fields
Sampling under soft linear constraints
Introduction
Sampling from x | Ax = ε
Let x be a GMRF which is observed by e, wheree | x ∼ N (Ax,Σε).
• e is a vector of length k < n
• A a k × n matrix rank k
• Σε > 0 is the covariance matrix for the noise.
The conditional for x, is
log π(x | e).
= −1
2(x−µ)T Q(x−µ)−1
2(e−Ax)T Σ−1
ε (e−Ax) (18)
x | e ∼ NC (Qµ + AT Σ−1ε e,Q + AT Σ−1
ε A) (19)
This precision matrix is most often a full matrix and the nicesparse structure of Q is lost.
Gaussian Markov Random Fields
Sampling under soft linear constraints
Introduction
ExampleObserve the sum of xi with unit variance noise, the posteriorprecision is
Q + 11T
which is a dense matrix.
In general, we have to sample from (18) using a general algorithmwhich is computational expensive for large n.
Gaussian Markov Random Fields
Sampling under soft linear constraints
Specific algorithm with not to many constraints (fast)
Alternative approach
Similar problem to sampling under a hard constraint Ax = e, butnow “e” is stochastic
Ax = ε where ε ∼ N (e,Σε). (20)
We denote this case sampling under a soft constraint.
If we extend (14) to
x∗ = x−Q−1AT (AQ−1AT + Σε)−1(Ax− ε), (21)
where ε and x are samples from their respective distributions,
....then x∗ has the correct conditional distribution.
Gaussian Markov Random Fields
Sampling under soft linear constraints
The algorithm
Algorithm 6 Sampling x | Ax = ε when ε ∼ N (e, Σε) and x ∼ N (µ, Q−1)
1: Compute the Cholesky factorisation, Q = LLT
2: Sample z ∼ N (0, I)3: Solve LT v = z4: Compute x = µ + v5: Compute Vn×k = Q−1AT using Algorithm 2 using L6: Compute Wk×k = AV + Σε
7: Compute Uk×n = W−1VT using Algorithm 28: Sample ε ∼ N (e,W) using the factorisation from step 79: Compute c = Ax− ε
10: Compute x∗ = x−UT c11: Return x∗
When z = 0 and ε = e then x∗ is the conditional mean.
Gaussian Markov Random Fields
Sampling under soft linear constraints
Evaluate the log-density
The log-density is computed via
π(x | e) =π(x)π(e | x)
π(e)(22)
All Cholesky triangles required are available from the simulationalgorithm.
π(x)-term. This is a GMRF.
π(e | x)-term. e | x is Gaussian with mean Ax and covariance Σε.
π(e)-term. e is Gaussian with mean Aµ and covariance matrixAQ−1AT + Σε.
Gaussian Markov Random Fields
Sampling under soft linear constraints
Example
Example
Let x1, . . . , xn be independent Gaussian variables with variance σ2i and
mean µi . We now observe
e ∼ N (∑
xi , σ2ε )
To sample from π(x | e) we sample first
xi ∼ N (µi , σ2i ), i = 1, . . . , n
and thenε ∼ N (e, σ2
ε )
A conditional sample x∗, is then
x∗i = xi − c σ2i , where c =
∑j xj − ε∑
j σ2j + σ2
ε
Gaussian Markov Random Fields
Part III
Numerical methods for sparse matrices
Gaussian Markov Random Fields
Outline IIntroductionCholesky factorisationThe Cholesky triangle
InterpretationThe zero-pattern in LExample: A simple graphExample: auto-regressive processes
Band matricesBandwidth is preservedCholesky factorisation for band-matrices
Reordering schemesIntroductionReordering to band-matricesReordering using the idea of nested dissection
What to do with sparse matrices?
Gaussian Markov Random Fields
Outline IIA numerical case-study
IntroductionGMRF-models in timeGMRF-models in spaceGMRF-models in time×space
Gaussian Markov Random Fields
Introduction
Numerical methods for sparse matrices
Have shown that computations on GMRFs can be expressed suchthat the main tasks are
1. compute the Cholesky factorisation of Q = LLT , and
2. solve Lv = b and LT x = z.
The second task is much-faster than the first, but sparsity will beof advantage also here.
Gaussian Markov Random Fields
Introduction
The goal is to explain
• why a sparse Q allow for fast factorisation
• how we can take advantage of it,
• why we gain if we permute the vertics before factorising thematrix,
• how statisticians can benefit for recent research in this areaby the numerical mathematicians.
At the end we present a small case study factorising some typicalmatrices for GMRFs, using a classical and more recent methods forfactorising matrices.
Gaussian Markov Random Fields
Cholesky factorisation
How to compute the Cholesky factorisation
Qij =
j∑k=1
LikLjk , i ≥ j .
vi = Qij −j−1∑k=1
LikLjk , i ≥ j ,
Then
• L2jj = vj , and
• LijLjj = vi for i > j .
If we know vi for fixed j , then
Ljj =√
vj and Lij = vi/√
vj , for i = j + 1, . . . , n.
This gives the jth column in L.
Gaussian Markov Random Fields
Cholesky factorisation
Cholesky factorization of Q > 0
Algorithm 7 Computing the Cholesky triangle L of Q
1: for j = 1 to n do2: vj :n = Qj :n,j
3: for k = 1 to j − 1 do vj :n = vj :n − Lj :n,kLjk
4: Lj :n,j = vj :n/√
vj
5: end for6: Return L
The overall process involves n3/3 flops.
Gaussian Markov Random Fields
The Cholesky triangle
Interpretation
Interpretation of L (I)
Let Q = LLT , then the solution of
LT x = z where z ∼ N (0, I)
is N (0,Q−1) distributed.
=
Since L is lower triangular then
xn =1
Lnnzn
xn−1 =1
Ln−1,n−1(zn−1 − Ln,n−1xn)
. . .
Gaussian Markov Random Fields
The Cholesky triangle
Interpretation
Interpretation of L (II)
TheoremLet x be a GMRF wrt to the labelled graph G, with mean µ andprecision matrix Q > 0. Let L be the Cholesky triangle of Q. Thenfor i ∈ V,
E(xi | x(i+1):n) = µi − 1
Lii
n∑j=i+1
Lji (xj − µj) and
Prec(xi | x(i+1):n) = L2ii .
Gaussian Markov Random Fields
The Cholesky triangle
The zero-pattern in L
Determine the zero-pattern in L (I)
TheoremLet x be a GMRF wrt G, with mean µ and precision matrix Q > 0.Let L be the Cholesky triangle of Q and define for 1 ≤ i < j ≤ nthe set
F (i , j) = i + 1, . . . , j − 1, j + 1, . . . , n,which is the future of i except j. Then
xi ⊥ xj | xF (i ,j) ⇐⇒ Lji = 0.
If we can verify that Lji is zero, we do not have to compute itwhen factorising Q
Gaussian Markov Random Fields
The Cholesky triangle
The zero-pattern in L
Proof
Assume µ = 0 and fix 1 ≤ i < j ≤ n. Theorem 11 gives that
π(xi :n) ∝ exp
−1
2
n∑k=i
L2kk
xk +1
Lkk
n∑j=k+1
Ljkxj
2= exp
(−1
2xTi :nQ(i :n)xi :n
),
where Q(i :n)ij = LiiLji . Then
xi ⊥ xj | xF (i ,j) ⇐⇒ LiiLji = 0,
which is equivalent to Lji = 0 since Lii > 0 as Q(i :n) > 0.
Gaussian Markov Random Fields
The Cholesky triangle
The zero-pattern in L
Determine the zero-pattern in L (II)
The global Markov property provide a simple and sufficient criteriafor checking if Lji = 0.
CorollaryIf F (i , j) separates i < j in G, then Lji = 0.
CorollaryIf i ∼ j then F (i , j) does not separates i < j .
The idea is simple
• Use the global Markov property to check if Lji = 0.
• Compute only the non-zero terms in L, so that Q = LLT .
Gaussian Markov Random Fields
The Cholesky triangle
Example: A simple graph
Example
Q =
× × ×× × ×× × ×× × ×
L =
×× ×× ? ×? × × ×
Gaussian Markov Random Fields
The Cholesky triangle
Example: A simple graph
Example
Q =
× × ×× × ×× × ×× × ×
L =
×× ×× √ ×× × ×
Gaussian Markov Random Fields
The Cholesky triangle
Example: auto-regressive processes
Example: AR(1)-process
xt | x1:(t−1) ∼ N (φxt−1, σ2), t = 1, . . . , n
Q =
× ×× × ×× × ×× × ×× × ×× × ×× ×
L =
×× ×× ×× ×× ×× ×× ×
Gaussian Markov Random Fields
Band matrices
Bandwidth is preserved
Bandwidth is preserved
Similarly, for an AR(p)-process
• Q have bandwidth p.
• L have lower-bandwidth p.
TheoremLet Q > 0 be a band matrix with bandwidth p and dimension n,then the Cholesky triangle of Q has (lower) bandwidth p.
=
...easy to modify existing Cholesky-factorisation code to use onlyentries where |i − j | ≤ p.
Gaussian Markov Random Fields
Band matrices
Cholesky factorisation for band-matrices
Avoid computing Lij and reading Qij for |i − j | > p.
Algorithm 8 Band-Cholesky factorization of Q with bandwidth p
1: for j = 1 to n do2: λ = minj + p, n3: vj :λ = Qj :λ,j
4: for k = max1, j − p to j − 1 do5: i = mink + p, n6: vj :i = vj :i − Lj :i ,kLjk
7: end for8: Lj :λ,j = vj :λ/
√vj
9: end for10: Return L
Cost is now n(p2 + 3p) flops assuming n p.
Gaussian Markov Random Fields
Reordering schemes
Introduction
Reorder the vertices
We can permute the vertexes;
select one of the n! possible permutations, define thecorresponding permutation matrix P, such that iP = Pi,where i = (1, . . . , n)T , is the new ordering of thevertexes.
Chose P, if possible, such that
QP = PQPT (23)
is a band-matrix with a small bandwidth.
Gaussian Markov Random Fields
Reordering schemes
Introduction
• Impossible in general to obtain the optimal permutation, n! isto large!
• A sub-optimal ordering will do as well.
• Solve Qµ = b as follows:• bP = Pb.• Solve QPµP = bP
• Map the solution back, µ = PTµP .
Gaussian Markov Random Fields
Reordering schemes
Reordering to band-matrices
Reordering to band-matrices
Gaussian Markov Random Fields
Reordering schemes
Reordering to band-matrices
Reordering to band-matrices
Gaussian Markov Random Fields
Reordering schemes
Reordering to band-matrices
Reordering to band-matrices
• factorisation of QP required about 0.0018 seconds
• Solving Qµ = b required about 0.0006 seconds.
• on a 1 200MHz laptop.
Gaussian Markov Random Fields
Reordering schemes
Reordering using the idea of nested dissection
More optimal reordering schemes
× × × × × ×× ×× ×× ×× ×× ×
× ×× ×× ×× ×× ×
× × × × × ×
×× ×× √ ×× √ √ ×× √ √ √ ×× √ √ √ √ ×
×××××
× × × × × ×
Gaussian Markov Random Fields
Reordering schemes
Reordering using the idea of nested dissection
Nested dissection reordering (I)
The idea generalise as follows.
• Select a (small) set of nodes whose removal divides the graphinto two disconnected subgraphs of almost equal size.
• Order the nodes chosen after ordering all the nodes in bothsubgraphs.
• Apply this procedure recursively to the nodes in eachsubgraph.
Costs in the spatial case
• Factorisation O(n3/2)
• Fill-in O(n log n)
• Optimal in the order sense.
Gaussian Markov Random Fields
Reordering schemes
Reordering using the idea of nested dissection
Nested dissection reordering (II)
Gaussian Markov Random Fields
What to do with sparse matrices?
Statisticians and numerical methods for sparse matrices
Gupta (2002) summarises his findings on recent advances forsparse linear solvers:
. . . recent sparse solvers have significantly improved thestate of the art of the direct solution of general sparsesystems.. . . recent years have seen some remarkable advances inthe general sparse direct-solver algorithms and software.
• Good news
• We just use their software
Gaussian Markov Random Fields
A numerical case-study
Introduction
A numerical case-study of typical GMRFs
Divide GMRF models into three categories.
1. GMRF models in time or on a line; auto-regressive models andmodels for smooth functions.Neighbours to xt are then those xs such that |s − t| ≤ p.
2. Spatial GMRF models; regular lattice, or irregular latticeinduced by a tessellation or regions of land.Neighbours to xi (spatial index i), are those j spatially “close”to i , where “close” is defined from its context.
3. Spatio-temporal GMRF models. Often an extension of spatialmodels to also include dynamic changes.
Gaussian Markov Random Fields
A numerical case-study
Introduction
Also include some “global” nodes:
Let x be a GMRF with a common mean µ
x|µ ∼ N (µ1,Q−1) (24)
Assume µ ∼ N (0, σ2)
...then (x, µ) is also a GMRF where the node µ is neighbour withall xi ’s.
Gaussian Markov Random Fields
A numerical case-study
Introduction
Two different algorithms
1. The band-Cholesky factorisation (BCF) as in Algorithm 8.Here we use the LAPACK-routines DPBTRF and DTBSV, for thefactorisation and the forward/back-substitution, respectively,and the Gibbs-Poole-Stockmeyer algorithm for bandwidthreduction.
2. The Multifrontal Supernodal Cholesky factorisation (MSCF)implementation in the library TAUCS using the nesteddissection reordering from the library METIS.
Both solvers are available in the GMRFLib-library
Gaussian Markov Random Fields
A numerical case-study
Introduction
The tasks we want to investigate are
1. factorising Q into LLT , and
2. solving LLTµ = b.
Producing a random sample from the GMRF, is half the cost ofsolving the linear system in step 2.
All tests reported here, we conducted on a 1 200MHz laptop with512Mb memory running Linux.
New machines are nearly 3− 10 times as fast...
Gaussian Markov Random Fields
A numerical case-study
GMRF-models in time
GMRF models in time
Let Q be a band-matrix with bandwidth p and dimension n.For such a problem, using BCF will be (theoretically) optimal, asthe fillin will be be zero.
n = 103 n = 104 n = 105
CPU-time p = 5 p = 25 p = 5 p = 25 p = 5 p = 25
Factorise 0.0005 0.0019 0.0044 0.0271 0.0443 0.2705Solve 0.0000 0.0004 0.0031 0.0109 0.0509 0.1052
“Long and thin” are fast!The MSCF is less optimal for band-matrices: fillin and morecomplicated data-structures.
Gaussian Markov Random Fields
A numerical case-study
GMRF-models in time
Add 10 global nodes:
n = 103 n = 104 n = 105
CPU-time p = 5 p = 25 p = 5 p = 25 p = 5 p = 25
Factorise 0.0119 0.0335 0.1394 0.4085 1.6396 4.1679Solve 0.0007 0.0035 0.0138 0.0306 0.1541 0.3078
• The fillin ≈ pn.
• The nested dissection ordering give good results in all casesconsidered so far.
Gaussian Markov Random Fields
A numerical case-study
GMRF-models in space
Spatial GMRF models
Regular lattice
Gaussian Markov Random Fields
A numerical case-study
GMRF-models in space
Spatial GMRF models
Irregular lattice
Gaussian Markov Random Fields
A numerical case-study
GMRF-models in space
n = 1002 n = 1502 n = 2002
CPU-time Method 3× 3 5× 5 3× 3 5× 5 3× 3 5× 5
Factorise BCF 0.51 1.02 2.60 4.93 13.30 38.12MSCF 0.17 0.62 0.55 1.92 1.91 4.90
Solve BCF 0.03 0.05 0.10 0.16 0.24 0.43MSCF 0.01 0.04 0.04 0.11 0.08 0.21
• For the largest lattice the MSCF really outperform the BCF.
• The reason is the O(n3/2) cost for MSCF compared to O(n2)for the BCF.
Gaussian Markov Random Fields
A numerical case-study
GMRF-models in time×space
Spatio-temporal GMRF models
Spatio-temporal GMRF models is often an extension of spatialGMRF models to account to time variation.
time
Use this model and the graph of Germany, for T = 10 andT = 100.
Gaussian Markov Random Fields
A numerical case-study
GMRF-models in time×space
10 global nodesCPU-time T = 10 T = 100 T = 10 T = 100
Factorise 0.25 39.96 0.31 39.22Solve 0.02 0.42 0.02 0.42
• The results shows a quite heavy dependency on T .
• Spatio-Temporal GMRFs are more demanding than spatialones, O(n2) for a n1/3 × n1/3 × n1/3-cube.
Gaussian Markov Random Fields
Part IV
Intrinsic GMRFs
Gaussian Markov Random Fields
Outline IIntroductionDefinition of IGMRFs
Improper GMRFs
IGMRFs of first orderForward differencesOn the line with regular locationsOn the line with irregular locationsWhy IGMRFs are useful in applicationsIGMRFs of first order on irregular latticesIGMRFs of first order on regular lattices
IGMRFs of second orderRW2 model for regular locations
The CRWk model for regular and irregular locationsIGMRFs of general order on regular latticesIGMRFs of second order on regular lattices
Gaussian Markov Random Fields
Introduction
Intrinsic GMRFs (IGMRF)
• IGMRFs are improper, i.e., they have precision matrices not offull rank.
• Often used as prior distributions in various applications.
• Of particular importance are IGMRFs that are invariant to anytrend that is a polynomial of the locations of the nodes up toa specific order.
Gaussian Markov Random Fields
Definition of IGMRFs
Improper GMRFs
Improper GMRF
DefinitionLet Q be an n × n SPSD matrix with rank n − k > 0. Thenx = (x1, . . . , xn)T is an improper GMRF of rank n − k withparameters (µ,Q), if its density is
π(x) = (2π)−(n−k)
2 (|Q|∗)1/2 exp
(−1
2(x− µ)T Q(x− µ)
). (25)
Further, x is an improper GMRF wrt to the labelled graphG = (V, E), where
Qij 6= 0⇐⇒ i , j ∈ E for all i 6= j .
| · |∗ denote the generalised determinant: product of the non-zeroeigenvalues.
Gaussian Markov Random Fields
Definition of IGMRFs
Improper GMRFs
Markov properties
The Markov properties are interpreted as those obtained from thelimit of a proper density.
Let the columns of AT span the null space of Q
Q(γ) = Q + γAT A. (26)
Q(γ) tends to the corresponding one in Q as γ → 0.
E(xi | x−i ) = µi − 1
Qii
∑j∼i
Qij(xj − µj)
is interpreted as γ → 0.
Gaussian Markov Random Fields
IGMRFs of first order
IGMRF of first order
An intrinsic GMRF of first order is an improper GMRF of rankn − 1 where Q1 = 0.
The condition Q1 = 0 means that∑j
Qij = 0, i = 1, . . . , n
Gaussian Markov Random Fields
IGMRFs of first order
Let µ = 0 so
E(xi | x−i ) = − 1
Qii
∑j :j∼i
Qijxj (27)
−∑j :j∼i
Qij/Qii = 1
• The conditional mean of xi is a weighted mean of itsneighbours.
• No shrinking towards an overall level.
• Many IGMRFs are constructed such that the deviation fromthe overall level is a smooth curve in time or a smooth surfacein space.
Gaussian Markov Random Fields
IGMRFs of first order
Forward differences
Definition (Forward difference)Define the first-order forward difference of a function f (·) as
∆f (z) = f (z + 1)− f (z).
Higher-order forward differences are defined recursively:
∆k f (z) = ∆∆k−1f (z)
so∆2f (z) = f (z + 2)− 2f (z + 1) + f (z) (28)
and in general for k = 1, 2, . . .,
∆k f (z) = (−1)kk∑
j=0
(−1)j
(k
j
)f (z + j).
Gaussian Markov Random Fields
IGMRFs of first order
Forward differences
For a vector z = (z1, z2, . . . , zn)T , ∆z has elements
∆zi = zi+1 − zi , i = 1, . . . , n − 1
The forward difference of kth order as an approximation to the kthderivative of f (z),
f ′(z) = limh→0
f (z + h)− f (z)
h
Gaussian Markov Random Fields
IGMRFs of first order
On the line with regular locations
IGMRFs of first order on the line
Location of the node i is i (think “time”).
Assume independent increments
∆xiiid∼ N (0, κ−1), i = 1, . . . , n − 1, so (29)
xj − xi ∼ N (0, (j − i)κ−1) for i < j . (30)
If the intersection between i , . . . , j and k , . . . , l is empty fori < j and k < l , then
Cov(xj − xi , xl − xk) = 0. (31)
Properties coincide with those of a Wiener process.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with regular locations
The density for x is
π(x | κ) ∝ κ(n−1)/2 exp
(−κ
2
n−1∑i=1
(∆xi )2
)(32)
= κ(n−1)/2 exp
(−κ
2
n−1∑i=1
(xi+1 − xi )2
), (33)
orπ(x) ∝ κ(n−1)/2 exp
(−κ
2xT Rx
)(34)
Gaussian Markov Random Fields
IGMRFs of first order
On the line with regular locations
The structure matrix
R =
1 −1−1 2 −1
−1 2 −1. . .
. . .. . .
−1 2 −1−1 2 −1
−1 1
. (35)
• The n − 1 independent increments ensure that the rank of Qis n − 1.
• Denote this model by RW1(κ) or short RW1.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with regular locations
Full conditionals
xi | x−i , κ ∼ N (1
2(xi−1 + xi+1), 1/(2κ)), 1 < i < n, (36)
Alternative interpretation:
• fit a first order polynomial
p(j) = β0 + β1j , (37)
locally through the points (i − 1, xi−1) and (i + 1, xi+1).
• The conditional mean is p(i).
Gaussian Markov Random Fields
IGMRFs of first order
On the line with regular locations
Example
0 20 40 60 80 100
−10
−50
510
Samples with n = 99 and κ = 1 by conditioning on the constraint∑xi = 0.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with regular locations
Example
Marginal variances Var(xi ) for i = 1, . . . , n.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with regular locations
Example
Corr(xn/2, xi ) for i = 1, . . . , n
Gaussian Markov Random Fields
IGMRFs of first order
On the line with irregular locations
The first order RW for irregular locations
Location of xi is si but si+1 − si is not constant.Assume
s1 < s2 < · · · < sn
and letδi
def= si+1 − si . (38)
Gaussian Markov Random Fields
IGMRFs of first order
On the line with irregular locations
Wiener process
Consider xi as the realisation of an integrated Brownian motion incontinuous time, i.e. a Wiener process W (t), at time si .
Definition (Wiener process)
• A Wiener process with precision κ is a continuous-timestochastic process W (t) for t ≥ 0 with W (0) = 0 and suchthat the increments W (t)−W (s) are Gaussian with mean 0and variance (t − s)/κ for any 0 ≤ s < t.
• Furthermore, increments for non overlapping time intervals areindependent.
• For κ = 1, this process is called a standard Wiener process.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with irregular locations
Brownian bridge
The full conditional is in this case known as the the Brownianbridge:
E(xi | x−i , κ) =δi
δi−1 + δixi−1 +
δi−1
δi−1 + δixi+1 (39)
and
Prec(xi | x−i , κ) = κ
(1
δi−1+
1
δi
). (40)
κ is a precision parameter.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with irregular locations
The precision matrix is now
Qij = κ
1δi−1
+ 1δi
j = i
− 1δi
j = i + 1
0 otherwise
(41)
for 1 < i < n.
A proper correction at the boundary gives the remaining diagonalterms Q11 = κ/δ1, Qnn = κ/δn−1.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with irregular locations
The joint density is
π(x | κ) ∝ κ(n−1)/2 exp
(−κ
2
n−1∑i=1
(xi+1 − xi )2/δi
), (42)
and is invariant to the addition of a constant.
Gaussian Markov Random Fields
IGMRFs of first order
On the line with irregular locations
• The interpretation of a RW1 model as a discretely observedWiener-process, justifies the corrections needed.
• The underlying model is the same, it is only observeddifferently.
0 20 40 60 80 100
−10
−50
5
Gaussian Markov Random Fields
IGMRFs of first order
Why IGMRFs are useful in applications
When the mean is only locally constant
Alternative IGMRF
π(x) ∝ κ(n−1)/2 exp
(−κ
2
n∑i=1
(xi − x)2
)(43)
where x is the empirical mean of x.
Gaussian Markov Random Fields
IGMRFs of first order
Why IGMRFs are useful in applications
Assume n is even and
xi =
0, 1 ≤ i ≤ n/2
1, n/2 < i ≤ n. (44)
Thus x is locally constant with two levels.
Evaluate the density at this configuration under the RW1 modeland the alternative (43), we obtain
κ(n−1)/2 exp(−κ
2
)and κ(n−1)/2 exp
(−n
κ
8
)(45)
The log ratio of the densities is then of order O(n).
Gaussian Markov Random Fields
IGMRFs of first order
Why IGMRFs are useful in applications
• The RW1 model only penalises the local deviation from aconstant level.
• The alternative penalises the global deviation from a constantlevel.
This local behaviour is advantageous in applications if the meanlevel of x is approximately or locally constant.
A similar argument also apply for polynomial IGMRFs of higherorder constructed using forward differences of order k asindependent Gaussian increments.
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
First order IGMRFs on irregular latticesConsider the map of the 544 regions in Germany.Two regions are neighbours if they share a common border.
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
Between neighbouring regions i and j , say, we define a“independent” Gaussian increment
xi − xj ∼ N (0, κ−1) (46)
Assume independent increments yields
π(x) ∝ κ(n−1)/2 exp
−κ2
∑i∼j
(xi − xj)2
. (47)
“i ∼ j” denotes the set of all unordered pairs of neighbours.
Number of increments |i ∼ j | is larger than n, but the rank of thecorresponding precision matrix is still n − 1.
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
There are hidden constraints in the increments due to the morecomplicated geometry on a lattice than on the line.
ExampleLet n = 3 where all nodes are neighbours. Then x1 − x2 = ε1,x2 − x3 = ε2, and x3 − x1 = ε3, where ε1, ε2 and ε3 are theincrements.
This implies thatε1 + ε2 + ε3 = 0
which is the ‘hidden’ linear constraint.
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
Let ni denote the number of neighbours of region i .The precision matrix Q is
Qij = κ
ni i = j ,
−1 i ∼ j ,
0 otherwise,
(48)
xi | x−i , κ ∼ N (1
ni
∑j :j∼i
xj ,1
niκ). (49)
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
Example
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
• In general there is no longer an underlying continuousstochastic process that we can relate to this density.
• If we change the spatial resolution or split a region into twonew ones, we change the model.
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
Weighted variants
Incorporate symmetric weights wij for each pair of adjacent nodes iand j .
For example, wij = 1/d(i , j)
Assuming independent increments
xi − xj ∼ N (0, 1/(wijκ)), (50)
then
π(x) ∝ κ(n−1)/2 exp
−κ2
∑i∼j
wij(xi − xj)2
. (51)
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on irregular lattices
The precision matrix is
Qij = κ
∑
k:k∼i wik i = j ,
−1/wij i ∼ j ,
0 otherwise,
(52)
and with mean and precision∑j :j∼i xjwij∑j :j∼i wij
and κ∑j :j∼i
wij , (53)
respectively.
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on regular lattices
First order IGMRFs on regular latticesFor a lattice IN with n = n1n2 nodes, let i = (i1, i2) denote thenode in the i1th row and i2th column.
Use the nearest four sites of i as its neighbours
(i1 + 1, i2), (i1 − 1, i2), (i1, i2 + 1), (i1, i2 − 1).
The precision matrix is
Qij = κ
ni i = j ,
−1 i ∼ j ,
0 otherwise,
(54)
and the full conditionals for xi are
xi | x−i , κ ∼ N (1
ni
∑j :j∼i
xj ,1
niκ). (55)
Gaussian Markov Random Fields
IGMRFs of first order
IGMRFs of first order on regular lattices
Limiting behaviour
This model have an important property
The process converge to the de Wijs-process: a Gaussianprocess with variogram
log(distance)
This is important
• there is a limiting process
• The de Wijs process has dense precision matrix whereas theIGMRF is very sparse!
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
The RW2 model for regular locations
Let si = i for i = 1, . . . , n, with a constant distance betweenconsecutive nodes.
Use the second order increments
∆2xiiid∼ N (0, κ−1) (56)
for i = 1, . . . , n − 2, to define the joint density of x
π(x) ∝ κ(n−2)/2 exp
(−κ
2
n−2∑i=1
(xi − 2xi+1 + xi+2)2
)(57)
= κ(n−2)/2 exp(−κ
2xT Rx
)(58)
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
R =
1 −2 1−2 5 −4 11 −4 6 −4 1
1 −4 6 −4 1. . .
. . .. . .
. . .. . .
1 −4 6 −4 11 −4 6 −4 1
1 −4 5 −21 −2 1
. (59)
• Verify directly that QS1 = 0 and that the rank of Q is n − 2.
• IGMRF of second order: invariant to the adding line to x.
• Known as the second order random walk model, denoted byRW2(κ) or simply RW2 model.
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
Remarks
• This is the RW2 model defined and used in the literature.
• We cannot extend it consistently to the case where thelocations are irregular.
• Similar problems occur if we increase the resolution from n to2n locations, say.
• This is in contrast to the RW1 model.
• There exists an alternative (somewhat more involved)formulation with the desired continuous time interpretation.
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
The conditional mean and precision is
E(xi | x−i , κ) =4
6(xi+1 + xi−1)− 1
6(xi+2 + xi−2), (60)
Prec(xi | x−i , κ) = 6κ, (61)
respectively for 2 < i < n − 2.
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
Consider the second order polynomial
p(j) = β0 + β1j +1
2β2j2 (62)
and compute the coefficients by a local least squares fit to thepoints
(i − 2, xi−2), (i − 1, xi−1), (i + 1, xi+1), (i + 2, xi+2). (63)
Turns out thatE(xi | x−i , κ) = p(i)
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
Samples with n = 99 and κ = 1 by conditioning on the constraints.
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
Marginal variances Var(xi ) for i = 1, . . . , n.
Gaussian Markov Random Fields
IGMRFs of second order
RW2 model for regular locations
Corr(xn/2, xi ) for i = 1, . . . , n
Gaussian Markov Random Fields
The CRWk model for regular and irregular locations
Random walk models: irregular locations
• Locations s1 < s2 < . . . < sn.
• Discretely observed (integrated) Wiener process
• Augment with the velocities
• Valid for any order
• Connection to spline-models
Gaussian Markov Random Fields
The CRWk model for regular and irregular locations
Random walk models: irregular locations
• Locations s1 < s2 < . . . < sn.
• Discretely observed (integrated) Wiener process
• Augment with the velocities
• Valid for any order
• Connection to spline-models
A1 B1
BT1 A2 + C1 B2
BT2 A3 + C2 B3
. . .. . .
. . .
Bn−1
BTn−1 Cn−1
.
Gaussian Markov Random Fields
The CRWk model for regular and irregular locations
Random walk models: irregular locations
• Locations s1 < s2 < . . . < sn.
• Discretely observed (integrated) Wiener process
• Augment with the velocities
• Valid for any order
• Connection to spline-models
Ai =
(12/δ3
i 6/δ2i
6/δ2i 4/δi
)Bi =
(−12/δ3i 6/δ2
i
−6/δ2i 2/δi
)Ci =
(12/δ3
i −6/δ2i
−6/δ2i 4/δi
)where
δi = si+1 − si .
Gaussian Markov Random Fields
IGMRFs of general order on regular lattices
IGMRFs of higher order on regular lattices
The precision matrix is orthogonal to a polynomial design matrix ofa certain degree.
Definition (IGMRFs of order k in dimension d)An IGMRF of order k in dimension d , is an improper GMRF ofrank n −mk−1,d where QSk−1,d = 0.
Gaussian Markov Random Fields
IGMRFs of second order on regular lattices
A second order IGMRF in two dimensions
Consider a regular lattice IN in d = 2 dimensions where si1 = i1and si2 = i2.
Choose the independent increments(x(i1+1,i2) + x(i1−1,i2) + x(i1,i2+1) + x(i1,i2−1)
)− 4xi1,i2 (64)
The motivation for this choice is that (64) is(∆2
i1 + ∆2i2
)xi1−1,i2−1 (65)
Invariant to adding a first order polynomial
p1,2(i1, i2) = β00 + β10i1 + β01i2,
Gaussian Markov Random Fields
IGMRFs of second order on regular lattices
The precision matrix (apart from boundary effects) should havenon-zero elements
− (∆2i1 + ∆2
i2
)2= − (∆4
i1 + 2∆2i1∆2
i2 + ∆4i2
)(66)
which is a negative difference approximation to the biharmonicdifferential operator(
∂2
∂x2+
∂2
∂y 2
)2
=∂4
∂x4+ 2
∂4
∂x2∂y 2+
∂4
∂y 4. (67)
The fundamental solution of the biharmonic equation(∂4
∂x4+ 2
∂4
∂x2∂y 2+
∂4
∂y 4
)φ(x , y) = 0 (68)
is the thin plate spline.
Gaussian Markov Random Fields
IGMRFs of second order on regular lattices
Full conditionals in the interior
E(xi | x−i ) =1
20
(8 • • • •
− 2 • • • •
− 1 • • • •
)(69)
andPrec(xi | x−i ) = 20κ. (70)
Gaussian Markov Random Fields
IGMRFs of second order on regular lattices
Alternative IGMRFs in two dimensions
Using only the terms • • • • • (71)
to obtain an difference approximation to
∂2
∂x2+
∂2
∂y 2(72)
is not optimal.
The discretization error different 45 degrees to the main directions,hence we should expect a “directional” effect.
Gaussian Markov Random Fields
IGMRFs of second order on regular lattices
Use an isotropic approximations (ex. “Mehrstellen-stencil”)
− 10
3
• +2
3
• • • • +1
6
• • • • (73)
E(xi | x−i ) =1
468
(144
• • • • − 18
• • • •
+8 • • • •
− 8 • • • • • • • •
− 1• • • •
)(74)
Prec(xi | x−i ) = 13κ. (75)
Gaussian Markov Random Fields
IGMRFs of second order on regular lattices
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Samples.
Gaussian Markov Random Fields
IGMRFs of second order on regular lattices
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Samples.
Gaussian Markov Random Fields
Part V
Hierarchical GMRF models
Gaussian Markov Random Fields
Outline IHierarchical GMRFs models
A simple exampleClasses of hierarchical GMRF models
A brief introduction to MCMCBlocking
Rate of convergenceExampleOne-block algorithm
Block algorithms for hierarchical GMRF modelsGeneral setupThe one-block algorithmThe sub-block algorithmGMRF approximations
Merging GMRFsIntroduction
Gaussian Markov Random Fields
Outline IIWhy is it so?
Gaussian Markov Random Fields
Hierarchical GMRFs models
Hierarchical GMRFs modelsCharacterised through several stages of observables andparameters.
A typical scenario is as follows.
Stage 1 Formulate a distributional assumption for theobservables, dependent on latent parameters.
• Time series of binary observations y, we mayassume
yi , i = 1, . . . , n : yi ∼ B(pi )
• We assume the observations to be conditionallyindependent
Gaussian Markov Random Fields
Hierarchical GMRFs models
Hierarchical GMRFs modelsCharacterised through several stages of observables andparameters.
A typical scenario is as follows.
Stage 2 Assign a prior model, i.e. a GMRF, for the unknownparameters, here pi .
• Chose an autoregressive model for thelogit-transformed probabilities xi = logit(pi ).
Gaussian Markov Random Fields
Hierarchical GMRFs models
Hierarchical GMRFs modelsCharacterised through several stages of observables andparameters.
A typical scenario is as follows.
Stage 3 Assign to unknown parameters (or hyperparameters)of the GMRF
• precision parameter κ• “strength” of dependency.
Further stages if needed.
Gaussian Markov Random Fields
Hierarchical GMRFs models
A simple example
Tokyo rainfall data
Stage 1 Binomial data
yi ∼
Binomial(2, p(xi ))
Binomial(1, p(xi ))
Gaussian Markov Random Fields
Hierarchical GMRFs models
A simple example
Tokyo rainfall data
Stage 2 Assume a smooth latent x,
x ∼ RW 2(κ), logit(pi ) = xi
Gaussian Markov Random Fields
Hierarchical GMRFs models
A simple example
Tokyo rainfall data
Stage 3 Gamma(α, β)-prior on κ
Gaussian Markov Random Fields
Hierarchical GMRFs models
Classes of hierarchical GMRF models
Classes of hierarchical GMRF models
• Normal data• Block-MCMC
• Non-normal data that allows for a normal-mixturerepresentation• Student-t distribution• Logistic and Laplace (Binary regression)
• Block-MCMC with auxillary variables
• Non-normal data• Poisson• and others...
• Block-MCMC with GMRF-approximations
Gaussian Markov Random Fields
A brief introduction to MCMC
MCMCConstruct a Markov chain
θ(1),θ(2), . . . ,θ(k), . . .
that converges (under some conditions) to π(θ).
1. Start with θ(0) where π(θ(0)) > 0. Set k = 1.
2. Generate a proposal θ∗ from some proposal kernelq(θ∗|θ(k−1)).Set θ(k) = θ∗ with probability
α = min
1,
π(θ∗)π(θ(k−1))
q(θ(k−1)|θ∗)q(θ∗|θ(k−1))
;
otherwise set θ(k) = θ(k−1).
3. Set k = k + 1 and go back to 2
Gaussian Markov Random Fields
A brief introduction to MCMC
Depending on the specific choice of the proposal kernel q(θ∗|θ),very different algorithms result.
• When q(θ∗|θ) does not depend on the current value of θproposal is called an independence proposal.
• When q(θ∗|θ) = q(θ|θ∗) we have a Metropolis proposal.These includes so-called random-walk proposals.
The rate of convergence toward π(θ) and the degree ofdependence between successive samples of the Markov chain(mixing) will depend on the chosen proposal.
Gaussian Markov Random Fields
A brief introduction to MCMC
Single-site algorithms
• Most MCMC algorithms have been based on updating eachscalar component
θi , i = 1, . . . , p
of θ conditional on θ−i .
• Apply the MH-algorithm in turn to every component θi of θwith arbitrary proposal kernels
qi (θ∗i |θi ,θ−i )
• As long as we update each component of θ, this algorithmwill converge to the target distribution π(θ).
Gaussian Markov Random Fields
A brief introduction to MCMC
However...
• Such single-site updating can be disadvantageous ifparameters are highly dependent in the posterior distributionπ(θ).
• The problem is that the Markov chain may move around veryslowly in its target (posterior) distribution.
• A general approach to circumvent this problem is to updateparameters in larger blocks, θj : a vector of components of θ.
• The choice of blocks are often controlled by what is possibleto do in practise.
• Ideally, we should choose a small number of blocks with largedependence within the blocks but with less dependencebetween blocks.
Gaussian Markov Random Fields
Blocking
Blocking
Assume the following
• θ = (κ, x) where• κ is the precision and• x is a GMRF.
Two-block approach
• sample x ∼ π(x|κ), and
• sample κ ∼ π(κ|x)
Often strong dependence between κ and x in the posterior;resolved using a joint update of (x, κ).
Why this modification is important and why it works is discussednext.
Gaussian Markov Random Fields
Blocking
Rate of convergence
Rate of convergence
Let θ(1), θ(2), . . . denote a Markov chain with target distributionπ(θ) and initial value θ(0) ∼ π(θ).
Rate of convergence ρ: how quickly E(h(θ(t))|θ(0)) approaches thestationary value E(h(θ)) for all square π-integrable functions h(·).
Let ρ be the minimum number such that for all h(·) and for allr > ρ
limk→∞
E
[(E(
h(θ(k)) | θ(0))− E (h(θ))
)2r−2k
]= 0. (76)
Gaussian Markov Random Fields
Blocking
Example
Example
Let x be a first-order autoregressive process
xt − µ = γ(xt−1 − µ) + νt , t = 2, . . . , n, (77)
where |γ| < 1, νt are iid normals with zero mean and variance
σ2, and x1 ∼ N (µ, σ2
1−γ2 ).
Let γ, σ2, and µ be fixed parameters.
At each iteration a single-site Gibbs sampler will sample xt fromthe full conditional π(xt |x−t) for t = 1, . . . , n,
xt | x−t ∼
N (µ+ γ(x2 − µ), σ2) t = 1,
N (µ+ γ1+γ2 (xt−1 + xt+1 − 2µ), σ2
1+γ2 ) t = 2, . . . , n − 1,
N (µ+ γ(xn−1 − µ), σ2) t = n.
Gaussian Markov Random Fields
Blocking
Example
• For large n, the rate of convergence is
ρ = 4γ2
(1 + γ2)2. (78)
• For |γ| close to one the rate of convergence can be slow: Ifγ = 1− δ for small δ > 0, then ρ = 1− δ2 +O(δ3).
To circumvent this problem, we may update x in one block. This ispossible as x is a GMRF.
This yields immediate convergence.
Gaussian Markov Random Fields
Blocking
Example
Relax the assumptions of fixed hyperparameters.
Consider a hierarchical formulation where the mean of xt , µ, isunknown and assigned with a standard normal prior,
µ ∼ N (0, 1) and x | µ ∼ N (µ1,Q−1),
where Q is the precision matrix of the GMRF x|µ.
The joint density of (µ, x) is normal.
We have two natural blocks, µ and x.
Gaussian Markov Random Fields
Blocking
Example
A two-block Gibbs sampler update µ and x with samples from theirfull conditionals,
µ(k)|x(k) ∼ N(
1T Qx(k−1)
1 + 1T Q1, (1 + 1T Q1)
−1
)x(k)|µ(k) ∼ N (µ(k)1, Q−1).
(79)
The presence of the hyperparameter µ will slow down theconvergence compared to the case when µ is fixed.
Due to the nice structure of (79) we can characterise explicitly themarginal chain of µ(k).
Gaussian Markov Random Fields
Blocking
Example
TheoremThe marginal chain µ(1), µ(2), . . . from the two-block Gibbssampler defined in (79) and started in equilibrium, is a first-orderautoregressive process
µ(k) = φµ(k−1) + εk ,
where
φ =1T Q1
1 + 1T Q1
and εkiid∼ N (0, 1− φ2).
Gaussian Markov Random Fields
Blocking
Example
Proof. It follows directly that the marginal chain µ(1), µ(2), . . . is afirst-order autoregressive process.
The coefficient φ is found by computing the covariance at lag 1,
Cov(µ(k), µ(k+1)) = E(µ(k)µ(k+1)
)= E
(µ(k)E
(µ(k+1) | µ(k)
))= E
(µ(k)E
(E(µ(k+1) | x(k)
)| µ(k)
))= E
(µ(k)E
(1T Qx(k)
1 + 1T Q1| µ(k)
))
=1T Q1
1 + 1T Q1Var(µ(k)),
which is known to be φ times the variance Var(µ(k)) for afirst-order autoregressive process.
Gaussian Markov Random Fields
Blocking
Example
For our model
φ =n(1− γ)2/σ2
1 + n(1− γ)2/σ2= 1− Var(xt)
n
1− γ2
(1− γ)2+O(1/n2).
When n is large, φ is close to 1 and the chain will both mix andconverge slowly even though we use a two-block Gibbs sampler.
Note that
• increasing variance of xt improves the convergence.
However, φ→ 1 as n→∞, which is bad.
Gaussian Markov Random Fields
Blocking
Example
What is going on?
(a) Trace of µ(k), and (b) the pairs (µ(k), 1T Qx(k)), with µ(k) onthe horizontal axis.
Gaussian Markov Random Fields
Blocking
One-block algorithm
One-block algorithm
So far: blocking improves mainly within the block. If there isstrong dependence between blocks, the MCMC algorithm may stillsuffer from slow convergence.
Update (µ, x) jointly by delaying the accept/reject step until x alsois updated.
µ∗ ∼ q(µ∗ | µ(k−1))
x∗|µ∗ ∼ N (µ∗1,Q−1)(80)
then accept/reject (µ∗, x∗) jointly.
Here, q(µ∗|µ(k−1)) can be a simple random-walk proposal or someother suitable proposal distribution.
Gaussian Markov Random Fields
Blocking
One-block algorithm
What is going on?
(a) Trace of µ(k), and (b) the pairs (µ(k), 1T Qx(k)), with µ(k) onthe horizontal axis.
Gaussian Markov Random Fields
Blocking
One-block algorithm
With a symmetric µ-proposal,
α = min
1, exp(−1
2((µ∗)2 − (µ(k−1))2))
. (81)
• Only the marginal density of µ is needed in (81): weeffectively integrate x out of the target density.
• The minor modification to delay the accept/reject step until xis updated as well can give a large improvement.
• In this case: random walk on a one-dimensional density.
Gaussian Markov Random Fields
Block algorithms for hierarchical GMRF models
General setup
A more general setup
• Hyperparameters θ (low dimension)
• GMRF x | θ of size n
• Observe x with data y.
The posterior is
π(x,θ | y) ∝ π(θ) π(x | θ) π(y | x,θ).
Assume we are able to sample from π(x|θ, y), i.e., the fullconditional of x is a GMRF.
Gaussian Markov Random Fields
Block algorithms for hierarchical GMRF models
The one-block algorithm
The one-block algorithm
The following proposal update (θ, x) in one block:
θ∗ ∼ q(θ∗ | θ(k−1))
x∗ ∼ π(x | θ∗, y).(82)
The proposal (θ∗, x∗) is then accepted/rejected jointly.
We denote this as the one-block algorithm.
Gaussian Markov Random Fields
Block algorithms for hierarchical GMRF models
The one-block algorithm
Consider the θ-chain, then we are in fact sampling from theposterior marginal π(θ|y) using the proposal
θ∗ ∼ q(θ∗ | θ(k−1))
The dimension of θ is typically low (1-5, say).
The proposed algorithm should not experience any serious mixingproblems.
Gaussian Markov Random Fields
Block algorithms for hierarchical GMRF models
The one-block algorithm
The one-block algorithm is not always feasible
The one-block algorithm is not always feasible for the followingreasons:
1. The full conditional of x can be a GMRF with a precisionmatrix that is not sparse.This will prohibit a fast factorisation, hence a joint update isfeasible but not computationally efficient.
2. The data can be non-normal so the full conditional of x is nota GMRF and sampling x∗ using (82) is not possible (ingeneral).
Gaussian Markov Random Fields
Block algorithms for hierarchical GMRF models
The sub-block algorithm
...not sparse precision matrix
These cases can often be approached using sub-blocks of (θ, x),the sub-block algorithm.
Assume a natural splitting exists for both θ and x into
(θa, xa), (θb, xb) and (θc , xc), (83)
The sets a, b, and c do not need to be disjoint.
One class of examples where such an approach is fruitful is(geo-)additive models where a, b, and c represent three differentcovariate effects with their respective hyperparameters.
Gaussian Markov Random Fields
Block algorithms for hierarchical GMRF models
GMRF approximations
...non-Gaussian full conditionals
Auxillary variables
• can help achieving Gaussian full conditionals.
• logit and probit regression models for binary andmulti-categorical data, and
• Student-tν distributed observations.
GMRF approximations
• can approximate π(x|θ, y) using a second-order Taylorexpansion.
• Prominent example: Poisson-regression.
• Such approximations can be surprisingly accurate in manycases and can be interpreted as integrating x approximatelyout of π(x,θ|y).
Gaussian Markov Random Fields
Merging GMRFs
Introduction
Merging GMRFs using conditioning (I)
µ ∼ N (0, 1)
x− µ | µ ∼ AR(1)
z | x ∼ N (x, I)
y | z ∼ N (z, I)
• x∗ = (µ, x, z, y) is a GMRF
• x∗ | y is a GMRF
Gaussian Markov Random Fields
Merging GMRFs
Introduction
Merging GMRFs using conditioning (I)
µ ∼ N (0, 1)
x− µ | µ ∼ AR(1)
z | x ∼ N (x, I)
y | z ∼ N (z, I)
• x∗ = (µ, x, z, y) is a GMRF
• x∗ | y is a GMRF
• Additional hyperparameters θ
Gaussian Markov Random Fields
Merging GMRFs
Introduction
Merging GMRFs using conditioning (II)
Gaussian Markov Random Fields
Merging GMRFs
Why is it so?
Merging GMRFs using conditioning (III)
If
x ∼ N (0,Q−1)
y | x ∼ N (x,K−1)
z | x, y ∼ N (y,H−1)
then
Prec(x, y, z) =
Q + K −K 0K + H −H
H
which is sparse if Q, K and H are.
Gaussian Markov Random Fields
Part VI
Applications
Gaussian Markov Random Fields
Outline I
Normal dataMunich rental data
Normal-mixture response modelsIntroductionHierarchical-t formulationsBinary regressionSummary
Non-normal response modelsGMRF approximationJoint analysis of diseases
Gaussian Markov Random Fields
Normal data
Normal data: Munich rental guide
• Response variable yi : rent pr m2
• location
• floor space
• year of construction
• various indicator variables, suchas• no central heating• no bathroom• large balcony
German law: increase in the rent is based on an ‘average rent’ of acomparable flat.
Gaussian Markov Random Fields
Normal data
Munich rental data
Spatial regression model
yi ∼ N(µ+ xS(i) + xC (i) + xL(i) + zT
i β, 1/κy
)xS(i) Floor space (continuous RW2)
xC (i) Year of construction (continuous RW2)
xL(i) Location (first order IGMRF)
β Parameter for indicator variables
• 380 spatial locations
• 2035 observations
• sum-to-zero constraint on spatial IGMRF model and the yearof construction covariate
Gaussian Markov Random Fields
Normal data
Munich rental data
Inference using MCMC
Sub-block approach
(xS , κS), (xC , κC ), (xL, κL), and (β, µ, κy).
Update each block at a time, using
κ∗L ∼ q(κ∗L | κL)
xL,∗ ∼ π(xL,∗ | the rest)
and then accepts/rejects (κ∗L, xL,∗) jointly.
Gaussian Markov Random Fields
Normal data
Munich rental data
Log-RW proposal for precisions
It’s convenient to propose new precisions using
κnew = κold · f
f ∼ π(f ) ∝ 1 + 1/f
for f in [1/F ,F ] and F > 1.
With this choiceq(κold | κnew)
q(κnew | κold)= 1
Var(κnew | κold) ∝ (κold)2
Gaussian Markov Random Fields
Normal data
Munich rental data
Full conditional π(xL,∗|the rest)
Introduce ‘fake’ data y
yi = yi −(µ+ xS(i) + xC (i) + zT
i β),
The full conditional of xL is
π(xL | the rest) ∝ exp(−κL
2
∑i∼j
(xLi − xL
j )2)
× exp(−κy
2
∑k
(yk − xL(k)
)2).
The data y do not introduce extra dependence between the xLi ’s,
as yi acts as a noisy observation of xLi .
Gaussian Markov Random Fields
Normal data
Munich rental data
Denote by ni the number of neighbors to location i and let L(i) be
L(i) = k : xL(k) = xLi ,
where its size is |L(i)|.The full conditional of xL is a GMRF with parameters (Q−1b,Q),where
bi = κy
∑k∈Li
yk and
Qij =
κLni + κy|L(i)| if i = j
−κL if i ∼ j
0 otherwise.
Gaussian Markov Random Fields
Normal data
Munich rental data
Results
Floor space
Gaussian Markov Random Fields
Normal data
Munich rental data
Results
Year of construction
Gaussian Markov Random Fields
Normal data
Munich rental data
ResultsLocation
Gaussian Markov Random Fields
Normal-mixture response models
Introduction
Normal-mixture representation
Theorem (Kelker, 1971)If x has density f (x) symmetric around 0, then there existindependent random variables z and v, with z standard normalsuch that x = z/v iff the derivatives of f (x) satisfy(
− d
dy
)k
f (√
y) ≥ 0
for y > 0 and for k = 1, 2, . . ..
• Student-t
• Logistic and Laplace
Gaussian Markov Random Fields
Normal-mixture response models
Introduction
Corresponding mixing distribution for the precision parameter λthat generates these distributions as scale mixtures of normals.
Distribution of x Mixing distribution of λ
Student-tν G(ν/2, ν/2)Logistic 1/(2K )2 where K is
Kolmogorov-Smirnov distributedLaplace 1/(2E ) where E is exponential distributed
Gaussian Markov Random Fields
Normal-mixture response models
Hierarchical-t formulations
RW1 with tν − increments
Replace the assumption of normally distributed increments by aStudent-tν distribution to allow for larger jumps in the sequence x.
Introduce n−1 independent G(ν/2, ν/2) scale mixture variables λi :
∆xi | λiiid∼ N (0, (κλi )
−1), i = 1, . . . , n − 1.
Gaussian Markov Random Fields
Normal-mixture response models
Hierarchical-t formulations
Observe data yi ∼ N (xi , κ−1y ) for i = 1, . . . , n
The posterior density for (x,λ) is
π(x,λ | y) ∝ π(x | λ) π(λ) π(y | x).
Note that
• x|(y,λ) is now a GMRF, while
• λ1, . . . , λn−1|(x, y) are conditionally independent gammadistributed with parameters (ν + 1)/2 and (ν + κ(∆xi )
2)/2.
Use the sub-block algorithm with blocks
(θ, x), λ
Gaussian Markov Random Fields
Normal-mixture response models
Binary regression
Example: Binary regression
GMRF x and Bernoulli data
yi ∼ B(g−1(xi ))
g(p) =
log(p/(1− p)) logit link
Φ(p) probit link
Equivalent representation using auxiliary variables w
εiiid∼ N (0, 1)
wi = xi + εi
yi =
1 if wi > 0
0 otherwise.
for the probit-link.
Gaussian Markov Random Fields
Normal-mixture response models
Binary regression
MCMC Inference
• Full conditional for the GMRF prior is a GMRF
• Full conditional for the auxiliary variables
π(w | . . .) =∏i
π(wi | ...)
Use the sub-block algorithm with blocks
(θ, x), w
Gaussian Markov Random Fields
Normal-mixture response models
Binary regression
Simple example: Tokyo rainfall data
Stage 1 Binomial data
yi ∼
Binomial(2, p(xi ))
Binomial(1, p(xi ))
Gaussian Markov Random Fields
Normal-mixture response models
Binary regression
Simple example: Tokyo rainfall data
Stage 2 Assume a smooth latent x,
x ∼ RW 2(κ), logit(pi ) = xi
Gaussian Markov Random Fields
Normal-mixture response models
Binary regression
Simple example: Tokyo rainfall data
Stage 3 Gamma(α, β)-prior on κ
Gaussian Markov Random Fields
Normal-mixture response models
Binary regression
Simple example: Tokyo rainfall data
Block Single-site
Precision
Gaussian Markov Random Fields
Normal-mixture response models
Binary regression
Simple example: Tokyo rainfall data
Block Single-site
Probability
Gaussian Markov Random Fields
Normal-mixture response models
Summary
• These auxiliary variables methods works better than onemight think.
• Not that strong dependency between the blocks:
(θ, x), w
• Nearly as random normal data.
Gaussian Markov Random Fields
Non-normal response models
GMRF approximation
Non-normal response models
Example of full conditional
π(x | y) ∝ exp
(−1
2xT Qx−
∑i
exp(xi )
)
≈ exp
(−1
2(x− µ)T (Q + diag(ci ))(x− µ)
)Construct GMRF approximation
• Locate the mode x∗
• Expand to second order
• To obtain the GMRF approximation
• The graph is unchanged
Gaussian Markov Random Fields
Non-normal response models
GMRF approximation
Why use the mode?
Gaussian Markov Random Fields
Non-normal response models
GMRF approximation
How to find the mode?
• Numerical gradient-search methods• evaluate the gradient and hessian in O(n) flops.• faster for large GMRFs
• Newton-Raphson method• Current x is initial value• Taylor-expand around x• Compute the mean in the GMRF-approximation• Repeat this until convergence
• Further complications with linear constraints Ax = b.
Gaussian Markov Random Fields
Non-normal response models
GMRF approximation
“Taylor” or not? (I)
Consider a non-quadratic function g(x)
g(x) = g(x0) + (x − x0)g ′(x0) +1
2(x − x0)2g ′′(x0) + . . .
• error is zero in x0
• error typically increase with |x − x0|
Gaussian Markov Random Fields
Non-normal response models
GMRF approximation
“Taylor” or not? (II)
Alternative approximation
g(x) ≈ g(x) = a + b +1
2cx2 (84)
where a, b and c are such that
g(x0) = g(x0), g(x0 ± h) = g(x0 ± h)
• More “uniformly distributed error”
• Better for distribution approximation
• ....if h is a rough guess of ROI.
Preference to numerical approxmations of derivatives compared toanalytical expressions.
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Joint analysis of diseases
SMR Oral SMR Lung
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Separate analysis: model
yij : number of cases in area i for cancer type j .
yij ∼ P(eij exp(ηij)),
ηij : log relative risk in area i for disease j .eij : constants
• Decompose the log-relative risk η as
η = µ1 + u + v
• µ is the overall mean
• u a spatially structured component (IGMRF)
• v an unstructured component (random effects)
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Separate analysis: model
Posterior
π(x,κ | y) ∝ κn/2v κ
(n−1)/2u exp
(−1
2xT Qx
)× exp
(∑i
yiηi − ei exp(ηi )
)π(κ)
Q =
κµ + nκv κv1T −κv1T
κv1 κuR + κvI −κvI−κv1 −κvI κvI
Q is a 2n + 1× 2n + 1 matrix where n = 544.
The spatially structured term u has a sum-to-zero constraint,1T u = 0.
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Separate analysis: MCMC
Joint update of all parameters:
κ∗u ∼ q(κ∗u | κu)
κ∗v ∼ q(κ∗v | κv)
x∗ ∼ π(x | κ∗, y)
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Separate analysis: Results
Oral Lung
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Joint analysis: Model
• Oral cavity and lung cancer relates to tobacco smoking (u1)
• Oral cancer relates to alcohol consumption (u2)
• Latent shared and specific spatial components:
η1 | u1,u2,µ,κ ∼ N (µ11 + δu1 + u2, κ−1η1
I)
η2 | u1,u2,µ,κ ∼ N (µ21 + δ−1u1, κ−1η2
I),
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Joint analysis: MCMC
Update all parameters in one block
• Simple (log-)RW for the hyperparamteters
• Sample (µ1,η1,u1, µ2,η2,u2) from the GMRF approximation.
• Accepts/rejects jointly.
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Joint analysis: Results
“tobacco” “alcohol”
Gaussian Markov Random Fields
Non-normal response models
Joint analysis of diseases
Independence samplers
For many of the examples it’s quite easy/possible to constructindependence samplers
• (More or less) exact samples
• The basis to methods avoiding completely MCMC
...continue tomorrow!
Gaussian Markov Random Fields
Part VII
Advanced topics
Gaussian Markov Random Fields
Outline I
Computing marginal variances for GMRFs∗
IntroductionStatistical derivationMarginal variances under hard and soft constraints
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Introduction
Computing marginal variances for GMRFs∗
LetQ = VDVT
• where D is a diagonal matrix, and
• V is a lower triangular matrix with ones on the diagonal.
The matrix identity
Σ = D−1V−1 + (I− VT )Σ
define recursions which can be used to compute
• Var(xi ) and Cov(xi , xj) for i ∼ j
essentially without cost when the Cholesky triangle L is known.This is a result from 1973.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
Statistical derivation
Recall for a zero mean GMRF that
xi | xi+1, . . . , xn ∼ N (− 1
Lii
n∑k=i+1
Lkixk , 1/L2ii ), i = n, . . . , 1.
provides a sequential representation of the GMRF backward in“time” i .
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
Multiply by xj , j ≥ i , and taking expectation yields
Σij = δij/L2ii −
1
Lii
n∑k∈I(i)
LkiΣkj , j ≥ i , i = n, . . . , 1,
where I(i) as those k where Lki is non-zero,
I(i) = k > i : Lki 6= 0
and δij is one if i = j and zero otherwise.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
We can use
Σij = δij/L2ii −
1
Lii
n∑k∈I(i)
LkiΣkj , j ≥ i , i = n, . . . , 1,
to compute Σij for each ij :
• Outer loop i = n, . . . , 1
• Inner loop j = n, . . . , i
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
ExampleLet n = 3, I(1) = 2, 3, I(2) = 3, then we get
Σ33 =1
L233
Σ23 = − 1
L22(L32Σ33)
Σ22 =1
L222
− 1
L22(L32Σ32) Σ13 = − 1
L11(L21Σ23 + L31Σ33)
Σ12 = − 1
L11(L21Σ22 + L31Σ32) Σ11 =
1
L211
− 1
L11(L21Σ21 + L31Σ31)
where we also need to use that Σ is symmetric.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
• Assume we want to compute all marginal variances.
• To do so, we need to compute Σij (or Σji ) for all ij in someset S.
• If the recursions can be solved by only computing Σij for allij ∈ S we say that the recursions are solvable using S.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
From
Σij = δij/L2ii −
1
Lii
n∑k∈I(i)
LkiΣkj , j ≥ i , i = n, . . . , 1, (85)
it is evident that S must satisfy
ij ∈ S and k ∈ I(i) =⇒ kj ∈ S (86)
We also need that ii ∈ S for i = 1, . . . , n.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
• S = V × V is a valid set, but we want |S| to be minimal toavoid unnecessary computations.
• Such a minimal set depends however on the numerical valuesin L, or Q implicitly.
• Denote by S(Q) a minimal set.
TheoremThe union of S(Q) for all Q > 0 with fixed graph G, is a subset of
S∗ = ij ∈ V × V : j ≥ i , i and j are not separated by F (i , j)
and S∗ is solvable.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
Proof. We first note that ii ∈ S∗, for i = 1, . . . , n, since i and i are notseparated by F (i , i). We will now verify that the recursions are solvable usingS∗. The global Markov property ensure that if ij 6∈ S∗ then Lji = 0 for allQ > 0 with fixed graph G. We use this to replace I(i) withI∗(i) = k > i : ik ∈ S∗ in (86), which is legal since I(i) ⊆ I∗(i) and thedifference only identify terms Lki which are zero. It is now sufficient to showthat
ij ∈ S∗ and ik ∈ S∗ =⇒ kj ∈ S∗ (87)
which implies (86). Eq. (87) is trivially true for i ≤ k = j . Fix now i < k < j .
Then ij ∈ S∗ says that there exists a path i , i1, . . . , in, j , where i1, . . . , in are all
smaller than i , and ik ∈ S∗ says that exists a path i , i ′1, . . . , i ′n′ , k, where
i ′1, . . . , i ′n′ are all smaller than i. Then there is a path from k to i and from i to
j where all nodes are less or equal to i , but then also less than k since i < k.
Hence, k and j are not separated by F (k, j) so kj ∈ S∗. Finally, since S∗contains 11, . . . , nn and only depend on G, it must contain the union of all
S(Q) since each S(Q) is minimal.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
Interpretation of S∗
• S∗ is the set of all possible non-zero elements in L based on Gonly.
• This is the set of Lji ’s that are computed when computingQ = LLT .
• Since Lji 6= 0 in general when i ∼ j , then we compute alsoCov(xi , xj) for i ∼ j .
• Some of the Lij ’s might turn out to be zero depending on theconditional independence properties of the marginal densityfor xi :n for i = n, . . . , 1. This might cause a slight problem inpractice (implementation dependent).
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
General algorithm
for i = n, . . . , 1for decreasing j in I(i)
Compute Σi ,j from Eq. (85)
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Statistical derivation
Band matrices
for i = n, . . . , 1for j = min(i + bw , n), . . . , i
Compute Σi ,j from Eq. (85).
Equivalent to Kalman-recursions for smoothing.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Marginal variances under hard and soft constraints
Marginal variances under hard and soft constraints
Let Σ be the covariance with the constraints and Σ be without.
Then Σ relates to Σ as
Σ = Σ−Q−1AT(
AQ−1AT)−1
AQ−1.
hence
Σii = Σii −(
Q−1AT(
AQ−1AT)−1
AQ−1
)ii
, i = 1, . . . , n.
for the hard constraint Ax = b.
Gaussian Markov Random Fields
Computing marginal variances for GMRFs∗
Marginal variances under hard and soft constraints
Computational costs
Time O(n)
Spatial O(n log(n)2)
Spatio-temporal O(n5/3).