Session 1: Introduction
Geir-Arne Fuglstad <[email protected]>
Department of Mathematical Sciences, NTNU
April 14, 2016
www.ntnu.no G.-A. Fuglstad, Introduction
2
Instructors
— Geir-Arne Fuglstad• Postdoc with Håvard Rue• Working with sensitivity and robustifying results from INLA• PhD on non-stationary spatial modelling with SPDE-models
— Jingyi Guo• Ph.D. student with Andrea Riebler and Håvard Rue• Master in Statistics from Lund University
www.ntnu.no G.-A. Fuglstad, Introduction
3
Practical information
Slides and code for the sessions available athttp://www.math.ntnu.no/~fuglstad/Lund2016
www.ntnu.no G.-A. Fuglstad, Introduction
4
Practical informationInformation about the software, examples, papers and help can befound at http://www.r-inla.org
www.ntnu.no G.-A. Fuglstad, Introduction
5
What is INLA?
We separate between three different parts:1. The INLA method2. The SPDE models3. The INLA R-package
www.ntnu.no G.-A. Fuglstad, Introduction
6
The INLA method
An approach for fast Bayesian inference with “latent Gaussianmodels”
Read paper:Rue, Martino, and Chopin (2009) “Approximate Bayesian inference for latent Gaussian
models by using integrated nested Laplace approximations.” Journal of the royal statistical
society: Series B. 71, 319–392
www.ntnu.no G.-A. Fuglstad, Introduction
7
The SPDE models
A novel way to get around the computational inefficiencies ofcontinuously indexed spatial fields (GRFs)
Read paper:Lindgren, F., Rue, H. and Lindström, J. (2011) “An explicit link between Gaussian fields and
Gaussian Markov random fields: the stochastic partial differential equation approach.”
Journal of the royal statistical society: Series B. 73, 319–392
www.ntnu.no G.-A. Fuglstad, Introduction
8
The INLA package
The R-package is an implementation of the INLA method and theSPDE models with a flexible and simple interface
Download with:source("http://www.math.ntnu.no/inla/givemeINLA-testing.R")
www.ntnu.no G.-A. Fuglstad, Introduction
9
The history of INLA
— The development has been driven by Håvard Rue and is theresult of many years of hard work
— Around 2002–2004 he and Leonard Held started to realize theimportance of the class of models that INLA handles
— In 2005 Håvard Rue and Leonard Held wrote the book“Gaussian Markov Random Fields: Theory and Applications”
www.ntnu.no G.-A. Fuglstad, Introduction
10
The history of INLA
— The first implementation in C was finished in 2007, butrequired hand-crafted input-files
— Arnoldo Frigessi (Oslo) suggested that an R-interface wasnecessary to reach a broad audience
— Sara Martino wrote the first prototype of the R-interface inJanuary/February 2008
— The source code now consists of many, many, many lines...— The source code is available at
https://bitbucket.org/hrue/r-inla/
www.ntnu.no G.-A. Fuglstad, Introduction
11
Who develops INLA?
Håvard Rue, Finn Lindgren, Daniel Simpson, Andrea Riebler, (SaraMartino, Thiago Guerrera Martins, Rupali Akerkar) and others(photo 2011)
www.ntnu.no G.-A. Fuglstad, Introduction
12
Aims of the course
— Get an overview of latent Gaussian models— Get an overview of the INLA method— Learn how to use INLA for (generalized) linear models, and
more— Learn how to do spatial modelling with INLA
www.ntnu.no G.-A. Fuglstad, Introduction
13
Structure of the course
Day 1:10:30–11:45 Session 1: Introduction13:15–15:00 Session 2: R-INLA15:30–17:00 Session 3: Practical session with R-INLADay 2:09:15–10:00 Session 4: Advanced Example10:30–11:45 Session 5: Spatial modelling with INLA13:15–15:00 Session 6: Practical session with spatial modelling
www.ntnu.no G.-A. Fuglstad, Introduction
14
In general
— Ask questions!— Discuss with us!— If you have questions, you can use the google group or
www.ntnu.no G.-A. Fuglstad, Introduction
15
Outline
Motivation
Bayesian hierarchical models
Latent Gaussian models
Deterministic inference
R and INLA
www.ntnu.no G.-A. Fuglstad, Introduction
16
Why use INLA?
— Provides full Bayesian analysis— Quick to write code, do not need to write a sampler— Runs quickly— Can be used for a flexible class of models (Latent Gaussian
Models)
www.ntnu.no G.-A. Fuglstad, Introduction
17
Example: Ski flying recordsWe have ski flying world records y = (y1, . . . , yn) and their datesx1, . . . , xn, and want to fit a simple linear regression with Gaussianresponses, where
E(yi) = µ+ βxi , Var(yi) = τ−1, i = 1, . . . ,n
1960 1970 1980 1990 2000 2010
14
01
80
22
0
Year
Dis
tan
ce
www.ntnu.no G.-A. Fuglstad, Introduction
18
Frequentist analysis
1 mod = lm(Length ~ Date , data = skiData)2 summary(mod)
1960 1970 1980 1990 2000 2010
140
180
220
Year
Dis
tance
Estimates
µ: -3986 (66),β: 2.10 (0.03)σ = 1/
√τ : 3.98
www.ntnu.no G.-A. Fuglstad, Introduction
19
Bayesian analysis1 res = inla(Length ~ Date , data = skiData)2 res$summary.fixed [ ,1:2]; res$summary.hyperpar
−4400 −4000 −36000.0
00
0.0
05
mu
Mean = −3986, SD = 65
1.9 2.1 2.3
04
8
beta
Mean = 2.10, SD = 0.03
3.0 3.5 4.0 4.5 5.0
0.0
0.2
0.4
0.6
0.8
1.0
standard deviation
www.ntnu.no G.-A. Fuglstad, Introduction
20
Real-world problems are typically morecomplicated!
Often we need to— include complicated dependency structures— stabilize the inference
Can be achieved with hierarchical Bayesian modelling, but...
Two main challenges:
— Need computationally efficient methods to calculate posteriors.— Select priors in a sensible way
www.ntnu.no G.-A. Fuglstad, Introduction
20
Real-world problems are typically morecomplicated!
Often we need to— include complicated dependency structures— stabilize the inference
Can be achieved with hierarchical Bayesian modelling, but...
Two main challenges:
— Need computationally efficient methods to calculate posteriors.— Select priors in a sensible way
www.ntnu.no G.-A. Fuglstad, Introduction
21
Bayesian hierarchical models
INLA can analyze Bayesian hierarchical models specified in threestages:
Stage 1: What is the distribution of the responses?
Stage 1.5: How is the mean/variance/probability of the responselinked to the underlying unobserved components?
Stage 2: What is the distribution of the underlying unobservedcomponents?
Stage 3: What are our prior beliefs about the parameterscontrolling the components in the model?
www.ntnu.no G.-A. Fuglstad, Introduction
21
Bayesian hierarchical models
INLA can analyze Bayesian hierarchical models specified in threestages:
Stage 1: What is the distribution of the responses?
Stage 1.5: How is the mean/variance/probability of the responselinked to the underlying unobserved components?
Stage 2: What is the distribution of the underlying unobservedcomponents?
Stage 3: What are our prior beliefs about the parameterscontrolling the components in the model?
www.ntnu.no G.-A. Fuglstad, Introduction
21
Bayesian hierarchical models
INLA can analyze Bayesian hierarchical models specified in threestages:
Stage 1: What is the distribution of the responses?
Stage 1.5: How is the mean/variance/probability of the responselinked to the underlying unobserved components?
Stage 2: What is the distribution of the underlying unobservedcomponents?
Stage 3: What are our prior beliefs about the parameterscontrolling the components in the model?
www.ntnu.no G.-A. Fuglstad, Introduction
21
Bayesian hierarchical models
INLA can analyze Bayesian hierarchical models specified in threestages:
Stage 1: What is the distribution of the responses?Stage 1.5: How is the mean/variance/probability of the response
linked to the underlying unobserved components?Stage 2: What is the distribution of the underlying unobserved
components?Stage 3: What are our prior beliefs about the parameters
controlling the components in the model?
www.ntnu.no G.-A. Fuglstad, Introduction
22
Stage 1
How is the data (y) generated from the underlying components (x)and hyperparameters (θ) in the model:— Gaussian response?— Count data? (E.g. Poisson, negative binomial)— Zero-inflation?— Point pattern? (E.g. Log-Gaussian cox process)— Binary data?
The response distribution is connected to x and θ through thelikelihood π(y |x ,θ)
www.ntnu.no G.-A. Fuglstad, Introduction
23
Stage 1.5
In INLA Stage 1 and Stage 2 must be connected through linearpredictors by
π(y |x ,θ) =n∏
i=1
π(yi |ηi ,θ),
where each ηi is a linear combination of the model components x .
For example, ηi = µ+ βxi can be combined withGaussian: ηi = µi
Poisson: ηi = log(µi)
Binomial: ηi = logit(pi)
www.ntnu.no G.-A. Fuglstad, Introduction
24
Stage 2
The underlying unobserved components x are called latentcomponents and can be:— Covariates— Unstructured random effects (individual effects, group effects)— Structured random effects (AR(1), regional effects,
continuously indexed spatial effects)
The distribution of the the model components are specified byπ(x |θ)
www.ntnu.no G.-A. Fuglstad, Introduction
25
Stage 3
The likelihood and the latent model typically have hyperparametersthat control their behavior. The hyperparameters θ can include:
— Variance of unstructured effects— Range and variance of spatial effects— Autocorrelation parameter— Variance of observation noise— Probability of a zero (zero-inflated models)
The a priori beliefs about these parameters are placed in the priorπ(θ)
www.ntnu.no G.-A. Fuglstad, Introduction
26
Statistical jargon
It can be phrased with equations as
Stage 1: y |x ,θ ∼ π(y |x ,θ) =∏n
i=1 π(yi |ηi ,θ)
Stage 1.5: Each ηi is a linear combination of elements of xStage 2: x |θ ∼ π(x |θ)
Stage 3: θ ∼ π(θ)
www.ntnu.no G.-A. Fuglstad, Introduction
27
Example: Disease mapping in Germany
We have observed larynx cancer mortality counts for males in 544district of Germany from 1986 to 1990 and want to make a model.
Information given:
yi : The count at location i .Ei : An offset; expected number of
cases in district i .ci : A covariate (level of smoking
consumption) at location isi : spatial location i (here, district).
0.0
0.5
1.0
1.5
2.0
2.5
www.ntnu.no G.-A. Fuglstad, Introduction
28
Stage 1: The data
First we decide on the likelihood for our data y
— Need a distribution for counts— We decide to model our responses as
yi | ηi ∼ Poisson(Ei exp(ηi)),
ηi is a linear function of the latent components
www.ntnu.no G.-A. Fuglstad, Introduction
29
Stage 2: The latent model
We choose four components— Intercept µ— Spatially structured effect fs and unstructured effect u— Covariate effect f (ci) of the exposure covariate ci
Combine with linear predictor ηi = µ+ fs(si) + f (ci) + ui , and thefull latent field x = (µ, {fs(·)}, {f (·)},u1,u2, . . . ,un)
www.ntnu.no G.-A. Fuglstad, Introduction
30
Stage 3: Hyperparameters
The structured and unstructured spatial effect as well as thesmooth covariate effect are each controlled by one parameter
— τc , τf , τη: The precisions (inverse variances) of the covariateeffect, spatial effect and unstructured effect, respectively.
Hyperparameters θ = (τc , τf , τη) must be given a prior π(τc , τf , τη).
www.ntnu.no G.-A. Fuglstad, Introduction
31
Quantities of interest
Median of structured spatial effect Covariate effect f (ci)exp(fs(si))
0.8
1.0
1.2
1.4
1.6
0 20 40 60 80 100
−0
.50
.00
.5
0.025%, 0.5% and 0.975%
www.ntnu.no G.-A. Fuglstad, Introduction
32
Latent Gaussian models
A key feature of the example is that it is contained in the veryflexible and useful class of models called Latent Gaussian models
— The characteristic property is that the latent part of thehierarchical model is Gaussian, x |θ ∼ N (0,Q−1)
• The expected value is 0• The precision matrix (inverse covariance matrix) is Q
Together with the linear predictor restriction this defines the classof models INLA can handle
www.ntnu.no G.-A. Fuglstad, Introduction
32
Latent Gaussian models
A key feature of the example is that it is contained in the veryflexible and useful class of models called Latent Gaussian models
— The characteristic property is that the latent part of thehierarchical model is Gaussian, x |θ ∼ N (0,Q−1)
• The expected value is 0• The precision matrix (inverse covariance matrix) is Q
Together with the linear predictor restriction this defines the classof models INLA can handle
www.ntnu.no G.-A. Fuglstad, Introduction
32
Latent Gaussian models
A key feature of the example is that it is contained in the veryflexible and useful class of models called Latent Gaussian models
— The characteristic property is that the latent part of thehierarchical model is Gaussian, x |θ ∼ N (0,Q−1)
• The expected value is 0• The precision matrix (inverse covariance matrix) is Q
Together with the linear predictor restriction this defines the classof models INLA can handle
www.ntnu.no G.-A. Fuglstad, Introduction
33
The general set-upThe class contains GLMs, GLMMs, GAMs, GAMMs, and more.Can be constructed by connecting the mean µi to the linearpredictor, ηi , through a link function g,
ηi = g(µi) =
α +
zTi β +
ui +
∑γ
wγ,i
fγ(cγ,i)
, i = 1,2, . . . ,n
where
α : Intercept
β : Fixed effects of covariates z
u : Unstructured error terms
{fγ(·)} : Non-linear/smooth effects of covariates c
{wγ,i} : Known weights defined for each observed data point
www.ntnu.no G.-A. Fuglstad, Introduction
33
The general set-upThe class contains GLMs, GLMMs, GAMs, GAMMs, and more.Can be constructed by connecting the mean µi to the linearpredictor, ηi , through a link function g,
ηi = g(µi) = α +
zTi β +
ui +
∑γ
wγ,i
fγ(cγ,i)
, i = 1,2, . . . ,n
whereα : Intercept
β : Fixed effects of covariates z
u : Unstructured error terms
{fγ(·)} : Non-linear/smooth effects of covariates c
{wγ,i} : Known weights defined for each observed data point
www.ntnu.no G.-A. Fuglstad, Introduction
33
The general set-upThe class contains GLMs, GLMMs, GAMs, GAMMs, and more.Can be constructed by connecting the mean µi to the linearpredictor, ηi , through a link function g,
ηi = g(µi) = α + zTi β +
ui +
∑γ
wγ,i
fγ(cγ,i)
, i = 1,2, . . . ,n
whereα : Interceptβ : Fixed effects of covariates z
u : Unstructured error terms
{fγ(·)} : Non-linear/smooth effects of covariates c
{wγ,i} : Known weights defined for each observed data point
www.ntnu.no G.-A. Fuglstad, Introduction
33
The general set-upThe class contains GLMs, GLMMs, GAMs, GAMMs, and more.Can be constructed by connecting the mean µi to the linearpredictor, ηi , through a link function g,
ηi = g(µi) = α + zTi β + ui +
∑γ
wγ,i
fγ(cγ,i)
, i = 1,2, . . . ,n
whereα : Interceptβ : Fixed effects of covariates zu : Unstructured error terms
{fγ(·)} : Non-linear/smooth effects of covariates c
{wγ,i} : Known weights defined for each observed data point
www.ntnu.no G.-A. Fuglstad, Introduction
33
The general set-upThe class contains GLMs, GLMMs, GAMs, GAMMs, and more.Can be constructed by connecting the mean µi to the linearpredictor, ηi , through a link function g,
ηi = g(µi) = α + zTi β + ui +
∑γ
wγ,i
fγ(cγ,i), i = 1,2, . . . ,n
whereα : Interceptβ : Fixed effects of covariates zu : Unstructured error terms
{fγ(·)} : Non-linear/smooth effects of covariates c
{wγ,i} : Known weights defined for each observed data point
www.ntnu.no G.-A. Fuglstad, Introduction
33
The general set-upThe class contains GLMs, GLMMs, GAMs, GAMMs, and more.Can be constructed by connecting the mean µi to the linearpredictor, ηi , through a link function g,
ηi = g(µi) = α + zTi β + ui +
∑γ
wγ,i fγ(cγ,i), i = 1,2, . . . ,n
whereα : Interceptβ : Fixed effects of covariates zu : Unstructured error terms
{fγ(·)} : Non-linear/smooth effects of covariates c{wγ,i} : Known weights defined for each observed data point
www.ntnu.no G.-A. Fuglstad, Introduction
34
Flexibility through f -functions
The functions {fγ} provides very different types of random effects— f (time): E.g., an AR(1) process, RW1 or RW2— f (spatial location): E.g., a Matérn field— f (covariate): E.g., a RW1 or RW2 on the covariate values— f (time, spatial location): spatio-temporal effect— And much more
www.ntnu.no G.-A. Fuglstad, Introduction
35
Additivity
— One of the most useful features of the framework is theadditivity
— Effects can easily be removed and added without difficulty— Each component might adds a new latent part and might add
new hyperparameters, but the modelling framework andcomputations stay the same
www.ntnu.no G.-A. Fuglstad, Introduction
36
Example: Smoothing binary time-series
— Observed the sequence y1, y2, . . . , yn of 0s and 1s— Each time t has an associated covariate xi
— We want to smooth the time series by inferring the sequencept , for t = 1,2, . . . ,n, of probabilities for 1s at each time step
www.ntnu.no G.-A. Fuglstad, Introduction
37
Example: Smoothing time series
Stage 1: Bernoulli distribution for the responses
yt |ηt ∼ Bernoulli(
exp (ηt )
1 + exp (ηt )
)
Stage 2: Covariates, AR(1) component and random noise areconnected to likelihood by
ηt = β0 + β1xt + at + vt
Stage 3: ρ : Dependence parameter in AR(1) processσ2
a : Marginal variance in AR(1) processσ2
v : Variance of the unstructed error
www.ntnu.no G.-A. Fuglstad, Introduction
37
Example: Smoothing time series
Stage 1: Bernoulli distribution for the responses
yt |ηt ∼ Bernoulli(
exp (ηt )
1 + exp (ηt )
)Stage 2: Covariates, AR(1) component and random noise are
connected to likelihood by
ηt = β0 + β1xt + at + vt
Stage 3: ρ : Dependence parameter in AR(1) processσ2
a : Marginal variance in AR(1) processσ2
v : Variance of the unstructed error
www.ntnu.no G.-A. Fuglstad, Introduction
37
Example: Smoothing time series
Stage 1: Bernoulli distribution for the responses
yt |ηt ∼ Bernoulli(
exp (ηt )
1 + exp (ηt )
)Stage 2: Covariates, AR(1) component and random noise are
connected to likelihood by
ηt = β0 + β1xt + at + vt
Stage 3: ρ : Dependence parameter in AR(1) processσ2
a : Marginal variance in AR(1) processσ2
v : Variance of the unstructed error
www.ntnu.no G.-A. Fuglstad, Introduction
38
Loads of examples— Dynamic linear models— Stochastic volatility models— Generalised linear (mixed) models— Generalised additive (mixed) models— Measurement error models— Spline smoothing— Semi-parametric regression— Disease mapping— Log-Gaussian Cox-processes— Spatio-temporal models— Survival analysis— And more!
www.ntnu.no G.-A. Fuglstad, Introduction
39
Computations
Now we have a modelling framework.
But how are the calculations actually done?
It depends on what you want to compute!
www.ntnu.no G.-A. Fuglstad, Introduction
39
Computations
Now we have a modelling framework.
But how are the calculations actually done?
It depends on what you want to compute!
www.ntnu.no G.-A. Fuglstad, Introduction
40
What are we interested in?
— Quantiles for the fixed effects
— A linear combination of elements from the latent field (e.g. theaverage over an area of a spatial effect, the difference of twoeffects)
— A single hyperparameter (e.g. the range)— A non-linear combination of hyperparameters (breeding values
for livestock)— Predictions at unobserved locations
www.ntnu.no G.-A. Fuglstad, Introduction
40
What are we interested in?
— Quantiles for the fixed effects— A linear combination of elements from the latent field (e.g. the
average over an area of a spatial effect, the difference of twoeffects)
— A single hyperparameter (e.g. the range)— A non-linear combination of hyperparameters (breeding values
for livestock)— Predictions at unobserved locations
www.ntnu.no G.-A. Fuglstad, Introduction
40
What are we interested in?
— Quantiles for the fixed effects— A linear combination of elements from the latent field (e.g. the
average over an area of a spatial effect, the difference of twoeffects)
— A single hyperparameter (e.g. the range)
— A non-linear combination of hyperparameters (breeding valuesfor livestock)
— Predictions at unobserved locations
www.ntnu.no G.-A. Fuglstad, Introduction
40
What are we interested in?
— Quantiles for the fixed effects— A linear combination of elements from the latent field (e.g. the
average over an area of a spatial effect, the difference of twoeffects)
— A single hyperparameter (e.g. the range)— A non-linear combination of hyperparameters (breeding values
for livestock)
— Predictions at unobserved locations
www.ntnu.no G.-A. Fuglstad, Introduction
40
What are we interested in?
— Quantiles for the fixed effects— A linear combination of elements from the latent field (e.g. the
average over an area of a spatial effect, the difference of twoeffects)
— A single hyperparameter (e.g. the range)— A non-linear combination of hyperparameters (breeding values
for livestock)— Predictions at unobserved locations
www.ntnu.no G.-A. Fuglstad, Introduction
41
What do we need to compute?
Often interested in the marginal posteriors the latent field
π(xi |y)
or the marginal posteriors the hyperparameters
π(θi |y)
or the posterior of another statistics
π(f (x ,θ)|y)
However, these can almost never be computed analytically.
www.ntnu.no G.-A. Fuglstad, Introduction
41
What do we need to compute?
Often interested in the marginal posteriors the latent field
π(xi |y)
or the marginal posteriors the hyperparameters
π(θi |y)
or the posterior of another statistics
π(f (x ,θ)|y)
However, these can almost never be computed analytically.
www.ntnu.no G.-A. Fuglstad, Introduction
42
Traditional approach with MCMC
Construct Markov chains with the target posterior distribution asthe stationary distribution.— Extensively used for Bayesian inference since the 1980’s— It is flexible and general— There are generic tools such as JAGS or OpenBUGS— Or more specific tools for more specific models, e.g. BayesX
www.ntnu.no G.-A. Fuglstad, Introduction
43
Alternative approach
— MCMC “works” for everything, but it can be incredibly slow— Is it possible to make a quicker, more specialized inference
scheme which only needs to work for this limited class ofmodels?
www.ntnu.no G.-A. Fuglstad, Introduction
44
Our model framework
Latent Gaussian models:Stage 1: y |x ,θ ∼
∏i π(yi |ηi ,θ)
Stage 2: x |θ ∼ π(x |θ) ∼ N (0,Q(θ)−1) Gaussian!Stage 3: θ ∼ π(θ)
where the precision matrix Q(θ) is sparse. Generally these“sparse” Gaussian distributions are called Gaussian Markovrandom fields (GMRFs).
The sparseness can be exploited for very quick computations forthe Gaussian part of the model through numerical algorithms forsparse matrices.
www.ntnu.no G.-A. Fuglstad, Introduction
45
The INLA idea
Directly approximate the posterior marginals
π(xi | y) and π(θj | y)
using the posterior distribution
π(x ,θ | y) ∝ π(θ)π(x | θ)π(y | x ,θ).
www.ntnu.no G.-A. Fuglstad, Introduction
46
Toy example: SmoothingObservations
yi = m(i) + εi , i = 1, . . . ,n
— εi is i.i.d. Gaussian noise with known precision, τ0— m(i) is an unknown smooth function wrt i
1 n = 502 idx = 1:n3 fun = 100*((idx -n/2)/n)^34 y = fun + rnorm(n, mean
=0, sd=1)5 plot(idx , y)
0 10 20 30 40 50
−10
−5
05
10
idx
y
www.ntnu.no G.-A. Fuglstad, Introduction
47
Assumed hierarchical model1. Data: Gaussian observations with known precision
yi | xi , θ ∼ N (xi , τ0)
2. Latent model: A second-order random walk for the underlyingsmooth function1
π(x | θ) ∝ θ(n−2)/2 exp
(−θ
2
n∑i=3
(xi − 2xi−1 + xi−2)2
)3. Hyperparameter: The smoothing parameter θ is assigned a
Γ(a,b) prior
π(θ) ∝ θa−1 exp (−bθ) , θ > 0
1model="rw2"
www.ntnu.no G.-A. Fuglstad, Introduction
47
Assumed hierarchical model1. Data: Gaussian observations with known precision
yi | xi , θ ∼ N (xi , τ0)
2. Latent model: A second-order random walk for the underlyingsmooth function1
π(x | θ) ∝ θ(n−2)/2 exp
(−θ
2
n∑i=3
(xi − 2xi−1 + xi−2)2
)
3. Hyperparameter: The smoothing parameter θ is assigned aΓ(a,b) prior
π(θ) ∝ θa−1 exp (−bθ) , θ > 0
1model="rw2"
www.ntnu.no G.-A. Fuglstad, Introduction
47
Assumed hierarchical model1. Data: Gaussian observations with known precision
yi | xi , θ ∼ N (xi , τ0)
2. Latent model: A second-order random walk for the underlyingsmooth function1
π(x | θ) ∝ θ(n−2)/2 exp
(−θ
2
n∑i=3
(xi − 2xi−1 + xi−2)2
)3. Hyperparameter: The smoothing parameter θ is assigned a
Γ(a,b) prior
π(θ) ∝ θa−1 exp (−bθ) , θ > 0
1model="rw2"
www.ntnu.no G.-A. Fuglstad, Introduction
48
Derivation of posterior marginals (I)
Sincex ,y | θ ∼ N (·, ·)
(derived using π(x ,y | θ) ∝ π(y | x , θ) π(x | θ)),we can compute (numerically) all marginals, using that
π(θ | y) ∝
Gaussian︷ ︸︸ ︷π(x ,y | θ) π(θ)
π(x | y , θ)︸ ︷︷ ︸Gaussian
www.ntnu.no G.-A. Fuglstad, Introduction
48
Derivation of posterior marginals (I)
Sincex ,y | θ ∼ N (·, ·)
(derived using π(x ,y | θ) ∝ π(y | x , θ) π(x | θ)),we can compute (numerically) all marginals, using that
π(θ | y) ∝
Gaussian︷ ︸︸ ︷π(x ,y | θ) π(θ)
π(x | y , θ)︸ ︷︷ ︸Gaussian
www.ntnu.no G.-A. Fuglstad, Introduction
49
Posterior marginal for hyperparameter
1 2 3 4 5 6 7
0e
+0
04
e−
19
8e
−1
9
Posterior marginal for theta
Log(theta)
Un
no
rma
lise
d d
en
sity
1 2 3 4 5 6 70
.00
.40
.81
.2
Posterior marginal for theta
Log(theta)
De
nsity
www.ntnu.no G.-A. Fuglstad, Introduction
50
Derivation of posterior marginals (II)
Fromx | y , θ ∼ N (·, ·)
we can compute
π(xi | y) =
∫π(xi | θ,y)︸ ︷︷ ︸
Gaussian
π(θ | y) dθ
≈∑
k
π(xi | θk ,y)π(θk | y)∆k
where θk , k = 1, . . . ,K , correspond to representative points of θ | yand ∆k are the corresponding weights.
www.ntnu.no G.-A. Fuglstad, Introduction
50
Derivation of posterior marginals (II)
Fromx | y , θ ∼ N (·, ·)
we can compute
π(xi | y) =
∫π(xi | θ,y)︸ ︷︷ ︸
Gaussian
π(θ | y) dθ
≈∑
k
π(xi | θk ,y)π(θk | y)∆k
where θk , k = 1, . . . ,K , correspond to representative points of θ | yand ∆k are the corresponding weights.
www.ntnu.no G.-A. Fuglstad, Introduction
51
Posterior marginal for latent parameters
−13 −12 −11 −10 −9 −8 −7 −6
0.0
0.2
0.4
0.6
0.8
Posterior marginal for x_1 for each theta (unweighted)
x_1
Density
www.ntnu.no G.-A. Fuglstad, Introduction
52
Posterior marginal for latent parameters
−13 −12 −11 −10 −9 −8 −7 −6
0.0
00.0
50.1
00.1
50.2
0
Posterior marginal for x_1 for each theta (weighted)
x_1
Density
www.ntnu.no G.-A. Fuglstad, Introduction
53
Posterior marginal for latent parameters
−13 −12 −11 −10 −9 −8 −7 −6
0.0
0.2
0.4
0.6
0.8
Posterior marginal for x_1
x_1
Density
www.ntnu.no G.-A. Fuglstad, Introduction
54
Fitted splineThe posterior marginals are used to calculate summary statistics,like means, variances and credible intervals:
0 10 20 30 40 50
−10
−5
05
10
idx
www.ntnu.no G.-A. Fuglstad, Introduction
55
Comparison with maximum likelihood
The red line is the Bayesian posterior, the blue line is the “posterior”using the MLE of θ, and the vertical line is the observed value y1
−13 −12 −11 −10 −9 −8 −7 −6
0.0
0.2
0.4
0.6
x_1
Density
www.ntnu.no G.-A. Fuglstad, Introduction
56
Extensions
This is the simple basic idea behind INLA
However, we need to extend this basic idea to deal with— More than one hyperparameter— Non-Gaussian observations
www.ntnu.no G.-A. Fuglstad, Introduction
56
Extensions
This is the simple basic idea behind INLA
However, we need to extend this basic idea to deal with— More than one hyperparameter— Non-Gaussian observations
www.ntnu.no G.-A. Fuglstad, Introduction
57
Extension: More than onehyperparameter
Step 1 Explore π(θ|y)— Locate the mode— Use the Hessian to construct new variables— Grid-search
Step 2 Approximate marginals based on these integrationpoints
www.ntnu.no G.-A. Fuglstad, Introduction
57
Extension: More than onehyperparameter
Step 1 Explore π(θ|y)— Locate the mode— Use the Hessian to construct new variables— Grid-search
Step 2 Approximate marginals based on these integrationpoints
www.ntnu.no G.-A. Fuglstad, Introduction
57
Extension: More than onehyperparameter
Step 1 Explore π(θ|y)— Locate the mode— Use the Hessian to construct new variables— Grid-search
Step 2 Approximate marginals based on these integrationpoints
www.ntnu.no G.-A. Fuglstad, Introduction
57
Extension: More than onehyperparameter
Step 1 Explore π(θ|y)— Locate the mode— Use the Hessian to construct new variables— Grid-search
Step 2 Approximate marginals based on these integrationpoints
www.ntnu.no G.-A. Fuglstad, Introduction
58
Non-Gaussian observations
— π(x |y ,θ) is often very close to a Gaussian distribution evenwith a non-Gaussian likelihood, and can be replaced with aLaplace approximation
— All the difficult high-dimensional integrals w.r.t. the latent fieldare easy, and only the integrals w.r.t. the hyperparametersremain
— The integrals can be done efficiently numerically when thenumber of hyperparameters is low
www.ntnu.no G.-A. Fuglstad, Introduction
59
Limitations
— The dimension of the latent field x can be large (102–106)— But the dimension of the hyperparameters θ must be small
(≤ 9)
In other words, each random effect can be big, but there cannot betoo many random effects unless they share parameters.
www.ntnu.no G.-A. Fuglstad, Introduction
60
How to use INLA?
INLA is implemented through the package INLA in the R softwarewhich— is the most popular computing language in applied statistics— is open source and free— has a lot of packages that extend the base functionality— has a very user friendly formula interface
linear_model <- lm(weight ~ group)
Fits the linear model
weighti = µ+ groupi + εi
www.ntnu.no G.-A. Fuglstad, Introduction
61
Summary of INLA
Three main ingredients in INLA— Latent Gaussian models— Laplace approximations— Gaussian Markov random fields
These ingredients leads to a very nice tool for Bayesian inference:— fast— accurate— scales well for moderate sizes
www.ntnu.no G.-A. Fuglstad, Introduction