fitting covariance and multioutput gaussian...

198
Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPSS 16th September 2014

Upload: hanhu

Post on 21-Sep-2018

246 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Fitting Covariance and Multioutput GaussianProcesses

Neil D. Lawrence

GPSS16th September 2014

Page 2: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Outline

Parametric Models are a Bottleneck

Constructing Covariance

GP Limitations

Kalman Filter

Page 3: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Outline

Parametric Models are a Bottleneck

Constructing Covariance

GP Limitations

Kalman Filter

Page 4: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Nonparametric Gaussian Processes

I We’ve seen how we go from parametric to non-parametric.I The limit implies infinite dimensional w.I Gaussian processes are generally non-parametric: combine

data with covariance function to get model.I This representation cannot be summarized by a parameter

vector of a fixed size.

Page 5: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

The Parametric Bottleneck

I Parametric models have a representation that does notrespond to increasing training set size.

I Bayesian posterior distributions over parameters containthe information about the training data.

I Use Bayes’ rule from training data, p(w|y,X

),

I Make predictions on test data

p(y∗|X∗,y,X

)=

∫p(y∗|w,X∗

)p(w|y,X)dw

).

I w becomes a bottleneck for information about the trainingset to pass to the test set.

I Solution: increase m so that the bottleneck is so large that itno longer presents a problem.

I How big is big enough for m? Non-parametrics saysm→∞.

Page 6: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

The Parametric Bottleneck

I Now no longer possible to manipulate the model throughthe standard parametric form.

I However, it is possible to express parametric as GPs:

k(xi, x j

)= φ: (xi)

> φ:

(x j

).

I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full

rank covariance matrices.I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.

Page 7: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

The Parametric Bottleneck

I Now no longer possible to manipulate the model throughthe standard parametric form.

I However, it is possible to express parametric as GPs:

k(xi, x j

)= φ: (xi)

> φ:

(x j

).

I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full

rank covariance matrices.I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.

Page 8: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

The Parametric Bottleneck

I Now no longer possible to manipulate the model throughthe standard parametric form.

I However, it is possible to express parametric as GPs:

k(xi, x j

)= φ: (xi)

> φ:

(x j

).

I These are known as degenerate covariance matrices.

I Their rank is at most m, non-parametric models have fullrank covariance matrices.

I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.

Page 9: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

The Parametric Bottleneck

I Now no longer possible to manipulate the model throughthe standard parametric form.

I However, it is possible to express parametric as GPs:

k(xi, x j

)= φ: (xi)

> φ:

(x j

).

I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full

rank covariance matrices.

I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.

Page 10: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

The Parametric Bottleneck

I Now no longer possible to manipulate the model throughthe standard parametric form.

I However, it is possible to express parametric as GPs:

k(xi, x j

)= φ: (xi)

> φ:

(x j

).

I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full

rank covariance matrices.I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.

Page 11: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Making Predictions

I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

I In GPs this involves combining the training data with thecovariance function and the mean function.

I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.

I Complexity of parametric model remains fixed regardlessof the size of our training data set.

I For a non-parametric model the required number ofparameters grows with the size of the training data.

Page 12: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Making Predictions

I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

I In GPs this involves combining the training data with thecovariance function and the mean function.

I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.

I Complexity of parametric model remains fixed regardlessof the size of our training data set.

I For a non-parametric model the required number ofparameters grows with the size of the training data.

Page 13: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Making Predictions

I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

I In GPs this involves combining the training data with thecovariance function and the mean function.

I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.

I Complexity of parametric model remains fixed regardlessof the size of our training data set.

I For a non-parametric model the required number ofparameters grows with the size of the training data.

Page 14: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Making Predictions

I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

I In GPs this involves combining the training data with thecovariance function and the mean function.

I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.

I Complexity of parametric model remains fixed regardlessof the size of our training data set.

I For a non-parametric model the required number ofparameters grows with the size of the training data.

Page 15: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Making Predictions

I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

I In GPs this involves combining the training data with thecovariance function and the mean function.

I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.

I Complexity of parametric model remains fixed regardlessof the size of our training data set.

I For a non-parametric model the required number ofparameters grows with the size of the training data.

Page 16: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance Functions and Mercer Kernels

I Mercer Kernels and Covariance Functions are similar.

I the kernel perspective does not make a probabilisticinterpretation of the covariance function.

I Algorithms can be simpler, but probabilistic interpretationis crucial for kernel parameter optimization.

Page 17: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance Functions and Mercer Kernels

I Mercer Kernels and Covariance Functions are similar.I the kernel perspective does not make a probabilistic

interpretation of the covariance function.

I Algorithms can be simpler, but probabilistic interpretationis crucial for kernel parameter optimization.

Page 18: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance Functions and Mercer Kernels

I Mercer Kernels and Covariance Functions are similar.I the kernel perspective does not make a probabilistic

interpretation of the covariance function.I Algorithms can be simpler, but probabilistic interpretation

is crucial for kernel parameter optimization.

Page 19: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Outline

Parametric Models are a Bottleneck

Constructing Covariance

GP Limitations

Kalman Filter

Page 20: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Constructing Covariance Functions

I Sum of two covariances is also a covariance function.

k(x, x′) = k1(x, x′) + k2(x, x′)

Page 21: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Constructing Covariance Functions

I Product of two covariances is also a covariance function.

k(x, x′) = k1(x, x′)k2(x, x′)

Page 22: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Multiply by Deterministic Function

I If f (x) is a Gaussian process.I g(x) is a deterministic function.I h(x) = f (x)g(x)I Then

kh(x, x′) = g(x)k f (x, x′)g(x′)

where kh is covariance for h(·) and k f is covariance for f (·).

Page 23: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance Functions

MLP Covariance Function

k (x, x′) = αasin(

wx>x′ + b√

wx>x + b + 1√

wx′>x′ + b + 1

)

I Based on infinite neuralnetwork model.

w = 40

b = 4

Page 24: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance Functions

MLP Covariance Function

k (x, x′) = αasin(

wx>x′ + b√

wx>x + b + 1√

wx′>x′ + b + 1

)

I Based on infinite neuralnetwork model.

w = 40

b = 4

Page 25: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance Functions

Linear Covariance Function

k (x, x′) = αx>x′

I Bayesian linearregression.

α = 1

Page 26: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance Functions

Linear Covariance Function

k (x, x′) = αx>x′

I Bayesian linearregression.

α = 1

Page 27: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 28: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 29: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 30: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 31: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 32: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 33: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 34: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x)

x

Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).

Page 35: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Noise

I Gaussian noise model,

p(yi| fi

)= N

(yi| fi, σ2

)where σ2 is the variance of the noise.

I Equivalent to a covariance function of the form

k(xi, x j) = δi, jσ2

where δi, j is the Kronecker delta function.I Additive nature of Gaussians means we can simply add

this term to existing covariance matrices.

Page 36: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 37: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 38: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 39: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 40: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 41: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 42: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 43: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 44: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure : Examples include WiFi localization, C14 callibration curve.

Page 45: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

General Noise Models

Graph of a GPI Relates input variables,

X, to vector, y, through fgiven kernel parametersθ.

I Plate notation indicatesindependence of yi| fi.

I In general p(yi| fi

)is

non-Gaussian.I We approximate with

Gaussianp(yi| fi

)≈ N

(mi| fi, β−1

i

).

yi

X

fi

θ

i = 1 . . . n

Figure : The Gaussian processdepicted graphically.

Page 46: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Noise

0

1

2

-3 -2 -1 0 1 2 3 4

p(

f∗|X, x∗,y)

Figure : Inclusion of a data point with Gaussian noise.

Page 47: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Noise

0

1

2

-3 -2 -1 0 1 2 3 4

p(

f∗|X, x∗,y)

p(y∗ = 0.6| f∗

)

Figure : Inclusion of a data point with Gaussian noise.

Page 48: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian Noise

0

1

2

-3 -2 -1 0 1 2 3 4

p(

f∗|X, x∗,y)

p(y∗ = 0.6| f∗

)p(

f∗|X, x∗,y, y∗)

Figure : Inclusion of a data point with Gaussian noise.

Page 49: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Expectation Propagation

Local Moment Matching

I Easiest to consider a single previously unseen data point,y∗, x∗.

I Before seeing data point, prediction of f∗ is a GP, q(

f∗|y,X).

I Update prediction using Bayes’ Rule,

p(

f∗|y, y∗,X, x∗)

=p(y∗| f∗

)p(

f∗|y,X, x∗)

p(y, y∗|X, x∗

) .

This posterior is not a Gaussian process if p(y∗| f∗

)is

non-Gaussian.

Page 50: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Classification Noise Model

Probit Noise Model

0

0.5

1

-4 -2 0 2 4

p(y i|f

i)

fi

yi = −1 yi = 1

Figure : The probit model (classification). The plot shows p(yi| fi

)for

different values of yi. For yi = 1 we have

p(yi| fi

)= φ

(fi)

=∫ fi−∞N (z|0, 1) dz.

Page 51: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Expectation Propagation II

Match Moments

I Idea behind EP — approximate with a Gaussian process atthis stage by matching moments.

I This is equivalent to minimizing the following KLdivergence where q

(f∗|y, y∗,X, x∗

)is constrained to be a GP.

q(

f∗|y, y∗X, x∗)

= argminq( f∗ |y,y∗X,x∗)KL(p(

f∗|y, y∗X, x∗)||q

(f∗|y, y∗,X, x∗

))I This is equivalent to setting⟨

f∗⟩

q( f∗|y,y∗,X,x∗) =⟨

f∗⟩

p( f∗|y,y∗,X,x∗)⟨f 2∗

⟩q( f∗|y,y∗,X,x∗)

=⟨

f 2∗

⟩p( f∗|y,y∗,X,x∗)

Page 52: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Expectation Propagation III

Equivalent Gaussian

I This is achieved by replacing p(y∗| f∗

)with a Gaussian

distribution

p(

f∗|y, y∗,X, x∗)

=p(y∗| f∗

)p(

f∗|y,X, x∗)

p(y, y∗|X, x∗

)becomes

q(

f∗|y, y∗,X, x∗)

=N

(m∗| f∗, β−1

m

)p(

f∗|y,X, x∗)

p(y, y∗|X, x∗

) .

Page 53: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Classification

0

1

2

3

-3 -2 -1 0 1 2 3

p(

f∗|X, x∗,y)

Figure : An EP style update with a classification noise model.

Page 54: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Classification

0

1

2

3

-3 -2 -1 0 1 2 3

p(

f∗|X, x∗,y)

p(y∗ = 1| f∗

)

Figure : An EP style update with a classification noise model.

Page 55: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Classification

0

1

2

3

-3 -2 -1 0 1 2 3

p(

f∗|X, x∗,y)

p(y∗ = 1| f∗

)p(

f∗|X, x∗,y, y∗)

Figure : An EP style update with a classification noise model.

Page 56: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Classification

0

1

2

3

-3 -2 -1 0 1 2 3

p(

f∗|X, x∗,y)

p(y∗ = 1| f∗

)p(

f∗|X, x∗,y, y∗)

q(

f∗|X, x∗,y)

Figure : An EP style update with a classification noise model.

Page 57: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Ordinal Noise Model

Ordered Categories

0

0.5

1

-4 -2 0 2 4

p(y i|f

i)

fi

yi = −1 yi = 1yi = 0

Figure : The ordered categorical noise model (ordinal regression).The plot shows p

(yi| fi

)for different values of yi. Here we have

assumed three categories.

Page 58: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Laplace Approximation

I Equivalent Gaussian is found by making a local 2nd orderTaylor approximation at the mode.

I Laplace was the first to suggest this1, so it’s known as theLaplace approximation.

Page 59: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine covariance parameters from the data?

N(y|0,K

)=

1

(2π)n2 |K|

12exp

(−

y>K−1y2

)

The parameters are inside the covariancefunction (matrix).

ki, j = k(xi, x j;θ)

Page 60: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine covariance parameters from the data?

N(y|0,K

)=

1

(2π)n2 |K|

12exp

(−

y>K−1y2

)

The parameters are inside the covariancefunction (matrix).

ki, j = k(xi, x j;θ)

Page 61: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine covariance parameters from the data?

logN(y|0,K

)=−

12

log |K|−y>K−1y

2−

n2

log 2π

The parameters are inside the covariancefunction (matrix).

ki, j = k(xi, x j;θ)

Page 62: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine covariance parameters from the data?

E(θ) =12

log |K| +y>K−1y

2

The parameters are inside the covariancefunction (matrix).

ki, j = k(xi, x j;θ)

Page 63: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Eigendecomposition of Covariance

A useful decomposition for understanding the objectivefunction.

K = RΛ2R>

λ1λ2

Diagonal of Λ represents distancealong axes.R gives a rotation of these axes.

where Λ is a diagonal matrix and R>R = I.

Useful representation since |K| =∣∣∣Λ2

∣∣∣ = |Λ|2.

Page 64: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

λ1 0

0 λ2

λ1

Λ =

Page 65: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

λ1 0

0 λ2

λ1

λ2Λ =

Page 66: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

λ1 0

0 λ2

λ1

λ2Λ =

Page 67: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2Λ =

Page 68: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2Λ =

Page 69: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2 |Λ|Λ =

Page 70: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0 0

0 λ2 0

0 0 λ3λ1

λ2 |Λ|Λ =

Page 71: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

|Λ| = λ1λ2λ3

λ1 0 0

0 λ2 0

0 0 λ3λ1

λ2

λ3

|Λ|Λ =

Page 72: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2 |Λ|Λ =

Page 73: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Capacity control: log |K|

|RΛ| = λ1λ2

w1,1 w1,2

w2,1 w2,2

λ1λ2

|Λ|RΛ =

Page 74: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Data Fit: y>K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1

λ2

Page 75: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Data Fit: y>K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1

λ2

Page 76: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Data Fit: y>K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1λ2

Page 77: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 78: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 79: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 80: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 81: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 82: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 83: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 84: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 85: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x)

x

-10-505

101520

10−1 100 101

length scale, `

E(θ) =12

log |K| +y>K−1y

2

Page 86: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gene Expression Example

I Given given expression levels in the form of a time seriesfrom Della Gatta et al. (2008).

I Want to detect if a gene is expressed or not, fit a GP to eachgene (Kalaitzis and Lawrence, 2011).

Page 87: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

RESEARCH ARTICLE Open Access

A Simple Approach to Ranking DifferentiallyExpressed Gene Expression Time Courses throughGaussian Process RegressionAlfredo A Kalaitzis* and Neil D Lawrence*

Abstract

Background: The analysis of gene expression from time series underpins many biological studies. Two basic formsof analysis recur for data of this type: removing inactive (quiet) genes from the study and determining whichgenes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data isdrawn from a time series. In this paper we propose a simple model for accounting for the underlying temporalnature of the data based on a Gaussian process.

Results: We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in geneexpression time-series. We present a simple approach which can be used to filter quiet genes, or for the case oftime series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankingsproduced by our regression framework and compare them to a recently proposed hierarchical Bayesian model forthe analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showingthat the proposed approach considerably outperforms the current state of the art.

Conclusions: Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis ofmicroarray time series. The Gaussian process framework offers a natural way of handling biological replicates andmissing values and provides confidence intervals along the estimated curves of gene expression. Therefore, webelieve Gaussian processes should be a standard tool in the analysis of gene expression time series.

BackgroundGene expression profiles give a snapshot of mRNA con-centration levels as encoded by the genes of an organ-ism under given experimental conditions. Early studiesof this data often focused on a single point in timewhich biologists assumed to be critical along the generegulation process after the perturbation. However, thestatic nature of such experiments severely restricts theinferences that can be made about the underlying dyna-mical system.With the decreasing cost of gene expression microar-

rays time series experiments have become commonplacegiving a far broader picture of the gene regulation pro-cess. Such time series are often irregularly sampled andmay involve differing numbers of replicates at each timepoint [1]. The experimental conditions under which

gene expression measurements are taken cannot be per-fectly controlled leading the signals of interest to be cor-rupted by noise, either of biological origin or arisingthrough the measurement process.Primary analysis of gene expression profiles is often

dominated by methods targeted at static experiments, i.e. gene expression measured on a single time-point, thattreat time as an additional experimental factor [1-6].However, were possible, it would seem sensible to con-sider methods that can account for the special nature oftime course data. Such methods can take advantage ofthe particular statistical constraints that are imposed ondata that is naturally ordered [7-12].The analysis of gene expression microarray time-series

has been a stepping stone to important problems in sys-tems biology such as the genome-wide identification ofdirect targets of transcription factors [13,14] and the fullreconstruction of gene regulatory networks [15,16]. Amore comprehensive review on the motivations and

* Correspondence: [email protected]; [email protected] Sheffield Institute for Translational Neuroscience, 385A Glossop Road,Sheffield, S10 2HQ, UK

Kalaitzis and Lawrence BMC Bioinformatics 2011, 12:180http://www.biomedcentral.com/1471-2105/12/180

© 2011 Kalaitzis and Lawrence; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Page 88: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

-2.5-2

-1.5-1

-0.50

0.51

1 1.5 2 2.5 3 3.5

log 10

SNR

log10 length scale

Contour plot of Gaussian process likelihood.

Page 89: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

-2.5-2

-1.5-1

-0.50

0.51

1 1.5 2 2.5 3 3.5

log 10

SNR

log10 length scale

-1

-0.5

0

0.5

1

0 50100150200250300

y(x)

x

Optima: length scale of 1.2221 and log10 SNR of 1.9654 loglikelihood is -0.22317.

Page 90: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

-2.5-2

-1.5-1

-0.50

0.51

1 1.5 2 2.5 3 3.5

log 10

SNR

log10 length scale

-1

-0.5

0

0.5

1

0 50100150200250300

y(x)

x

Optima: length scale of 1.5162 and log10 SNR of 0.21306 loglikelihood is -0.23604.

Page 91: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

-2.5-2

-1.5-1

-0.50

0.51

1 1.5 2 2.5 3 3.5

log 10

SNR

log10 length scale

-0.8-0.6-0.4-0.2

00.20.40.60.8

0 50100150200250300

y(x)

x

Optima: length scale of 2.9886 and log10 SNR of -4.506 loglikelihood is -2.1056.

Page 92: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Outline

Parametric Models are a Bottleneck

Constructing Covariance

GP Limitations

Kalman Filter

Page 93: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Limitations of Gaussian Processes

I Inference is O(n3) due to matrix inverse (in practice useCholesky).

I Gaussian processes don’t deal well with discontinuities(financial crises, phosphorylation, collisions, edges inimages).

I Widely used exponentiated quadratic covariance (RBF) canbe too smooth in practice (but there are manyalternatives!!).

Page 94: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Outline

Parametric Models are a Bottleneck

Constructing Covariance

GP Limitations

Kalman Filter

Page 95: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Simple Markov Chain

I Assume 1-d latent state, a vector over time, x = [x1 . . . xT].I Markov property,

xi =xi−1 + εi,

εi ∼N (0, α)

=⇒ xi ∼N (xi−1, α)

I Initial state,x0 ∼ N (0, α0)

I If x0 ∼ N (0, α) we have a Markov chain for the latent states.I Markov chain it is specified by an initial distribution

(Gaussian) and a transition distribution (Gaussian).

Page 96: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x0 = 0.000, ε1 = −2.24

x1 = 0.000 − 2.24 = −2.24

Page 97: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x1 = −2.24, ε2 = 0.457

x2 = −2.24 + 0.457 = −1.78

Page 98: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x2 = −1.78, ε3 = 0.178

x3 = −1.78 + 0.178 = −1.6

Page 99: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x3 = −1.6, ε4 = −0.292

x4 = −1.6 − 0.292 = −1.89

Page 100: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x4 = −1.89, ε5 = −0.501

x5 = −1.89 − 0.501 = −2.39

Page 101: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x5 = −2.39, ε6 = 1.32

x6 = −2.39 + 1.32 = −1.08

Page 102: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x6 = −1.08, ε7 = 0.989

x7 = −1.08 + 0.989 = −0.0881

Page 103: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x7 = −0.0881, ε8 = −0.842

x8 = −0.0881 − 0.842 = −0.93

Page 104: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gauss Markov Chain

-4

-2

0

2

4

0 1 2 3 4 5 6 7 8 9

x

t

x0 = 0, εi ∼ N (0, 1)

x8 = −0.93, ε9 = −0.41

x9 = −0.93 − 0.410 = −1.34

Page 105: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Multivariate Gaussian Properties: Reminder

Ifz ∼ N

(µ,C

)and

x = Wz + b

thenx ∼ N

(Wµ + b,WCW>

)

Page 106: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Multivariate Gaussian Properties: Reminder

Simplified: Ifz ∼ N

(0, σ2I

)and

x = Wz

thenx ∼ N

(0, σ2WW>

)

Page 107: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Matrix Representation of Latent Variables

x1

x2

x3

x4

x5

ε1

ε2

ε3

ε4

ε5

1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1

×=

x1 = ε1

Page 108: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Matrix Representation of Latent Variables

x1

x2

x3

x4

x5

ε1

ε2

ε3

ε4

ε5

1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1

×=

x2 = ε1 + ε2

Page 109: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Matrix Representation of Latent Variables

x1

x2

x3

x4

x5

ε1

ε2

ε3

ε4

ε5

1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1

×=

x3 = ε1 + ε2 + ε3

Page 110: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Matrix Representation of Latent Variables

x1

x2

x3

x4

x5

ε1

ε2

ε3

ε4

ε5

1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1

×=

x4 = ε1 + ε2 + ε3 + ε4

Page 111: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Matrix Representation of Latent Variables

x1

x2

x3

x4

x5

ε1

ε2

ε3

ε4

ε5

1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1

×=

x5 = ε1 + ε2 + ε3 + ε4 + ε5

Page 112: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Matrix Representation of Latent Variables

x εL1 ×=

Page 113: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Multivariate Process

I Since x is linearly related to εwe know x is a Gaussianprocess.

I Trick: we only need to compute the mean and covarianceof x to determine that Gaussian.

Page 114: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Mean

x = L1ε

Page 115: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Mean

〈x〉 = 〈L1ε〉

Page 116: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Mean

〈x〉 = L1 〈ε〉

Page 117: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Mean

〈x〉 = L1 〈ε〉

ε ∼ N (0, αI)

Page 118: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Mean

〈x〉 = L10

Page 119: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Mean

〈x〉 = 0

Page 120: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Covariance

xx> = L1εε>L>1x> = ε>L>

Page 121: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Covariance

⟨xx>

⟩=

⟨L1εε>L>1

Page 122: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Covariance

⟨xx>

⟩= L1

⟨εε>

⟩L>1

Page 123: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Covariance

⟨xx>

⟩= L1

⟨εε>

⟩L>1

ε ∼ N (0, αI)

Page 124: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process Covariance

⟨xx>

⟩= αL1L>1

Page 125: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process

x = L1ε

Page 126: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process

x = L1ε

ε ∼ N (0, αI)

Page 127: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process

x = L1ε

ε ∼ N (0, αI)

=⇒

Page 128: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Latent Process

x = L1ε

ε ∼ N (0, αI)

=⇒

x ∼ N(0, αL1L>1

)

Page 129: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance for Latent Process II

I Make the variance dependent on time interval.I Assume variance grows linearly with time.I Justification: sum of two Gaussian distributed random

variables is distributed as Gaussian with sum of variances.I If variable’s movement is additive over time (as described)

variance scales linearly with time.

Page 130: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance for Latent Process II

I Givenε ∼ N (0, αI) =⇒ ε ∼ N

(0, αL1L>1

).

Thenε ∼ N (0,∆tαI) =⇒ ε ∼ N

(0,∆tαL1L>1

).

where ∆t is the time interval between observations.

Page 131: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance for Latent Process II

ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1

)

K = α∆tL1L>1

ki, j = α∆tl>:,il:, j

where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.

ki, j = α∆t min(i, j)

define ∆ti = ti so

ki, j = αmin(ti, t j) = k(ti, t j)

Page 132: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance for Latent Process II

ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1

)K = α∆tL1L>1

ki, j = α∆tl>:,il:, j

where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.

ki, j = α∆t min(i, j)

define ∆ti = ti so

ki, j = αmin(ti, t j) = k(ti, t j)

Page 133: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance for Latent Process II

ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1

)K = α∆tL1L>1

ki, j = α∆tl>:,il:, j

where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.

ki, j = α∆t min(i, j)

define ∆ti = ti so

ki, j = αmin(ti, t j) = k(ti, t j)

Page 134: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance for Latent Process II

ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1

)K = α∆tL1L>1

ki, j = α∆tl>:,il:, j

where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.

ki, j = α∆t min(i, j)

define ∆ti = ti so

ki, j = αmin(ti, t j) = k(ti, t j)

Page 135: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Markov Process

k (t, t′) = αmin(t, t′)

I Covariance matrix isbuilt using the inputs tothe function t.

Page 136: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Markov Process

k (t, t′) = αmin(t, t′)

I Covariance matrix isbuilt using the inputs tothe function t.

-3-2-10123

0 0.5 1 1.5 2

Page 137: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Markov Process

Visualization of inverse covariance (precision).

I Precision matrix issparse: only neighboursin matrix are non-zero.

I This reflects conditionalindependencies in data.

I In this case Markovstructure.

Page 138: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

k (x, x′) = α exp

−‖x − x′‖222`2

I Covariance matrix is

built using the inputs tothe function x.

I For the example above itwas based on Euclideandistance.

I The covariance functionis also know as a kernel.

Page 139: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

k (x, x′) = α exp

−‖x − x′‖222`2

I Covariance matrix is

built using the inputs tothe function x.

I For the example above itwas based on Euclideandistance.

I The covariance functionis also know as a kernel.

Page 140: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Exponentiated Quadratic

Visualization of inverse covariance (precision).

I Precision matrix is notsparse.

I Each point is dependenton all the others.

I In this casenon-Markovian.

Page 141: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Markov Process

Visualization of inverse covariance (precision).

I Precision matrix issparse: only neighboursin matrix are non-zero.

I This reflects conditionalindependencies in data.

I In this case Markovstructure.

Page 142: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Simple Kalman Filter I

I We have state vector X =[x1 . . . xq

]∈ RT×q and if each state

evolves independently we have

p(X) =

q∏i=1

p(x:,i)

p(x:,i) = N(x:,i|0,K

).

I We want to obtain outputs through:

yi,: = Wxi,:

Page 143: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Stacking and Kronecker Products I

I Represent with a ‘stacked’ system:

p(x) = N (x|0, I ⊗K)

where the stacking is placing each column of X one on topof another as

x =

x:,1x:,2...

x:,q

Page 144: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Kronecker Product

aK bKcK dK

Ka b

c d⊗ =

Page 145: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Kronecker Product

⊗ =

Page 146: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Stacking and Kronecker Products I

I Represent with a ‘stacked’ system:

p(x) = N (x|0, I ⊗K)

where the stacking is placing each column of X one on topof another as

x =

x:,1x:,2...

x:,q

Page 147: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Column Stacking

⊗ =

Page 148: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over time is given bythe block diagonals.

Page 149: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over time is given bythe block diagonals.

Page 150: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over time is given bythe block diagonals.

Page 151: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over time is given bythe block diagonals.

Page 152: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over time is given bythe block diagonals.

Page 153: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Two Ways of Stacking

Can also stack each row of X to form column vector:

x =

x1,:x2,:...

xT,:

p(x) = N (x|0,K ⊗ I)

Page 154: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Row Stacking

⊗ =

Page 155: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.

Page 156: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.

Page 157: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.

Page 158: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.

Page 159: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.

Page 160: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Observed Process

The observations are related to the latent points by a linearmapping matrix,

yi,: = Wxi,: + εi,:

ε ∼ N(0, σ2I

)

Page 161: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Mapping from Latent Process to Observed

Wx1,:

Wx2,:

Wx3,:

x1,:

x2,:

x3,:

W 0 0

0 W 0

0 0 W

× =

Page 162: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Output Covariance

This leads to a covariance of the form

(I ⊗W)(K ⊗ I)(I ⊗W>) + Iσ2

Using (A ⊗ B)(C ⊗D) = AC ⊗ BD This leads to

K ⊗WW> + Iσ2

ory ∼ N

(0,WW>

⊗K + Iσ2)

Page 163: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Kernels for Vector Valued Outputs: A Review

Foundations and TrendsR© inMachine LearningVol. 4, No. 3 (2011) 195–266c© 2012 M. A. Alvarez, L. Rosasco and N. D. LawrenceDOI: 10.1561/2200000036

Kernels for Vector-ValuedFunctions: A Review

By Mauricio A. Alvarez,

Lorenzo Rosasco and Neil D. Lawrence

Contents

1 Introduction 197

2 Learning Scalar Outputs

with Kernel Methods 200

2.1 A Regularization Perspective 200

2.2 A Bayesian Perspective 202

2.3 A Connection Between Bayesian

and Regularization Points of View 205

3 Learning Multiple Outputs with

Kernel Methods 207

3.1 Multi-output Learning 207

3.2 Reproducing Kernel for Vector-Valued Functions 209

3.3 Gaussian Processes for Vector-Valued Functions 211

4 Separable Kernels and Sum of Separable Kernels 213

4.1 Kernels and Regularizers 214

4.2 Coregionalization Models 217

4.3 Extensions 228

Page 164: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Kronecker Structure GPs

I This Kronecker structure leads to several publishedmodels.

(K(x, x′))d,d′ = k(x, x′)kT(d, d′),

where k has x and kT has n as inputs.I Can think of multiple output covariance functions as

covariances with augmented input.I Alongside x we also input the d associated with the output

of interest.

Page 165: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Separable Covariance Functions

I Taking B = WW> we have a matrix expression acrossoutputs.

K(x, x′) = k(x, x′)B,

where B is a p × p symmetric and positive semi-definitematrix.

I B is called the coregionalization matrix.I We call this class of covariance functions separable due to

their product structure.

Page 166: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Sum of Separable Covariance Functions

I In the same spirit a more general class of kernels is given by

K(x, x′) =

q∑j=1

k j(x, x′)B j.

I This can also be written as

K(X,X) =

q∑j=1

B j ⊗ k j(X,X),

I This is like several Kalman filter-type models addedtogether, but each one with a different set of latentfunctions.

I We call this class of kernels sum of separable kernels (SoSkernels).

Page 167: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Geostatistics

I Use of GPs in Geostatistics is called kriging.I These multi-output GPs pioneered in geostatistics:

prediction over vector-valued output data is known ascokriging.

I The model in geostatistics is known as the linear model ofcoregionalization (LMC, Journel and Huijbregts (1978);Goovaerts (1997)).

I Most machine learning multitask models can be placed inthe context of the LMC model.

Page 168: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Weighted sum of Latent Functions

I In the linear model of coregionalization (LMC) outputs areexpressed as linear combinations of independent randomfunctions.

I In the LMC, each component fd is expressed as a linear sum

fd(x) =

q∑j=1

wd, ju j(x).

where the latent functions are independent and havecovariance functions k j(x, x′).

I The processes f j(x)qj=1 are independent for q , j′.

Page 169: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Kalman Filter Special Case

I The Kalman filter is an example of the LMC whereui(x)→ xi(t).

I I.e. we’ve moved form time input to a more general inputspace.

I In matrix notation:1. Kalman filter

F = WX

2. LMCF = WU

where the rows of these matrices F, X, U each contain qsamples from their corresponding functions at a differenttime (Kalman filter) or spatial location (LMC).

Page 170: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

I If one covariance used for latent functions (like in Kalmanfilter).

I This is called the intrinsic coregionalization model (ICM,Goovaerts (1997)).

I The kernel matrix corresponding to a dataset X takes theform

K(X,X) = B ⊗ k(X,X).

Page 171: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Autokrigeability

I If outputs are noise-free, maximum likelihood isequivalent to independent fits of B and k(x, x′) (Helterbrandand Cressie, 1994).

I In geostatistics this is known as autokrigeability(Wackernagel, 2003).

I In multitask learning its the cancellation of intertasktransfer (Bonilla et al., 2008).

Page 172: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = ww> ⊗ k(X,X).

w =

[15

]B =

[1 55 25

]

Page 173: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = ww> ⊗ k(X,X).

w =

[15

]B =

[1 55 25

]

Page 174: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = ww> ⊗ k(X,X).

w =

[15

]B =

[1 55 25

]

Page 175: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = ww> ⊗ k(X,X).

w =

[15

]B =

[1 55 25

]

Page 176: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = ww> ⊗ k(X,X).

w =

[15

]B =

[1 55 25

]

Page 177: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = B ⊗ k(X,X).

B =

[1 0.5

0.5 1.5

]

Page 178: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = B ⊗ k(X,X).

B =

[1 0.5

0.5 1.5

]

Page 179: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = B ⊗ k(X,X).

B =

[1 0.5

0.5 1.5

]

Page 180: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = B ⊗ k(X,X).

B =

[1 0.5

0.5 1.5

]

Page 181: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Intrinsic Coregionalization Model

K(X,X) = B ⊗ k(X,X).

B =

[1 0.5

0.5 1.5

]

Page 182: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

LMC Samples

K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)

B1 =

[1.4 0.50.5 1.2

]`1 = 1

B2 =

[1 0.5

0.5 1.3

]`2 = 0.2

Page 183: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

LMC Samples

K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)

B1 =

[1.4 0.50.5 1.2

]`1 = 1

B2 =

[1 0.5

0.5 1.3

]`2 = 0.2

Page 184: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

LMC Samples

K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)

B1 =

[1.4 0.50.5 1.2

]`1 = 1

B2 =

[1 0.5

0.5 1.3

]`2 = 0.2

Page 185: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

LMC Samples

K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)

B1 =

[1.4 0.50.5 1.2

]`1 = 1

B2 =

[1 0.5

0.5 1.3

]`2 = 0.2

Page 186: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

LMC Samples

K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)

B1 =

[1.4 0.50.5 1.2

]`1 = 1

B2 =

[1 0.5

0.5 1.3

]`2 = 0.2

Page 187: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

LMC in Machine Learning and Statistics

I Used in machine learning for GPs for multivariateregression and in statistics for computer emulation ofexpensive multivariate computer codes.

I Imposes the correlation of the outputs explicitly throughthe set of coregionalization matrices.

I Setting B = Ip assumes outputs are conditionallyindependent given the parameters θ. (Minka and Picard,1997; Lawrence and Platt, 2004; Yu et al., 2005).

I More recent approaches for multiple output modeling aredifferent versions of the linear model of coregionalization.

Page 188: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Semiparametric Latent Factor Model

I Coregionalization matrices are rank 1 Teh et al. (2005).rewrite equation (??) as

K(X,X) =

q∑j=1

w:, jw>:, j ⊗ k j(X,X).

I Like the Kalman filter, but each latent function has adifferent covariance.

I Authors suggest using an exponentiated quadraticcharacteristic length-scale for each input dimension.

Page 189: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Semiparametric Latent Factor Model Samples

K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)

w1 =

[0.51

]w2 =

[1

0.5

]

Page 190: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Semiparametric Latent Factor Model Samples

K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)

w1 =

[0.51

]w2 =

[1

0.5

]

Page 191: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Semiparametric Latent Factor Model Samples

K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)

w1 =

[0.51

]w2 =

[1

0.5

]

Page 192: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Semiparametric Latent Factor Model Samples

K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)

w1 =

[0.51

]w2 =

[1

0.5

]

Page 193: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Semiparametric Latent Factor Model Samples

K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)

w1 =

[0.51

]w2 =

[1

0.5

]

Page 194: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Gaussian processes for Multi-task, Multi-output andMulti-class

I Bonilla et al. (2008) suggest ICM for multitask learning.I Use a PPCA form for B: similar to our Kalman filter

example.I Refer to the autokrigeability effect as the cancellation of

inter-task transfer.I Also discuss the similarities between the multi-task GP and

the ICM, and its relationship to the SLFM and the LMC.

Page 195: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Multitask Classification

I Mostly restricted to the case where the outputs areconditionally independent given the hyperparameters φ(Minka and Picard, 1997; Williams and Barber, 1998; Lawrenceand Platt, 2004; Seeger and Jordan, 2004; Yu et al., 2005;Rasmussen and Williams, 2006).

I Intrinsic coregionalization model has been used in themulticlass scenario. Skolidis and Sanguinetti (2011) use theintrinsic coregionalization model for classification, byintroducing a probit noise model as the likelihood.

I Posterior distribution is no longer analytically tractable:approximate inference is required.

Page 196: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

Computer Emulation

I A statistical model used as a surrogate for acomputationally expensive computer model.

I Higdon et al. (2008) use the linear model ofcoregionalization to model images representing theevolution of the implosion of steel cylinders.

I In Conti and O’Hagan (2009) use the ICM to model avegetation model: called the Sheffield Dynamic GlobalVegetation Model (Woodward et al., 1998).

Page 197: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

References I

E. V. Bonilla, K. M. Chai, and C. K. I. Williams. Multi-task Gaussian process prediction. In J. C. Platt, D. Koller,Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20, Cambridge, MA,2008. MIT Press.

S. Conti and A. O’Hagan. Bayesian emulation of complex multi-output and dynamic computer models. Journal ofStatistical Planning and Inference, 140(3):640–651, 2009. [DOI].

G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of thetrp63 transcription factor revealed by a combination of gene expression profiling and reverse engineering.Genome Research, 18(6):939–948, Jun 2008. [URL]. [DOI].

P. Goovaerts. Geostatistics For Natural Resources Evaluation. Oxford University Press, 1997. [Google Books] .

J. D. Helterbrand and N. A. C. Cressie. Universal cokriging under intrinsic coregionalization. Mathematical Geology,26(2):205–226, 1994.

D. M. Higdon, J. Gattiker, B. Williams, and M. Rightley. Computer model calibration using high dimensional output.Journal of the American Statistical Association, 103(482):570–583, 2008.

A. G. Journel and C. J. Huijbregts. Mining Geostatistics. Academic Press, London, 1978. [Google Books] .

A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression timecourses through Gaussian process regression. BMC Bioinformatics, 12(180), 2011. [DOI].

N. D. Lawrence and J. C. Platt. Learning to learn with the informative vector machine. In R. Greiner andD. Schuurmans, editors, Proceedings of the International Conference in Machine Learning, volume 21, pages 512–519.Omnipress, 2004. [PDF].

T. P. Minka and R. W. Picard. Learning how to learn is learning with point sets. Available on-line., 1997. [URL].Revised 1999, available at http://www.stat.cmu.edu/˜minka/.

J. Oakley and A. O’Hagan. Bayesian inference for the uncertainty distribution of computer model outputs.Biometrika, 89(4):769–784, 2002.

C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.[Google Books] .

M. Seeger and M. I. Jordan. Sparse Gaussian Process Classification With Multiple Classes. Technical Report 661,Department of Statistics, University of California at Berkeley,

Page 198: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence

References II

G. Skolidis and G. Sanguinetti. Bayesian multitask classification with Gaussian process priors. IEEE Transactions onNeural Networks, 22(12):2011 – 2021, 2011.

Y. W. Teh, M. Seeger, and M. I. Jordan. Semiparametric latent factor models. In R. G. Cowell and Z. Ghahramani,editors, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pages 333–340,Barbados, 6-8 January 2005. Society for Artificial Intelligence and Statistics.

H. Wackernagel. Multivariate Geostatistics: An Introduction With Applications. Springer-Verlag, 3rd edition, 2003.[Google Books] .

C. K. Williams and D. Barber. Bayesian Classification with Gaussian processes. IEEE Transactions on Pattern Analysisand Machine Intelligence, 20(12):1342–1351, 1998.

I. Woodward, M. R. Lomas, and R. A. Betts. Vegetation-climate feedbacks in a greenhouse world. PhilosophicalTransactions: Biological Sciences, 353(1365):29–39, 1998.

K. Yu, V. Tresp, and A. Schwaighofer. Learning Gaussian processes from multiple tasks. In Proceedings of the 22ndInternational Conference on Machine Learning (ICML 2005), pages 1012–1019, 2005.