gaussian processes: from the basics to the...

Gaussian Processes:From the Basics to the State-of-the-Art

Dr. Richard E. Turner ([email protected])Computational and Biological Learning Lab, Department of

Engineering, University of Cambridge

with Thang Bui, Yingzhen Li, Jose Miguel Hernandez Lobato,Daniel Hernandez Lobato, Josiah Jan, Alex Navarro,

Felipe Tobar, and Maneesh Sahani

1 / 335

Motivation: non-linear regression

1 5 10 15 20x

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y

2 / 335


1 5 10 15 20x

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y

?

3 / 335


1 5 10 15 20x

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y

4 / 335


1 5 10 15 20x

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y

5 / 335


1 5 10 15 20x

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y

Can we do this with a plain old Gaussian? 6 / 335

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =[

1 .7.7 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

7 / 335



2yTΣ−1y)

Σ =[

1 .7.7 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

8 / 335



2yTΣ−1y)

Σ =[

1 .6.6 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

9 / 335



2yTΣ−1y)

Σ =[

1 .4.4 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

10 / 335



2yTΣ−1y)

Σ =[

1 .1.1 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

11 / 335



2yTΣ−1y)

Σ =[

1 .0.0 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

12 / 335



2yTΣ−1y)

Σ =[

1 .7.7 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

13 / 335



2yTΣ−1y)

Σ =[

1 .7.7 1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

14 / 335


p(y2|y1,Σ) ∝ exp(−1

2(y2 − µ∗)Σ∗−1(y2 − µ∗)) [ 1

1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

15 / 335


p(y2|y1,Σ) ∝ exp(−1

2(y2 − µ∗)Σ∗−1(y2 − µ∗)) [ 1

1

]

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

16 / 335


p(y2|y1,Σ) ∝ exp(−1

2(y2 − µ∗)Σ∗−1(y2 − µ∗)) [ 1

1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

17 / 335


p(y2|y1,Σ) ∝ exp(−1

2(y2 − µ∗)Σ∗−1(y2 − µ∗)) [ 1

1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

18 / 335


p(y2|y1,Σ) ∝ exp(−1

2(y2 − µ∗)Σ∗−1(y2 − µ∗)) [ 1

1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

19 / 335


p(y2|y1,Σ) ∝ exp(−1

2(y2 − µ∗)Σ∗−1(y2 − µ∗)) [ 1

1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

20 / 335


p(y2|y1,Σ) ∝ exp(−1

2(y2 − µ∗)Σ∗−1(y2 − µ∗)) [ 1

1

]

y1

y2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

21 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

Σ =[

1 .9.9 1

]

22 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

23 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

24 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

25 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

26 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

27 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

28 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

29 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

30 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

31 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

32 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

33 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

34 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

35 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

36 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

37 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

38 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

39 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

40 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

41 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

42 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

43 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

44 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

45 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

46 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

47 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

48 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

49 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

50 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

51 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

52 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

53 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

54 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

55 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

56 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

57 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

58 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

59 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

60 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

61 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

62 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

63 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =[

1 .9.9 1

]

64 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

65 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

66 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

67 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

68 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

69 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

70 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

71 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

72 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

73 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

74 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

75 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

76 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

77 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

78 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

79 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

80 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

81 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

82 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

83 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

84 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

85 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

86 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

87 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

88 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

89 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

90 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

91 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

92 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

93 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

94 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

95 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

96 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

97 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

98 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

99 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

100 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

101 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

102 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

103 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 .9 .8 .6 .4.9 1 .9 .8 .6.8 .9 1 .9 .8.6 .8 .9 1 .9.4 .6 .8 .9 1

104 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

105 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

106 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

107 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

108 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

109 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

110 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

111 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

112 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

113 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

114 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

115 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

116 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

117 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

118 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

119 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

120 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

121 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

122 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

123 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

124 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

125 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

126 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

127 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

128 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

129 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

130 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

131 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

132 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

133 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

134 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

135 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

136 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

137 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

138 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

139 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

140 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

141 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

142 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

143 / 335

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

144 / 335

Regression using Gaussians

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

145 / 335


1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

146 / 335


1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

147 / 335


1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

148 / 335

Regression: probabilistic inference in function space

−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

149 / 335


−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)

150 / 335


−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ) function estimatewith uncertainty

K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)

151 / 335


−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


noise

K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)

152 / 335


−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


horizontal-scale

K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)

153 / 335


−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

horizontal-scalevertical-scale

154 / 335

Mathematical Foundations: Definition

Gaussian process = generalisation of multivariate Gaussian distribution toinfinitely many variables.

Definition: a Gaussian process is a collection of random variables, anyfinite number of which have (consistent) Gaussian distributions.

A Gaussian distribution is fully specified by a mean vector, µ, andcovariance matrix Σ:

f = (f1, . . . , fn) ∼ N (µ,Σ), indices i = 1, . . . ,n

A Gaussian process is fully specified by a mean function m(x) andcovariance function K(x, x′):

f (x) ∼ GP(m(x),K(x, x′)

), indices x

155 / 335

Mathematical Foundations: Regression

Q1. What's the formal justification for how we were using GPs for regression?

156 / 335



generative model (like non-linear regression)

156 / 335




place GP prior over the non-linear function

(smoothly wiggling functions expected)

156 / 335




place GP prior over the non-linear function

sum of Gaussian variables = Gaussian: induces a GP over

(smoothly wiggling functions expected)

156 / 335

Mathematical Foundations: Marginalisation

Q2. A GP is "like" a Gaussian distribution with an infinitely long mean vector and an "infinite by infinite" covariance matrix, so how do we represent it on a computer?

157 / 335



We are saved by the marginalisation property:

157 / 335



We are saved by the marginalisation property:

Only need to represent finite dimensional projections of GPs on computer.

157 / 335

Mathematical Foundations: Prediction

Q3. How do we make predictions?

158 / 335



predictive mean

158 / 335



linear in the data

predictive mean

158 / 335



prior uncertainty

predictive uncertainty

reduction inuncertainty

linear in the data

predictive mean predictive covariance

predictions more confident than prior

158 / 335


−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

1 10 20

1

10

20

horizontal-scalevertical-scale

159 / 335

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

160 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

161 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

162 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

163 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

164 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

165 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

166 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

167 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

168 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

169 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

170 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

171 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

172 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

173 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

174 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

175 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

176 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

177 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

178 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

179 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

180 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

181 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

182 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

183 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

184 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

185 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

186 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

187 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

188 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

189 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

190 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)short horizontal length-scale

Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

191 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

192 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

193 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

194 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

195 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

196 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

197 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

198 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

199 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

200 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


Parametric model

y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

201 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

long horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

202 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

203 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

204 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

205 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

206 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

207 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

208 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

209 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

210 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

211 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

212 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

213 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

214 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

215 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

216 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

217 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

218 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

219 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

220 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

221 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

222 / 335


K(x1, x2) = σ2 exp(− 1

2l2 (x1 − x2)2)

Hyper-parameters have a strong effectI l controls the horizontal length-scaleI σ2 controls the vertical scale of the data

=⇒ need automatic learning of hyper-parameters from data e.g.

arg maxl,σ2

log p(y |θ)

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

l = short l = longl = medium

223 / 335


K(x1, x2) = σ2 exp(− 1

2l2 (x1 − x2)2)

Hyper-parameters have a strong effectI l controls the horizontal length-scaleI σ2 controls the vertical scale of the data

=⇒ need automatic learning of hyper-parameters from data e.g.

arg maxl,σ2

log p(y |θ)

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

l = short l = longl = medium

224 / 335

Higher dimensional input spaces

0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

225 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

226 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

227 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

228 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

229 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

230 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

231 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

232 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

233 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

234 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

235 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

236 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

237 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

238 / 335


0

20

40

60

80

100

0

20

40

60

80

100

−2

0

2

x1x

2

y

239 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y


p(y|θ) = N (0,Σ)


y(x) = f (x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2e−1

2l2(x1−x2)2

240 / 335

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

Σ =

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

241 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

242 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

243 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

244 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

245 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

246 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

247 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

248 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

249 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

250 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

251 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

252 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

253 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

254 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

255 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

256 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

257 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

258 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

259 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

260 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =


Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2 |x1 − x2|)

261 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

262 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

263 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

264 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

265 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

266 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

267 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

268 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

269 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

270 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

271 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

272 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

273 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

274 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

275 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

276 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

277 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

Rational Quadratic

278 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

279 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

280 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

281 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2 |x1 − x2|)−α

282 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

Periodic

sinusoid × squared exponential

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

283 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


284 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


285 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


286 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


287 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


288 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


289 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


290 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


291 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


292 / 335


−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


293 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


294 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


295 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


296 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


297 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


298 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


299 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


300 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


301 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


302 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2 (x1 − x2)2)

Periodic


303 / 335

The covariance function has a large effect

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

OU RQ periodic

304 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

OU RQ periodic

Bayesian model comparison:

305 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

OU RQ periodic

Bayesian model comparison:prior over models

306 / 335


−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

OU RQ periodic


marginal likelihood

307 / 335


Health warnings: Hard to compute (need approximations)Often results are very sensitive to the priors

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

OU RQ periodic


marginal likelihood

308 / 335

Scaling Gaussian Processes toLarge Datasets

309 / 335

Motivation: Gaussian Process Regression

inputs

outputs

310 / 335


inputs

outputs

?

310 / 335


inputs

outputs

?

inference & learning

310 / 335


inputs

outputs

?

inference & learning

intractabilities

computational

analytic

310 / 335


310 / 335

A Brief History of Gaussian Process Approximations

FITC: Snelson et al. “Sparse Gaussian Processes using Pseudo-inputs”PITC: Snelson et al. “Local and global sparse Gaussian process approximations”EP: Csato and Opper 2002 / Qi et al. "Sparse-posterior Gaussian Processes for general likelihoods.” VFE: Titsias “Variational Learning of Inducing Variables in Sparse Gaussian Processes”DTC / PP: Seeger et al. “Fast Forward Selection to Speed Up Sparse Gaussian Process Regression”

311 / 335


approximate generative modelexact inference


311 / 335



methods employingpseudo-data


311 / 335





FITCPITCDTC

311 / 335





FITCPITCDTC

A Unifying View of Sparse Approximate Gaussian Process RegressionQuinonero-Candela & Rasmussen, 2005(FITC, PITC, DTC)

311 / 335



exact generative modelapproximate inference



FITCPITCDTC


311 / 335






VFEEPPP

FITCPITCDTC


311 / 335






VFEEPPP

FITCPITCDTC


A Unifying Framework for Sparse Gaussian Process Approximation using Power Expectation PropagationBui, Yan and Turner, 2016(VFE, EP, FITC, PITC ...)

311 / 335

EP pseudo-point approximation

true posterior

312 / 335


true posterior

marginal

likelihood

posterior

312 / 335


true posterior approximate posterior

marginal

likelihood

posterior

312 / 335


input locations of

'pseudo' data

outputs and covariance

'pseudo' data

true posterior approximate posterior

marginal

likelihood

posterior

exact joint

of new GP

regression

model

312 / 335

EP algorithm

313 / 335

EP algorithm

1. remove take out onepseudo-observation

likelihoodcavity

313 / 335

EP algorithm

1. remove

2. include

take out onepseudo-observation

likelihood

add in onetrue observation

likelihood

cavity

tilted

313 / 335

EP algorithm

1. remove

2. include

3. project


likelihood


likelihood

project ontoapproximating

family

cavity

tilted KL between unnormalisedstochastic processes

313 / 335

EP algorithm

1. remove

2. include

3. project

4. update


likelihood


likelihood


family

updatepseudo-observation

likelihood

cavity

tilted KL between unnormalisedstochastic processes

313 / 335

EP algorithm

1. remove

2. include

3. project

4. update


likelihood


likelihood


family


likelihood

cavity

tilted

1. minimum: moments matched at pseudo-inputs2. Gaussian regression: matches moments everywhere

KL between unnormalisedstochastic processes

313 / 335

EP algorithm

1. remove

2. include

3. project

4. update


likelihood


likelihood


family


likelihood

cavity

tilted



rank 1

313 / 335






VFEEPPP

FITCPITCDTC



314 / 335

Fixed points of EP = FITC approximation





VFEEPPP

FITCPITCDTC



315 / 335

Fixed points of EP = FITC approximation





VFEEPPP

FITCPITCDTC

interpretation resolves issues with FITC: why does it work so well?

are we allowed to increase M with N



315 / 335

EP algorithm

1. remove

2. include

3. project

4. update


likelihood


likelihood


family


likelihood

cavity

tilted



rank 1

316 / 335

Power EP algorithm (as tractable as EP)

1. remove

2. include

3. project

4. update

take out fraction ofpseudo-observation

likelihood

add in fraction oftrue observation

likelihood


family


likelihood

cavity

tilted



rank 1

317 / 335

Power EP: a unifying framework

FITCCsato and Opper, 2002

Snelson and Ghahramani, 2005

VFETitsias, 2009

318 / 335

Power EP: a unifying framework

GP Regression GP Classification

PEPVFE

EP

inter-domain

[4] Quiñonero-Candela et al. 2005[5] Snelson et al., 2005[6] Snelson, 2006[7] Schwaighofer, 2002

[10,5,6*]

[14*]

[12*,15*]

[13][17,13]

[9,11,8*]

[16*] inter-domain

structuredapprox.

structuredapprox.

(FITC)

[7,4*,6*](PITC)

[8] Titsias, 2009[9] Csató, 2002[10] Csató et al., 2002[11] Seeger et al., 2003

[12] Naish-Guzman et al, 2007[13] Qi et al., 2010[14] Hensman et al., 2015[15] Hernández-Lobato et al., 2016[16] Matthews et al., 2016[17] Figueiras-Vidal et al., 2009

PEPVFE

EP

* = optimised pseudo-inputs ** = structured versions of VFE recover VFE

** **

319 / 335

How should I set the power parameter α?

6 UCI classification datasets20 random splitsM = 10, 50, 100

hypers and inducing inputs optimised

8 UCI regression datasets20 random splits

M = 0 - 200hypers and inducing

inputs optimised

0.0 0.2 0.4 0.6 0.8 1.02

3

4

5

6

7

8

0.0 0.2 0.4 0.6 0.8 1.02

3

4

5

6

7

8

MS

Era

nker

ror

rank

log-

loss

ran

klo

g-lo

ss r

ank

= 0.5 does well on average

0.0 0.2 0.4 0.6 0.8 1.02.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

0.0 0.2 0.4 0.6 0.8 1.01

2

3

4

5

6

7

320 / 335

Deep Gaussian Processes forRegression

321 / 335

Pros and cons of Gaussian Process Regression

Probabilistic Xneed well-calibrated predictive uncertainty (decision making)big models (even with small data)neural networks do not provide well calibrated uncertainty estimates

Non-parametric (∞ - parametric) Xmodel parameters: information bottleneck between training and test data

training data→ θθθ → test data

unbounded model (θθθ grows with data) pruned by data

Deep (and wide) 7

intrinsic hierarchical structure in real world datasimpler to learn a series of weak non-linearities

322 / 335

From Gaussian Processes to Deep Gaussian Processes

input

323 / 335


}input

input

323 / 335


}input

input

long short

323 / 335


}input

input

long short

DGP = multi-layer neural network, infinitely wide hidden layer

323 / 335

Deep Gaussian Processes

fl ∼ GP(0, k(., .))yn = g(xn) = fL(fL−1(· · · f2(f1(xn)))) + εn

hL−1,n := fL−1(· · · f1(xn)), yn = fL(hL−1,n) + εn

Deep GPsa aremulti-layer generalisation of Gaussian processes,equivalent to deep neural networks with infinitelywide hidden layers

Questions:How to perform inference and learning tractably?How Deep GPs compare to alternative,e.g. Bayesian neural networks?

aDamianou and Lawrence (2013) [unsupervised learning]

xn

h1,n

h2,n

yn

f1

f2

f3

N

324 / 335

Approximate inference for (Deep) Gaussian Processes

GP Regression GP Classification

PEPVFE

EP

inter-domain

[4] Quiñonero-Candela et al. 2005[5] Snelson et al., 2005[6] Snelson, 2006[7] Schwaighofer, 2002

[10,5,6*]

[14*]

[12*,15*]

[13][17,13]

[9,11,8*]

[16*] inter-domain

structuredapprox.

structuredapprox.

(FITC)

[7,4*,6*](PITC)

[8] Titsias, 2009[9] Csató, 2002[10] Csató et al., 2002[11] Seeger et al., 2003

[12] Naish-Guzman et al, 2007[13] Qi et al., 2010[14] Hensman et al., 2015[15] Hernández-Lobato et al., 2016[16] Matthews et al., 2016[17] Figueiras-Vidal et al., 2009

PEPVFE

EP

* = optimised pseudo-inputs ** = structured versions of VFE recover VFE

** **

325 / 335

Experiment: Value function of the mountain car problem

goalr = 10

car

each stepr=-1

[x,x]

x1

x2

GP fit

x1

x2

Value function

x1

x2

DGP fit

326 / 335

Experiment: Comparison to Bayesian neural networks

Compare DGPs with GPs and Bayesian neural networks with one andtwo hidden layers using:

VI(G): Graves’ VI [diagonal Gaussian, without the reparam. trick]VI(KW): Kingma and Welling’s VI [with the reparam. trick]PBP: ADF with Probablistic BackpropagationDropout: Combining dropout predictions at test timeSGLD: Stochastic gradient Langevin dynamicsHMC: Hamiltonian Monte Carlo [only for small networks]

327 / 335

Experiment: Comparison to Bayesian neural networks

Rankings of all methods across all datasets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

MLL Average Rank

PBP−1Dropout−1SGLD−1DGP−1 50SGLD−2VI(KW)−1GP 50

DGP−3 100DGP−2 100DGP−3 50DGP−2 50

GP 100HMC−1

VI(KW)−2DGP−1 100

CD

328 / 335

Experiment: Comparison to Bayesian neural networks [Best results]

bostonN = 506D = 13

−2.5

−2.4

−2.3

−2.2

−2.1

−2.0

aver

age

test

log-

likel

ihoo

d/na

ts

concreteN = 1030

D = 8

−3.2

−3.0

−2.8

−2.6

−2.4

−2.2

−2.0

−1.8

energyN = 768D = 8

−2.0

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

kin8nmN = 8192

D = 8

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

3.2

navalN = 11934

D = 16

5.0

5.5

6.0

6.5

7.0

7.5

powerN = 9568

D = 4

−2.65

−2.60

−2.55

−2.50

−2.45

−2.40

−2.35

−2.30

−2.25

proteinN = 45730

D = 9

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

red wineN = 1588D = 11

−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

yachtN = 308D = 6

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

−0.4

yearN = 515345

D = 90

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5

BNN-deterministic BNN-sampling GP DGP

329 / 335


bostonN = 506D = 13

−2.5

−2.4

−2.3

−2.2

−2.1

−2.0

aver

age

test

log-

likel

ihoo

d/na

ts

concreteN = 1030

D = 8

−3.2

−3.0

−2.8

−2.6

−2.4

−2.2

−2.0

−1.8

energyN = 768D = 8

−2.0

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

kin8nmN = 8192

D = 8

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

3.2

navalN = 11934

D = 16

5.0

5.5

6.0

6.5

7.0

7.5

powerN = 9568

D = 4

−2.65

−2.60

−2.55

−2.50

−2.45

−2.40

−2.35

−2.30

−2.25

proteinN = 45730

D = 9

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0


−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

yachtN = 308D = 6

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

−0.4

yearN = 515345

D = 90

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5


330 / 335


bostonN = 506D = 13

−2.5

−2.4

−2.3

−2.2

−2.1

−2.0

aver

age

test

log-

likel

ihoo

d/na

ts

concreteN = 1030

D = 8

−3.2

−3.0

−2.8

−2.6

−2.4

−2.2

−2.0

−1.8

energyN = 768D = 8

−2.0

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

kin8nmN = 8192

D = 8

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

3.2

navalN = 11934

D = 16

5.0

5.5

6.0

6.5

7.0

7.5

powerN = 9568

D = 4

−2.65

−2.60

−2.55

−2.50

−2.45

−2.40

−2.35

−2.30

−2.25

proteinN = 45730

D = 9

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0


−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

yachtN = 308D = 6

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

−0.4

yearN = 515345

D = 90

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5


331 / 335


bostonN = 506D = 13

−2.5

−2.4

−2.3

−2.2

−2.1

−2.0

aver

age

test

log-

likel

ihoo

d/na

ts

concreteN = 1030

D = 8

−3.2

−3.0

−2.8

−2.6

−2.4

−2.2

−2.0

−1.8

energyN = 768D = 8

−2.0

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

kin8nmN = 8192

D = 8

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

3.2

navalN = 11934

D = 16

5.0

5.5

6.0

6.5

7.0

7.5

powerN = 9568

D = 4

−2.65

−2.60

−2.55

−2.50

−2.45

−2.40

−2.35

−2.30

−2.25

proteinN = 45730

D = 9

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0


−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

yachtN = 308D = 6

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

−0.4

yearN = 515345

D = 90

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5


332 / 335


bostonN = 506D = 13

−2.5

−2.4

−2.3

−2.2

−2.1

−2.0

aver

age

test

log-

likel

ihoo

d/na

ts

concreteN = 1030

D = 8

−3.2

−3.0

−2.8

−2.6

−2.4

−2.2

−2.0

−1.8

energyN = 768D = 8

−2.0

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

kin8nmN = 8192

D = 8

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

3.2

navalN = 11934

D = 16

5.0

5.5

6.0

6.5

7.0

7.5

powerN = 9568

D = 4

−2.65

−2.60

−2.55

−2.50

−2.45

−2.40

−2.35

−2.30

−2.25

proteinN = 45730

D = 9

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0


−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

yachtN = 308D = 6

−1.8

−1.6

−1.4

−1.2

−1.0

−0.8

−0.6

−0.4

yearN = 515345

D = 90

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5


333 / 335

Pros and cons of Gaussian Process Regression

Probabilistic Xneed well-calibrated predictive uncertainty (decision making)big models (even with small data)neural networks do not provide well calibrated uncertainty estimates

Non-parametric (∞ - parametric) Xmodel parameters: information bottleneck between training and test data

training data→ θθθ → test data

unbounded model (θθθ grows with data) pruned by data

Deep (and wide) Xintrinsic hierarchical structure in real world datasimpler to learn a series of weak non-linearities

334 / 335

References (hyperlinked)

Deep Gaussian Processes:Deep Gaussian Processes for Regression using ApproximateExpectation Propagation, ICML 2016

Approximate inference in GPs:

A Unifying Framework for Sparse Gaussian Process Approximationusing Power Expectation Propagation, arXiv preprint 2016

Scalable Approximate inference:Stochastic Expectation Propagation, NIPS 2015Black-box α-divergence Minimization, ICML 2016

335 / 335

http://arxiv.org/abs/1602.04133






gaussian processes: from the basics to the...

Documents