parg university of oxford 1 bayes, birds and brains: applications of inference and probabilistic...

97
1 PARG University of Oxford Bayes, birds and brains: applications of inference and probabilistic modelling Stephen Roberts Pattern Analysis & Machine Learning Research Group University of Oxford http://www.robots.ox.ac.uk/~parg

Upload: scot-hardy

Post on 31-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

1PARG University of Oxford

Bayes, birds and brains: applications of inference and probabilistic modelling

Stephen Roberts

Pattern Analysis & Machine Learning Research GroupUniversity of Oxford

http://www.robots.ox.ac.uk/~parg

2PARG University of Oxford

Introduction

• Bayesian inference has profound impact in principled handling of uncertainty in practical computation

• What this talks aims to do:– Give an overview of Bayesian inference applied to

several real-world problem domains

• What it does not aim to do:– Give endless equations – these are important and

elegant, but are in the open literature

5PARG University of Oxford

6PARG University of Oxford

What’s wrong with sampling?

• Nothing – apart from speed and the occasional frequentist way the samples are used…

– Not much we can do for speed, lots of clever methods out there which help

– Bayesian sampling (using Gaussian processes)• Bayes-Hermite Quadrature [O’Hagan, 1992]• Bayesian Monte Carlo [Rasmussen and Ghahramani, 2003]

• Variational Bayes

7PARG University of Oxford

Variational Bayes - 1

Log posterior bounded below by Free Energy

8PARG University of Oxford

Variational Bayes - 2

• A slow and (often) painful derivation leads to an iterative node update for DAGs

• This converges to a local optimum – like EM and many other energy minimization approaches – get the priors right!

x1

x2

x2

p(x1,x2)

q(x1)q(x2)

9PARG University of Oxford

Variational Bayes – 3

• Some relief via Variational Message Passing– Same update equations as VB but at fraction of pain– Conjugate exponential family only– Pearl-style message passing on graphical model

using sufficient statistics only

• For many applications the factored nature of

degrades performance – need non-factored proposals – extra computation (e.g. some VB models with mixture model nodes)

10PARG University of Oxford

Priors & model selection

• Sensitivity to priors– posterior distributions conjugate with priors– empirically can be a problem – know the domain

• Model selection– evaluate set of models for VFE. Rank or integrate– use VFE in ‘quasi-RJMCMC’ approach– use ridge regression (ARD, weight decay) priors

11PARG University of Oxford

Simple example - ICA

• ICA (Bell & Sejnowski, Attias, Amari….)• Bayesian ICA (Roberts 1998, Attias 1999, Miskin &

MacKay 2000, Choudrey & Roberts 2000)

)](||)([ spspKL i

12PARG University of Oxford

vbICA – graphical model

13PARG University of Oxford

vbICA – simple example

14PARG University of Oxford

How many sources?

15PARG University of Oxford

vbICA - VFE

16PARG University of Oxford

vbICA - RJMCMC

17PARG University of Oxford

vbICA – source suppression

18PARG University of Oxford

Mixtures of ICAs

(Choudrey & Roberts, 2001)

19PARG University of Oxford

VFE and RR (ARD) work

20PARG University of Oxford

Example (& a cautionary tale)

21PARG University of Oxford

… a cautionary tale

22PARG University of Oxford

Ridge regression…

23PARG University of Oxford

Variational free energy

24PARG University of Oxford

Recovered images

25PARG University of Oxford

A cautionary conclusion…

• In high noise regimes use ARD to focus on a small subset of models

• These are then investigated in more detail using variational free energy

26PARG University of Oxford

Priors

• If we have prior knowledge regarding the sources or the mixing , we can use it.

• Spatial information• Positive mixing• Positive sources• Structured observations

27PARG University of Oxford

Positivity

28PARG University of Oxford

An example

29PARG University of Oxford

ICA with different priors

Which is ‘correct’ though?

30PARG University of Oxford

Which is ‘correct’?

31PARG University of Oxford

Epilepsy data

32PARG University of Oxford

33PARG University of Oxford

34PARG University of Oxford

Structure priors

• To be an ICA matrix, must lie on manifold of decorrelating matrices. These form ‘great circles’ in the matrix space.

• Can parameterize using co-ordinates on the manifold.

• Where do priors lie?

35PARG University of Oxford

Gaussian priors

Gaussian priors on the mixing process just form great circles – they have little impact if we already compute on the decorrelating manifold as they are aligned with the manifold.

36PARG University of Oxford

Structure priors

• Sensor coupling has spatial structure, close by sensors have similar coupling weights

• Gaussian process prior: still gives great circle in matrix space but very informative as not aligned along decorrelating manifold

Potential from brain source – dipole potential (Knuth)

37PARG University of Oxford

Phantom head experiments

Without prior With prior

38PARG University of Oxford

Brain-Computer Interfaces

‘direct’ control in real-time using ‘thought’

39PARG University of Oxford

Motor cortex

• When we plan a movement, changes take place in the motor cortex, whether or not the movement takes place.

• When we change cognitive task, changes take place in the cortex.

41PARG University of Oxford

Cursor control – real time BCI

max median min

dT = 50msbaseline

Bayes – rejection

Bayes

42PARG University of Oxford

The curse of feedback

bits

t (secs)

43PARG University of Oxford

Information Engines

DATA ENGINE (MODEL) INFORMATION

potential entropy machine useful

1110000101010101010101001001010101001010111010100101010100101010010010100101010010101010001010010100001001001000

P(action|data)

= 0.95

“If all you have is a hammer, everything looks like a nail.”

44PARG University of Oxford

111000010101010101010100100101010100101011101010010101010010101001001010010101001010101000101001010000100100100010010101

Inside or outside the box?

The inferences we make, and the actions decided upon have an impact on the data

Learning with changing objectives

45PARG University of Oxford

Sequential Bayesian inference

• Particle filter (SIR)

• Humble (variational) Kalman filter: Bayesian inference assuming generalized (non-) linear Gaussian system

• Adaptive system using sequential variational Bayes– BCI application– (Musical score following)

46PARG University of Oxford

Generalised non-linear dynamic classifier

Copes with missing inputs & labels, input noise and bit errors on labels as well as time-delayed information

Penny, Sykacek, Lowne

47PARG University of Oxford

Foul stuff…

48PARG University of Oxford

What it buys us

49PARG University of Oxford

50PARG University of Oxford

Thanks to John Gann

51PARG University of Oxford

PART II: BIRDS

52PARG University of Oxford

Hidden Markov birds…

• Global Positioning System (GPS)

• 15g units• Strapped to back of bird• Gives position every

second

Roberts, Guilford, Biro, Lau 2004,5 JTB

53PARG University of Oxford

54PARG University of Oxford

55PARG University of Oxford

Strategies, navigation & uncertainty

• Dynamics of flight gives indication of navigation strategy

• Easy to measure from GPS co-ordinates

56PARG University of Oxford

States – Hidden Markov Models

57PARG University of Oxford

Variational Bayes

P(H,V) intractable. Tractable proposal Q(H) forms strict lower bound to P(H,V)

Hidden Markov Model States S with observation model parameters θ

Learning involves minimization of F wrt S, θ

58PARG University of Oxford

59PARG University of Oxford

60PARG University of Oxford

Posteriors over states

61PARG University of Oxford

62PARG University of Oxford

Influence from landscape

• Navigation states dependent upon many factors– ‘Markov pigeon property’– Visual landscape– Magnetic field (Hall effect sensor)– Other birds – etc…

• Allow for coupling between factors

63PARG University of Oxford

Coupled models

64PARG University of Oxford

Variable lag between chains

GM now loopy. Sample, node cluster or use EP (thanks Iead Rezek)

65PARG University of Oxford

66PARG University of Oxford

67PARG University of Oxford

Gaussian process pigeons…

Gives a null hypothesis as GP paths independent of landscape

Mike Osborne & Richard Mann

68PARG University of Oxford

Birds are known to be highly right-eye dominant

Primary attention to edges in landscape

69PARG University of Oxford

Bird-brained conclusions

• Simple explanatory ‘navigation states’

• Terrain information may be important

• Co-released birds may co-operate

• Perceptive field may be possible to infer from tracks

• Edges may be primary information source for navigation

70PARG University of Oxford

71PARG University of Oxford

72PARG University of Oxford

73PARG University of Oxford

74PARG University of Oxford

Thanks…

• Steve Reece (for being the skeptic)

• Mike Osborne & Charles Fox (for glaring at me if I start straying from the path of Bayes)

• Riz Choudrey, Will Addison, Will Penny, Richard Everson, Evengelos Roussos (ICA)

• John Gann, Duncan Lowne, Peter Sykacek (brains)

• Tim Guilford, Rob Freeman, Richard Mann, Mike Osborne (bird brains)

75PARG University of Oxford

76PARG University of Oxford

77PARG University of Oxford

Recent sequential VB

• Very successful in musical score following

– or ‘How to replace your bass player with a laptop’

– Charles Fox

78PARG University of Oxford

Single fragment Bayes net

• uncertain number and matchings of notes

• Inference is hard because of loops

Start-time (invese) tempo

beats

notes

data

79PARG University of Oxford

Start with good priors

80PARG University of Oxford

81PARG University of Oxford

82PARG University of Oxford

83PARG University of Oxford

84PARG University of Oxford

Converged

85PARG University of Oxford

Start with very poor priors

86PARG University of Oxford

87PARG University of Oxford

88PARG University of Oxford

89PARG University of Oxford

90PARG University of Oxford

Failed

Arrg! Floating point exception…

91PARG University of Oxford

92PARG University of Oxford

93PARG University of Oxford

94PARG University of Oxford

95PARG University of Oxford

96PARG University of Oxford

97PARG University of Oxford

98PARG University of Oxford

99PARG University of Oxford

100PARG University of Oxford