parg university of oxford 1 bayes, birds and brains: applications of inference and probabilistic...
TRANSCRIPT
1PARG University of Oxford
Bayes, birds and brains: applications of inference and probabilistic modelling
Stephen Roberts
Pattern Analysis & Machine Learning Research GroupUniversity of Oxford
http://www.robots.ox.ac.uk/~parg
2PARG University of Oxford
Introduction
• Bayesian inference has profound impact in principled handling of uncertainty in practical computation
• What this talks aims to do:– Give an overview of Bayesian inference applied to
several real-world problem domains
• What it does not aim to do:– Give endless equations – these are important and
elegant, but are in the open literature
6PARG University of Oxford
What’s wrong with sampling?
• Nothing – apart from speed and the occasional frequentist way the samples are used…
– Not much we can do for speed, lots of clever methods out there which help
– Bayesian sampling (using Gaussian processes)• Bayes-Hermite Quadrature [O’Hagan, 1992]• Bayesian Monte Carlo [Rasmussen and Ghahramani, 2003]
• Variational Bayes
8PARG University of Oxford
Variational Bayes - 2
• A slow and (often) painful derivation leads to an iterative node update for DAGs
• This converges to a local optimum – like EM and many other energy minimization approaches – get the priors right!
x1
x2
x2
p(x1,x2)
q(x1)q(x2)
9PARG University of Oxford
Variational Bayes – 3
• Some relief via Variational Message Passing– Same update equations as VB but at fraction of pain– Conjugate exponential family only– Pearl-style message passing on graphical model
using sufficient statistics only
• For many applications the factored nature of
degrades performance – need non-factored proposals – extra computation (e.g. some VB models with mixture model nodes)
10PARG University of Oxford
Priors & model selection
• Sensitivity to priors– posterior distributions conjugate with priors– empirically can be a problem – know the domain
• Model selection– evaluate set of models for VFE. Rank or integrate– use VFE in ‘quasi-RJMCMC’ approach– use ridge regression (ARD, weight decay) priors
11PARG University of Oxford
Simple example - ICA
• ICA (Bell & Sejnowski, Attias, Amari….)• Bayesian ICA (Roberts 1998, Attias 1999, Miskin &
MacKay 2000, Choudrey & Roberts 2000)
)](||)([ spspKL i
25PARG University of Oxford
A cautionary conclusion…
• In high noise regimes use ARD to focus on a small subset of models
• These are then investigated in more detail using variational free energy
26PARG University of Oxford
Priors
• If we have prior knowledge regarding the sources or the mixing , we can use it.
• Spatial information• Positive mixing• Positive sources• Structured observations
34PARG University of Oxford
Structure priors
• To be an ICA matrix, must lie on manifold of decorrelating matrices. These form ‘great circles’ in the matrix space.
• Can parameterize using co-ordinates on the manifold.
• Where do priors lie?
35PARG University of Oxford
Gaussian priors
Gaussian priors on the mixing process just form great circles – they have little impact if we already compute on the decorrelating manifold as they are aligned with the manifold.
36PARG University of Oxford
Structure priors
• Sensor coupling has spatial structure, close by sensors have similar coupling weights
• Gaussian process prior: still gives great circle in matrix space but very informative as not aligned along decorrelating manifold
Potential from brain source – dipole potential (Knuth)
39PARG University of Oxford
Motor cortex
• When we plan a movement, changes take place in the motor cortex, whether or not the movement takes place.
• When we change cognitive task, changes take place in the cortex.
41PARG University of Oxford
Cursor control – real time BCI
max median min
dT = 50msbaseline
Bayes – rejection
Bayes
43PARG University of Oxford
Information Engines
DATA ENGINE (MODEL) INFORMATION
potential entropy machine useful
1110000101010101010101001001010101001010111010100101010100101010010010100101010010101010001010010100001001001000
P(action|data)
= 0.95
“If all you have is a hammer, everything looks like a nail.”
44PARG University of Oxford
111000010101010101010100100101010100101011101010010101010010101001001010010101001010101000101001010000100100100010010101
Inside or outside the box?
The inferences we make, and the actions decided upon have an impact on the data
Learning with changing objectives
45PARG University of Oxford
Sequential Bayesian inference
• Particle filter (SIR)
• Humble (variational) Kalman filter: Bayesian inference assuming generalized (non-) linear Gaussian system
• Adaptive system using sequential variational Bayes– BCI application– (Musical score following)
46PARG University of Oxford
Generalised non-linear dynamic classifier
Copes with missing inputs & labels, input noise and bit errors on labels as well as time-delayed information
Penny, Sykacek, Lowne
52PARG University of Oxford
Hidden Markov birds…
• Global Positioning System (GPS)
• 15g units• Strapped to back of bird• Gives position every
second
Roberts, Guilford, Biro, Lau 2004,5 JTB
55PARG University of Oxford
Strategies, navigation & uncertainty
• Dynamics of flight gives indication of navigation strategy
• Easy to measure from GPS co-ordinates
57PARG University of Oxford
Variational Bayes
P(H,V) intractable. Tractable proposal Q(H) forms strict lower bound to P(H,V)
Hidden Markov Model States S with observation model parameters θ
Learning involves minimization of F wrt S, θ
62PARG University of Oxford
Influence from landscape
• Navigation states dependent upon many factors– ‘Markov pigeon property’– Visual landscape– Magnetic field (Hall effect sensor)– Other birds – etc…
• Allow for coupling between factors
64PARG University of Oxford
Variable lag between chains
GM now loopy. Sample, node cluster or use EP (thanks Iead Rezek)
67PARG University of Oxford
Gaussian process pigeons…
Gives a null hypothesis as GP paths independent of landscape
Mike Osborne & Richard Mann
68PARG University of Oxford
Birds are known to be highly right-eye dominant
Primary attention to edges in landscape
69PARG University of Oxford
Bird-brained conclusions
• Simple explanatory ‘navigation states’
• Terrain information may be important
• Co-released birds may co-operate
• Perceptive field may be possible to infer from tracks
• Edges may be primary information source for navigation
74PARG University of Oxford
Thanks…
• Steve Reece (for being the skeptic)
• Mike Osborne & Charles Fox (for glaring at me if I start straying from the path of Bayes)
• Riz Choudrey, Will Addison, Will Penny, Richard Everson, Evengelos Roussos (ICA)
• John Gann, Duncan Lowne, Peter Sykacek (brains)
• Tim Guilford, Rob Freeman, Richard Mann, Mike Osborne (bird brains)
77PARG University of Oxford
Recent sequential VB
• Very successful in musical score following
– or ‘How to replace your bass player with a laptop’
– Charles Fox
78PARG University of Oxford
Single fragment Bayes net
• uncertain number and matchings of notes
• Inference is hard because of loops
Start-time (invese) tempo
beats
notes
data