![Page 1: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/1.jpg)
Amos Storkey, School of Informatics.
when training and test distributions are
different
characterising learning transfer
![Page 2: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/2.jpg)
Amos Storkey, School of Informatics.
acknowledgements
Joint work with Masashi Sugiyama, Jon Clayden and Mark Bastin
![Page 3: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/3.jpg)
Amos Storkey, School of Informatics.
characterising learning transfer
Learning transfer
Covers many current cases of dataset shift
Will benefit from an inclusive framework that characterises the general problem
Can be formalised
Is practical
![Page 4: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/4.jpg)
Amos Storkey, School of Informatics.
dataset shift
Predictive Generative
Training
?Test
![Page 5: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/5.jpg)
Amos Storkey, School of Informatics.
real life
Characterising the change Simple covariate shift Prior probability shift Sample selection bias Imbalanced data Domain shift Source component shift
Focus on the prediction problem: Given X predict Y
![Page 6: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/6.jpg)
Amos Storkey, School of Informatics.
simple covariate shift
Learnt conditional predictive model
Change: Distribution of X changes P(Y|X) does not
Modelling implication: None (given suitable modelling class)
X
Y
y
x
![Page 7: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/7.jpg)
Amos Storkey, School of Informatics.
no modellingimplication?
y
x
![Page 8: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/8.jpg)
Amos Storkey, School of Informatics.
prior probability shift
Learnt generative model
Change: Distribution of Y changes P(X|Y) does not
Modelling implication: Use different P(Y) in Bayes Rule
Y
X
x2
x1
y
x
![Page 9: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/9.jpg)
Amos Storkey, School of Informatics.
sample selection bias
Learnt conditional predictive model
Change: Sample selection rule V determines
what samples occur in data.
Modelling implication: Sample selection estimation
X
Y V
y
x
X
Y V
= covariate shift
![Page 10: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/10.jpg)
Amos Storkey, School of Informatics.
imbalanced data
Learn conditional classification model on balanced data
Change: Training data: V rejects many samples for
common class Test on full imbalanced data (special case of
sample selection bias)
Modelling implication: Adapt classification probability thresholds to
account for change.
X
Y V
X1
X2
![Page 11: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/11.jpg)
Amos Storkey, School of Informatics.
domain shift
Learn conditional classification model on balanced data
Change: Dynamic X. Xnew=f(Xold) Y(Xnew)=Y(f(Xold))
Modelling implication: Need to learn functional map f
X
Y
F
Xo
![Page 12: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/12.jpg)
Amos Storkey, School of Informatics.
source component shift
Various sources for dataChange:
Proportions of different source components vary between datasets
Within source conditional models are same
Modelling implication: Estimate sources and proportion changes Learn mixture of experts model
X
Y R
y
x
![Page 13: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/13.jpg)
Amos Storkey, School of Informatics.
sample selection v source
componentsample selection bias as
source component shift: Let R index rejection-
equiprobable regions. P(X,Y|R) gives distributions
for those regions: consistent for both training and test.
P(R) varies to account for rejection in training.
X
Y V
X
Y R
![Page 14: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/14.jpg)
Amos Storkey, School of Informatics.
modelling source component shift
P1(y|x) P2(y|x)
P11(x) P12(x) P13(x) P21(x) P22(x) P23(x)
i
D1i
D2i
T1i
![Page 15: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/15.jpg)
Amos Storkey, School of Informatics.
EM for source component shift
Effectively a Gaussian mixture model with shared components, and different priors.
Can use EM algorithm: Compute responsibilities for components Learn parameters of Gaussians Learn parameters for regressors. All subject to constraints on what data point can
be generated from what model.
![Page 16: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/16.jpg)
Amos Storkey, School of Informatics.
![Page 17: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/17.jpg)
Amos Storkey, School of Informatics.
tests
1D linear, sample from prior form, BIC model selection, 100 tests.
![Page 18: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/18.jpg)
Amos Storkey, School of Informatics.
tests
4D nonlinear, auto-mpg data, Gaussian process regressors, BIC.
Trained on one origin of car
Tested on 2 other origins
![Page 19: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/19.jpg)
Amos Storkey, School of Informatics.
issues
Single training dataset
No targets for new domain Semi-supervised: a few target values might help
to distinguish between different potential shift models.
Dataset shift Transfer Learning
![Page 20: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/20.jpg)
Amos Storkey, School of Informatics.
from here...
Tranference Dealing with the more general problem of
multiple datasets multiple domains• Topic modelling and multilevel topic modelling• What is a domain or dataset anyway? Structured data.• More general than regression. Varying fields. Missing
data. Semi-supervised learning.• Characterising the general case.
Mixtures and mixingDataset productionNon-parametric methods and local minima
reduction
![Page 21: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/21.jpg)
Amos Storkey, School of Informatics.
interim
Transference is really structure modellingDataset shift implies unsupervised learning!Using conditional models implies a particular
full generative model under dataset shift scenarios.
But in unsupervised learning people have been dealing with dataset shift for a long time… by modelling for it.
e.g.Intra versus inter subject variability.In real life, modelling for the variability is the
most common approach. Never simple.
![Page 22: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/22.jpg)
Amos Storkey, School of Informatics.
Diffusion Tensor Imaging
Brain MRI imaging technique looking at the anisotropy of water diffusion in the brain.
![Page 23: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/23.jpg)
Amos Storkey, School of Informatics.
the white matter
![Page 24: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/24.jpg)
Amos Storkey, School of Informatics.
diffusion tensor
The diffusion of water at each voxel is commonly modelled as a three dimensional second order tensor, D.
Think of it as an ellipsoid with some principal direction.
![Page 25: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/25.jpg)
Amos Storkey, School of Informatics.
The problem
“White matter integrity” matters in studies of ageing.
But to study white matter integrity, we have to compare across subjects, and within subjects.
But subjects brains are different anyway.
Need to account for shifts between brains in mapping results.
Use diffusion tensor imaging. Currently: Use FA.
![Page 26: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/26.jpg)
Amos Storkey, School of Informatics.
Tractography
Would like to combine local direction components into consistent “tracts”.
But the measurements are noisy…
Set up a Markov Random Field
![Page 27: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/27.jpg)
Amos Storkey, School of Informatics.
Behrens et al
And then sample streamlines from the random field. Can either work with streamline samples,
or compute marginals: P(tract goes through X| same tract goes through SEED).
![Page 28: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/28.jpg)
Amos Storkey, School of Informatics.
Seed points as hypotheses
Single seed point is more specific than a seeding region
But tract reconstruction is highly sensitive to seed placement
Neighbourhood tractography (NT) treats a group of “candidate” seed points as hypotheses
Uses tract shape and length to find best resulting match to a reference tract
Clayden et al., NeuroImage, 2006
![Page 29: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/29.jpg)
Amos Storkey, School of Informatics.
Bayesian model comparison
Given some reference tract from one brain.
Is this tract in a second brain the same tract as the reference tract?
Compare P(tract) with P(tract|reference tract)
but
Want consistency! The reference tract is just any other tract. Need a model with P(tract)=
reftract )reftract()reftract|tract( dPP
![Page 30: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/30.jpg)
Amos Storkey, School of Informatics.
Model choice
Model Comparison or Model choice?
In fact we have a number of candidate matches.
Presume at most one is right. Could be that none match.
Compute P(this is right match).
![Page 31: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/31.jpg)
Amos Storkey, School of Informatics.
median tractspline fit
Work with streamlines. Reduce to Median Tract.
Fit a B-spline to the 3D
median tract.
Adjust knot point positions to constrain error on reference tract.
Seed point
![Page 32: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/32.jpg)
Amos Storkey, School of Informatics.
Two models:P(cos[]) andP(cos[], cos[r] | cos[])=P(cos[])P(cos[r]| cos[], cos[])
Derive second from assumption v1
* symmetric about v1.
v0
v1
![Page 33: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/33.jpg)
Amos Storkey, School of Informatics.
modelcos() is uniform if direction is uniform on unit sphere.Use a Beta distribution + uniform component to model
probabilities. Compute using hand labelled training data.
model whole tract as product of individual step probabilities.
2 cases: unmatched, matched.
![Page 34: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/34.jpg)
Amos Storkey, School of Informatics.
results
![Page 35: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/35.jpg)
Amos Storkey, School of Informatics.
match quality
Posterior probabilities for the second and third subjects:
1: 0.332 2: 0.344 3: 0.822 4: 0.588 5: 0.877
For the first subject, the best match (top): 0.464, next best (middle): 0.116.
Three tracts >0.1, five >0.05 (all plausible matches). This is out of 220 candidate seeds. The posterior for the “central seed” (bottom)was 5.28
x 10-6.
![Page 36: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/36.jpg)
Amos Storkey, School of Informatics.
Use match
Now we can compare like with like across brains: compute tract integrity measures.
Major improvement in comparative results.
Clayden J.D., A.J. Storkey, S. Munoz Maniega and M.E. Bastin (2009) Reproducibility of tract segmentation between sessions using an unsupervised modelling-based approach. Neuroimage 45, 377-385.
Bastin, M., J.P. Piatowski, A.J. Storkey, L.J. Brown, A.M. Maclullich and J.D. Clayden (2008) Tract shape modelling provides evidence of topological change in corpus callosum genu during normal ageing. Neuroimage 43: 20-28
Bastin M.E. , S. Muñoz Maniega, K.J. Ferguson, L.J. Brown, J.M. Wardlaw, A.M. MacLullich & J.D. Clayden (2010). Quantifying the effects of normal ageing on white matter structure using unsupervised tract shape modelling. NeuroImage 51(1):1-10.
Penke L., S. Muñoz Maniega, L.M. Houlihan, C. Murray, A.J. Gow, J.D. Clayden, M.E. Bastin, J.M. Wardlaw & I.J. Deary (2010). White matter integrity in the splenium of the corpus callosum is related to successful cognitive aging and partly mediates the protective effect of an ancestral polymorphism in ADRB2. Behavior Genetics 40(2):146-156.
![Page 37: Amos Storkey, School of Informatics. when training and test distributions are different characterising learning transfer](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfd91a28abf838caf9ac/html5/thumbnails/37.jpg)
Amos Storkey, School of Informatics.
Conclusions
Dataset shift happens all the time
There are some common generic causes
Modelling involves a full generative understanding.
In many realistic scenarios accommodating shifts is non-trivial.
Model for likely changes.