python for brain mining:
TRANSCRIPT
![Page 1: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/1.jpg)
Python for brain mining:(neuro)science with state of the art machine learningand data visualization
Gael Varoquaux
1. Data-driven science“Brain mining”
2. Data mining in PythonMayavi, scikit-learn, joblib
![Page 2: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/2.jpg)
1 Brain miningLearning models of brain function
Gael Varoquaux 2
![Page 3: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/3.jpg)
1 Imaging neuroscience
Brainimages
Models offunction
Cognitive tasks
Gael Varoquaux 3
![Page 4: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/4.jpg)
1 Imaging neuroscience
Brainimages
Models offunction
Cognitive tasks
Data-driven science
i ~ ∂
∂t Ψ = HΨ
Gael Varoquaux 3
![Page 5: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/5.jpg)
1 Brain functional data
Rich data50 000 voxels perframeComplex underlyingdynamics
Few observations∼ 100
Drawing scientific conclusions?Ill-posed statistical problem
Gael Varoquaux 4
![Page 6: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/6.jpg)
1 Brain functional data
Rich data50 000 voxels perframeComplex underlyingdynamics
Few observations∼ 100
Drawing scientific conclusions?Ill-posed statistical problem
Modern complex system studies:from strong hypothesizes to rich data
Gael Varoquaux 4
![Page 7: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/7.jpg)
1 Statistics: the curse of dimensionalityy function of x1
y function of x1 and x2
More fit parameters?⇒ need exponentially more data
y function of 50 000 voxels?Expert knowledge
(pick the right ones) Machine learning
Gael Varoquaux 5
![Page 8: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/8.jpg)
1 Statistics: the curse of dimensionalityy function of x1 y function of x1 and x2
More fit parameters?⇒ need exponentially more data
y function of 50 000 voxels?Expert knowledge
(pick the right ones) Machine learning
Gael Varoquaux 5
![Page 9: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/9.jpg)
1 Statistics: the curse of dimensionalityy function of x1 y function of x1 and x2
More fit parameters?⇒ need exponentially more data
y function of 50 000 voxels?Expert knowledge
(pick the right ones) Machine learning
Gael Varoquaux 5
![Page 10: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/10.jpg)
1 Brain readingPredict from brain images the object viewed
Correlation analysis
Gael Varoquaux 6
![Page 11: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/11.jpg)
1 Brain readingPredict from brain images the object viewed
Inverse problem
Observations Spatial code
Correlation analysis
Inject prior: regularize
Sparse regression= compressive sensing
Gael Varoquaux 6
![Page 12: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/12.jpg)
1 Brain readingPredict from brain images the object viewed
Inverse problem
Observations Spatial code
Correlation analysis
Inject prior: regularize
Extract brain regionsTotal variationregression
[Michel, Trans Med Imag 2011]Gael Varoquaux 6
![Page 13: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/13.jpg)
1 Brain readingPredict from brain images the object viewed
Inverse problem
Observations Spatial code
Correlation analysis
Inject prior: regularize
Extract brain regionsTotal variationregression
[Michel, Trans Med Imag 2011]
Cast the problem in a prediction task:supervised learning.
Prediction is a model-selection metric
Gael Varoquaux 6
![Page 14: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/14.jpg)
1 On-going/spontaneous activity
95% of theactivity is
unrelated to task
Gael Varoquaux 7
![Page 15: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/15.jpg)
1 Learning regions from spontaneous activity
Multi-subject dictionarylearning
SpatialmapTime series
Sparsity + spatial continuity + spatial variability
⇒ Individual maps + functional regions atlas
[Varoquaux, Inf Proc Med Imag 2011]
Gael Varoquaux 8
![Page 16: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/16.jpg)
1 Graphical models: interactions between regions
Estimate covariance structureMany parameters to learn
Regularize: conditionalindependence= sparsity on inverse
covariance
[Varoquaux NIPS 2010]
Gael Varoquaux 9
![Page 17: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/17.jpg)
1 Graphical models: interactions between regions
Estimate covariance structureMany parameters to learn
Regularize: conditionalindependence= sparsity on inverse
covariance
[Varoquaux NIPS 2010]
Find structure via a density estimation:unsupervised learning.
Model selection: likelihood of new data
Gael Varoquaux 9
![Page 18: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/18.jpg)
2 My data-science software stackMayavi, scikit-learn, joblib
Gael Varoquaux 10
![Page 19: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/19.jpg)
2 Mayavi: 3D data visualizationRequirements
large 3D datainteractive visualizationeasy scripting
SolutionVTK: C++ data visualizationUI (traits)
+ pylab-inspired API
Black-box solutions don’t yield new intuitions
Limitationshard to installclunky & complexC++ leaking through3D visualization doesn’tpay in academia
Tragedy of thecommons or
niche product?
Gael Varoquaux 11
![Page 20: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/20.jpg)
2 scikit-learn: statistical learningVision
Address non-machine-learning expertsSimplify but don’t dumb downPerformance: be state of the artEase of installation
Gael Varoquaux 12
![Page 21: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/21.jpg)
2 scikit-learn: statistical learningTechnical choices
Prefer Python or Cython, focus on readabilityDocumentation and examples are paramountLittle object-oriented design. Opt for simplicityPrefer algorithms to frameworkCode quality: consistency and testing
Gael Varoquaux 13
![Page 22: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/22.jpg)
2 scikit-learn: statistical learningAPI
Inputs are numpy arraysLearn a model from the data:estimator.fit(X train, Y train)
Predict using learned modelestimator.predict(X test)
Test goodness of fitestimator.score(X test, y test)
Apply change of representationestimator.transform(X, y)
Gael Varoquaux 14
![Page 23: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/23.jpg)
2 scikit-learn: statistical learningComputational performance
scikit-learn mlpy pybrain pymvpa mdp shogunSVM 5.2 9.47 17.5 11.52 40.48 5.63LARS 1.17 105.3 - 37.35 - -Elastic Net 0.52 73.7 - 1.44 - -kNN 0.57 1.41 - 0.56 0.58 1.36PCA 0.18 - - 8.93 0.47 0.33k-Means 1.34 0.79 ∞ - 35.75 0.68
Algorithms rather than low-level optimizationconvex optimization + machine learning
Avoid memory copies
Gael Varoquaux 15
![Page 24: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/24.jpg)
2 scikit-learn: statistical learningCommunity
35 contributors since 2008, 103 github forks25 contributors in latest release (3 months span)
Why this success?Trendy topic?Low barrier of entryFriendly and very skilled mailing listCredit to people
Gael Varoquaux 16
![Page 25: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/25.jpg)
2 joblib: Python functions on steroidsWe keep recomputing the same things
Nested loops with overlapping sub-problemsVarying parameters I/O
Standard solution:pipelines
ChallengesDependencies modelingParameter tracking
Gael Varoquaux 17
![Page 26: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/26.jpg)
2 joblib: Python functions on steroids
PhilosophySimple don’t change your codeMinimal no dependenciesPerformant big dataRobust never fail
joblib’s solution = lazy recomputation:Take an MD5 hash of function arguments,Store outputs to disk
Gael Varoquaux 18
![Page 27: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/27.jpg)
2 joblibLazy recomputing
>>> from j o b l i b import Memory>>> mem = Memory( c a c h e d i r =’/tmp/ joblib ’)>>> import numpy as np>>> a = np. v a n d e r (np. a r a n g e (3))>>> s q u a r e = mem. cache (np. s q u a r e )>>> b = s q u a r e (a)
[Memory] C a l l i n g s q u a r e ...s q u a r e ( a r r a y ([[0 , 0, 1],
[1, 1, 1],[4, 2, 1]]))
s q u a r e - 0.0 s>>> c = s q u a r e (a)>>> # No recomputation
Gael Varoquaux 19
![Page 28: Python for brain mining:](https://reader031.vdocuments.site/reader031/viewer/2022021120/6206071acf456418c32f16ff/html5/thumbnails/28.jpg)
Conclusion
Data-driven science will needmachine learning because of thecurse of dimensionality
Scikit-learn and joblib: focus onlarge-data performance and easy ofuse
Cannot develop software and science separatelyGael Varoquaux 20