introduction to machine learning for bioinformatics › dotclear › public › lectures ›...
TRANSCRIPT
![Page 1: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/1.jpg)
Introduction to Machine Learning for Bioinformatics
S1133 2019
ChloéAgathe AzencottCentre for Computational Biology, Mines ParisTech
![Page 2: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/2.jpg)
Overview● What kind of problems can machine learning solve,
particularly in bioinformatics?● Some popular supervised ML algorithms:
– Linear models– Support vector machines– Random forests– Neural networks
● How do we select a machine learning algorithm?● What is overfitting, and how can we avoid it?● How do we deal with highdimensional data?
![Page 3: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/3.jpg)
3
What is (Machine) Learning?
![Page 4: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/4.jpg)
4
Why Learn?● Learning:
Aquire a skill from experience, practice.● Machine learning: Programming computers to
– model phenomena– by means of optimizing an objective function– using example data.
![Page 5: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/5.jpg)
5
Why Learn?● There is no need to “learn” to calculate payroll.● Learning is used when
– Human expertise does not exist (bioinformatics);– Humans are unable to explain their expertise (speech
recognition, computer vision).
Classical program
Machine learningprogram
Answers
Rules
Data
Rules
Data
Answers
![Page 6: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/6.jpg)
6
What about AI?
![Page 7: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/7.jpg)
7
Artificial Intelligence
● Weak AI: a software or machine focused on one narrow task
E.g. Siri (Apple), DeepBlue (IBM), AlphaGo (Alphabet), Alexa (Amazon).
● Strong AI: a machine – able to address any problem
– possessing consciousness, sentience, a mind of its own
– always active and interacting with the world
E.g. Hal (Clarke), Skynet (Cameron), Rachael (Dick), Samantha (Jonze).
![Page 8: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/8.jpg)
8
Artificial Intelligence
ML is a component of AI.
So are robotics, data bases, language processing...
![Page 9: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/9.jpg)
9
Zoo of ML Problems
![Page 10: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/10.jpg)
10
Unsupervised learning
Data
Images, text, measurements, omics data...
ML algo Data!
Learn a new representation of the data
X
p
n
![Page 11: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/11.jpg)
11
Clustering
Data
Group similar data points together
– Understand general characteristics of the data;– Infer some properties of an object based on
how it relates to other objects.
ML algo
![Page 12: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/12.jpg)
12
Clustering: applications– Customer segmentation
Find groups of customers with similar buying behavior.
– Topic modelingFind groups of documents with similar content (and thus, hopefully, topics).
– Gene expression clustering– Disease subtyping
Find groups of patients with similar molecular basis for the disease / similar symptomes.
– Tumor cell clusteringUntangle tumor heterogeneity.
![Page 13: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/13.jpg)
13
Dimensionality reduction
Data ML algo
Find a lowerdimensional representation
Data
X
p
n
Images, text, measurements, omics data... X
m
n
![Page 14: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/14.jpg)
14
Dimensionality reduction
Data
Find a lowerdimensional representation
– Reduce storage space & computational time– Remove redundances– Visualization (in 2 or 3 dimensions) and interpretability.
ML algo Data
![Page 15: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/15.jpg)
15
Supervised learning
Data
ML algo
Make predictions
Labels
Predictor
decision function
X
p
n yn
![Page 16: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/16.jpg)
16
ClassificationMake discrete predictions
Binary classification
Multiclass classification
Data
ML algo
Labels
Predictor
![Page 17: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/17.jpg)
17
Classification
gene 1
Healthy
Sickg
ene
2
![Page 18: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/18.jpg)
18
Classification: Applications– Face recognition – Handwritten character recognition
– What is the function of this gene?– Is this DNA sequence a microRNA?– Does this blood sample come from a diseased
or a healthy individual?– Is this drug appropriate for this patient?– What side effects could this new drug have?
![Page 19: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/19.jpg)
19
RegressionMake continuous predictions
Data
ML algo
Labels
Predictor
![Page 20: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/20.jpg)
20
Regression
gene 1
ge
ne 2
![Page 21: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/21.jpg)
21
Regression: Applications
– Click predictionHow many people will click on this ad? Comment on this post? Share this article on social media?
– Load predictionHow many users will my service have at a given time?
– Gene expression regulation: how much of this gene will be expressed?
– When will this patient relapse?– Drug efficacy: how well does this drug work on this tumor?– What is the binding affinity between these two molecules?– How soluble is this chemical in water?
![Page 22: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/22.jpg)
22
Parametric models● Decision function has a set form● Model complexity ≈ number of parameters
● Decision function can have “arbitrary” form● Model complexity grows with the number of
samples.
Nonparametric models
![Page 23: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/23.jpg)
23
Training a supervised model● Ingredients:
– Data
– Hypotheses classType of decision function f
– Cost functionError of f on a sample
● Recipe: empirical risk minimzation (ERM)Find, among all functions of the hypotheses class, one that minimizes the average error of f on all samples.
![Page 24: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/24.jpg)
24
Linear models
![Page 25: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/25.jpg)
25
Linear regression
● Leastsquares fit:
● Equivalent to maximizing the likelihood under the assumption of Gaussian noise
● Exact solution if X has full column rank
![Page 26: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/26.jpg)
26
Classification: logistic regressionLinear function probability→
● Solve by maximizing the likelihood● No analytical solution● Use gradient descent.
![Page 27: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/27.jpg)
27
Support Vector Machines
![Page 28: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/28.jpg)
28
Large margin classifier● Find the separating hyperplane with the largest
margin.
f(x)
prediction errorinverse of the margin
![Page 29: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/29.jpg)
29
When lines are not enough
![Page 30: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/30.jpg)
30
When lines are not enough
![Page 31: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/31.jpg)
31
When lines are not enough
● Nonlinear mapping to a feature space
![Page 32: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/32.jpg)
32
When lines are not enough● Nonlinear mapping to a space of higher
dimension.● https://www.youtube.com/watch?v=3liCbRZPrZA
● Example:
![Page 33: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/33.jpg)
33
The kernel trick● The solution & SVMsolving algorithm can be
expressed using only● Never need to explicitly compute
● k: kernel
– must be positive semidefinite – can be interpreted as a similarity function.
● Support vectors: training points for which α ≠ 0.
![Page 34: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/34.jpg)
34
RBF kernel● Radial Basis Function kernel, or Gaussian kernel
● The feature space is infinitedimensional.
![Page 35: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/35.jpg)
35
Random Forests
![Page 36: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/36.jpg)
36
Decision trees
● Well suited to categorical features● Naturally handle multiclass classification● Interpretable● Perform poorly.
Color?
Shape?Size?
Apple BananaGrape Apple
YellowGreen
BigSmall Curved Ball
![Page 37: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/37.jpg)
37
Ensemble learning● Combining weak learners averages out their
individual errors (wisdom of crowds)● Final prediction:
– Classification: majority vote– Regression: average.
● Bagging: weak learners are trained on bootstraped samples of the data (Breiman, 1996).
bootstrap: sample n, with replacement.● Boosting: weak learners are built iteratively, based on
performance (Shapire, 1990).
![Page 38: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/38.jpg)
38
Random forests● Combine many decision trees
● Each tree is trained on a data set created using
– A bootstrap sample (sample with replacement) of the data– A random sample of the features.
● Very powerful in practice.
Majority vote
![Page 39: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/39.jpg)
39
(Deep) Neural Networks
![Page 40: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/40.jpg)
40
(Deep) neural networks
...
... hidden layer
● Nothing more than a (possibly complicated) parametric model
![Page 41: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/41.jpg)
41
(Deep) neural networks
...
... hidden layer
● Fitting weights:– Nonconvex optimization problem– Solved with gradient descent– Can be difficult to tune.
![Page 42: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/42.jpg)
42
(Deep) neural networks
...
... hidden layer
● Learn an internal representation of the data on which a linear model works well.
● Currently one of the most powerful supervised learning algorithms for large training sets.
![Page 43: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/43.jpg)
43
(Deep) neural networks
Yann Le Cun et al. (1990)
Internal representation of the digits data
![Page 44: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/44.jpg)
44
Adversarial examples
panda gibbon
Goodfellow et al. ICLR 2015https://arxiv.org/pdf/1412.6572v3.pdf
![Page 45: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/45.jpg)
45
October 14Deep learning for image recognition
Source: DOI: 10.1109/ICPR.2016.7900002
![Page 46: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/46.jpg)
46
Generalization & overfitting
![Page 47: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/47.jpg)
47
Generalization● Goal: build models that make good predictions
on new data.● Models that work “too well” on the data we learn
on tend to model noise as well as the underlying phenomenon: overfitting.
![Page 48: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/48.jpg)
48
Overfitting (Classification)
![Page 49: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/49.jpg)
49
Overfitting (Regression)
Overfitting
Underfitting
![Page 50: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/50.jpg)
50
Model complexitySimple models are
– more plausible (Occam’s Razor)– easier to train, use, and interpret.
Pre
dic
tion
err
or
Model complexity
On training data
On new data
![Page 51: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/51.jpg)
51
● Prevent overfitting by designing an objective function that accounts not only for the prediction error but also for model complexity.
min (empirical_error + λ*model_complexity)
● Rember the SVM
Regularization
f(x)
prediction errorinverse of the margin
![Page 52: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/52.jpg)
52
Ridge regression
● Unique solution, always exists
● Grouped selection:Correlated variables get similar weights.
● ShrinkageCoefficients shrink towards 0.
prediction error regularizer
hyperparameter
![Page 53: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/53.jpg)
53
Geometry of ridge regression
feasible region
unconstrained minimum
![Page 54: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/54.jpg)
54
Model selection & evaluation
![Page 55: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/55.jpg)
55
No free lunch theorem
● Any two optimization algorithms are equivalent when their performance is averaged over all possible problems.
● For any learner, there is a learning task that it will not learn well.
Wolpert & Macready, 1997
![Page 56: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/56.jpg)
56
Evaluation on heldout data● If we evaluate the model on the data we’ve used to
train it, we risk overestimating performance.
● Proper procedure:
– Separate the data in train/test sets
– Use a crossvalidation on the train set to find the best algorithm + hyperparameter(s)
– Train this best algorithm + hyperparameter(s) on the entire train set
– The performance on the test set estimates generalization performance.
![Page 57: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/57.jpg)
57
Evaluation on heldout data● If we evaluate the model on the data we’ve used to
train it, we risk overestimating performance.
● Proper procedure:
– Separate the data in train/test sets
– Use a crossvalidation on the train set to find the best algorithm + hyperparameter(s)
– Train this best algorithm + hyperparameter(s) on the entire train set
– The performance on the test set estimates generalization performance.
Train Test
![Page 58: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/58.jpg)
58
Evaluation on heldout data● If we evaluate the model on the data we’ve used to
train it, we risk overestimating performance.
● Proper procedure:
– Separate the data in train/test sets
– Use a crossvalidation on the train set to find the best algorithm + hyperparameter(s)
– Train this best algorithm + hyperparameter(s) on the entire train set
– The performance on the test set estimates generalization performance.
Val
ValTrain
Val
Val
Val
Train
Train
Train
Train
perf
perf
perf
perf
perf
CV score
Train Test
![Page 59: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/59.jpg)
59
Evaluation on heldout data● If we evaluate the model on the data we’ve used to
train it, we risk overestimating performance.
● Proper procedure:
– Separate the data in train/test sets
– Use a crossvalidation on the train set to find the best algorithm + hyperparameter(s)
– Train this best algorithm + hyperparameter(s) on the entire train set
– The performance on the test set estimates generalization performance.
Train Test
![Page 60: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/60.jpg)
60
Practical
![Page 61: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/61.jpg)
61
ML in high dimension
![Page 62: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/62.jpg)
62
“Big data” vs “fat data”
Challenges when p>>n:– Computational challenges (e.g. linear regression)– Curse of dimensionality– Higher risk of overfitting.
![Page 63: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/63.jpg)
63
Curse of dimensionality● Methods/intuitions that work in low dimension do
not necessarily apply in high dimension.● Hyperspace is very big and everything is far apart
Most points inside a cube are outside of the sphere inscribed in this cube.
● Dimensionality reduction:– Feature extraction– Feature selection.
r
![Page 64: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/64.jpg)
64
Feature extraction
● Project the data onto a lowerdimensional space● Creates new features → interpretability?● Matrix factorization techniques:
PCA & kPCA, factorial analysis, nonnegative matrix factorization.
● Manifold learning techniques
MDS, tSNE
● Autoencoders: neural networks
Goal: recover the input as well as possible.
![Page 65: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/65.jpg)
65
Principal Components Analysis● Find a space of dimension m s.t. the variance of
the data projected onto that space is maximal.
Projection on x1
Pro
jec
tion
on
x 2
![Page 66: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/66.jpg)
66
Principal Components Analysis● Find a space of dimension m s.t. the variance of
the data projected onto that space is maximal.● Find orthonormal directions s.t.
–
–
–
● These directions, called principal components are the eigenvectors of , sorted by decreasing eigenvalue.
![Page 67: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/67.jpg)
67
Principal Components Analysis
Novembre et al, Nature 2008
SNP data (Affymetrix 500K) for 1387 Europeans
![Page 68: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/68.jpg)
68
Supervised feature selectionKeep relevant features only
● Filter approaches: apply a statistical test to assign a score to each feature
● Wrapper approaches: use a greedy search to find the best set of features for a given ML algorithm
● Embedded approaches: fit a sparse model, i.e. that is encouraged to not use all the features.
![Page 69: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/69.jpg)
69
A p>>n simulation
p=1 000, n=100, 10 causal features.
![Page 70: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/70.jpg)
70
A p>>n simulation
p=1 000, n=100, 10 causal features.
![Page 71: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/71.jpg)
71
A p>>n simulation
Statistical test
![Page 72: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/72.jpg)
72
A p>>n simulation
Linear regression
![Page 73: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/73.jpg)
73
Lasso
feasible region
unconstrained minimum
prediction error regularizer
● Create sparse models: drive coefficients of β towards 0.
● No explicit solution, use gradient descent.
![Page 74: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/74.jpg)
74
A p>>n simulation
Lasso
![Page 75: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/75.jpg)
75
ElasticNet● Lasso tends to be unstable:
– Picks one of several correlated variables at random
– Different results on similar data sets
● ElasticNet: combine Ridge with Lasso
![Page 76: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/76.jpg)
76
ML Toolboxes● Python: scikitlearnhttp://scikit-learn.org
● R: Machine Learning Task Viewhttp://cran.r-project.org/web/views/MachineLearning.html
● Matlab™: Machine Learning with MATLABhttp://mathworks.com/machine-learning/index.html
– Statistics and Machine Learning Toolbox– Deep Learning Toolbox
● Deep learning software
– Theano http://deeplearning.net/software/theano/ – TensorFlow http://www.tensorflow.org/
– Caffe http://caffe.berkeleyvision.org/
– Keras https://keras.io/
![Page 77: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/77.jpg)
77
![Page 78: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30 · Introduction to Machine Learning for Bioinformatics S1133](https://reader033.vdocuments.site/reader033/viewer/2022060210/5f047c597e708231d40e34c0/html5/thumbnails/78.jpg)
78
SummaryMachine learning =
data + model + objective function● Catalog:
– Supervised vs unsupervised– Parametric vs nonparametric– Linear models, SVMs, random forests, neural networks.
● Key concerns: – Avoid overfitting regularization⇒
– Measure generalization performance.● p >> n setting:
– Feature selection– Sparsity (Lasso, ElasticNet)