![Page 1: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/1.jpg)
Introduction toSuper Learning
Ted Westling, PhDPostdoctoral Researcher
Center for Causal InferencePerelman School of Medicine
University of Pennsylvania
September 25, 2018
1 / 48
![Page 2: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/2.jpg)
Learning Goals
• Conceptual understanding of Super Learning (SL)
• Comfort with the SuperLearner R package
• Awareness of the mathematical backbone of SL
2 / 48
![Page 3: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/3.jpg)
Learning Goals
• Conceptual understanding of Super Learning (SL)
• Comfort with the SuperLearner R package
• Awareness of the mathematical backbone of SL
2 / 48
![Page 4: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/4.jpg)
Learning Goals
• Conceptual understanding of Super Learning (SL)
• Comfort with the SuperLearner R package
• Awareness of the mathematical backbone of SL
2 / 48
![Page 5: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/5.jpg)
Outline
I. Motivation and description of SL (30 minutes)
II. Lab 1: Vanilla SL for a continuous outcome (30 minutes)
III. Mathematical presentation of SL (20 minutes)
IV. Lab 2: Vanilla SL for a binary outcome (30 minutes)
15 minute break
3 / 48
![Page 6: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/6.jpg)
Outline
15 minute break
V. Bells and whistles: Screens, weights, and CV-SL (30
minutes)
VI. Lab 3: Binary outcome redux (40 minutes)
VII. Lab 4: Case-control analysis of Fluzone vaccine (30
minutes)
4 / 48
![Page 7: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/7.jpg)
I. Motivation anddescription of Super
Learning
4 / 48
![Page 8: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/8.jpg)
Notation
• Y is a univariate outcome
• X is a p-variate set of predictors
• We observe n independent copies
(Y1,X1), . . . , (Yn,Xn)
from the joint distribution of (Y ,X).
5 / 48
![Page 9: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/9.jpg)
Notation
• Y is a univariate outcome
• X is a p-variate set of predictors
• We observe n independent copies
(Y1,X1), . . . , (Yn,Xn)
from the joint distribution of (Y ,X).
5 / 48
![Page 10: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/10.jpg)
Notation
• Y is a univariate outcome
• X is a p-variate set of predictors
• We observe n independent copies
(Y1,X1), . . . , (Yn,Xn)
from the joint distribution of (Y ,X).
5 / 48
![Page 11: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/11.jpg)
The problem
• We want to estimate a function, e.g.:
– Conditional mean (regression) function
– Conditional quantile function
– Conditional density function
– Conditional hazard function
• Super Learning can be applied in all of the above settings
• We will focus on estimating the regression function
µ(x) := E [Y | X = x].
6 / 48
![Page 12: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/12.jpg)
The problem
• We want to estimate a function, e.g.:
– Conditional mean (regression) function
– Conditional quantile function
– Conditional density function
– Conditional hazard function
• Super Learning can be applied in all of the above settings
• We will focus on estimating the regression function
µ(x) := E [Y | X = x].
6 / 48
![Page 13: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/13.jpg)
The problem
• We want to estimate a function, e.g.:
– Conditional mean (regression) function
– Conditional quantile function
– Conditional density function
– Conditional hazard function
• Super Learning can be applied in all of the above settings
• We will focus on estimating the regression function
µ(x) := E [Y | X = x].
6 / 48
![Page 14: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/14.jpg)
The problem
• We want to estimate a function, e.g.:
– Conditional mean (regression) function
– Conditional quantile function
– Conditional density function
– Conditional hazard function
• Super Learning can be applied in all of the above settings
• We will focus on estimating the regression function
µ(x) := E [Y | X = x].
6 / 48
![Page 15: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/15.jpg)
The problem
• We want to estimate a function, e.g.:
– Conditional mean (regression) function
– Conditional quantile function
– Conditional density function
– Conditional hazard function
• Super Learning can be applied in all of the above settings
• We will focus on estimating the regression function
µ(x) := E [Y | X = x].
6 / 48
![Page 16: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/16.jpg)
The problem
• We want to estimate a function, e.g.:
– Conditional mean (regression) function
– Conditional quantile function
– Conditional density function
– Conditional hazard function
• Super Learning can be applied in all of the above settings
• We will focus on estimating the regression function
µ(x) := E [Y | X = x].
6 / 48
![Page 17: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/17.jpg)
The problem
• We want to estimate a function, e.g.:
– Conditional mean (regression) function
– Conditional quantile function
– Conditional density function
– Conditional hazard function
• Super Learning can be applied in all of the above settings
• We will focus on estimating the regression function
µ(x) := E [Y | X = x].
6 / 48
![Page 18: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/18.jpg)
Why?
1. Exploratory analysis
2. Imputation of missing values
3. Prediction for new observations
4. Assessing prediction quality/comparing competing
estimators
5. Use as a nuisance parameter estimator
6. Confirmatory analysis/hypothesis testing
(not our goal here)
7 / 48
![Page 19: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/19.jpg)
Why?
1. Exploratory analysis
2. Imputation of missing values
3. Prediction for new observations
4. Assessing prediction quality/comparing competing
estimators
5. Use as a nuisance parameter estimator
6. Confirmatory analysis/hypothesis testing
(not our goal here)
7 / 48
![Page 20: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/20.jpg)
Why?
1. Exploratory analysis
2. Imputation of missing values
3. Prediction for new observations
4. Assessing prediction quality/comparing competing
estimators
5. Use as a nuisance parameter estimator
6. Confirmatory analysis/hypothesis testing
(not our goal here)
7 / 48
![Page 21: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/21.jpg)
Why?
1. Exploratory analysis
2. Imputation of missing values
3. Prediction for new observations
4. Assessing prediction quality/comparing competing
estimators
5. Use as a nuisance parameter estimator
6. Confirmatory analysis/hypothesis testing
(not our goal here)
7 / 48
![Page 22: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/22.jpg)
Why?
1. Exploratory analysis
2. Imputation of missing values
3. Prediction for new observations
4. Assessing prediction quality/comparing competing
estimators
5. Use as a nuisance parameter estimator
6. Confirmatory analysis/hypothesis testing
(not our goal here)
7 / 48
![Page 23: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/23.jpg)
Why?
1. Exploratory analysis
2. Imputation of missing values
3. Prediction for new observations
4. Assessing prediction quality/comparing competing
estimators
5. Use as a nuisance parameter estimator
6. Confirmatory analysis/hypothesis testing
(not our goal here)
7 / 48
![Page 24: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/24.jpg)
Why?
1. Exploratory analysis
2. Imputation of missing values
3. Prediction for new observations
4. Assessing prediction quality/comparing competing
estimators
5. Use as a nuisance parameter estimator
6. Confirmatory analysis/hypothesis testing
(not our goal here)
7 / 48
![Page 25: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/25.jpg)
We want to estimate µ(x) = E [Y | X = x].
How should we do it?
8 / 48
![Page 26: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/26.jpg)
We want to estimate µ(x) = E [Y | X = x].
How should we do it?
8 / 48
![Page 27: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/27.jpg)
We want to estimate µ(x) = E [Y | X = x].
How should we do it?
GAM
8 / 48
![Page 28: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/28.jpg)
We want to estimate µ(x) = E [Y | X = x].
How should we do it?
GAM Random Forest
8 / 48
![Page 29: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/29.jpg)
We want to estimate µ(x) = E [Y | X = x].
How should we do it?
GAM Random ForestNeural network
8 / 48
![Page 30: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/30.jpg)
We want to estimate µ(x) = E [Y | X = x].
How should we do it?
GAM Random ForestNeural network GLM
8 / 48
![Page 31: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/31.jpg)
How do we choose which algorithm to use?
9 / 48
![Page 32: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/32.jpg)
Super Learning is:
An ensemble method for combiningpredictions from many candidate machine
learning algorithms
10 / 48
![Page 33: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/33.jpg)
Measuring algorithmperformance
• Suppose µ̂1, . . . , µ̂K are candidate estimators of µ.
• k will always index estimators, and i will always index
observations (e.g. study participants)
• The mean squared error of µ̂k ,
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]measures the performance of µ̂k as an estimator of µ.
• If we knew MSE(µ̂k ), we could choose the µ̂k with the
smallest MSE(µ̂k ).
11 / 48
![Page 34: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/34.jpg)
Measuring algorithmperformance
• Suppose µ̂1, . . . , µ̂K are candidate estimators of µ.
• k will always index estimators, and i will always index
observations (e.g. study participants)
• The mean squared error of µ̂k ,
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]measures the performance of µ̂k as an estimator of µ.
• If we knew MSE(µ̂k ), we could choose the µ̂k with the
smallest MSE(µ̂k ).
11 / 48
![Page 35: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/35.jpg)
Measuring algorithmperformance
• Suppose µ̂1, . . . , µ̂K are candidate estimators of µ.
• k will always index estimators, and i will always index
observations (e.g. study participants)
• The mean squared error of µ̂k ,
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]measures the performance of µ̂k as an estimator of µ.
• If we knew MSE(µ̂k ), we could choose the µ̂k with the
smallest MSE(µ̂k ).
11 / 48
![Page 36: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/36.jpg)
Measuring algorithmperformance
• Suppose µ̂1, . . . , µ̂K are candidate estimators of µ.
• k will always index estimators, and i will always index
observations (e.g. study participants)
• The mean squared error of µ̂k ,
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]measures the performance of µ̂k as an estimator of µ.
• If we knew MSE(µ̂k ), we could choose the µ̂k with the
smallest MSE(µ̂k ). 11 / 48
![Page 37: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/37.jpg)
Estimating MSE
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]
• It is tempting to take M̂SE(µ̂k ) = 1n∑n
i=1 [Yi − µ̂k (Xi)]2.
• This estimator will favor µ̂k which are overfit, because µ̂k
are trained on the same data used to evaluate the MSE.
• Analogy: a student has the exam questions before taking
the exam!
• Instead, we estimate MSE using cross-validation.
12 / 48
![Page 38: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/38.jpg)
Estimating MSE
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]• It is tempting to take M̂SE(µ̂k ) = 1
n∑n
i=1 [Yi − µ̂k (Xi)]2.
• This estimator will favor µ̂k which are overfit, because µ̂k
are trained on the same data used to evaluate the MSE.
• Analogy: a student has the exam questions before taking
the exam!
• Instead, we estimate MSE using cross-validation.
12 / 48
![Page 39: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/39.jpg)
Estimating MSE
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]• It is tempting to take M̂SE(µ̂k ) = 1
n∑n
i=1 [Yi − µ̂k (Xi)]2.
• This estimator will favor µ̂k which are overfit, because µ̂k
are trained on the same data used to evaluate the MSE.
• Analogy: a student has the exam questions before taking
the exam!
• Instead, we estimate MSE using cross-validation.
12 / 48
![Page 40: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/40.jpg)
Estimating MSE
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]• It is tempting to take M̂SE(µ̂k ) = 1
n∑n
i=1 [Yi − µ̂k (Xi)]2.
• This estimator will favor µ̂k which are overfit, because µ̂k
are trained on the same data used to evaluate the MSE.
• Analogy: a student has the exam questions before taking
the exam!
• Instead, we estimate MSE using cross-validation.
12 / 48
![Page 41: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/41.jpg)
Estimating MSE
MSE(µ̂k ) = E[(Y − µ̂k (X))2
]• It is tempting to take M̂SE(µ̂k ) = 1
n∑n
i=1 [Yi − µ̂k (Xi)]2.
• This estimator will favor µ̂k which are overfit, because µ̂k
are trained on the same data used to evaluate the MSE.
• Analogy: a student has the exam questions before taking
the exam!
• Instead, we estimate MSE using cross-validation.
12 / 48
![Page 42: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/42.jpg)
Cross-validation
1. Split the data in to V “folds” of size roughly n/V .
2. For each fold v = 1, . . . ,V :
• the data in folds other than v is called the training set;
• the data in fold v is called the test/validation set.
13 / 48
![Page 43: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/43.jpg)
Cross-validation
1. Split the data in to V “folds” of size roughly n/V .
2. For each fold v = 1, . . . ,V :
• the data in folds other than v is called the training set;
• the data in fold v is called the test/validation set.
13 / 48
![Page 44: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/44.jpg)
1
2
3
4
5
6
7
8
9
10
1
Fold 1
1
2
3
4
5
6
7
8
9
10
2
Fold 2
1
2
3
4
5
6
7
8
9
10
3
Fold 3
1
2
3
4
5
6
7
8
9
10
4
Fold 4
1
2
3
4
5
6
7
8
9
10
5
Fold 5
1
2
3
4
5
6
7
8
9
10
6
Fold 6
1
2
3
4
5
6
7
8
9
10
7
Fold 7
1
2
3
4
5
6
7
8
9
10
8
Fold 8
1
2
3
4
5
6
7
8
9
10
9
Fold 9
1
2
3
4
5
6
7
8
9
1010
Fold 10
Schematic of 10-fold cross-validation. Gray: training sets. Yellow: validation sets.
14 / 48
![Page 45: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/45.jpg)
Cross-validation
1. Split the data in to V “folds” of size roughly n/V .
2. For each fold v = 1, . . . ,V :
• the data in folds other than v is called the training set;
• the data in fold v is called the test/validation set.
• we obtain µ̂k ,v using the training set;
• we obtain µ̂k ,v (Xi) for Xi in the validation set Vv .
3. Our cross-validated MSE is
M̂SECV (µ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi − µ̂k ,v (Xi)]2.
We average the MSEs of the V validation sets.
15 / 48
![Page 46: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/46.jpg)
Cross-validation
1. Split the data in to V “folds” of size roughly n/V .
2. For each fold v = 1, . . . ,V :
• the data in folds other than v is called the training set;
• the data in fold v is called the test/validation set.
• we obtain µ̂k ,v using the training set;
• we obtain µ̂k ,v (Xi) for Xi in the validation set Vv .
3. Our cross-validated MSE is
M̂SECV (µ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi − µ̂k ,v (Xi)]2.
We average the MSEs of the V validation sets.
15 / 48
![Page 47: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/47.jpg)
Cross-validation
1. Split the data in to V “folds” of size roughly n/V .
2. For each fold v = 1, . . . ,V :
• the data in folds other than v is called the training set;
• the data in fold v is called the test/validation set.
• we obtain µ̂k ,v using the training set;
• we obtain µ̂k ,v (Xi) for Xi in the validation set Vv .
3. Our cross-validated MSE is
M̂SECV (µ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi − µ̂k ,v (Xi)]2.
We average the MSEs of the V validation sets.
15 / 48
![Page 48: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/48.jpg)
Cross-validation
1. Split the data in to V “folds” of size roughly n/V .
2. For each fold v = 1, . . . ,V :
• the data in folds other than v is called the training set;
• the data in fold v is called the test/validation set.
• we obtain µ̂k ,v using the training set;
• we obtain µ̂k ,v (Xi) for Xi in the validation set Vv .
3. Our cross-validated MSE is
M̂SECV (µ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi − µ̂k ,v (Xi)]2.
We average the MSEs of the V validation sets.
15 / 48
![Page 49: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/49.jpg)
Cross-validation
1. Split the data in to V “folds” of size roughly n/V .
2. For each fold v = 1, . . . ,V :
• the data in folds other than v is called the training set;
• the data in fold v is called the test/validation set.
• we obtain µ̂k ,v using the training set;
• we obtain µ̂k ,v (Xi) for Xi in the validation set Vv .
3. Our cross-validated MSE is
M̂SECV (µ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi − µ̂k ,v (Xi)]2.
We average the MSEs of the V validation sets.15 / 48
![Page 50: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/50.jpg)
1
2
3
4
5
6
7
8
9
10
1
Fold 1
1
2
3
4
5
6
7
8
9
10
2
Fold 2
1
2
3
4
5
6
7
8
9
10
3
Fold 3
1
2
3
4
5
6
7
8
9
10
4
Fold 4
1
2
3
4
5
6
7
8
9
10
5
Fold 5
1
2
3
4
5
6
7
8
9
10
6
Fold 6
1
2
3
4
5
6
7
8
9
10
7
Fold 7
1
2
3
4
5
6
7
8
9
10
8
Fold 8
1
2
3
4
5
6
7
8
9
10
9
Fold 9
1
2
3
4
5
6
7
8
9
1010
Fold 10
1
2
3
4
5
6
7
8
9
10
CV preds.
Schematic of 10-fold cross-validation. Gray: training sets. Yellow: validation sets.16 / 48
![Page 51: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/51.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 52: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/52.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 53: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/53.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 54: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/54.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 55: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/55.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 56: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/56.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 57: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/57.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 58: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/58.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)
17 / 48
![Page 59: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/59.jpg)
How do we choose V?
• Large V :
– more training data, so better for small n
– more computation time
– well-suited to high-dimensional covariates
– well-suited to complicated or non-smooth µ
• Small V :
– more test data
– less computation time.
(People typically use V = 5 or V = 10.)17 / 48
![Page 60: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/60.jpg)
“Discrete” Super Learner
• At this point, we have cross-validated MSE estimates
M̂SECV (µ̂1), . . . , M̂SECV (µ̂K )
for each of our candidate algorithms.
• We could simply take as our estimator the µ̂k minimizing
these cross-validated MSEs.
• We call this the “discrete Super Learner”.
18 / 48
![Page 61: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/61.jpg)
“Discrete” Super Learner
• At this point, we have cross-validated MSE estimates
M̂SECV (µ̂1), . . . , M̂SECV (µ̂K )
for each of our candidate algorithms.
• We could simply take as our estimator the µ̂k minimizing
these cross-validated MSEs.
• We call this the “discrete Super Learner”.
18 / 48
![Page 62: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/62.jpg)
“Discrete” Super Learner
• At this point, we have cross-validated MSE estimates
M̂SECV (µ̂1), . . . , M̂SECV (µ̂K )
for each of our candidate algorithms.
• We could simply take as our estimator the µ̂k minimizing
these cross-validated MSEs.
• We call this the “discrete Super Learner”.
18 / 48
![Page 63: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/63.jpg)
Super Learner
• Let λ = (λ1, . . . , λK ) be an element of SK , the
K -dimensional simplex: each λk ∈ [0,1] and∑
k λk = 1.
• Super Learner considers as its set of candidate algorithms
all convex combinations µ̂λ :=∑K
k=1 λk µ̂k .
• The Super Learner is µ̂λ̂
, where
λ̂ := arg minλ∈SK
M̂SECV
(K∑
k=1
λk µ̂k
).
(We use constrained optimization to compute the argmin.)
19 / 48
![Page 64: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/64.jpg)
Super Learner
• Let λ = (λ1, . . . , λK ) be an element of SK , the
K -dimensional simplex: each λk ∈ [0,1] and∑
k λk = 1.
• Super Learner considers as its set of candidate algorithms
all convex combinations µ̂λ :=∑K
k=1 λk µ̂k .
• The Super Learner is µ̂λ̂
, where
λ̂ := arg minλ∈SK
M̂SECV
(K∑
k=1
λk µ̂k
).
(We use constrained optimization to compute the argmin.)
19 / 48
![Page 65: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/65.jpg)
Super Learner
• Let λ = (λ1, . . . , λK ) be an element of SK , the
K -dimensional simplex: each λk ∈ [0,1] and∑
k λk = 1.
• Super Learner considers as its set of candidate algorithms
all convex combinations µ̂λ :=∑K
k=1 λk µ̂k .
• The Super Learner is µ̂λ̂
, where
λ̂ := arg minλ∈SK
M̂SECV
(K∑
k=1
λk µ̂k
).
(We use constrained optimization to compute the argmin.)
19 / 48
![Page 66: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/66.jpg)
Super Learner
λ̂ := arg minλ∈SK
M̂SECV
(K∑
k=1
λk µ̂k
).
M̂SECV
(K∑
k=1
λk µ̂k
)=
1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi −
K∑k=1
λk µ̂k ,v (Xi)
]2
.
20 / 48
![Page 67: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/67.jpg)
Super Learner
λ̂ := arg minλ∈SK
M̂SECV
(K∑
k=1
λk µ̂k
).
M̂SECV
(K∑
k=1
λk µ̂k
)=
1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi −
K∑k=1
λk µ̂k ,v (Xi)
]2
.
20 / 48
![Page 68: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/68.jpg)
Super Learner
λ̂ := arg minλ∈SK
M̂SECV
(K∑
k=1
λk µ̂k
).
M̂SECV
(K∑
k=1
λk µ̂k
)=
1V
V∑v=1
1|Vv |
∑i∈Vv
[Yi −
K∑k=1
λk µ̂k ,v (Xi)
]2
.
20 / 48
![Page 69: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/69.jpg)
Super Learner: steps
Putting it all together:
1. Define a library of candidate algorithms µ̂1, . . . , µ̂K .
2. Obtain the CV-predictions µ̂k ,v (Xi) for all k , v and i ∈ Vv .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKM̂SECV
(∑Kk=1 λk µ̂k
).
4. Take µ̂SL =∑K
k=1 λ̂k µ̂k .
21 / 48
![Page 70: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/70.jpg)
Super Learner: steps
Putting it all together:
1. Define a library of candidate algorithms µ̂1, . . . , µ̂K .
2. Obtain the CV-predictions µ̂k ,v (Xi) for all k , v and i ∈ Vv .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKM̂SECV
(∑Kk=1 λk µ̂k
).
4. Take µ̂SL =∑K
k=1 λ̂k µ̂k .
21 / 48
![Page 71: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/71.jpg)
Super Learner: steps
Putting it all together:
1. Define a library of candidate algorithms µ̂1, . . . , µ̂K .
2. Obtain the CV-predictions µ̂k ,v (Xi) for all k , v and i ∈ Vv .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKM̂SECV
(∑Kk=1 λk µ̂k
).
4. Take µ̂SL =∑K
k=1 λ̂k µ̂k .
21 / 48
![Page 72: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/72.jpg)
Super Learner: steps
Putting it all together:
1. Define a library of candidate algorithms µ̂1, . . . , µ̂K .
2. Obtain the CV-predictions µ̂k ,v (Xi) for all k , v and i ∈ Vv .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKM̂SECV
(∑Kk=1 λk µ̂k
).
4. Take µ̂SL =∑K
k=1 λ̂k µ̂k .
21 / 48
![Page 73: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/73.jpg)
Super Learner: steps
Putting it all together:
1. Define a library of candidate algorithms µ̂1, . . . , µ̂K .
2. Obtain the CV-predictions µ̂k ,v (Xi) for all k , v and i ∈ Vv .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKM̂SECV
(∑Kk=1 λk µ̂k
).
4. Take µ̂SL =∑K
k=1 λ̂k µ̂k .
21 / 48
![Page 74: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/74.jpg)
II. Lab 1:Vanilla SL for a
continuous outcome
21 / 48
![Page 75: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/75.jpg)
III. Into the weeds:a mathematical
presentation of SL
21 / 48
![Page 76: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/76.jpg)
Review
Recall the construction of SL for a continuous outcome:
1. Define a library of candidate algorithms µ̂1, . . . , µ̂K .
2. Obtain the CV-predictions µ̂k ,v (Xi) for all k , v and i ∈ Vv .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKM̂SECV
(∑Kk=1 λk µ̂k
).
4. Take µ̂SL =∑K
k=1 λ̂k µ̂k .
22 / 48
![Page 77: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/77.jpg)
Review
Recall the construction of SL for a continuous outcome:
1. Define a library of candidate algorithms µ̂1, . . . , µ̂K .
2. Obtain the CV-predictions µ̂k ,v (Xi) for all k , v and i ∈ Vv .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKM̂SECV
(∑Kk=1 λk µ̂k
).
4. Take µ̂SL =∑K
k=1 λ̂k µ̂k .
22 / 48
![Page 78: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/78.jpg)
In this section, we generalize this procedure to estimation
of any summary of the observed data distribution given an
appropriate loss for the summary of interest.
23 / 48
![Page 79: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/79.jpg)
Loss and risk: setup
• Denote by O the observed data unit – e.g. O = (Y ,X).
• Denote by O the sample space of O
• LetM denote our statistical model.
• Denote by P0 ∈M the true distribution of O.
• Thus, we observe i.i.d. copies O1, . . . ,On ∼ P0.
• Suppose we want to estimate a parameter θ :M→ Θ.
• Denote θ0 := θ(P0) the true parameter value.
24 / 48
![Page 80: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/80.jpg)
Loss and risk: setup
• Denote by O the observed data unit – e.g. O = (Y ,X).
• Denote by O the sample space of O
• LetM denote our statistical model.
• Denote by P0 ∈M the true distribution of O.
• Thus, we observe i.i.d. copies O1, . . . ,On ∼ P0.
• Suppose we want to estimate a parameter θ :M→ Θ.
• Denote θ0 := θ(P0) the true parameter value.
24 / 48
![Page 81: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/81.jpg)
Loss and risk: setup
• Denote by O the observed data unit – e.g. O = (Y ,X).
• Denote by O the sample space of O
• LetM denote our statistical model.
• Denote by P0 ∈M the true distribution of O.
• Thus, we observe i.i.d. copies O1, . . . ,On ∼ P0.
• Suppose we want to estimate a parameter θ :M→ Θ.
• Denote θ0 := θ(P0) the true parameter value.
24 / 48
![Page 82: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/82.jpg)
Loss and risk: setup
• Denote by O the observed data unit – e.g. O = (Y ,X).
• Denote by O the sample space of O
• LetM denote our statistical model.
• Denote by P0 ∈M the true distribution of O.
• Thus, we observe i.i.d. copies O1, . . . ,On ∼ P0.
• Suppose we want to estimate a parameter θ :M→ Θ.
• Denote θ0 := θ(P0) the true parameter value.
24 / 48
![Page 83: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/83.jpg)
Loss and risk: setup
• Denote by O the observed data unit – e.g. O = (Y ,X).
• Denote by O the sample space of O
• LetM denote our statistical model.
• Denote by P0 ∈M the true distribution of O.
• Thus, we observe i.i.d. copies O1, . . . ,On ∼ P0.
• Suppose we want to estimate a parameter θ :M→ Θ.
• Denote θ0 := θ(P0) the true parameter value.
24 / 48
![Page 84: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/84.jpg)
Loss and risk: setup
• Denote by O the observed data unit – e.g. O = (Y ,X).
• Denote by O the sample space of O
• LetM denote our statistical model.
• Denote by P0 ∈M the true distribution of O.
• Thus, we observe i.i.d. copies O1, . . . ,On ∼ P0.
• Suppose we want to estimate a parameter θ :M→ Θ.
• Denote θ0 := θ(P0) the true parameter value.
24 / 48
![Page 85: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/85.jpg)
Loss and risk: setup
• Denote by O the observed data unit – e.g. O = (Y ,X).
• Denote by O the sample space of O
• LetM denote our statistical model.
• Denote by P0 ∈M the true distribution of O.
• Thus, we observe i.i.d. copies O1, . . . ,On ∼ P0.
• Suppose we want to estimate a parameter θ :M→ Θ.
• Denote θ0 := θ(P0) the true parameter value.
24 / 48
![Page 86: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/86.jpg)
Loss and risk
• Let L be a map from O ×Θ to R.
• We call L a loss function for θ if it holds that
θ0 = arg minθ∈Θ
EP0 [L(O, θ)] .
• R0(θ) = EP0 [L(O, θ)] is called the oracle risk.
• These definitions of loss and risk come from the statistical
learning literature (see, e.g. Vapnik, 1992, 1999, 2013)
and are not to be confused with loss and risk from the
decision theory literature (e.g. Ferguson, 2014).
25 / 48
![Page 87: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/87.jpg)
Loss and risk
• Let L be a map from O ×Θ to R.
• We call L a loss function for θ if it holds that
θ0 = arg minθ∈Θ
EP0 [L(O, θ)] .
• R0(θ) = EP0 [L(O, θ)] is called the oracle risk.
• These definitions of loss and risk come from the statistical
learning literature (see, e.g. Vapnik, 1992, 1999, 2013)
and are not to be confused with loss and risk from the
decision theory literature (e.g. Ferguson, 2014).
25 / 48
![Page 88: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/88.jpg)
Loss and risk
• Let L be a map from O ×Θ to R.
• We call L a loss function for θ if it holds that
θ0 = arg minθ∈Θ
EP0 [L(O, θ)] .
• R0(θ) = EP0 [L(O, θ)] is called the oracle risk.
• These definitions of loss and risk come from the statistical
learning literature (see, e.g. Vapnik, 1992, 1999, 2013)
and are not to be confused with loss and risk from the
decision theory literature (e.g. Ferguson, 2014).
25 / 48
![Page 89: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/89.jpg)
Loss and risk
• Let L be a map from O ×Θ to R.
• We call L a loss function for θ if it holds that
θ0 = arg minθ∈Θ
EP0 [L(O, θ)] .
• R0(θ) = EP0 [L(O, θ)] is called the oracle risk.
• These definitions of loss and risk come from the statistical
learning literature (see, e.g. Vapnik, 1992, 1999, 2013)
and are not to be confused with loss and risk from the
decision theory literature (e.g. Ferguson, 2014).
25 / 48
![Page 90: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/90.jpg)
Loss and risk: MSE example
MSE is the oracle risk corresponding to a
squared-error loss function
• O = (Y ,X).
• θ(P) = µ(P) = {x 7→ EP [Y | X = x]}
• L(O, µ) = [Y − µ(X)]2 is the squared-error loss.
• R0(µ) = MSE(µ) = EP0 [Y − µ(X)]2.
26 / 48
![Page 91: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/91.jpg)
Loss and risk: MSE example
MSE is the oracle risk corresponding to a
squared-error loss function
• O = (Y ,X).
• θ(P) = µ(P) = {x 7→ EP [Y | X = x]}
• L(O, µ) = [Y − µ(X)]2 is the squared-error loss.
• R0(µ) = MSE(µ) = EP0 [Y − µ(X)]2.
26 / 48
![Page 92: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/92.jpg)
Loss and risk: MSE example
MSE is the oracle risk corresponding to a
squared-error loss function
• O = (Y ,X).
• θ(P) = µ(P) = {x 7→ EP [Y | X = x]}
• L(O, µ) = [Y − µ(X)]2 is the squared-error loss.
• R0(µ) = MSE(µ) = EP0 [Y − µ(X)]2.
26 / 48
![Page 93: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/93.jpg)
Loss and risk: MSE example
MSE is the oracle risk corresponding to a
squared-error loss function
• O = (Y ,X).
• θ(P) = µ(P) = {x 7→ EP [Y | X = x]}
• L(O, µ) = [Y − µ(X)]2 is the squared-error loss.
• R0(µ) = MSE(µ) = EP0 [Y − µ(X)]2.
26 / 48
![Page 94: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/94.jpg)
Loss and risk: MSE example
MSE is the oracle risk corresponding to a
squared-error loss function
• O = (Y ,X).
• θ(P) = µ(P) = {x 7→ EP [Y | X = x]}
• L(O, µ) = [Y − µ(X)]2 is the squared-error loss.
• R0(µ) = MSE(µ) = EP0 [Y − µ(X)]2.
26 / 48
![Page 95: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/95.jpg)
Estimating the oracle risk
θ0 = arg minθ∈Θ
R0(θ)
R0(θ) = EP0 [L(O, θ)]
• Suppose that θ̂1, . . . , θ̂K are candidate estimators.
• As before, we need to estimate R0(θ) to evaluate each θ̂k .
• The naive estimator is R̂(θ̂k ) = 1n∑n
i=1 L(Oi , θ̂k ).
• We instead estimate R0(θ) using the cross-validated risk
R̂CV (θ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
L(Oi , θ̂k ,v ).
27 / 48
![Page 96: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/96.jpg)
Estimating the oracle risk
θ0 = arg minθ∈Θ
R0(θ)
R0(θ) = EP0 [L(O, θ)]
• Suppose that θ̂1, . . . , θ̂K are candidate estimators.
• As before, we need to estimate R0(θ) to evaluate each θ̂k .
• The naive estimator is R̂(θ̂k ) = 1n∑n
i=1 L(Oi , θ̂k ).
• We instead estimate R0(θ) using the cross-validated risk
R̂CV (θ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
L(Oi , θ̂k ,v ).
27 / 48
![Page 97: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/97.jpg)
Estimating the oracle risk
θ0 = arg minθ∈Θ
R0(θ)
R0(θ) = EP0 [L(O, θ)]
• Suppose that θ̂1, . . . , θ̂K are candidate estimators.
• As before, we need to estimate R0(θ) to evaluate each θ̂k .
• The naive estimator is R̂(θ̂k ) = 1n∑n
i=1 L(Oi , θ̂k ).
• We instead estimate R0(θ) using the cross-validated risk
R̂CV (θ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
L(Oi , θ̂k ,v ).
27 / 48
![Page 98: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/98.jpg)
Estimating the oracle risk
θ0 = arg minθ∈Θ
R0(θ)
R0(θ) = EP0 [L(O, θ)]
• Suppose that θ̂1, . . . , θ̂K are candidate estimators.
• As before, we need to estimate R0(θ) to evaluate each θ̂k .
• The naive estimator is R̂(θ̂k ) = 1n∑n
i=1 L(Oi , θ̂k ).
• We instead estimate R0(θ) using the cross-validated risk
R̂CV (θ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
L(Oi , θ̂k ,v ).
27 / 48
![Page 99: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/99.jpg)
Estimating the oracle risk
θ0 = arg minθ∈Θ
R0(θ)
R0(θ) = EP0 [L(O, θ)]
• Suppose that θ̂1, . . . , θ̂K are candidate estimators.
• As before, we need to estimate R0(θ) to evaluate each θ̂k .
• The naive estimator is R̂(θ̂k ) = 1n∑n
i=1 L(Oi , θ̂k ).
• We instead estimate R0(θ) using the cross-validated risk
R̂CV (θ̂k ) =1V
V∑v=1
1|Vv |
∑i∈Vv
L(Oi , θ̂k ,v ).
27 / 48
![Page 100: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/100.jpg)
Super Learner: general steps
Using this framework, we can generalize the SL recipe:
1. Define a library of candidate algorithms θ̂1, . . . , θ̂K .
2. Obtain the CV-Risks R̂CV (θ̂k ), k = 1, . . . ,K .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKR̂CV
(∑Kk=1 λk θ̂k
).
4. Take θ̂SL =∑K
k=1 λ̂k θ̂k .
28 / 48
![Page 101: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/101.jpg)
Super Learner: general steps
Using this framework, we can generalize the SL recipe:
1. Define a library of candidate algorithms θ̂1, . . . , θ̂K .
2. Obtain the CV-Risks R̂CV (θ̂k ), k = 1, . . . ,K .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKR̂CV
(∑Kk=1 λk θ̂k
).
4. Take θ̂SL =∑K
k=1 λ̂k θ̂k .
28 / 48
![Page 102: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/102.jpg)
Super Learner: general steps
Using this framework, we can generalize the SL recipe:
1. Define a library of candidate algorithms θ̂1, . . . , θ̂K .
2. Obtain the CV-Risks R̂CV (θ̂k ), k = 1, . . . ,K .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKR̂CV
(∑Kk=1 λk θ̂k
).
4. Take θ̂SL =∑K
k=1 λ̂k θ̂k .
28 / 48
![Page 103: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/103.jpg)
Super Learner: general steps
Using this framework, we can generalize the SL recipe:
1. Define a library of candidate algorithms θ̂1, . . . , θ̂K .
2. Obtain the CV-Risks R̂CV (θ̂k ), k = 1, . . . ,K .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKR̂CV
(∑Kk=1 λk θ̂k
).
4. Take θ̂SL =∑K
k=1 λ̂k θ̂k .
28 / 48
![Page 104: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/104.jpg)
Super Learner: general steps
Using this framework, we can generalize the SL recipe:
1. Define a library of candidate algorithms θ̂1, . . . , θ̂K .
2. Obtain the CV-Risks R̂CV (θ̂k ), k = 1, . . . ,K .
3. Use constrained optimization to compute the SL weights
λ̂ := arg minλ∈SKR̂CV
(∑Kk=1 λk θ̂k
).
4. Take θ̂SL =∑K
k=1 λ̂k θ̂k .
28 / 48
![Page 105: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/105.jpg)
Theoretical guarantees
van der Vaart et al. (2006) showed that, under some conditions,
the oracle risk of the SL estimator is as good as the oracle
risk of the oracle minimizer up to a multiple of log nn as long as
the number of candidate algorithms is polynomial in n.
29 / 48
![Page 106: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/106.jpg)
Loss functions for a binaryoutcome
We return to O = (Y ,X), θ = µ.
• For continuous Y , we used squared-error loss.
• For binary Y , squared-error loss is still valid.
• However, there are (at least) two other alternative loss
functions for a binary outcome.
– Negative log-likelihood loss:
L(O, µ) = −Y logµ(X)− [1− Y ] log[1− µ(X)].
– AUC loss.
30 / 48
![Page 107: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/107.jpg)
Loss functions for a binaryoutcome
We return to O = (Y ,X), θ = µ.
• For continuous Y , we used squared-error loss.
• For binary Y , squared-error loss is still valid.
• However, there are (at least) two other alternative loss
functions for a binary outcome.
– Negative log-likelihood loss:
L(O, µ) = −Y logµ(X)− [1− Y ] log[1− µ(X)].
– AUC loss.
30 / 48
![Page 108: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/108.jpg)
Loss functions for a binaryoutcome
We return to O = (Y ,X), θ = µ.
• For continuous Y , we used squared-error loss.
• For binary Y , squared-error loss is still valid.
• However, there are (at least) two other alternative loss
functions for a binary outcome.
– Negative log-likelihood loss:
L(O, µ) = −Y logµ(X)− [1− Y ] log[1− µ(X)].
– AUC loss.
30 / 48
![Page 109: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/109.jpg)
Loss functions for a binaryoutcome
We return to O = (Y ,X), θ = µ.
• For continuous Y , we used squared-error loss.
• For binary Y , squared-error loss is still valid.
• However, there are (at least) two other alternative loss
functions for a binary outcome.
– Negative log-likelihood loss:
L(O, µ) = −Y logµ(X)− [1− Y ] log[1− µ(X)].
– AUC loss.
30 / 48
![Page 110: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/110.jpg)
Loss functions for a binaryoutcome
We return to O = (Y ,X), θ = µ.
• For continuous Y , we used squared-error loss.
• For binary Y , squared-error loss is still valid.
• However, there are (at least) two other alternative loss
functions for a binary outcome.
– Negative log-likelihood loss:
L(O, µ) = −Y logµ(X)− [1− Y ] log[1− µ(X)].
– AUC loss.
30 / 48
![Page 111: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/111.jpg)
Loss functions for a binaryoutcome
We return to O = (Y ,X), θ = µ.
• For continuous Y , we used squared-error loss.
• For binary Y , squared-error loss is still valid.
• However, there are (at least) two other alternative loss
functions for a binary outcome.
– Negative log-likelihood loss:
L(O, µ) = −Y logµ(X)− [1− Y ] log[1− µ(X)].
– AUC loss.30 / 48
![Page 112: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/112.jpg)
IV. Lab 2:Vanilla SL for a binary
outcome
30 / 48
![Page 113: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/113.jpg)
15 minute break
30 / 48
![Page 114: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/114.jpg)
V. Bells and whistles:Screens, weights, and
CV-SL
30 / 48
![Page 115: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/115.jpg)
Overview
In this section, we will introduce three of the add-ons to SL that
are frequently useful in practice: variable screens,
observation weights, and cross-validated SL.
31 / 48
![Page 116: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/116.jpg)
Variable screens
• We think of a candidate algorithm as a two-step procedure:
1. Select a subset of the covariates.
2. Use the selected subset to fit a model.
• We call step 1 a screening procedure.
• While we could program steps 1 and 2 by hand in to each
candidate algorithm, the SuperLearner package has
built-in functionality to ease this process.
Screening algorithms allow us to guide the SL using our
domain knowledge.
32 / 48
![Page 117: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/117.jpg)
Variable screens
• We think of a candidate algorithm as a two-step procedure:
1. Select a subset of the covariates.
2. Use the selected subset to fit a model.
• We call step 1 a screening procedure.
• While we could program steps 1 and 2 by hand in to each
candidate algorithm, the SuperLearner package has
built-in functionality to ease this process.
Screening algorithms allow us to guide the SL using our
domain knowledge.
32 / 48
![Page 118: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/118.jpg)
Variable screens
• We think of a candidate algorithm as a two-step procedure:
1. Select a subset of the covariates.
2. Use the selected subset to fit a model.
• We call step 1 a screening procedure.
• While we could program steps 1 and 2 by hand in to each
candidate algorithm, the SuperLearner package has
built-in functionality to ease this process.
Screening algorithms allow us to guide the SL using our
domain knowledge.
32 / 48
![Page 119: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/119.jpg)
Variable screens
• We think of a candidate algorithm as a two-step procedure:
1. Select a subset of the covariates.
2. Use the selected subset to fit a model.
• We call step 1 a screening procedure.
• While we could program steps 1 and 2 by hand in to each
candidate algorithm, the SuperLearner package has
built-in functionality to ease this process.
Screening algorithms allow us to guide the SL using our
domain knowledge.
32 / 48
![Page 120: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/120.jpg)
Variable screens
• We think of a candidate algorithm as a two-step procedure:
1. Select a subset of the covariates.
2. Use the selected subset to fit a model.
• We call step 1 a screening procedure.
• While we could program steps 1 and 2 by hand in to each
candidate algorithm, the SuperLearner package has
built-in functionality to ease this process.
Screening algorithms allow us to guide the SL using our
domain knowledge.
32 / 48
![Page 121: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/121.jpg)
Variable screens
• We think of a candidate algorithm as a two-step procedure:
1. Select a subset of the covariates.
2. Use the selected subset to fit a model.
• We call step 1 a screening procedure.
• While we could program steps 1 and 2 by hand in to each
candidate algorithm, the SuperLearner package has
built-in functionality to ease this process.
Screening algorithms allow us to guide the SL using our
domain knowledge.
32 / 48
![Page 122: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/122.jpg)
Example use-cases of screening
• If we have a high-dimensional set of covariates, we can try
different ways of reducing the dimensionality.
• If we have a large number of “raw” measurements, we
might try providing a smaller number of summary
measures – e.g. mean, median, min, max.
• If we have measurements collected at multiple time
points, we might try providing just baseline, or just the last
time point, or some summaries of the trajectory.
• We can force certain variables to always be used.
33 / 48
![Page 123: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/123.jpg)
Example use-cases of screening
• If we have a high-dimensional set of covariates, we can try
different ways of reducing the dimensionality.
• If we have a large number of “raw” measurements, we
might try providing a smaller number of summary
measures – e.g. mean, median, min, max.
• If we have measurements collected at multiple time
points, we might try providing just baseline, or just the last
time point, or some summaries of the trajectory.
• We can force certain variables to always be used.
33 / 48
![Page 124: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/124.jpg)
Example use-cases of screening
• If we have a high-dimensional set of covariates, we can try
different ways of reducing the dimensionality.
• If we have a large number of “raw” measurements, we
might try providing a smaller number of summary
measures – e.g. mean, median, min, max.
• If we have measurements collected at multiple time
points, we might try providing just baseline, or just the last
time point, or some summaries of the trajectory.
• We can force certain variables to always be used.
33 / 48
![Page 125: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/125.jpg)
Example use-cases of screening
• If we have a high-dimensional set of covariates, we can try
different ways of reducing the dimensionality.
• If we have a large number of “raw” measurements, we
might try providing a smaller number of summary
measures – e.g. mean, median, min, max.
• If we have measurements collected at multiple time
points, we might try providing just baseline, or just the last
time point, or some summaries of the trajectory.
• We can force certain variables to always be used.33 / 48
![Page 126: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/126.jpg)
Observation weights
• In some applications, we need to include observation
weights in the procedure – e.g. case-control sampling,
or as a simple way to account for loss-to-followup.
• Observation weights can be included directly in a call to
SuperLearner, but method.AUC does not make correct
use of weights!!!!
• Note that some SuperLearner wrappers might not make
use of observation weights.
34 / 48
![Page 127: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/127.jpg)
Observation weights
• In some applications, we need to include observation
weights in the procedure – e.g. case-control sampling,
or as a simple way to account for loss-to-followup.
• Observation weights can be included directly in a call to
SuperLearner, but method.AUC does not make correct
use of weights!!!!
• Note that some SuperLearner wrappers might not make
use of observation weights.
34 / 48
![Page 128: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/128.jpg)
Observation weights
• In some applications, we need to include observation
weights in the procedure – e.g. case-control sampling,
or as a simple way to account for loss-to-followup.
• Observation weights can be included directly in a call to
SuperLearner, but method.AUC does not make correct
use of weights!!!!
• Note that some SuperLearner wrappers might not make
use of observation weights.
34 / 48
![Page 129: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/129.jpg)
Case-control weights
• Let Y represent disease status at the end of a study.
• Suppose specimens from all ncase cases (Yi = 1) are
assayed.
• A random subset of Ncontrol controls (Yi = 0) (out of
ncontrol total controls) are assayed.
• We will use this case-control cohort to predict disease
status using the results of the assay and other covariates.
35 / 48
![Page 130: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/130.jpg)
Case-control weights
• Let Y represent disease status at the end of a study.
• Suppose specimens from all ncase cases (Yi = 1) are
assayed.
• A random subset of Ncontrol controls (Yi = 0) (out of
ncontrol total controls) are assayed.
• We will use this case-control cohort to predict disease
status using the results of the assay and other covariates.
35 / 48
![Page 131: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/131.jpg)
Case-control weights
• Let Y represent disease status at the end of a study.
• Suppose specimens from all ncase cases (Yi = 1) are
assayed.
• A random subset of Ncontrol controls (Yi = 0) (out of
ncontrol total controls) are assayed.
• We will use this case-control cohort to predict disease
status using the results of the assay and other covariates.
35 / 48
![Page 132: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/132.jpg)
Case-control weights
• Let Y represent disease status at the end of a study.
• Suppose specimens from all ncase cases (Yi = 1) are
assayed.
• A random subset of Ncontrol controls (Yi = 0) (out of
ncontrol total controls) are assayed.
• We will use this case-control cohort to predict disease
status using the results of the assay and other covariates.
35 / 48
![Page 133: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/133.jpg)
Case-control weights
• We can use SL with observation weights.
• Cases have weight wi = 1.
• Controls have weight wi = ncontrol/Ncontrol .
• Control weights could also be estimated using a logistic
regression of the indicator of inclusion in the control cohort
on baseline covariates.
36 / 48
![Page 134: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/134.jpg)
Case-control weights
• We can use SL with observation weights.
• Cases have weight wi = 1.
• Controls have weight wi = ncontrol/Ncontrol .
• Control weights could also be estimated using a logistic
regression of the indicator of inclusion in the control cohort
on baseline covariates.
36 / 48
![Page 135: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/135.jpg)
Case-control weights
• We can use SL with observation weights.
• Cases have weight wi = 1.
• Controls have weight wi = ncontrol/Ncontrol .
• Control weights could also be estimated using a logistic
regression of the indicator of inclusion in the control cohort
on baseline covariates.
36 / 48
![Page 136: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/136.jpg)
Case-control weights
• We can use SL with observation weights.
• Cases have weight wi = 1.
• Controls have weight wi = ncontrol/Ncontrol .
• Control weights could also be estimated using a logistic
regression of the indicator of inclusion in the control cohort
on baseline covariates.
36 / 48
![Page 137: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/137.jpg)
Right-censored outcomes
• Suppose Y = I(T ≤ t0) indicates that disease occurs
before time t0.
• T is subject to right-censoring by C: we observe
Y = min{T ,C} and ∆ = I(T ≤ C).
• We want to estimate
µ(x) = P(T ≤ t0 | X = x) = E [Y | X = x].
37 / 48
![Page 138: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/138.jpg)
Right-censored outcomes
• Suppose Y = I(T ≤ t0) indicates that disease occurs
before time t0.
• T is subject to right-censoring by C: we observe
Y = min{T ,C} and ∆ = I(T ≤ C).
• We want to estimate
µ(x) = P(T ≤ t0 | X = x) = E [Y | X = x].
37 / 48
![Page 139: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/139.jpg)
Right-censored outcomes
• Suppose Y = I(T ≤ t0) indicates that disease occurs
before time t0.
• T is subject to right-censoring by C: we observe
Y = min{T ,C} and ∆ = I(T ≤ C).
• We want to estimate
µ(x) = P(T ≤ t0 | X = x) = E [Y | X = x].
37 / 48
![Page 140: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/140.jpg)
Right-censored outcomes
µ0 = arg minµ
EP0
{∆
G0(Y | X)L((Y ,X), µ)
}• Here, G0(t | x) = P0(C > t | X = x).
• L either squared-error or negative log-likelihood loss.
• If we knew G0, we could use SL with weight ∆G0(Y |X) .
• Instead, we estimate G0 and plug in this estimator to
obtain an estimated weight.
• If C ⊥⊥ T , we can use a Kaplan-Meier estimator for G0;
otherwise we might use a Cox model.
38 / 48
![Page 141: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/141.jpg)
Right-censored outcomes
µ0 = arg minµ
EP0
{∆
G0(Y | X)L((Y ,X), µ)
}• Here, G0(t | x) = P0(C > t | X = x).
• L either squared-error or negative log-likelihood loss.
• If we knew G0, we could use SL with weight ∆G0(Y |X) .
• Instead, we estimate G0 and plug in this estimator to
obtain an estimated weight.
• If C ⊥⊥ T , we can use a Kaplan-Meier estimator for G0;
otherwise we might use a Cox model.
38 / 48
![Page 142: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/142.jpg)
Right-censored outcomes
µ0 = arg minµ
EP0
{∆
G0(Y | X)L((Y ,X), µ)
}• Here, G0(t | x) = P0(C > t | X = x).
• L either squared-error or negative log-likelihood loss.
• If we knew G0, we could use SL with weight ∆G0(Y |X) .
• Instead, we estimate G0 and plug in this estimator to
obtain an estimated weight.
• If C ⊥⊥ T , we can use a Kaplan-Meier estimator for G0;
otherwise we might use a Cox model.
38 / 48
![Page 143: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/143.jpg)
Right-censored outcomes
µ0 = arg minµ
EP0
{∆
G0(Y | X)L((Y ,X), µ)
}• Here, G0(t | x) = P0(C > t | X = x).
• L either squared-error or negative log-likelihood loss.
• If we knew G0, we could use SL with weight ∆G0(Y |X) .
• Instead, we estimate G0 and plug in this estimator to
obtain an estimated weight.
• If C ⊥⊥ T , we can use a Kaplan-Meier estimator for G0;
otherwise we might use a Cox model.38 / 48
![Page 144: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/144.jpg)
CV-Super Learner
• The standard SL framework gives us CV risks for each
candidate algorithm.
• However, the SL and discrete SL are obtained using all the
data, so their estimated risks will be optimistic.
• We can rectify this using a second layer of
cross-validation.
39 / 48
![Page 145: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/145.jpg)
CV-Super Learner
• The standard SL framework gives us CV risks for each
candidate algorithm.
• However, the SL and discrete SL are obtained using all the
data, so their estimated risks will be optimistic.
• We can rectify this using a second layer of
cross-validation.
39 / 48
![Page 146: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/146.jpg)
CV-Super Learner
• The standard SL framework gives us CV risks for each
candidate algorithm.
• However, the SL and discrete SL are obtained using all the
data, so their estimated risks will be optimistic.
• We can rectify this using a second layer of
cross-validation.
39 / 48
![Page 147: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/147.jpg)
CV-Super Learner
• The standard SL framework gives us CV risks for each
candidate algorithm.
• However, the SL and discrete SL are obtained using all the
data, so their estimated risks will be optimistic.
• We can rectify this using a second layer of
cross-validation.
39 / 48
![Page 148: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/148.jpg)
CV-Super Learner
1. Split the data into V1 folds.
2. For v = 1, . . . ,V1:
a. Run regular SL on the training set for fold v using
V2-fold CV.
b. Obtain discrete SL and SL predictions for the
validation set for fold v .
3. Combine the validation sets to obtain CV-risks for the
discrete SL and SL.
40 / 48
![Page 149: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/149.jpg)
CV-Super Learner
1. Split the data into V1 folds.
2. For v = 1, . . . ,V1:
a. Run regular SL on the training set for fold v using
V2-fold CV.
b. Obtain discrete SL and SL predictions for the
validation set for fold v .
3. Combine the validation sets to obtain CV-risks for the
discrete SL and SL.
40 / 48
![Page 150: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/150.jpg)
CV-Super Learner
1. Split the data into V1 folds.
2. For v = 1, . . . ,V1:
a. Run regular SL on the training set for fold v using
V2-fold CV.
b. Obtain discrete SL and SL predictions for the
validation set for fold v .
3. Combine the validation sets to obtain CV-risks for the
discrete SL and SL.
40 / 48
![Page 151: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/151.jpg)
CV-Super Learner
1. Split the data into V1 folds.
2. For v = 1, . . . ,V1:
a. Run regular SL on the training set for fold v using
V2-fold CV.
b. Obtain discrete SL and SL predictions for the
validation set for fold v .
3. Combine the validation sets to obtain CV-risks for the
discrete SL and SL.
40 / 48
![Page 152: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/152.jpg)
CV-Super Learner
1. Split the data into V1 folds.
2. For v = 1, . . . ,V1:
a. Run regular SL on the training set for fold v using
V2-fold CV.
b. Obtain discrete SL and SL predictions for the
validation set for fold v .
3. Combine the validation sets to obtain CV-risks for the
discrete SL and SL.
40 / 48
![Page 153: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/153.jpg)
VI. Lab 3:Binary outcome
redux
40 / 48
![Page 154: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/154.jpg)
VII. Lab 4:Case-control analysis
of Fluzone vaccine
40 / 48
![Page 155: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/155.jpg)
FLUVACS trial
• Health adults aged 18–49 years, Michigan, 2007–2008.
• Randomly assigned to:
– Fluzone – inactivated influenza vaccine (IIV)
– FluMist – live-attenuated influenza vaccine (LAIV)
– placebo.
• We are only interested in Fluzone vs placebo.
• Followed for one flu season.
• Endpoint = laboratory-confirmed influenza.
41 / 48
![Page 156: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/156.jpg)
FLUVACS trial
• Health adults aged 18–49 years, Michigan, 2007–2008.
• Randomly assigned to:
– Fluzone – inactivated influenza vaccine (IIV)
– FluMist – live-attenuated influenza vaccine (LAIV)
– placebo.
• We are only interested in Fluzone vs placebo.
• Followed for one flu season.
• Endpoint = laboratory-confirmed influenza.
41 / 48
![Page 157: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/157.jpg)
FLUVACS trial
• Health adults aged 18–49 years, Michigan, 2007–2008.
• Randomly assigned to:
– Fluzone – inactivated influenza vaccine (IIV)
– FluMist – live-attenuated influenza vaccine (LAIV)
– placebo.
• We are only interested in Fluzone vs placebo.
• Followed for one flu season.
• Endpoint = laboratory-confirmed influenza.
41 / 48
![Page 158: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/158.jpg)
FLUVACS trial
• Health adults aged 18–49 years, Michigan, 2007–2008.
• Randomly assigned to:
– Fluzone – inactivated influenza vaccine (IIV)
– FluMist – live-attenuated influenza vaccine (LAIV)
– placebo.
• We are only interested in Fluzone vs placebo.
• Followed for one flu season.
• Endpoint = laboratory-confirmed influenza.
41 / 48
![Page 159: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/159.jpg)
FLUVACS trial
42 / 48
![Page 160: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/160.jpg)
FLUVACS trial
• All 52 cases and 52 random controls were assayed for a
variety of markers (HAI, NAI, MN, AM titers,
proteins/virus/peptide magnitude/breadth).
• Measured variables:
– Demographics: age, vaccinated in last year
(EVERVAX)
– Day 0 markers
– Day 30 markers
– Difference markers = Day 30 markers - Day 0 markers
43 / 48
![Page 161: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/161.jpg)
FLUVACS trial
• All 52 cases and 52 random controls were assayed for a
variety of markers (HAI, NAI, MN, AM titers,
proteins/virus/peptide magnitude/breadth).
• Measured variables:
– Demographics: age, vaccinated in last year
(EVERVAX)
– Day 0 markers
– Day 30 markers
– Difference markers = Day 30 markers - Day 0 markers
43 / 48
![Page 162: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/162.jpg)
FLUVACS trial
• All 52 cases and 52 random controls were assayed for a
variety of markers (HAI, NAI, MN, AM titers,
proteins/virus/peptide magnitude/breadth).
• Measured variables:
– Demographics: age, vaccinated in last year
(EVERVAX)
– Day 0 markers
– Day 30 markers
– Difference markers = Day 30 markers - Day 0 markers
43 / 48
![Page 163: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/163.jpg)
FLUVACS trial
• All 52 cases and 52 random controls were assayed for a
variety of markers (HAI, NAI, MN, AM titers,
proteins/virus/peptide magnitude/breadth).
• Measured variables:
– Demographics: age, vaccinated in last year
(EVERVAX)
– Day 0 markers
– Day 30 markers
– Difference markers = Day 30 markers - Day 0 markers
43 / 48
![Page 164: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/164.jpg)
FLUVACS trial
• All 52 cases and 52 random controls were assayed for a
variety of markers (HAI, NAI, MN, AM titers,
proteins/virus/peptide magnitude/breadth).
• Measured variables:
– Demographics: age, vaccinated in last year
(EVERVAX)
– Day 0 markers
– Day 30 markers
– Difference markers = Day 30 markers - Day 0 markers
43 / 48
![Page 165: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/165.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 166: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/166.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 167: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/167.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 168: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/168.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 169: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/169.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 170: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/170.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 171: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/171.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 172: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/172.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)
44 / 48
![Page 173: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/173.jpg)
Variable sets
1. Demo.
2. Demo. + Day 0 markers
3. Demo. + Day 30 markers
4. Demo. + Difference markers
5. Demo. + Day 0 markers + EVERVAX × Day 0 markers
6. Demo. + Day 30 markers + EVERVAX × Day 30 markers
7. Demo. + Diff. markers + EVERVAX × Diff. markers
8. Demo. + Day 0 + Day 30 + EVERVAX × (Day 0 + Day 30)
9. Demo. + Day 0 + Diff. + EVERVAX × (Day 0 + Diff.)44 / 48
![Page 174: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/174.jpg)
Analysis goals
• We want to compare the quality of these nine sets of
variables for predicting flu status in the placebo and
Fluzone arms separately.
• We also want to compare the predictive quality of IgA, IgG,
and both IgA + IgG measurements.
• We will use cross-validated Super Learning to do this.
45 / 48
![Page 175: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/175.jpg)
Analysis goals
• We want to compare the quality of these nine sets of
variables for predicting flu status in the placebo and
Fluzone arms separately.
• We also want to compare the predictive quality of IgA, IgG,
and both IgA + IgG measurements.
• We will use cross-validated Super Learning to do this.
45 / 48
![Page 176: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/176.jpg)
Analysis goals
• We want to compare the quality of these nine sets of
variables for predicting flu status in the placebo and
Fluzone arms separately.
• We also want to compare the predictive quality of IgA, IgG,
and both IgA + IgG measurements.
• We will use cross-validated Super Learning to do this.
45 / 48
![Page 177: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/177.jpg)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
EV x (Day 30 − Day 0) EV x (Day 0, Day 30) EV x (Day 0, Diff)
Day 30 − Day 0 EV x Day 0 EV x Day 30
Baseline Day 0 Day 30
0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0
SL
Discrete SL
SL.glm
SL.bayesglm
SL.glmnet
SL.earth
SL.gam
SL.xgboost
SL.ranger
SL.mean
SL
Discrete SL
SL.glm
SL.bayesglm
SL.glmnet
SL.earth
SL.gam
SL.xgboost
SL.ranger
SL.mean
SL
Discrete SL
SL.glm
SL.bayesglm
SL.glmnet
SL.earth
SL.gam
SL.xgboost
SL.ranger
SL.mean
AUC
Lear
ner
Screen● All
screen.marginal.05
screen.marginal.10
●
●
●
●
Both
IgA
IgG
Neither
46 / 48
![Page 178: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/178.jpg)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
EV x (Day 30 − Day 0) EV x (Day 0, Day 30) EV x (Day 0, Diff)
Day 30 − Day 0 EV x Day 0 EV x Day 30
Baseline Day 0 Day 30
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
SL
Discrete SL
SL.xgboost
SL
Discrete SL
SL.xgboost
SL
Discrete SL
SL.xgboost
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.xgboost
SL
Discrete SL
SL.xgboost
SL
Discrete SL
SL.glm
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.gam
AUC
Lear
ner
Screen● All
screen.marginal.05
screen.marginal.10
●
●
●
●
Both
IgA
IgG
Neither
47 / 48
![Page 179: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/179.jpg)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
EV x (Day 30 − Day 0) EV x (Day 0, Day 30) EV x (Day 0, Diff)
Day 30 − Day 0 EV x Day 0 EV x Day 30
Baseline Day 0 Day 30
0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0
SL
Discrete SL
SL.glm
SL.bayesglm
SL.glmnet
SL.earth
SL.gam
SL.xgboost
SL.ranger
SL.mean
SL
Discrete SL
SL.glm
SL.bayesglm
SL.glmnet
SL.earth
SL.gam
SL.xgboost
SL.ranger
SL.mean
SL
Discrete SL
SL.glm
SL.bayesglm
SL.glmnet
SL.earth
SL.gam
SL.xgboost
SL.ranger
SL.mean
AUC
Lear
ner
Screen● All
screen.marginal.05
screen.marginal.10
●
●
●
●
Both
IgA
IgG
Neither
48 / 48
![Page 180: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/180.jpg)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
EV x (Day 30 − Day 0) EV x (Day 0, Day 30) EV x (Day 0, Diff)
Day 30 − Day 0 EV x Day 0 EV x Day 30
Baseline Day 0 Day 30
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.ranger
SL
Discrete SL
SL.bayesglm
SL
Discrete SL
SL.bayesglm
AUC
Lear
ner ●
●
●
●
Both
IgA
IgG
Neither
49 / 48
![Page 181: Introduction to Super Learning - UW Faculty Web Serverfaculty.washington.edu/peterg/Sanofi2018/Slides/... · Introduction to Super Learning Ted Westling, PhD Postdoctoral Researcher](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec44bb77037d603ac3c68fd/html5/thumbnails/181.jpg)
Ferguson, T. S. (2014). Mathematical statistics: A decision theoretic
approach. Academic Press.
van der Vaart, A. W., Dudoit, S., and van der Laan, M. J. (2006). Oracle
inequalities for multi-fold cross validation. Statistics & Decisions,
24(3):351–371.
Vapnik, V. (1992). Principles of risk minimization for learning theory. In
Advances in Neural Information Processing Systems, pages 831–838.
Vapnik, V. (2013). The nature of statistical learning theory. Springer Science
& Business Media.
Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE
Transactions on Neural Networks, 10(5):988–999.
50 / 48