causal forests: a tutorial in high-dimensional causal inference › sites › default › files ›...
TRANSCRIPT
![Page 1: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/1.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal forestsA tutorial in high-dimensional
causal inference
Ian Lundberg
General ExamFrontiers of Causal Inference
12 October 2017PC: Michael Schweppe viaWikimedia Commons
CC BY-SA 2.0
![Page 2: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/2.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Note: These slides assumerandomized treatment assignment
until the section labeled“confounding.”
![Page 3: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/3.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 1 1
2 High school 1 0 1 1
3 College 0 1 1 0
4 College 1 1 1 0
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 4: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/4.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 1 1
2 High school 1 0 1 1
3 College 0 1 1 0
4 College 1 1 1 0
τ = Yi :Wi=1(1)− Yi :Wi=0(0)
= 1− 0.5
= 0.5
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 5: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/5.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 1 ? 1 ?
3 College 0 1 ? ?
4 College 1 ? 1 ?
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 6: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/6.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 1 ? 1 ?
3 College 0 1 ? ?
4 College 1 ? 1 ?
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 7: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/7.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 1 ? 1 ?
3 College 0 1 ? ?
4 College 1 ? 1 ?
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 8: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/8.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 1 ? 1 ?
3 College 0 1 ? ?
4 College 1 ? 1 ?
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 9: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/9.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 1 ? 1 ?
3 College 0 1 ? ?
4 College 1 ? 1 ?
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 10: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/10.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal inference: A missing data problem
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 1 ? 1 ?
3 College 0 1 ? ?
4 College 1 ? 1 ?
If Wi ⊥⊥ {Yi (0),Yi (1)}, then
ˆτ = Yi :Wi=1 − Yi :Wi=0
= 1− 0.5
= 0.5
What if we want to study τi = f (Xi )?ˆτHigh school = Yi :Wi=1,Xi=High school
− Yi :Wi=0,Xi=High school
= 1− 0.5
= 0.5
ˆτCollege = Yi :Wi=1,Xi=College
− Yi :Wi=0,Xi=College
= 1− 1
= 0
What if there are dozens of X variables?
What if X is continuous?
It’s hard to know which subgroups of X
might show interesting effect heterogeneity
![Page 11: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/11.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Start with a simpler prediction question.
Which subgroups of X have very different
average outcomes?
![Page 12: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/12.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k16
Xi ≥
k16
Choose k to minimize MSE1
Xi <
k112
Xi ≥
k112
MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 13: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/13.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi < k
16
Xi ≥ k
16
Choose k to minimize MSE1
Xi <
k112
Xi ≥
k112
MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 14: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/14.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi < k
16
Xi ≥ k
16
Choose k to minimize MSE1
Xi <
k112
Xi ≥
k112
MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 15: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/15.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <
k112
Xi ≥
k112
MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 16: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/16.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <k1
12
Xi ≥k1
12
MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2 Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 17: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/17.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <k1
12
Xi ≥k1
12
MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2 Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2
MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 18: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/18.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 19: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/19.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2
MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 20: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/20.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2
MSE3 Zi =White Zi 6=White
• •
••
Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 21: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/21.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2
MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 22: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/22.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2
MSE3 Zi =White Zi 6=White
• •
••
Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 23: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/23.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1
Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2
Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••
Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 24: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/24.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2 Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••
Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 25: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/25.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2 Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 26: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/26.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2 Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 27: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/27.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Prediction: One tree
MSE0 = 1n
∑(Yi − Y )2 All observations
MSE1 = 1n
∑(Yi − Yj :xj∈`(xi |Π1))
2 Xi <
k
16 Xi ≥
k
16
Choose k to minimize MSE1Xi <
k1
12 Xi ≥
k1
12MSE2 = 1n
∑(Yi − Yj :xj∈`(xi |Π2))
2 Xi <k2 Xi ≥k2
Choose k1 or k2 to minimize MSE2MSE3 Zi =White Zi 6=White
• •
••Could continue until all leaves had only one observation.
Unbiased but uselessly high variance!
Instead, regularize: keep only splits that improve MSE by more than c .
{`1 = {xi : xi < 16}, `2 = {xi : xi ≥ 16}
}Partition Π ∈ P
Leaves
Prediction rule for new x :
µ(x) = Yj :xj∈`(xi |Π)
Could we use this method to find causal effects τ(x)that are heterogeneous between leaves?
![Page 28: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/28.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal tree: What’s different?
1 We do not observe the ground truth
2 Honest estimation:
One sample to choose partitionOne sample to estimate leaf effects
Why is the split critical?
Fitting both on the training sample risks overfitting: Estimatingmany “heterogeneous effects” that are really just noiseidiosyncratic to the sample.
We want to search for true heterogeneity, not noise.
![Page 29: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/29.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal tree: What’s different?
1 We do not observe the ground truth2 Honest estimation:
One sample to choose partitionOne sample to estimate leaf effects
Why is the split critical?
Fitting both on the training sample risks overfitting: Estimatingmany “heterogeneous effects” that are really just noiseidiosyncratic to the sample.
We want to search for true heterogeneity, not noise.
![Page 30: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/30.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal tree: What’s different?
1 We do not observe the ground truth2 Honest estimation:
One sample to choose partitionOne sample to estimate leaf effects
Why is the split critical?
Fitting both on the training sample risks overfitting: Estimatingmany “heterogeneous effects” that are really just noiseidiosyncratic to the sample.
We want to search for true heterogeneity, not noise.
![Page 31: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/31.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Causal tree: What’s different?
1 We do not observe the ground truth2 Honest estimation:
One sample to choose partitionOne sample to estimate leaf effects
Why is the split critical?
Fitting both on the training sample risks overfitting: Estimatingmany “heterogeneous effects” that are really just noiseidiosyncratic to the sample.
We want to search for true heterogeneity, not noise.
![Page 32: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/32.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Sample splitting
MSEµ(S te, Sest,Π) ≡ 1
#(S te)
∑i∈S te
{ MSE criterion︷ ︸︸ ︷(Yi − µ(Xi ; S
est,Π))2−Authors add︷︸︸︷
Y 2i
}
EMSEµ(Π) ≡ ES te,Sest
[MSEµ(S te, Sest,Π)
]Honest criterion: Maximize
QH(π) ≡ −ES te,Sest,S tr
[MSEµ(S te,Sest, π(S tr))
]This is S tr in the classical approach
where π : Rp+1 → P is a function that takes a training sampleS tr ∈ Rp+1 and outputs a partition Π ∈ P.
Note: The authors include the final Y 2i term to simplify the math; it just shifts
the estimator by a constant.
![Page 33: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/33.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Sample splitting
MSEµ(S te, Sest,Π) ≡ 1
#(S te)
∑i∈S te
{ MSE criterion︷ ︸︸ ︷(Yi − µ(Xi ; S
est,Π))2−Authors add︷︸︸︷
Y 2i
}
EMSEµ(Π) ≡ ES te,Sest
[MSEµ(S te, Sest,Π)
]
Honest criterion: Maximize
QH(π) ≡ −ES te,Sest,S tr
[MSEµ(S te,Sest, π(S tr))
]This is S tr in the classical approach
where π : Rp+1 → P is a function that takes a training sampleS tr ∈ Rp+1 and outputs a partition Π ∈ P.
Note: The authors include the final Y 2i term to simplify the math; it just shifts
the estimator by a constant.
![Page 34: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/34.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Sample splitting
MSEµ(S te, Sest,Π) ≡ 1
#(S te)
∑i∈S te
{ MSE criterion︷ ︸︸ ︷(Yi − µ(Xi ; S
est,Π))2−Authors add︷︸︸︷
Y 2i
}
EMSEµ(Π) ≡ ES te,Sest
[MSEµ(S te, Sest,Π)
]Honest criterion: Maximize
QH(π) ≡ −ES te,Sest,S tr
[MSEµ(S te,Sest, π(S tr))
]This is S tr in the classical approach
where π : Rp+1 → P is a function that takes a training sampleS tr ∈ Rp+1 and outputs a partition Π ∈ P.
Note: The authors include the final Y 2i term to simplify the math; it just shifts
the estimator by a constant.
![Page 35: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/35.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
Goal: Estimate expected MSE using only thetraining sample.
This will be used to place splits when training a tree.
![Page 36: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/36.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 37: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/37.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 38: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/38.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 39: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/39.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 40: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/40.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 41: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/41.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)
A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 42: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/42.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)
A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 43: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/43.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Analytic estimator for EMSEµ(Π) (p. 7356)
−EMSEµ(Π) = −ES te,Sest
[(Yi − µ(Xi | Sest,Π)
)2
− Y 2i
]
= − ES te,Sest
[(Yi − µ(Xi | Π) + µ(Xi | Π)− µ(Xi | Sest,Π)
)2
− Y 2i
]
= −ES te,Sest
[(Yi − µ(Xi | Π)
)2
− Y 2i
]
− ES te,Sest
[(µ(Xi | Π)− µ(Xi | Sest,Π)
)2]
− ES te,Sest
[2
(Yi − µ(Xi | Π)
)(µ(Xi | Π)− µ(Xi | Sest,Π)
)]
Expected mean squared error for a partition Π
Over estimation sets used to estimate the leaf-specific µ and test sets to evaluate those
Prediction based on Sest from the leave `(Xi ) containing Xi
Add a zero
First term2Second term2
2(First term)(Second term)
A B
E(A) = 0 by assumption
Cov(A, B) = 0 because Yi is from
a sample independent of Sest
Cov(AB) = E(AB)− E(A)E(B)
0 = E(AB)− 0
= 0
![Page 44: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/44.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
= −E(Yi ,Xi ),Sest
[(Yi − µ(Xi | Π))2 − Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= −E(Yi ,Xi ),Sest
[Y 2i + µ2(Xi | Π)− 2Yiµ(Xi | Π)− Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= − E(Yi ,Xi ),Sest
[µ2(Xi | Π)− 2µ(Xi | Π)µ(Xi | Π)
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Y 2i terms cancel
E(Yi ,Xi ),Sest(Yi )
= EXi ,Sestµ(Xi | Π)
They have µ2 here but I think they are wrong
I think
X
Athey & Imbens 2016, p. 7356
![Page 45: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/45.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
= −E(Yi ,Xi ),Sest
[(Yi − µ(Xi | Π))2 − Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= −E(Yi ,Xi ),Sest
[Y 2i + µ2(Xi | Π)− 2Yiµ(Xi | Π)− Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= − E(Yi ,Xi ),Sest
[µ2(Xi | Π)− 2µ(Xi | Π)µ(Xi | Π)
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Y 2i terms cancel
E(Yi ,Xi ),Sest(Yi )
= EXi ,Sestµ(Xi | Π)
They have µ2 here but I think they are wrong
I think
X
Athey & Imbens 2016, p. 7356
![Page 46: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/46.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
= −E(Yi ,Xi ),Sest
[(Yi − µ(Xi | Π))2 − Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= −E(Yi ,Xi ),Sest
[Y 2i + µ2(Xi | Π)− 2Yiµ(Xi | Π)− Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= − E(Yi ,Xi ),Sest
[µ2(Xi | Π)− 2µ(Xi | Π)µ(Xi | Π)
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Y 2i terms cancel
E(Yi ,Xi ),Sest(Yi )
= EXi ,Sestµ(Xi | Π)
They have µ2 here but I think they are wrong
I think
X
Athey & Imbens 2016, p. 7356
![Page 47: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/47.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
= −E(Yi ,Xi ),Sest
[(Yi − µ(Xi | Π))2 − Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= −E(Yi ,Xi ),Sest
[Y 2i + µ2(Xi | Π)− 2Yiµ(Xi | Π)− Y 2
i
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= − E(Yi ,Xi ),Sest
[µ2(Xi | Π)− 2µ(Xi | Π)µ(Xi | Π)
]− EXi ,Sest
[(µ(Xi | Sest,Π)− µ(Xi | Π))2
]
= EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Y 2i terms cancel
E(Yi ,Xi ),Sest(Yi )
= EXi ,Sestµ(Xi | Π)
They have µ2 here but I think they are wrong
I think
X
Athey & Imbens 2016, p. 7356
![Page 48: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/48.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 49: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/49.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 50: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/50.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 51: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/51.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 52: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/52.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 53: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/53.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 54: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/54.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 55: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/55.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 56: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/56.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 57: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/57.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 58: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/58.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 59: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/59.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 60: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/60.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
−EMSEµ(Π) = EXi
[µ2(Xi | Π)
]− ESest,Xi
[V(µ(Xi | Sest,Π))
]
Estimate with V(µ(x | Sest,Π)
)≡ S2
Str (`(x |Π))
Nest(`(x |Π))
EXi
[VSest
(µ(Xi | Sest,Π)
)| i ∈ S te
]=∑`
p`S2S tr(`)
Nest(`)
(assuming ≈ equal leaf sizes) ≈∑`
1
#`
S2S tr(`)
Nest/#`
=1
Nest
∑`∈Π
S2S tr(`)
V(µ | x ,Π) = E(µ2 | x ,Π)−[E(µ | x ,Π)
]2
S2S tr(`(x | Π))
Ntr(`(x | Π))≈ µ2(x | S trΠ)− µ2(x | Π)
µ2(x | Π) ≈ µ2(x | S tr,Π)−S2S tr(`(x | Π))
Ntr(`(x | Π))
EXi(µ2(Xi | Π)) ≈ 1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)−∑`
1
#`
S2S tr(`)
Ntr/#`
=1
Ntr
∑i∈S tr
µ2(xi | S tr,Π)− 1
Ntr
∑`
S2S tr(`)
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)− 1
Ntr
∑`∈Π
S2S tr(`)
− 1
Nest
∑`∈Π
S2S tr(`)
=1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
![Page 61: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/61.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Note: We still assumerandomized
treatment assignment
![Page 62: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/62.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Population-average potential outcomes within leaves:
µ(w , x | Π) ≡ E[Yi (w) | Xi ∈ `(x | Π)
]
Average causal effect:
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)τ(x | Π) ≡ E
[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Average effect evaluated at (potentially moderating)covariate value x
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Difference in potential outcomes
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Among observations in the leaf `
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Compact notation
![Page 63: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/63.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Population-average potential outcomes within leaves:
µ(w , x | Π) ≡ E[Yi (w) | Xi ∈ `(x | Π)
]
Potential outcome for
treatment w
(heterogeneous by Xi )
Averaged over controls
Xi in the leaf
Average causal effect:
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)τ(x | Π) ≡ E
[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Average effect evaluated at (potentially moderating)covariate value x
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Difference in potential outcomes
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Among observations in the leaf `
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Compact notation
![Page 64: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/64.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Population-average potential outcomes within leaves:
µ(w , x | Π) ≡ E[Yi (w) | Xi ∈ `(x | Π)
]Average causal effect:
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Average effect evaluated at (potentially moderating)covariate value x
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Difference in potential outcomes
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Among observations in the leaf `
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Compact notation
![Page 65: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/65.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Population-average potential outcomes within leaves:
µ(w , x | Π) ≡ E[Yi (w) | Xi ∈ `(x | Π)
]Average causal effect:
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Average effect evaluated at (potentially moderating)covariate value x
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Difference in potential outcomes
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Among observations in the leaf `
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Compact notation
![Page 66: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/66.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Population-average potential outcomes within leaves:
µ(w , x | Π) ≡ E[Yi (w) | Xi ∈ `(x | Π)
]Average causal effect:
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)τ(x | Π) ≡ E
[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Average effect evaluated at (potentially moderating)covariate value x
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Difference in potential outcomes
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Among observations in the leaf `
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Compact notation
![Page 67: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/67.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Population-average potential outcomes within leaves:
µ(w , x | Π) ≡ E[Yi (w) | Xi ∈ `(x | Π)
]Average causal effect:
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)τ(x | Π) ≡ E
[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Average effect evaluated at (potentially moderating)covariate value x
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Difference in potential outcomes
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Among observations in the leaf `
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Compact notation
![Page 68: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/68.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Honest inference for treatment effects
Population-average potential outcomes within leaves:
µ(w , x | Π) ≡ E[Yi (w) | Xi ∈ `(x | Π)
]Average causal effect:
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)τ(x | Π) ≡ E
[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Average effect evaluated at (potentially moderating)covariate value x
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Difference in potential outcomes
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Among observations in the leaf `
τ(x | Π) ≡ E[Yi (1)− Yi (0) | Xi ∈ `(x | Π)
]= µ(1, x | Π)− µ(0, x | Π)
Compact notation
![Page 69: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/69.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Estimate:
µ(w , x | S ,Π) ≡ 1#({i∈Sw :Xi∈`(x |Π)})
∑i∈Sw :Xi∈`(x |Π) Y
obsi
MSE for treatment effects:
MSEτ (S te,Sest,Π) ≡ 1#(S te)
∑i∈S te
{(τi − τ(Xi | Sest,Π)
)2
− τ2i
}
Challenge! τi is never observed.
![Page 70: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/70.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Estimate:
µ(w , x | S ,Π) ≡ 1#({i∈Sw :Xi∈`(x |Π)})
∑i∈Sw :Xi∈`(x |Π) Y
obsi
MSE for treatment effects:
MSEτ (S te,Sest,Π) ≡ 1#(S te)
∑i∈S te
{(τi − τ(Xi | Sest,Π)
)2
− τ2i
}
MSEτ (S te,Sest,Π) ≡ 1#(S te)
∑i∈S te
{(τi − τ(Xi | Sest,Π)
)2
− τ2i
}
Challenge! τi is never observed.
![Page 71: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/71.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Estimate:
µ(w , x | S ,Π) ≡ 1#({i∈Sw :Xi∈`(x |Π)})
∑i∈Sw :Xi∈`(x |Π) Y
obsi
MSE for treatment effects:
MSEτ (S te,Sest,Π) ≡ 1#(S te)
∑i∈S te
{(τi − τ(Xi | Sest,Π)
)2
− τ2i
}
Challenge! τi is never observed.
![Page 72: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/72.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Adapt EMSEµ to estimate EMSEτ
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
−EMSEτ (S tr,Nest,Π) =1
Ntr
∑i∈S tr
τ2(Xi | S tr,Π)︸ ︷︷ ︸Variance of treatmenteffects across leaves
−(
1
Ntr+
1
Nest
)∑`∈Π
(S2S tr
treat(`)
p+
S2S tr
control(`)
1− p
)︸ ︷︷ ︸
Uncertainty about leaf treatment effects
Prefers leaves with
heterogeneous effects
Prefers leaves with good fit
(leaf-specific effects
estimated precisely)
![Page 73: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/73.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Adapt EMSEµ to estimate EMSEτ
−EMSEµ(S tr,Nest,Π) =1
Ntr
∑i∈S tr
µ2(Xi | S tr,Π)︸ ︷︷ ︸Conventional CART criterion
−(
1
Ntr+
1
Nest
)∑`∈Π
S2S tr(`)︸ ︷︷ ︸
Uncertainty about leaf means
−EMSEτ (S tr,Nest,Π) =1
Ntr
∑i∈S tr
τ2(Xi | S tr,Π)︸ ︷︷ ︸Variance of treatmenteffects across leaves
−(
1
Ntr+
1
Nest
)∑`∈Π
(S2S tr
treat(`)
p+
S2S tr
control(`)
1− p
)︸ ︷︷ ︸
Uncertainty about leaf treatment effects
Prefers leaves with
heterogeneous effects
Prefers leaves with good fit
(leaf-specific effects
estimated precisely)
![Page 74: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/74.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Four partitioning estimators
![Page 75: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/75.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
1. Causal trees
Split by
−EMSEτ (S tr,Nest,Π) =1
Ntr
∑i∈S tr
τ2(Xi | S tr,Π)︸ ︷︷ ︸Variance of treatmenteffects across leaves
−(
1
Ntr+
1
Nest
)∑`∈Π
(S2S tr
treat(`)
p+
S2S tr
control(`)
1− p
)︸ ︷︷ ︸
Uncertainty about leaf treatment effects
Prefers leaves with
heterogeneous effects
Prefers leaves with good fit
(leaf-specific effects
estimated precisely)
Benefit: Prioritizes heterogeneity (τ varies a lot) and fit(within-leaf precision)
Drawback: Cannot be done with off-the-shelf CART methods
![Page 76: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/76.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
2. Transformed outcome trees
Transform the outcome
Y ∗i = YiWi − p
p(1− p)→ E(Y ∗i | Xi = x) = τ(x)
E(Y ∗i ) = E[Yi
Wi − p
p(1− p)
]= E
[Yi
Wi
p(1− p)
]− E
[Yi
p
p(1− p)
]= E
[Yi (1)
Wi
p(1− p)
]− E
[(Yi (1)Wi + Yi (0)(1−Wi )
)p
p(1− p)
]= Yi (1)
1
p(1− p)E[Wi ]− Yi (1)
p
p(1− p)E[Wi ]− Yi (0)
p
p(1− p)E[1−Wi ]
= Yi (1)1− p
p(1− p)E[Wi ]− Yi (0)
p
p(1− p)E[1−Wi ]
= Yi (1)p(1− p)
p(1− p)− Yi (0)
p(1− p)
p(1− p)
= Yi (1)− Yi (0) = τi
![Page 77: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/77.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
2. Transformed outcome trees
Benefit: Can use off-the-shelf CART methods for prediction
Drawbacks: Inefficient. Treatment is ignored aftertransforming outcome.If within a leaf W 6= p (by chance), then sample averagewithin leaf is a poor estimator of τ .
![Page 78: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/78.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
3. Fit-based trees
Replace
MSEµ(S te,Sest,Π) ≡ 1
#(S te)
∑i∈S te
{(Yi − µ(Xi ; S
est,Π))2 − Y 2i
}with the fit-based split rule
MSEµ,W (S te,Sest,Π) ≡∑i∈S te
{(Yi − µw (WiXi ; S
est,Π))2 − Y 2i
}which loss by model fit within each leaf: the difference from theexpected value for the treatment group of observation i .
Benefit: Prefers splits that lead to better fit.
Drawback: Does not prefer splits that lead to variation intreatment effects.
Zeileis et al. 2008
![Page 79: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/79.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
4. Squared T-statistic trees
Split based on:
T 2 ≡ N(YL−YR)
2
S2/NL+S2/NR
τ in left leaf in right leaf
Benefit: Prefers splits that lead to variation in treatment effects.
Drawback: Missed opportunity to improve fit: ignores useful splitsbetween leaves with similar treatment effects but very differentaverage values.
Su et al. 2009
![Page 80: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/80.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
From trees to forests: Double-sample trees
An individual tree can be noisy. Instead, we might fit a forest.
1 Draw a sample of size s
2 Split into an I and J sample.
3 Grow a tree on the J sample
4 Estimate leaf-specific τ` using the I sample
Repeat many times.
Advantages of forests:
Consistent for true τ(x)
Asymptotic normality
Asymptotic variance isestimable
Why double-sample forests:
Advantage: Trees search forheterogeneous effects
Disadvantage: Requiressample splitting
Wager & Athey 2017
![Page 81: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/81.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
From trees to forests: Propensity trees
An individual tree can be noisy. Instead, we might fit a forest.
1 Draw a sample of size s2 Grow a tree on the J sample to predict W
– Each leaf must have at least k observations of each treatmentclass
3 Estimate τ` on each leaf
Repeat many times.
Advantages of forests:
Consistent for true τ(x)
Asymptotic normality
Asymptotic variance isestimable
Why propensity forests:
Advantage: Can use fullsample
Disadvantage: Does notsearch for heterogeneouseffects
Wager & Athey 2017
![Page 82: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/82.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary of causal trees and forests
There is no ground truth: We never observe τi
Causal trees search for leaves with
heterogeneous effects across leavesprecisely-estimated leaf effects
Require extra sample splitting
Work well with randomized treatments.
With selection on observables, the general recommendation ispropensity forests
Maximizes the goal of addressing confounding by ignoringheterogeneous effects when choosing splitsGeneralized random forests also perform well (Athey, Tibshirani,
& Wager 2017)
But “the challenge in using adaptive methods. . . is thatselection bias can be difficult to quantify” (Wager & Athey p.
24).
![Page 83: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/83.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary of causal trees and forests
There is no ground truth: We never observe τiCausal trees search for leaves with
heterogeneous effects across leavesprecisely-estimated leaf effects
Require extra sample splitting
Work well with randomized treatments.
With selection on observables, the general recommendation ispropensity forests
Maximizes the goal of addressing confounding by ignoringheterogeneous effects when choosing splitsGeneralized random forests also perform well (Athey, Tibshirani,
& Wager 2017)
But “the challenge in using adaptive methods. . . is thatselection bias can be difficult to quantify” (Wager & Athey p.
24).
![Page 84: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/84.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary of causal trees and forests
There is no ground truth: We never observe τiCausal trees search for leaves with
heterogeneous effects across leavesprecisely-estimated leaf effects
Require extra sample splitting
Work well with randomized treatments.
With selection on observables, the general recommendation ispropensity forests
Maximizes the goal of addressing confounding by ignoringheterogeneous effects when choosing splitsGeneralized random forests also perform well (Athey, Tibshirani,
& Wager 2017)
But “the challenge in using adaptive methods. . . is thatselection bias can be difficult to quantify” (Wager & Athey p.
24).
![Page 85: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/85.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary of causal trees and forests
There is no ground truth: We never observe τiCausal trees search for leaves with
heterogeneous effects across leavesprecisely-estimated leaf effects
Require extra sample splitting
Work well with randomized treatments.
With selection on observables, the general recommendation ispropensity forests
Maximizes the goal of addressing confounding by ignoringheterogeneous effects when choosing splitsGeneralized random forests also perform well (Athey, Tibshirani,
& Wager 2017)
But “the challenge in using adaptive methods. . . is thatselection bias can be difficult to quantify” (Wager & Athey p.
24).
![Page 86: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/86.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary of causal trees and forests
There is no ground truth: We never observe τiCausal trees search for leaves with
heterogeneous effects across leavesprecisely-estimated leaf effects
Require extra sample splitting
Work well with randomized treatments.
With selection on observables, the general recommendation ispropensity forests
Maximizes the goal of addressing confounding by ignoringheterogeneous effects when choosing splits
Generalized random forests also perform well (Athey, Tibshirani,
& Wager 2017)
But “the challenge in using adaptive methods. . . is thatselection bias can be difficult to quantify” (Wager & Athey p.
24).
![Page 87: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/87.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary of causal trees and forests
There is no ground truth: We never observe τiCausal trees search for leaves with
heterogeneous effects across leavesprecisely-estimated leaf effects
Require extra sample splitting
Work well with randomized treatments.
With selection on observables, the general recommendation ispropensity forests
Maximizes the goal of addressing confounding by ignoringheterogeneous effects when choosing splitsGeneralized random forests also perform well (Athey, Tibshirani,
& Wager 2017)
But “the challenge in using adaptive methods. . . is thatselection bias can be difficult to quantify” (Wager & Athey p.
24).
![Page 88: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/88.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary of causal trees and forests
There is no ground truth: We never observe τiCausal trees search for leaves with
heterogeneous effects across leavesprecisely-estimated leaf effects
Require extra sample splitting
Work well with randomized treatments.
With selection on observables, the general recommendation ispropensity forests
Maximizes the goal of addressing confounding by ignoringheterogeneous effects when choosing splitsGeneralized random forests also perform well (Athey, Tibshirani,
& Wager 2017)
But “the challenge in using adaptive methods. . . is thatselection bias can be difficult to quantify” (Wager & Athey p.
24).
![Page 89: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/89.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
If treatment is not randomized
Causal trees find heterogeneous effects but
cannot guarantee that confounding is
addressed.
Next we focus on
why high-dimensional confounding is hard
![Page 90: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/90.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Why aren’t causal trees guaranteed to addressconfounding?
Plan
1 What does address confounding? Standardization
2 Why is tree-based standardization biased? Regularization
3 Is there anything we can do? Chernozhukov et al.
![Page 91: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/91.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0
2 High school 0 0
3 High school 1 1
4 College 0 1
5 College 1 1
6 College 1 1
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)= P(Xi = High school)
+ P(Xi = College)
=1
2+
1
2= 0.5 + 0 =
![Page 92: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/92.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 1 1
2 High school 0 0 1 1
3 High school 1 0 1 1
4 College 0 1 1 0
5 College 1 1 1 0
6 College 1 1 1 0
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)= P(Xi = High school)
+ P(Xi = College)
=1
2+
1
2= 0.5 + 0 =
![Page 93: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/93.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)= P(Xi = High school)
+ P(Xi = College)
=1
2+
1
2= 0.5 + 0 =
![Page 94: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/94.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)= P(Xi = High school)
+ P(Xi = College)
=1
2+
1
2= 0.5 + 0 =
![Page 95: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/95.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)
= P(Xi = High school)
(Yi :Wi=1,Xi=High school − Yi :Wi=0,Xi=High school
)+ P(Xi = College)
(Yi :Wi=1,Xi=College − Yi :Wi=0,Xi=College
)=
1
2(1− 0) +
1
2(1− 1) = 0.5 + 0 =
![Page 96: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/96.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)
= P(Xi = High school)
(Yi :Wi=1,Xi=High school − Yi :Wi=0,Xi=High school
)+ P(Xi = College)
(Yi :Wi=1,Xi=College − Yi :Wi=0,Xi=College
)=
1
2(1− 0) +
1
2(1− 1) = 0.5 + 0 =
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
![Page 97: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/97.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)
= P(Xi = High school)
(Yi :Wi=1,Xi=High school − Yi :Wi=0,Xi=High school
)+ P(Xi = College)
(Yi :Wi=1,Xi=College − Yi :Wi=0,Xi=College
)=
1
2(1− 0) +
1
2(1− 1) = 0.5 + 0 =
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
![Page 98: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/98.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
What if {Yi (0),Yi (1)} 6⊥⊥Wi but {Yi (0),Yi (1)} ⊥⊥Wi | Xi?
We need to estimate τ within each level of Xi .
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
ˆτ =∑
x∈Support of X
P(X = x)
(Yi :Wi=1,Xi=x − Yi :Wi=0,Xi=x
)
= P(Xi = High school)
(Yi :Wi=1,Xi=High school − Yi :Wi=0,Xi=High school
)+ P(Xi = College)
(Yi :Wi=1,Xi=College − Yi :Wi=0,Xi=College
)=
1
2(1− 0) +
1
2(1− 1) = 0.5 + 0 = 0.5
Potential employment
Education Treated No job training Job training Treatment effect
ID Xi Wi Yi (0) Yi (1) τi = Yi (1)− Yi (0)
1 High school 0 0 ? ?
2 High school 0 0 ? ?
3 High school 1 ? 1 ?
4 College 0 1 ? ?
5 College 1 ? 1 ?
6 College 1 ? 1 ?
![Page 99: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/99.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
What works: Nonparametric standardization
But when there are many cells of the covariates Xi ,
nonparametric standardization isimpossible!
![Page 100: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/100.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Why is tree-based standardization biased? Regularization
With no regularization, a tree would grow until each leaf wascompletely homogenous in Xi .
But this tree would be very noisy! We prune our trees so thatleaves contain more observations.
Treatment effects are more precisely estimated
But treatment effects are biased if there is confounding withinleaves
![Page 101: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/101.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Is there anything we can do? Chernozhukov et al.
Outcome equation︷ ︸︸ ︷Y = Dθ0 + g0(X ) + U
Treatment assignment︷ ︸︸ ︷D = m0(X ) + V
One might be tempted to estimate g0(X ) by machine learning andthen state:
θ0 =1n
∑i∈I Di (Yi − g0(Xi ))
1n
∑i∈I D
2i
This will be biased because the estimator g0 is regularized.
b =1
E(D2i )
1√n
∑i∈I
Does not have mean 0︷ ︸︸ ︷(m0(Xi )(g0(Xi )− g0(Xi )
)+oP(1)
Key: Di is centered at m0(X ) 6= 0. We should recenter Di .
![Page 102: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/102.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Is there anything we can do? Chernozhukov et al.
Outcome equation︷ ︸︸ ︷Y = Dθ0 + g0(X ) + U
Treatment assignment︷ ︸︸ ︷D = m0(X ) + V
One might be tempted to estimate g0(X ) by machine learning andthen state:
θ0 =1n
∑i∈I Di (Yi − g0(Xi ))
1n
∑i∈I D
2i
This will be biased because the estimator g0 is regularized.
b =1
E(D2i )
1√n
∑i∈I
Does not have mean 0︷ ︸︸ ︷(m0(Xi )(g0(Xi )− g0(Xi )
)+oP(1)
Key: Di is centered at m0(X ) 6= 0. We should recenter Di .
![Page 103: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/103.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Is there anything we can do? Chernozhukov et al.
Outcome equation︷ ︸︸ ︷Y = Dθ0 + g0(X ) + U
Treatment assignment︷ ︸︸ ︷D = m0(X ) + V
One might be tempted to estimate g0(X ) by machine learning andthen state:
θ0 =1n
∑i∈I Di (Yi − g0(Xi ))
1n
∑i∈I D
2i
This will be biased because the estimator g0 is regularized.
b =1
E(D2i )
1√n
∑i∈I
Does not have mean 0︷ ︸︸ ︷(m0(Xi )(g0(Xi )− g0(Xi )
)+oP(1)
Key: Di is centered at m0(X ) 6= 0. We should recenter Di .
![Page 104: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/104.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Is there anything we can do? Chernozhukov et al.
Outcome equation︷ ︸︸ ︷Y = Dθ0 + g0(X ) + U
Treatment assignment︷ ︸︸ ︷D = m0(X ) + V
1 Split the sample into I and J2 Estimate g0(X ) using sample J3 Estimate m0(X ) using sample J4 Orthogonalize D on X (approximately)
V = D − m0(X )
5 Estimate the treatment effect
Biased De-biased
θ0 =1n
∑i∈I Di (Yi−g0(Xi ))
1n
∑i∈I D
2i
θ0 =1n
∑i∈I Vi (Yi−g0(Xi ))1n
∑i∈I ViDi
Chernozhukov et al. 2016
![Page 105: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/105.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Bias remaining in de-biased estimator (Chernozhukov et al.)
√n(θ0 − θ0) = a∗ + b∗ + c∗
![Page 106: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/106.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Bias remaining in de-biased estimator (Chernozhukov et al.)
√n(θ0 − θ0) = a∗ + b∗ + c∗
a∗ =1
E(V 2)
1√n
∑i∈I
ViUi → N(0,Σ)
Because a∗ converges to mean 0, we don’t worry about it.
![Page 107: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/107.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Bias remaining in de-biased estimator (Chernozhukov et al.)
√n(θ0 − θ0) = a∗ + b∗ + c∗
Regularization bias:
b∗ =1
E(V 2)
1√n
∑i∈I
(m0(Xi )−m0(Xi )
)(g0(Xi )− g0(Xi )
)Vanishes “under a broad range of data-generating processes.”
Bounded above by
Rate of convergence ofm0 → m
Rate of convergence ofg0 → g
√nn−ψmn−ψg
![Page 108: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/108.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Bias remaining in de-biased estimator (Chernozhukov et al.)
√n(θ0 − θ0) = a∗ + b∗ + c∗
An example of the third term in the partially linear model:
c∗ =1√n
∑i∈I
Vi
(g0(Xi )− g0(Xi )
)If g0 is estimated on an auxiliary sample J , then Vi and g0(Xi )will be uncorrelated and E(c∗) = 0.
![Page 109: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/109.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
BART: Bayesian Additive Regression Trees
Differs from random forests:
Fixed number of trees
Backfits repeatedly over the fixed number of trees
Strong prior encourages shallow trees
Uncertainty comes automatically from posterior samples
Chipman, George, & McCulloch 2010
![Page 110: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/110.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
BART model
Y =m∑j=1
gj(x | Tj ,Mj) + ε
ε ∼ N(0, σ2)
Tj prior
P( Dj = d︸ ︷︷ ︸Tree depth
) = α(1 + d)−β
Split variable ∼ Uniform(Available variables)
Split value ∼ Uniform(Available split values)
µij | Tj prior
µij︸︷︷︸Tree i leaf j
∼ N
(µm, σ
2µ︸ ︷︷ ︸
Chosen so thathigh probability ofE(Y |x)∈(ymin,ymax)
)
σ prior
σ ∼ νλ
χ2ν
(inverse chi-square)
They recommend {α = .95, β = 2} → 97% of prior probability ison 4 or fewer terminal nodes.
Chipman, George, & McCulloch 2010
![Page 111: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/111.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
BART for causal inference
Goal: Model the response surface as a function of treatment andpre-treatment covariates
1 Fit a flexible model for Y = f (X ,W )2 Set W = 0 to predict Yi (0) for all i3 Set W = 1 to predict Yi (1) for all i4 Difference to estimate τi5 Plot effects
Hill 2011
![Page 112: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/112.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
BART: Benefits and drawbacks
Benefits
Less researcher discretion for tuning parameters
Automatic posterior uncertainty estimates
Drawbacks
Not guaranteed to address confounding due to regularization
No theoretical guarantees of centering over truth
Splitting is based on prediction and is not explicitly optimizedfor causal inference within leaves
![Page 113: Causal forests: A tutorial in high-dimensional causal inference › sites › default › files › b... · IntroPotential outcomesAlgorithmSample splitRegularization + confoundingBART](https://reader030.vdocuments.site/reader030/viewer/2022040205/5f189dc7859eb8158b3dc44e/html5/thumbnails/113.jpg)
Intro Potential outcomes Algorithm Sample split Regularization + confounding BART
Summary
Causal trees can detect high-dimensional covariate-basedtreatment effect heterogeneity
Work well with high-order interactions
Causal forests give theoretically valid confidence intervals
Bayesian approaches (BART) are less theoretically verified butgive easy uncertainty
With high-dimensional confounding, all methods are biasedbut can be designed to be consistent.