machine learning for healthcare - github pages · machine learning for healthcare hst.956, 6.s897...
TRANSCRIPT
MachineLearningforHealthcareHST.956,6.S897
Lecture14:CausalInferencePart1
DavidSontag
Courseannouncements• Pleasefilloutmid-semestersurvey• Projectproposals
– Youwillreceivee-mail feedback thisweek– OfficehoursnextTuesday,10-11:30am
• Problemsets– PS1-4graded(seeStellar)– PS5outtonight,duenextTuesday,April9– Lastproblem set,PS6,released in~2weeks
• Recitationthisweekwillbeadiscussionof– Bratetal.,Postsurgicalprescriptions foropioidnaïvepatients
andassociationwithoverdoseandmisuse,BMJ2018– Bertsimasetal.,Personalized diabetesmanagement using
electronicmedical records,DiabetesCare2017
Doesgastricbypasssurgerypreventonsetofdiabetes?
• InLecture4&PS2weusedmachinelearningforearlydetectionofType2diabetes
• Healthsystemdoesn’twanttoknowhowtopredictdiabetes– theywanttoknowhowtopreventit
• Gastricbypasssurgeryisthehighest negativeweight(9thmostpredictivefeature)– Doesthismeanitwouldbeagoodintervention?
1994 2000
<4.5%4.5%–5.9%6.0%–7.4%7.5%–8.9%>9.0%
2013
• Suchpredictivemodelswidelyusedtostagepatients.Shouldweinitiatetreatment?Howaggressive?
• Whatcouldgowrongifwetrainedtopredictsurvival,andthenusedtoguidepatientcare?
Mammography(86Ksubjects)
Competitive Period Launch: Nov 18, 2016Competitive Period Close: May 9, 2017
Outof1000womenscreened,only5willhavebreastcancer
Goal:developalgorithmsforriskstratificationofscreeningmammogramsthatcanbeusedtoimprovebreastcancerdetection
Whatisthelikelihoodthispatient,withbreastcancer,willsurvive5years?
𝑿𝒀
Diagnosis Death Time
“Mary”
Treatment
Alongsurvivaltimemaybebecauseoftreatment!
• Peopleresponddifferentlytotreatment• Goal:usedatafromotherpatientsandtheirjourneystoguidefuturetreatmentdecisions
• Whatcouldgowrongifwetrainedtopredict(past)treatmentdecisions?
Whattreatmentshouldwegivethispatient?Expansion Pathology
with DNA-FISH and Protein-IF
Blue =HER2ProteinRed =HER2AmpliconGreen =Centromeric probe
NegativeforHER2Amplification HER2Amplified
Expansionpathology(imagefromAndyBeck)
“David” TreatmentA
TreatmentA“Juana”“John” TreatmentB
Bestthiscandoismatchcurrentmedicalpractice!
• Doingarandomizedcontroltrialisunethical• CouldwesimplyanswerthisquestionbycomparingPr(lungcancer|smoker)vsPr(lungcancer|nonsmoker)?
• No!Answeringsuchquestionsfromobservationaldataisdifficultbecauseofconfounding
Doessmokingcauselungcancer?
Toproperlyanswer,needtoformulateascausal questions:
Intervention, 𝑇
(e.g. medication, procedure)
Outcome, 𝑌
Patient, 𝑋
(including allconfoundingfactors)
?
Highdimensional Observationaldata
PotentialOutcomesFramework(Rubin-Neyman CausalModel)
• Eachunit(individual)𝑥' hastwopotentialoutcomes:– 𝑌((𝑥') isthepotentialoutcomehadtheunitnotbeentreated:“controloutcome”
– 𝑌+(𝑥') isthepotentialoutcomehadtheunitbeentreated:“treatedoutcome”
• Conditionalaveragetreatmenteffectforunit𝑖:𝐶𝐴𝑇𝐸 𝑥' = 𝔼23~5(23|78)[𝑌+|𝑥'] − 𝔼2=~5(2=|78)[𝑌(|𝑥']
• AverageTreatmentEffect:𝐴𝑇𝐸:= 𝔼 𝑌+ − 𝑌( = 𝔼7~5(7) 𝐶𝐴𝑇𝐸 𝑥
PotentialOutcomesFramework(Rubin-Neyman CausalModel)
• Eachunit(individual)𝑥' hastwopotentialoutcomes:– 𝑌((𝑥') isthepotentialoutcomehadtheunitnotbeentreated:“controloutcome”
– 𝑌+(𝑥') isthepotentialoutcomehadtheunitbeentreated:“treatedoutcome”
• Observedfactualoutcome:𝑦' = 𝑡'𝑌+ 𝑥' + 1 − 𝑡' 𝑌((𝑥')
• Unobservedcounterfactualoutcome:𝑦'CD = (1 − 𝑡')𝑌+ 𝑥' + 𝑡'𝑌((𝑥')
Thefundamental problemofcausal inference“Thefundamental problemof
causal inference”
Weonlyeverobserveoneofthetwooutcomes
Treated
𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥
Example– Bloodpressureandage
Treated
𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥
Bloodpressureandage
𝐶𝐴𝑇𝐸(𝑥)
Treated
𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥
Bloodpressureandage
𝐴𝑇𝐸
Treated
𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥
Bloodpressureandage
Treated
Control
Treated
𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥
Bloodpressureandage
Treated
Control
Counterfactualtreated
Counterfactualcontrol
(age,gender,exercise,treatment)
Sugarlevelshadtheyreceived
medicationA
Sugarlevelshadtheyreceived
medicationB
Observedsugarlevels
(45,F,0,A) 6 5.5 6(45,F,1,B) 7 6.5 6.5(55,M,0,A) 7 6 7(55,M,1,B) 9 8 8(65,F,0,B) 8.5 8 8(65,F,1,A) 7.5 7 7.5(75,M,0,B) 10 9 9(75,M,1,A) 8 7 8
(ExamplefromUriShalit)
(age,gender,exercise)
Sugarlevelshadtheyreceived
medicationA
Sugarlevelshadtheyreceived
medicationB
Observedsugarlevels
(45,F,0) 6 5.5 6(45,F,1) 7 6.5 6.5(55,M,0) 7 6 7(55,M,1) 9 8 8(65,F,0) 8.5 8 8(65,F,1) 7.5 7 7.5(75,M,0) 10 9 9(75,M,1) 8 7 8
(ExamplefromUriShalit)
(age,gender,exercise)
Y0:Sugarlevelshadtheyreceived
medicationA
Y1:Sugarlevelshadtheyreceived
medicationB
Observedsugarlevels
(45,F,0) 6 5.5 6(45,F,1) 7 6.5 6.5(55,M,0) 7 6 7(55,M,1) 9 8 8(65,F,0) 8.5 8 8(65,F,1) 7.5 7 7.5(75,M,0) 10 9 9(75,M,1) 8 7 8
(ExamplefromUriShalit)
(age,gender,exercise)
Sugarlevelshadtheyreceived
medicationA
Sugarlevelshadtheyreceived
medicationB
Observedsugarlevels
(45,F,0) 6 5.5 6
(45,F,1) 7 6.5 6.5
(55,M,0) 7 6 7
(55,M,1) 9 8 8
(65,F,0) 8.5 8 8
(65,F,1) 7.5 7 7.5
(75,M,0) 10 9 9
(75,M,1) 8 7 8
mean(sugar|medication B)–mean(sugar|medicaton A)=?
mean(sugar|had theyreceived B)–mean(sugar|had theyreceived A)=?
(ExamplefromUriShalit)
(age,gender,exercise)
Sugarlevelshadtheyreceived
medicationA
Sugarlevelshadtheyreceived
medicationB
Observedsugarlevels
(45,F,0) 6 5.5 6
(45,F,1) 7 6.5 6.5
(55,M,0) 7 6 7
(55,M,1) 9 8 8
(65,F,0) 8.5 8 8
(65,F,1) 7.5 7 7.5
(75,M,0) 10 9 9
(75,M,1) 8 7 8
mean(sugar|medication B)–mean(sugar|medicaton A)=7.875- 7.125=0.75
mean(sugar|had theyreceived B)–mean(sugar|had theyreceived A)=7.125- 7.875=-0.75
(ExamplefromUriShalit)
Typicalassumption– nounmeasuredconfounders
𝑌(, 𝑌+:potentialoutcomesforcontrolandtreated𝑥:unitcovariates(features)T:treatmentassignment
Weassume:(𝑌(, 𝑌+) ⫫ 𝑇|𝑥
Thepotentialoutcomesareindependentoftreatmentassignment,conditionedoncovariates𝑥
Typicalassumption– nounmeasuredconfounders
𝑌(, 𝑌+:potentialoutcomesforcontrolandtreated𝑥:unitcovariates(features)T:treatmentassignment
Weassume:(𝑌(, 𝑌+) ⫫ 𝑇|𝑥
Ignorability
covariates(features)
treatment
Potentialoutcomes
𝑻𝒙
𝒀𝟏𝒀𝟎
Ignorability
(𝑌(, 𝑌+) ⫫ 𝑇|𝑥
𝑻𝒙
𝒀𝟏𝒀𝟎
anti-hypertensivemedication
bloodpressureaftermedicationA
age,gender,weight,diet,heartrateatrest,…
bloodpressureaftermedicationB
Ignorability
(𝑌(, 𝑌+) ⫫ 𝑇|𝑥
𝒙
𝒀𝟏𝒀𝟎bloodpressureaftermedicationA
age,gender,weight,diet,heartrateatrest,…
bloodpressureaftermedicationB
𝒉
NoIgnorability
diabetic𝑻
anti-hypertensivemedication
(𝑌(, 𝑌+) ⫫ 𝑇|𝑥
Typicalassumption– commonsupport
Y(, 𝑌+:potentialoutcomesforcontrolandtreated𝑥:unitcovariates(features)𝑇:treatmentassignment
Weassume:𝑝 𝑇 = 𝑡 𝑋 = 𝑥 > 0∀𝑡, 𝑥
Framingthequestion
1. Wherecouldwegotofordatatoanswerthesequestions?
2. WhatshouldX,T,andYbetosatisfyignorability?3. Whatisthespecificcausalinferencequestionthat
weareinterestedin?4. Areyouworriedaboutcommonsupport?
Outlineforlecture
• Howtorecognizeacausalinferenceproblem• Potentialoutcomesframework– Averagetreatmenteffect(ATE)– Conditionalaveragetreatmenteffect(CATE)
• AlgorithmsforestimatingATEandCATE
AverageTreatmentEffect
Theexpectedcausaleffectof𝑇 on𝑌:ATE := E [Y1 � Y0]
AverageTreatmentEffect–theadjustmentformula
• Assumingignorability,wewillderivetheadjustment formula (Hernán &Robins2010,Pearl2009)
• Theadjustmentformulaisextremelyusefulincausalinference
• AlsocalledG-formula
AverageTreatmentEffect
Theexpectedcausaleffectof𝑇 on𝑌:ATE := E [Y1 � Y0]
AverageTreatmentEffect
Theexpectedcausaleffectof𝑇 on𝑌:ATE := E [Y1 � Y0]
E [Y1] =
Ex⇠p(x)
⇥EY1⇠p(Y1|x) [Y1|x]
⇤=
Ex⇠p(x)
⇥EY1⇠p(Y1|x) [Y1|x, T = 1]
⇤=
Ex⇠p(x) [E [Y1|x, T = 1]]
lawoftotalexpectation
AverageTreatmentEffect
Theexpectedcausaleffectof𝑇 on𝑌:ATE := E [Y1 � Y0]
E [Y1] =
Ex⇠p(x)
⇥EY1⇠p(Y1|x) [Y1|x]
⇤=
Ex⇠p(x)
⇥EY1⇠p(Y1|x) [Y1|x, T = 1]
⇤=
Ex⇠p(x) [E [Y1|x, T = 1]]
ignorability(𝑌(, 𝑌+) ⫫ 𝑇|𝑥
AverageTreatmentEffect
Theexpectedcausaleffectof𝑇 on𝑌:ATE := E [Y1 � Y0]
E [Y1] =
Ex⇠p(x)
⇥EY1⇠p(Y1|x) [Y1|x]
⇤=
Ex⇠p(x)
⇥EY1⇠p(Y1|x) [Y1|x, T = 1]
⇤=
Ex⇠p(x) [E [Y1|x, T = 1]] shorternotation
AverageTreatmentEffect
Theexpectedcausaleffectof𝑇 on𝑌:ATE := E [Y1 � Y0]
E [Y0] =
Ex⇠p(x)
⇥EY0⇠p(Y0|x) [Y0|x]
⇤=
Ex⇠p(x)
⇥EY0⇠p(Y0|x) [Y0|x, T = 1]
⇤=
Ex⇠p(x) [E [Y0|x, T = 0]]
Quantitieswecanestimate
fromdata
Theadjustmentformula(
E[Y1|x,T=1]
E[Y0|x,T=0](E [Y1|x, T = 1]
E [Y0|x, T = 0]
ATE = E [Y1 � Y0] =
Ex⇠p(x)[ E [Y1|x, T = 1]�E [Y0|x, T = 0] ]
Undertheassumptionofignorability,wehavethat:
Quantitieswecannotdirectly
estimatefromdata
Theadjustmentformula(
E[Y1|x,T=1]
E[Y0|x,T=0]ATE = E [Y1 � Y0] =
Ex⇠p(x)[ E [Y1|x, T = 1]�E [Y0|x, T = 0] ]
Undertheassumptionofignorability,wehavethat:
E [Y0|x, T = 1]
E [Y1|x, T = 0]
E [Y0|x]E [Y1|x]
Quantitieswecanestimate
fromdata
Theadjustmentformula(
E[Y1|x,T=1]
E[Y0|x,T=0](E [Y1|x, T = 1]
E [Y0|x, T = 0]
ATE = E [Y1 � Y0] =
Ex⇠p(x)[ E [Y1|x, T = 1]�E [Y0|x, T = 0] ]
Empiricallywehavesamplesfrom𝑝(𝑥|𝑇 = 1) or𝑝 𝑥 𝑇 = 0 .Extrapolate to 𝑝(𝑥)
Undertheassumptionofignorability,wehavethat:
Manymethods!
CovariateadjustmentPropensityscore re-weightingDoublyrobustestimatorsMatching…
Covariateadjustment
• Explicitlymodeltherelationshipbetweentreatment,confounders,andoutcome
• Alsocalled“ResponseSurfaceModeling”• UsedforbothITEandATE• Aregressionproblem
𝑥+
𝑥\
𝑥]
𝑇
… 𝑓(𝑥, 𝑇)
𝑦
Regressionmodel
OutcomeCovariates(Features)
𝑥+
𝑥\
𝑥]
𝑇
…
𝑦
NuisanceParameters
Regressionmodel
Outcome
Parameterofinterest
𝑓(𝑥, 𝑇)
Covariateadjustment(parametricg-formula)
• Explicitlymodeltherelationshipbetweentreatment,confounders,andoutcome
• Underignorability,theexpectedcausaleffectof𝑇 on𝑌:𝔼7~5 7 𝔼 𝑌+ 𝑇 = 1, 𝑥 − 𝔼 𝑌( 𝑇 = 0, 𝑥
• Fitamodel𝑓 𝑥, 𝑡 ≈ 𝔼 𝑌 𝑇 = 𝑡, 𝑥
𝐴𝑇𝐸a =1𝑛c𝑓 𝑥', 1 − 𝑓(𝑥', 0)d
'e+
Covariateadjustment(parametricg-formula)
• Explicitlymodeltherelationshipbetweentreatment,confounders,andoutcome
• Underignorability,theexpectedcausaleffectof𝑇 on𝑌:𝔼7~5 7 𝔼 𝑌+ 𝑇 = 1, 𝑥 − 𝔼 𝑌( 𝑇 = 0, 𝑥
• Fitamodel𝑓 𝑥, 𝑡 ≈ 𝔼 𝑌 𝑇 = 𝑡, 𝑥
𝐶𝐴𝑇𝐸a 𝑥' = 𝑓 𝑥', 1 − 𝑓(𝑥', 0)
Treated
𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥
Covariateadjustment
Treated
Control
Treated
𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥
Covariateadjustment
Treated
Control
Counterfactualtreated
Counterfactualcontrol
𝒇
Exampleofhowcovariateadjustmentfailswhenthereisnooverlap
TreatedTreated
Control 𝑥 = 𝑎𝑔𝑒
𝑦 =𝑏𝑙𝑜𝑜𝑑_𝑝𝑟𝑒𝑠.
𝑌+ 𝑥
𝑌( 𝑥