automatic learning of adaptive treatment strategies using
Post on 21-Dec-2021
2 Views
Preview:
TRANSCRIPT
Automatic learning of adaptive treatment strategies
using kernel-based reinforcement learning
Presented by: Joelle Pineau, School of Computer Science, McGill University
Joint work with: Marc G. Bellemare, Adrian Ghizaru, Zaid Zawaideh, Susan Murphy (Univ. Michigan), John Rush (UT Southwestern Medical Center)
Biostatistics Seminar, McGill UniversityMarch 7, 2007
Automatic learning of adaptive treatment strategies Joelle Pineau
Outline
• Background on adaptive treatment design
• Reinforcement learning primer
• Automatic construction of treatment strategies
• Preliminary results from the STAR*D trial
• Discussion
Automatic learning of adaptive treatment strategies Joelle Pineau
Adaptive treatment strategies
• Individually tailored treatments, where treatment type, dosage andduration varies according to patient outcomes.
• Goal is to improve longer-term outcomes, as opposed to focusing ononly short-term benefits, for patients with chronic disorders.
• Recently developed for treatment of chronic illnesses, e.g.» Brooner et al. (2002) Treatment of Cocaine Addiction» Breslin et al. (1999) Treatment of Alcohol Addiction» Prokaska et al. (2001) Treatment of Tobacco Addiction» Rush et al. (2003) Treatment of Depression
Automatic learning of adaptive treatment strategies Joelle Pineau
Example of an adaptive treatment strategy
Following graduation from an intensive outpatient program, alcohol dependent
patients are provided naltrexone (Treatment 1). In the ensuing two months
patients are monitored.
If during that time the patient has 5 or more heavy drinking days (Nonresponder),
then the medication is switched to acamprosate (Treatment 2a).
If the patient reports no more than 4 heavy drinking days (Responder), then the
patient is continued on naltrexone and monitored monthly for signs of relapse
(Treatment 2b).
Automatic learning of adaptive treatment strategies Joelle Pineau
Why use adaptive treatment strategies?
• In Prevention
– Individuals at risk of problem behaviors for different reasons (multiple
causes).
– Periods of high and low risk.
• In Treatment– High heterogeneity in response to any one treatment.
» What works for one person may not work for another.
– Improvement often marred by relapse.» What works now for a person may not work later.
– Co-occurring disorders may be common.
Automatic learning of adaptive treatment strategies Joelle Pineau
The big questions
• What is the best sequencing of treatments?
• What information do we use to make these decisions?
Automatic learning of adaptive treatment strategies Joelle Pineau
SMART Trials
• SMART = Sequential Multiple Assignment RandomizedTrial [Murphy, 2005]
• These are multi-stage trials; conceptually a randomizationtakes place at each stage.
• Goal is to inform the construction of an adaptive treatmentstrategy.
Automatic learning of adaptive treatment strategies Joelle Pineau
Example of a SMART study
Legend:
MedA = Naltrexone
MedB = Acamprosate
CBT= Cognitive Behavioral Therapy
TDM = Telephone Disease Management
EM = Motivation therapy for adherence
Automatic learning of adaptive treatment strategies Joelle Pineau
Question
• Why not use data from multiple trials to construct the dynamictreatment regime?
Choose best initial treatment on the basis of a randomized trial ofinitial treatments.
Choose the best secondary treatment on the basis of a randomizedtrial of secondary treatments.
Etc.
Automatic learning of adaptive treatment strategies Joelle Pineau
Delayed effects
Negative synergies:
Initial treatment may produce higher proportion of responders, but alsoresult in side effects that are sufficiently high so that nonresponders areless likely to adhere to subsequent treatments.
Positive synergies:
A treatment may not be best initially but may have enhanced long termeffectiveness when followed by a particular maintenance treatment.
Diagnostic effects:
Initial treatment may elicit informative patient response that permits betterchoice of subsequent treatment.
Automatic learning of adaptive treatment strategies Joelle Pineau
Examples of SMART designs:
• Thall et al. (2000) Treatment of Prostate Cancer
• CATIE (2001) Treatment of Psychosis in Schizophrenia
• STAR*D (2003) Treatment of Depression
• Oslin (on-going) Treatment of Alcohol Dependence
Automatic learning of adaptive treatment strategies Joelle Pineau
STAR*D
• STAR*D = Sequenced Treatment Alternatives to Relieve Depression– Largest study of depression (14 regional centers, 4041 patients).– Longitudinal (5 years), NIMH-sponsored.– Multiple patient randomizations (up to 4 treatments: medication or
psychotherapy).
Entranceevaluation
Firsttreatment
Firstevaluation
Fourthtreatment
Fourthevaluation…
Automatic learning of adaptive treatment strategies Joelle Pineau
STAR*D Randomizations (slightly simplified!)
Level 1 Treatment CIT
Level 2 Treatment options SER BUP VEN CT CIT+BUP CIT+BUS CIT+CT Treatment strategy Switch Augment
Level 3 Treatment options MRT NTP SER+Li BUP+Li VEN+Li CIT+Li Treatment strategy Switch Augment
Level 4 Treatment options TCP VEN+MRT Treatment strategy Switch
Automatic learning of adaptive treatment strategies Joelle Pineau
Clinical data
• Each evaluation measures 20+ variables, e.g.– Quick inventory of depressive symptoms (QIDS)– Frequency and intensity of side-effects– Work and productive activity impairment– Quality of life enjoyment and satisfaction– Medication compliance
• Research outcomes measured (during clinic visit or telephone interview) at:– beginning of treatment level,– every ~6 weeks during treatment,– exit from treatment level,– every month during follow-up.
• Remission = QIDS at end of treatment level falls below threshold (QIDS≤5).
Automatic learning of adaptive treatment strategies Joelle Pineau
Research objectives for computer scientists
• Long-term:
– Develop new methodologies for automatic learning of adaptive
treatment strategies.
• Current project:
– Automatically construct decision rules from the STAR*D trial
dataset using reinforcement learning techniques.
Automatic learning of adaptive treatment strategies Joelle Pineau
Decision rules
• Adaptive treatment strategies are composed of a series of decisionrules, one per treatment step.
• Decision rule:IF baseline assessments = Z0
AND prior treatments at Steps 1 through t = {A1, …, At} AND clinical assessments at Steps 1 through t = {Z1…, Zt}THEN at Step t+1 apply treatment At+1
Goal is to learn such rules directly from data.
Automatic learning of adaptive treatment strategies Joelle Pineau
An adaptive treatment strategy in STAR*D
Patient is diagnosed with depression. Treat with citalopram.
If the patient’s depression remits and medication is tolerated, then retain patienton citalopram.
If the patient does not remit, or cannot tolerate citalopram, then ascertainpatient’s preference for a medication switch treatment augmentation.
If the patient prefers a switch in treatment then provide sertraline.
If the patient prefers an augmentation and the patient was poorly adherent tocitalopram then provide cognitive therapy in addition to citalopram.
If the patient prefers an augmentation and the patient was adherent to citalopram,then provide bupropion in addition to citalopram.
Automatic learning of adaptive treatment strategies Joelle Pineau
Learning decision rules
• Specify set of candidate decision rules.
• For each decision rule, evaluate:
– its expected reward (or cost),
– its expected effect.
• Select sequence of decision rules with highest expected cumulativereward (or lowest cost).
• Formal framework for such problem: Reinforcement learning.
Automatic learning of adaptive treatment strategies Joelle Pineau
What is reinforcement learning (RL)?
• Inspired by trial-and-error learning studied in the psychology of animallearning:
– good actions are positively reinforced,
– poor actions are negatively reinforced.
• Formalized in computer science and operations research [Bellman,1957; Sutton&Barto, 1998].
Automatic learning of adaptive treatment strategies Joelle Pineau
What is reinforcement learning (RL)? (cont’d)
• Mathematical framework to estimate usefulness of taking sequences ofactions in an evolving, time varying, system.
Intuition: A decision-maker tries (random) actions and observes their
effect until it gradually learns when to apply which action.
• Used to evaluate treatment based on immediate and long-termeffects.
• Can evaluate both therapeutic and diagnostic effects.
Automatic learning of adaptive treatment strategies Joelle Pineau
A reinforcement learning problem is described by:
States: What we know– History of measurements and medications
Actions: What we can do– Treatment choice: CIT, CIT+CT, BUP, psychotherapy, …
Effects: How the action affects the state– E.g. QIDSt=1 = 20 + A=CIT → QIDSt=2=15
Rewards: Time-dependent outcomes– Duration of treatment, final outcome (remission, no remission, left
trial), side effect burden, …
Automatic learning of adaptive treatment strategies Joelle Pineau
A graphical model of the STAR*D trial
where s = statea = actione = effectr = reward
s0
Pretreatment data:Demographics
Screening assessmentsBaseline assessments
Baselinee IVRs
a1=CIT
s1
Level 1 data:Treatment preferenceAssigned treatmentClinical assessments
IVR assessmentsDuration
a2=BUP
s2
Level 2 data:Treatment preferenceAssigned treatmentClinical assessments
IVR assessmentsDuration
…e
r1
r2r2r0
e
Automatic learning of adaptive treatment strategies Joelle Pineau
Reinforcement learning objective
• To learn the best treatment strategy for each state.
“best” = the one that maximizes a given reward function.
• The reward function r(s) is a numerical indicator of whether a state is goodor bad, capturing time-varying outcomes, e.g.:
– Depression score (e.g. QIDS) [Lower score = higher reward]
– Severity of side-effects [Lower score = higher reward]
– Length of treatment [Shorter length = higher reward]
Automatic learning of adaptive treatment strategies Joelle Pineau
Reinforcement learning in STAR*D
Q: Which treatment a2 has highest total (immediate + future) reward?
s0
Pretreatment data:Demographics
Screening assessmentsBaseline assessments
Baselinee IVRs
a1=CIT
s1
Level 1 data:Treatment preferenceAssigned treatmentClinical assessments
IVR assessmentsDuration
a2=??
s2
Level 2 data:Treatment preferenceAssigned treatmentClinical assessments
IVR assessmentsDuration
…e
r1
r2r2r0
e
Automatic learning of adaptive treatment strategies Joelle Pineau
Maximizing the reward function
• To maximize the expected total reward:E [ r (s0) + r (s1) + r (s2) + r (s3) + r (s4) ]
• We define the value function: Vt(s) = maxa [ r (s,a,s’) + ∑s’ Pa
ss’ Vt+1(s’) ]
where Pass’ captures the effect of treatment on state.
• If we know Pass’, r(s,a,s’), then we can evaluate by dynamic programming.
• Problem: we lack a model to estimate Pass’ !
Automatic learning of adaptive treatment strategies Joelle Pineau
Kernel-based Reinforcement Learning
Consider value function: Vt(s) = maxa ∑s’ Pa
ss’ [ r(s,a,s’) + Vt+1(s’) ]
Approximate using kernel-based averaging: Vt(si) = maxa ∑<s,a,s’> wa(si, s) [ r(s,a,s’) + Vt+1(s’) ]
where <s, a, s’> is a given data instance
and wa(si, sj) = φ(||si-sj||/b) is a weight kernel ∑<s,a,s’>φ(||si-s||/b)
Provides a natural way of incorporating measures of patient similarity.
Automatic learning of adaptive treatment strategies Joelle Pineau
Specifics of kernel regression
We assume a Gaussian kernel: φ(||si-sj||/b) = exp(-d2(si,sj)/2σ2 ).
We allocate one kernel function per training example <s, a, s’>.
The width of the kernel σ2 controls the neighbourhood of influence of eachtraining sample.
– σ2=0 means no generalization to unseen states.
– σ2=∞ means every state has (identical) mean value.
The distance function d2(si,sj) measures similarity between patients via theirvalues on the independent variables.
Automatic learning of adaptive treatment strategies Joelle Pineau
Bias-variance trade-off
• Width of the kernel, σ2, controls the neighbourhood of influence ofeach training sample.
– Small bandwidth = more variance.
– Large bandwidth = more bias.
• Bandwidth should be chosen to minimize bias-variance trade-off.
– Need to shrink bandwidth as sample size increases.
– With the right shrinking-rate, we can guarantee:
||Vt - Vt*|| →P 0 as m→∞
Automatic learning of adaptive treatment strategies Joelle Pineau
Instance-based Reinforcement Learning
Estimate non-parametric value function using kernel regression:Vt(s) = maxa ∑s’ wa
ss’ Vt+1(s’)
CITPretreatment data
Level 1 data
Patient n
CITPretreatment data
Level 1 data
Patient n
BUP
CITPretreatment data
Level 1 data
Patient n
VEN
CITPretreatment data
Level 1 data
Patient n
CIT+BUP. . .
CITPretreatment data
Level 1 data
Patient 1 . . .
CITPretreatment data
Level 1 data
Patient 2
BUPLevel 2 data
CITPretreatment data
Level 1 data
Patient 3
BUPLevel 2 data
BUP+LILevel 3 data
CITPretreatment data
Level 1 data
Patient 4
VENLevel 2 data
CITPretreatment data
Level 1 data
Patient N
SERLevel 2 data
NTPLevel 3 data
consider all acceptable treatments
s’ sweeps overpatient database
wass’ weight
proportionalto similarityof measurements
Vt+1 Vt+1
Automatic learning of adaptive treatment strategies Joelle Pineau
Caveat #1: Feature selection
• State representation at level i:si = pretreatment data + treatment 1 + measurements at level 1 +… +
treatment i + measurements at level i
• Total state representation:– dozens of variables, multi-valued (up to 60+ values in some cases)
• Urgent need for feature selection! But…– Data is extremely sparse.– Feature selection in decision-making is an open research question.
• Currently, select features by hand based on discussion with clinicians.
Automatic learning of adaptive treatment strategies Joelle Pineau
Caveat #2: Reward elicitation
• Reward function should be provided by clinicians.
• How do we elicit this information from them?
– Reward needs to have sufficient discriminative information.
– Reward is an abstract concept for a clinician.
– Clinicians may not be consistent in reward assessment.
• Results presented today assume hand-crafted reward function.
• Ongoing work on decision-theoretic method for eliciting preferences[Zawaideh & Pineau, submitted].
Automatic learning of adaptive treatment strategies Joelle Pineau
Current implementation
State variables: Pre-treatment depression score QIDS-C0={1, .., 25}Switch/Augment preference during levels 1… t.Gradient over depression scores during levels 1… t.Treatments at levels 1… t.
Decisions: Level 2 and 3 treatments
Reward function: r = 1 if QIDS-C ≤ 5r = 0 otherwise
Automatic learning of adaptive treatment strategies Joelle Pineau
Level 2 treatment - Switch group
Lev
el 1
QID
S-C
Entrance QIDS-C
Automatic learning of adaptive treatment strategies Joelle Pineau
Level 2 myopic treatment - Augment group
Automatic learning of adaptive treatment strategies Joelle Pineau
Value function at level 2
• Recall value function: Vt(s) = maxa ∑s’ wass’ Vt+1(s’)
QIDS at exit from level 0
QID
S at
exi
t fro
m le
vel 1
Automatic learning of adaptive treatment strategies Joelle Pineau
Estimated Level 3 valuefor patients with pretreatment QIDS-C = 15 and BUP as Level 2 treatment
Level 1 QIDS-C
Lev
el 2
QID
S-C
QIDS at exit from level 1
QID
S at
exi
t fro
m le
vel 2
Automatic learning of adaptive treatment strategies Joelle Pineau
Level 1 QIDS-C
Lev
el 2
QID
S-C
Estimated Level 3 valuefor patients with pretreatment QIDS-C = 15 and SER as Level 2 treatment
QIDS at exit from level 1
QID
S at
exi
t fro
m le
vel 2
Automatic learning of adaptive treatment strategies Joelle Pineau
Summary
• Adaptive treatment strategies can be constructed using:
– data from SMART studies
– reinforcement learning analysis.
• The analysis highlights the role of treatment as a diagnostic tool.
• The analysis presented is generating new hypothesis regarding thetreatment of depression.
Automatic learning of adaptive treatment strategies Joelle Pineau
Clinical challenges
• Establishing close level of collaboration between clinical researchersand methodologists.
• Designing confirmatory trial for learned adaptive strategies.
• Performing preference (reward) elicitation studies.
Automatic learning of adaptive treatment strategies Joelle Pineau
Methodological challenges
• Extending RL methods to provide confidence measures.
• Using RL with data from observational studies.
• Improving problem design:» variables that are predictive of costs/rewards,
» variables that interact with treatment.
Automatic learning of adaptive treatment strategies Joelle Pineau
Related ongoing project
Learning adaptive treatment strategies for epilepsy(joint work with Massimo Avoli, MNI)
– Similar RL framework, different function approximation (not kernel RL).
– Gather data from generative model of epilepsy (rather than SMART trial).
– Faster treatment cycle, strategy can adapt to patient over time.
Automatic learning of adaptive treatment strategies Joelle Pineau
Follow-up
• Recent paper (proofs available by email: jpineau@cs.mcgill.ca):Pineau, Ghizaru, Bellemare, Rush & Murphy. ``Improving the management ofchronic disorders by learning adaptive treatment strategies". Drug and AlcoholDependence, Supplemental Issue on Adaptive Treatment Strategies. Elsevier. Inpress.
• Grad student position available for this project.
• Acknowledgements:– Data generously provided by the STAR*D team.– Partial funding provided by NIH (grant R21 DA019800).– Some slide material borrowed from:
http://www.stat.lsa.umich.edu/~samurphy/nida/seminars.html
top related