automatic learning of adaptive treatment strategies using

43
Automatic learning of adaptive treatment strategies using kernel-based reinforcement learning Presented by : Joelle Pineau, School of Computer Science, McGill University Joint work with: Marc G. Bellemare, Adrian Ghizaru, Zaid Zawaideh, Susan Murphy (Univ. Michigan), John Rush (UT Southwestern Medical Center) Biostatistics Seminar, McGill University March 7, 2007

Upload: others

Post on 21-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Automatic learning of adaptive treatment strategies

using kernel-based reinforcement learning

Presented by: Joelle Pineau, School of Computer Science, McGill University

Joint work with: Marc G. Bellemare, Adrian Ghizaru, Zaid Zawaideh, Susan Murphy (Univ. Michigan), John Rush (UT Southwestern Medical Center)

Biostatistics Seminar, McGill UniversityMarch 7, 2007

Automatic learning of adaptive treatment strategies Joelle Pineau

Outline

• Background on adaptive treatment design

• Reinforcement learning primer

• Automatic construction of treatment strategies

• Preliminary results from the STAR*D trial

• Discussion

Automatic learning of adaptive treatment strategies Joelle Pineau

Adaptive treatment strategies

• Individually tailored treatments, where treatment type, dosage andduration varies according to patient outcomes.

• Goal is to improve longer-term outcomes, as opposed to focusing ononly short-term benefits, for patients with chronic disorders.

• Recently developed for treatment of chronic illnesses, e.g.» Brooner et al. (2002) Treatment of Cocaine Addiction» Breslin et al. (1999) Treatment of Alcohol Addiction» Prokaska et al. (2001) Treatment of Tobacco Addiction» Rush et al. (2003) Treatment of Depression

Automatic learning of adaptive treatment strategies Joelle Pineau

Example of an adaptive treatment strategy

Following graduation from an intensive outpatient program, alcohol dependent

patients are provided naltrexone (Treatment 1). In the ensuing two months

patients are monitored.

If during that time the patient has 5 or more heavy drinking days (Nonresponder),

then the medication is switched to acamprosate (Treatment 2a).

If the patient reports no more than 4 heavy drinking days (Responder), then the

patient is continued on naltrexone and monitored monthly for signs of relapse

(Treatment 2b).

Automatic learning of adaptive treatment strategies Joelle Pineau

Why use adaptive treatment strategies?

• In Prevention

– Individuals at risk of problem behaviors for different reasons (multiple

causes).

– Periods of high and low risk.

• In Treatment– High heterogeneity in response to any one treatment.

» What works for one person may not work for another.

– Improvement often marred by relapse.» What works now for a person may not work later.

– Co-occurring disorders may be common.

Automatic learning of adaptive treatment strategies Joelle Pineau

The big questions

• What is the best sequencing of treatments?

• What information do we use to make these decisions?

Automatic learning of adaptive treatment strategies Joelle Pineau

SMART Trials

• SMART = Sequential Multiple Assignment RandomizedTrial [Murphy, 2005]

• These are multi-stage trials; conceptually a randomizationtakes place at each stage.

• Goal is to inform the construction of an adaptive treatmentstrategy.

Automatic learning of adaptive treatment strategies Joelle Pineau

Example of a SMART study

Legend:

MedA = Naltrexone

MedB = Acamprosate

CBT= Cognitive Behavioral Therapy

TDM = Telephone Disease Management

EM = Motivation therapy for adherence

Automatic learning of adaptive treatment strategies Joelle Pineau

Question

• Why not use data from multiple trials to construct the dynamictreatment regime?

Choose best initial treatment on the basis of a randomized trial ofinitial treatments.

Choose the best secondary treatment on the basis of a randomizedtrial of secondary treatments.

Etc.

Automatic learning of adaptive treatment strategies Joelle Pineau

Delayed effects

Negative synergies:

Initial treatment may produce higher proportion of responders, but alsoresult in side effects that are sufficiently high so that nonresponders areless likely to adhere to subsequent treatments.

Positive synergies:

A treatment may not be best initially but may have enhanced long termeffectiveness when followed by a particular maintenance treatment.

Diagnostic effects:

Initial treatment may elicit informative patient response that permits betterchoice of subsequent treatment.

Automatic learning of adaptive treatment strategies Joelle Pineau

Examples of SMART designs:

• Thall et al. (2000) Treatment of Prostate Cancer

• CATIE (2001) Treatment of Psychosis in Schizophrenia

• STAR*D (2003) Treatment of Depression

• Oslin (on-going) Treatment of Alcohol Dependence

Automatic learning of adaptive treatment strategies Joelle Pineau

STAR*D

• STAR*D = Sequenced Treatment Alternatives to Relieve Depression– Largest study of depression (14 regional centers, 4041 patients).– Longitudinal (5 years), NIMH-sponsored.– Multiple patient randomizations (up to 4 treatments: medication or

psychotherapy).

Entranceevaluation

Firsttreatment

Firstevaluation

Fourthtreatment

Fourthevaluation…

Automatic learning of adaptive treatment strategies Joelle Pineau

STAR*D Randomizations (slightly simplified!)

Level 1 Treatment CIT

Level 2 Treatment options SER BUP VEN CT CIT+BUP CIT+BUS CIT+CT Treatment strategy Switch Augment

Level 3 Treatment options MRT NTP SER+Li BUP+Li VEN+Li CIT+Li Treatment strategy Switch Augment

Level 4 Treatment options TCP VEN+MRT Treatment strategy Switch

Automatic learning of adaptive treatment strategies Joelle Pineau

Clinical data

• Each evaluation measures 20+ variables, e.g.– Quick inventory of depressive symptoms (QIDS)– Frequency and intensity of side-effects– Work and productive activity impairment– Quality of life enjoyment and satisfaction– Medication compliance

• Research outcomes measured (during clinic visit or telephone interview) at:– beginning of treatment level,– every ~6 weeks during treatment,– exit from treatment level,– every month during follow-up.

• Remission = QIDS at end of treatment level falls below threshold (QIDS≤5).

Automatic learning of adaptive treatment strategies Joelle Pineau

Research objectives for computer scientists

• Long-term:

– Develop new methodologies for automatic learning of adaptive

treatment strategies.

• Current project:

– Automatically construct decision rules from the STAR*D trial

dataset using reinforcement learning techniques.

Automatic learning of adaptive treatment strategies Joelle Pineau

Decision rules

• Adaptive treatment strategies are composed of a series of decisionrules, one per treatment step.

• Decision rule:IF baseline assessments = Z0

AND prior treatments at Steps 1 through t = {A1, …, At} AND clinical assessments at Steps 1 through t = {Z1…, Zt}THEN at Step t+1 apply treatment At+1

Goal is to learn such rules directly from data.

Automatic learning of adaptive treatment strategies Joelle Pineau

An adaptive treatment strategy in STAR*D

Patient is diagnosed with depression. Treat with citalopram.

If the patient’s depression remits and medication is tolerated, then retain patienton citalopram.

If the patient does not remit, or cannot tolerate citalopram, then ascertainpatient’s preference for a medication switch treatment augmentation.

If the patient prefers a switch in treatment then provide sertraline.

If the patient prefers an augmentation and the patient was poorly adherent tocitalopram then provide cognitive therapy in addition to citalopram.

If the patient prefers an augmentation and the patient was adherent to citalopram,then provide bupropion in addition to citalopram.

Automatic learning of adaptive treatment strategies Joelle Pineau

Learning decision rules

• Specify set of candidate decision rules.

• For each decision rule, evaluate:

– its expected reward (or cost),

– its expected effect.

• Select sequence of decision rules with highest expected cumulativereward (or lowest cost).

• Formal framework for such problem: Reinforcement learning.

Automatic learning of adaptive treatment strategies Joelle Pineau

What is reinforcement learning (RL)?

• Inspired by trial-and-error learning studied in the psychology of animallearning:

– good actions are positively reinforced,

– poor actions are negatively reinforced.

• Formalized in computer science and operations research [Bellman,1957; Sutton&Barto, 1998].

Automatic learning of adaptive treatment strategies Joelle Pineau

What is reinforcement learning (RL)? (cont’d)

• Mathematical framework to estimate usefulness of taking sequences ofactions in an evolving, time varying, system.

Intuition: A decision-maker tries (random) actions and observes their

effect until it gradually learns when to apply which action.

• Used to evaluate treatment based on immediate and long-termeffects.

• Can evaluate both therapeutic and diagnostic effects.

Automatic learning of adaptive treatment strategies Joelle Pineau

A reinforcement learning problem is described by:

States: What we know– History of measurements and medications

Actions: What we can do– Treatment choice: CIT, CIT+CT, BUP, psychotherapy, …

Effects: How the action affects the state– E.g. QIDSt=1 = 20 + A=CIT → QIDSt=2=15

Rewards: Time-dependent outcomes– Duration of treatment, final outcome (remission, no remission, left

trial), side effect burden, …

Automatic learning of adaptive treatment strategies Joelle Pineau

A graphical model of the STAR*D trial

where s = statea = actione = effectr = reward

s0

Pretreatment data:Demographics

Screening assessmentsBaseline assessments

Baselinee IVRs

a1=CIT

s1

Level 1 data:Treatment preferenceAssigned treatmentClinical assessments

IVR assessmentsDuration

a2=BUP

s2

Level 2 data:Treatment preferenceAssigned treatmentClinical assessments

IVR assessmentsDuration

…e

r1

r2r2r0

e

Automatic learning of adaptive treatment strategies Joelle Pineau

Reinforcement learning objective

• To learn the best treatment strategy for each state.

“best” = the one that maximizes a given reward function.

• The reward function r(s) is a numerical indicator of whether a state is goodor bad, capturing time-varying outcomes, e.g.:

– Depression score (e.g. QIDS) [Lower score = higher reward]

– Severity of side-effects [Lower score = higher reward]

– Length of treatment [Shorter length = higher reward]

Automatic learning of adaptive treatment strategies Joelle Pineau

Reinforcement learning in STAR*D

Q: Which treatment a2 has highest total (immediate + future) reward?

s0

Pretreatment data:Demographics

Screening assessmentsBaseline assessments

Baselinee IVRs

a1=CIT

s1

Level 1 data:Treatment preferenceAssigned treatmentClinical assessments

IVR assessmentsDuration

a2=??

s2

Level 2 data:Treatment preferenceAssigned treatmentClinical assessments

IVR assessmentsDuration

…e

r1

r2r2r0

e

Automatic learning of adaptive treatment strategies Joelle Pineau

Maximizing the reward function

• To maximize the expected total reward:E [ r (s0) + r (s1) + r (s2) + r (s3) + r (s4) ]

• We define the value function: Vt(s) = maxa [ r (s,a,s’) + ∑s’ Pa

ss’ Vt+1(s’) ]

where Pass’ captures the effect of treatment on state.

• If we know Pass’, r(s,a,s’), then we can evaluate by dynamic programming.

• Problem: we lack a model to estimate Pass’ !

Automatic learning of adaptive treatment strategies Joelle Pineau

Kernel-based Reinforcement Learning

Consider value function: Vt(s) = maxa ∑s’ Pa

ss’ [ r(s,a,s’) + Vt+1(s’) ]

Approximate using kernel-based averaging: Vt(si) = maxa ∑<s,a,s’> wa(si, s) [ r(s,a,s’) + Vt+1(s’) ]

where <s, a, s’> is a given data instance

and wa(si, sj) = φ(||si-sj||/b) is a weight kernel ∑<s,a,s’>φ(||si-s||/b)

Provides a natural way of incorporating measures of patient similarity.

Automatic learning of adaptive treatment strategies Joelle Pineau

Specifics of kernel regression

We assume a Gaussian kernel: φ(||si-sj||/b) = exp(-d2(si,sj)/2σ2 ).

We allocate one kernel function per training example <s, a, s’>.

The width of the kernel σ2 controls the neighbourhood of influence of eachtraining sample.

– σ2=0 means no generalization to unseen states.

– σ2=∞ means every state has (identical) mean value.

The distance function d2(si,sj) measures similarity between patients via theirvalues on the independent variables.

Automatic learning of adaptive treatment strategies Joelle Pineau

Bias-variance trade-off

• Width of the kernel, σ2, controls the neighbourhood of influence ofeach training sample.

– Small bandwidth = more variance.

– Large bandwidth = more bias.

• Bandwidth should be chosen to minimize bias-variance trade-off.

– Need to shrink bandwidth as sample size increases.

– With the right shrinking-rate, we can guarantee:

||Vt - Vt*|| →P 0 as m→∞

Automatic learning of adaptive treatment strategies Joelle Pineau

Instance-based Reinforcement Learning

Estimate non-parametric value function using kernel regression:Vt(s) = maxa ∑s’ wa

ss’ Vt+1(s’)

CITPretreatment data

Level 1 data

Patient n

CITPretreatment data

Level 1 data

Patient n

BUP

CITPretreatment data

Level 1 data

Patient n

VEN

CITPretreatment data

Level 1 data

Patient n

CIT+BUP. . .

CITPretreatment data

Level 1 data

Patient 1 . . .

CITPretreatment data

Level 1 data

Patient 2

BUPLevel 2 data

CITPretreatment data

Level 1 data

Patient 3

BUPLevel 2 data

BUP+LILevel 3 data

CITPretreatment data

Level 1 data

Patient 4

VENLevel 2 data

CITPretreatment data

Level 1 data

Patient N

SERLevel 2 data

NTPLevel 3 data

consider all acceptable treatments

s’ sweeps overpatient database

wass’ weight

proportionalto similarityof measurements

Vt+1 Vt+1

Automatic learning of adaptive treatment strategies Joelle Pineau

Caveat #1: Feature selection

• State representation at level i:si = pretreatment data + treatment 1 + measurements at level 1 +… +

treatment i + measurements at level i

• Total state representation:– dozens of variables, multi-valued (up to 60+ values in some cases)

• Urgent need for feature selection! But…– Data is extremely sparse.– Feature selection in decision-making is an open research question.

• Currently, select features by hand based on discussion with clinicians.

Automatic learning of adaptive treatment strategies Joelle Pineau

Caveat #2: Reward elicitation

• Reward function should be provided by clinicians.

• How do we elicit this information from them?

– Reward needs to have sufficient discriminative information.

– Reward is an abstract concept for a clinician.

– Clinicians may not be consistent in reward assessment.

• Results presented today assume hand-crafted reward function.

• Ongoing work on decision-theoretic method for eliciting preferences[Zawaideh & Pineau, submitted].

Automatic learning of adaptive treatment strategies Joelle Pineau

Current implementation

State variables: Pre-treatment depression score QIDS-C0={1, .., 25}Switch/Augment preference during levels 1… t.Gradient over depression scores during levels 1… t.Treatments at levels 1… t.

Decisions: Level 2 and 3 treatments

Reward function: r = 1 if QIDS-C ≤ 5r = 0 otherwise

Automatic learning of adaptive treatment strategies Joelle Pineau

Level 2 treatment - Switch group

Lev

el 1

QID

S-C

Entrance QIDS-C

Automatic learning of adaptive treatment strategies Joelle Pineau

Level 2 treatment - Augment group

Automatic learning of adaptive treatment strategies Joelle Pineau

Level 2 myopic treatment - Augment group

Automatic learning of adaptive treatment strategies Joelle Pineau

Value function at level 2

• Recall value function: Vt(s) = maxa ∑s’ wass’ Vt+1(s’)

QIDS at exit from level 0

QID

S at

exi

t fro

m le

vel 1

Automatic learning of adaptive treatment strategies Joelle Pineau

Estimated Level 3 valuefor patients with pretreatment QIDS-C = 15 and BUP as Level 2 treatment

Level 1 QIDS-C

Lev

el 2

QID

S-C

QIDS at exit from level 1

QID

S at

exi

t fro

m le

vel 2

Automatic learning of adaptive treatment strategies Joelle Pineau

Level 1 QIDS-C

Lev

el 2

QID

S-C

Estimated Level 3 valuefor patients with pretreatment QIDS-C = 15 and SER as Level 2 treatment

QIDS at exit from level 1

QID

S at

exi

t fro

m le

vel 2

Automatic learning of adaptive treatment strategies Joelle Pineau

Summary

• Adaptive treatment strategies can be constructed using:

– data from SMART studies

– reinforcement learning analysis.

• The analysis highlights the role of treatment as a diagnostic tool.

• The analysis presented is generating new hypothesis regarding thetreatment of depression.

Automatic learning of adaptive treatment strategies Joelle Pineau

Clinical challenges

• Establishing close level of collaboration between clinical researchersand methodologists.

• Designing confirmatory trial for learned adaptive strategies.

• Performing preference (reward) elicitation studies.

Automatic learning of adaptive treatment strategies Joelle Pineau

Methodological challenges

• Extending RL methods to provide confidence measures.

• Using RL with data from observational studies.

• Improving problem design:» variables that are predictive of costs/rewards,

» variables that interact with treatment.

Automatic learning of adaptive treatment strategies Joelle Pineau

Related ongoing project

Learning adaptive treatment strategies for epilepsy(joint work with Massimo Avoli, MNI)

– Similar RL framework, different function approximation (not kernel RL).

– Gather data from generative model of epilepsy (rather than SMART trial).

– Faster treatment cycle, strategy can adapt to patient over time.

Automatic learning of adaptive treatment strategies Joelle Pineau

Follow-up

• Recent paper (proofs available by email: [email protected]):Pineau, Ghizaru, Bellemare, Rush & Murphy. ``Improving the management ofchronic disorders by learning adaptive treatment strategies". Drug and AlcoholDependence, Supplemental Issue on Adaptive Treatment Strategies. Elsevier. Inpress.

• Grad student position available for this project.

• Acknowledgements:– Data generously provided by the STAR*D team.– Partial funding provided by NIH (grant R21 DA019800).– Some slide material borrowed from:

http://www.stat.lsa.umich.edu/~samurphy/nida/seminars.html