applied bandits: supporting health-related researchquality score parameter selection online analysis...

67
Applied bandits: Supporting health-related research Audrey Durand July 2, 2019

Upload: others

Post on 23-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Applied bandits:Supporting health-related research

Audrey Durand

July 2, 2019

Page 2: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

We are going to talk about

Bandits with structure → Neuroscience research

Application to microscopy imaging parameters

Bandits with contexts → Cancer research

Application to adaptive treatment allocation

Page 3: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Stochastic bandits

For each episode 𝑡:• Select a action 𝑘# ∈ 1, 2, … , 𝐾

• Observe outcome 𝑟#~𝐷(𝜇/0)

1 2 3

𝐾

𝜇2 𝜇3 𝜇4 𝜇5

Page 4: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Stochastic bandits

For each episode 𝑡:• Select a action 𝑘# ∈ 1, 2, … , 𝐾

• Observe outcome 𝑟#~𝐷(𝜇/0)

1 2 3

𝐾

𝜇2 𝜇3 𝜇4 𝜇5

Goal: Maximize expected rewards

𝑘⋆=argmax/∈{2,3,…,5}

𝜇/

Minimize 𝑅 𝑇 = ∑#B2C 𝜇/⋆ − 𝜇/0

Page 5: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Exploration/Exploitation trade-off

Exploit: Potentially minimize regret• 𝑘# = argmax

/∈{2,3,…,5}E𝜇/

Explore: Gain information

Many strategies:• 𝜖-Greedy• Optimism in front of uncertainty (UCB)• Thompson Sampling• Best Empirical Sampled Average (BESA)

Theory showingsublinear regret underproper assumptions

Page 6: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

In practice

We cannot compute regret: 𝔼∑#B2C 𝜇/⋆ − 𝜇/0

• Instead we minimize cumulative bad events, e.g. system failures, fractures, patient dropout

• Or we maximize cumulative good events, e.g. clicks, minutes spent on website, lives saved

Page 7: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

In practice

We cannot compute regret: ∑#B2C 𝜇/⋆ − 𝜇/0

• Instead we minimize cumulative bad events, e.g. system failures, fractures, patient dropout

• Or we maximize cumulative good events, e.g. clicks, minutes spent on website, lives saved

We need to face constraints and challenges specific to applications

Page 8: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0.0 0.2 0.4 0.6 0.8 1.0

Actions

0.0

0.5

1.0

Reward

Many actions → Structured bandits

Expected reward is a functionof the action features

𝑓:𝒳 ↦ ℝ

For each episode 𝑡:• Select an action 𝑥# ∈ 𝒳• Obtain a reward 𝑟#~ 𝒟(𝑓(𝑥#))

𝔼[

]

Page 9: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0.0 0.2 0.4 0.6 0.8 1.0

Actions

0.0

0.5

1.0

Reward

Many actions → Structured bandits

Expected reward is a functionof the action features

𝑓:𝒳 ↦ ℝ

For each episode 𝑡:• Select an action 𝑥# ∈ 𝒳• Obtain a reward 𝑟#~ 𝒟(𝑓(𝑥#))

𝔼[

]

Goal: Maximize rewards Find 𝑥⋆= argmaxQ∈𝒳

𝑓(𝑥)

Page 10: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0.0 0.2 0.4 0.6 0.8 1.0

Actions

0.0

0.5

1.0

Reward

Capture structure: Linear model

• Unknown 𝜃 ∈ ℝS

• Mapping 𝜙:𝒳 ↦ ℝS

• 𝑓 𝑥 = 𝜙 𝑥 , 𝜃

𝒳 𝒦𝜙(V)

𝔼[

]

Page 11: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0.0 0.2 0.4 0.6 0.8 1.0

Actions

0.0

0.5

1.0

Reward

Online function approximation

• Unknown 𝜃 ∈ ℝS

• 𝑓 𝑥 = 𝜙 𝑥 , 𝜃

• 𝑥⋆ = argmaxQ∈𝒳

𝜙 𝑥 , 𝜃

For each episode 𝑡:• Select an action 𝑥# ∈ 𝒳• Observe outcome 𝑦# = 𝑓(𝑥#) + 𝜉#

with noise ξ[ ∼ 𝒩(0, σ3)

𝔼[

]

Minimize 𝔼∑#B2C 𝑓 𝑥⋆ − 𝑓(𝑥#)

𝑓

𝑥⋆

Page 12: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Kernel regression

• Kernel 𝑘 𝑥, 𝑥′ = 𝜙 𝑥 , 𝜙(𝑥′)

• Gaussian prior 𝜃 ∼ 𝒩S 0, Σ with Σ =bc

d𝐼 for 𝜆 > 0

𝐊i = 𝑘 𝑥j, 𝑥k 2lj,kliand 𝐤i 𝑥 = 𝑘 𝑥, 𝑥j 2ljli

ℙ 𝑓|𝑥2, … , 𝑥i, 𝑦2, … , 𝑦i ∼ 𝒩 𝑓i 𝑥Q∈𝒳

, 𝑘i 𝑥, 𝑥p Q,Qq∈𝒳

𝑓i 𝑥 = 𝐤i 𝑥 r 𝐊i + 𝜆𝐼s2𝐲i

𝑘i 𝑥, 𝑥p = 𝑘 𝑥, 𝑥′ − 𝐤i 𝑥 r 𝐊i + 𝜆𝐼s2𝐤i(𝑥

p)

𝜙:𝒳 ↦ ℝS

𝑑 can be very large!

Page 13: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Kernel regression

• Kernel 𝑘 𝑥j, 𝑥k = 𝜙 𝑥j , 𝜙(𝑥k)

• Gaussian prior 𝜃 ∼ 𝒩S 0, Σ with Σ =bc

d𝐼 for 𝜆 > 0

For 𝜆 = 𝜎3 → Gaussian Process (Rasmussen and Williams, 2006)

• Example: Pointwise posterior mean and standard deviation

𝜙:𝒳 ↦ ℝS

𝑑 can be very large!

Page 14: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Streaming kernel regression

• Next input location 𝑥# is selected based on the 𝑡 − 1 past observations

• Many algorithm variants bandits, e.g.Kernel UCB, Kernel TS, GP-UCB, GP-TS

Page 15: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Let’s apply those bandits!

Page 16: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Optimizing super-resolution imaging parameters

• Flavie Lavoie-Cardinal• Theresa Wiesner• Anthony Bilodeau• Paul De Koninck

• Louis-Émile Robitaille• Marc-André Gardner• Christian Gagné

Joint work with

D., Wiesner, Gardner, Robitaille, Bilodeau, Gagné, De Koninck, and Lavoie-Cardinal(Nature Comm 2018)

Page 17: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Observing structures at the nanoscale (Hell and Wichmann, 1994)

Page 18: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Problem

Biology: The optimal parameters are not always the same

Typical strategy:• Split samples in two groups A and B• Find good parameters on group A• Perform imaging task on group B

From abberior-instruments.com

Page 19: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Structured bandit problem

Find good parameters during the imaging task• Maximize the acquisition of useful images • Minimize trials of poor parameters

Imaging parameters

Feedback

→ Identify best parameters→ Explore wisely

Page 20: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Optimizing image quality

Recall goal: Maximize the acquisition of images useful to researchers

Imaging parameters

ImageQuality score

Page 21: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Imaging parameters

ImageQuality score

ThompsonSampling

−1 0 1

Parameter

−5.0

−2.5

0.0

2.5

5.0

Objective

N = 3

Qua

lity

Page 22: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Thompson Sampling for selecting imaging parameters

−1 0 1

Parameter

−5.0

−2.5

0.0

2.5

5.0

Objective

N = 3

f fN f̃ Previous points Next location

−1 0 1

Parameter

−1.0

−0.5

0.0

0.5

Objective

N = 10

−1 0 1

Parameter

−1.0

−0.5

0.0

0.5

Objective

N = 29

Qua

lity

Page 23: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

What is good image quality?

Avoiding images like these:

Getting more like these:

Page 24: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

But imaging is a destructive process…

Trade-off image quality and photobleaching

Outcomesoptions

Quality score

Parameterselection

Online analysis Images% bleach

Page 25: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Thompson Sampling for generating outcome options

• One kernel regression model w𝑓j per objective 𝑖

• Sample one function y𝑓j per objective 𝑖

• Option y𝑓(𝑥) at parameter 𝑥: Concatenate y𝑓j(𝑥) for all 𝑖

−1 0 1

Parameter

−4

−2

0

2

4

Objective

N = 3

Quality

−1 0 1

Parameter

−4

−2

0

2

4

Objective

N = 3

Photobleaching

10

10

y𝑓 0 = (0.75, 0.65)

Page 26: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Presenting estimates to the expert• Exploration/Exploitation

in the cloud!

• Expert acts as an argmaxon the preference function

Image quality

Pho

tobl

each

ing

Page 27: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Experiments on neuronal imaging

Three parameters (1000 configurations): • Excitation laser power • Depletion laser power• Duration of imaging per pixel

Different imaging targets:• Neuron: Rat neuron• PC12: Rat tumor cell line• HEK293: Human embryonic kidney cells

Acquire two STED images with ↑ 1st STED quality and ↓ photobleaching

Page 28: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Acquire good images and control photobleaching

Not super-resolution →

Quality

Photobleaching

Page 29: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Sublinear regret, as suggested by theory

Different imaging targets:• Neuron: Rat neuron• PC12: Rat tumor cell line• HEK293: Human

embryonic kidney cells

Page 30: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Fully automated process

Outcomesoptions

Quality score

Parameterselection

Online analysis Images% bleach

Page 31: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Fully automated imaging

Bottom left corner:Not super-resolution

Beginning of optim →

End of optim →

Page 32: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Towards the next application:Getting closer to the patient

Page 33: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Randomized trials

Ran

dom

izat

ion

Option A

Option B

Time

Comparison&

Decision

Page 34: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Adaptive trials

Dynamically adapt the study designbased on previous observations

Effective

Favorable

Promising

Unfavorable

Futile

Page 35: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Writing down the setting…

Treatments:

Probabilityof effectiveness:

For each patient 𝑡:• Select a treatment 𝑘# ∈ 1, 2, … , 𝐾

• Observe outcome 𝑟#~𝐷(𝜇/0)

1 2 3

𝐾

𝜇2 𝜇3 𝜇4 𝜇5

This is stochastic bandits!(Thompson, 1933)

Page 36: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

In the absence of one size fits all strategy

Treatments:

Probabilityof effectiveness:

1 2 3

𝐾

𝜇2,~ 𝜇3,~ 𝜇4,~ 𝜇5,~

𝜇2,� 𝜇3,� 𝜇4,� 𝜇5,�

Context

Page 37: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

In the absence of one size fits all strategy

Treatments:

Probabilityof effectiveness:

1 2 3

𝐾

𝜇2,~ 𝜇3,~ 𝜇4,~ 𝜇5,~

𝜇2,� 𝜇3,� 𝜇4,� 𝜇5,�

You can treat them as independent bandit problems!

Page 38: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

In the absence of one size fits all strategy

Treatments:

Probabilityof effectiveness:

1 2 3

𝐾

𝜇2,~ 𝜇3,~ 𝜇4,~ 𝜇5,~

𝜇2,� 𝜇3,� 𝜇4,� 𝜇5,�

What happens if the number of contexts grows large?

Page 39: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Contextual bandits

Exploit the underlying structure on the context space

Reward = treatment effect

0.0 0.2 0.4 0.6 0.8 1.0

Context

-1

0

1

Reward

Action 1

Action 2

Action 3

Action 4

𝑓2(0.6)

𝑓3(0.6)

𝔼[

] 𝑓4(0.6)

𝑓�(0.6)

Page 40: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Contextual bandits

Expected reward of action 𝑘 is afunction 𝑓/ of the context features

𝑓/: 𝒮 ↦ ℝ

For each episode 𝑡:• Observe a context 𝑠# ∼ Π

• Select an action 𝑘# ∈ 1, 2, … , 𝐾

• Observe a reward 𝑟#~𝐷(𝑓/0(𝑠#))

0.0 0.2 0.4 0.6 0.8 1.0

Context

-1

0

1

Reward

Action 1

Action 2

Action 3

Action 4

𝔼[

]

Page 41: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Contextual bandits

Expected reward of action 𝑘 is afunction 𝑓/ of the context

𝑓/: 𝒮 ↦ ℝ

For each episode 𝑡:• Observe a context 𝑠# ∼ Π

• Select an action 𝑘# ∈ 1, 2, … , 𝐾

• Observe a reward 𝑟#~𝐷(𝑓/0(𝑠#))

0.0 0.2 0.4 0.6 0.8 1.0

Context

-1

0

1

Reward

Action 1

Action 2

Action 3

Action 4

𝔼[

]

Goal: Maximize rewards Find 𝑘#⋆= argmax/∈{2,3,…,5}

𝑓/(𝑠#)

Page 42: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0.0 0.2 0.4 0.6 0.8 1.0

Context

-1

0

1

Reward

Action 1

Action 2

Action 3

Action 4

Online function approximation

• Unknown 𝜃/ ∈ ℝS for each action 𝑘

• 𝑓/ 𝑠 = 𝜙 𝑠 , 𝜃/• 𝑘#

⋆ = argmax/∈{2,3,…,5}

𝜙 𝑠# , 𝜃/

For each episode 𝑡:• Observe a context 𝑠# ∼ Π

• Select an action 𝑘# ∈ 1, 2, … , 𝐾

• Observe reward 𝑟# = 𝑓/0(𝑠#) + 𝜉#

with noise ξ[ ∼ 𝒩(0, σ3)𝔼[

]

Minimize 𝔼∑#B2C 𝑓/0⋆ 𝑠# − 𝑓/0(𝑠#)

Page 43: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Adaptive treatment allocation for mice trials

• Georgios D. Mitsis• Joelle Pineau

• Charis Achilleos• Demetris Iacovides• Katerina Strati

Joint work with

D., Achilleos, Iacovides, Strati, Mitsis, and Pineau (MLHC 2018)

Page 44: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Data acquisition problem

• Mice with induced cancer tumours

• Treatment options: 5FU, Imiquimod, 5FU+Imiquimod, None

• Treatment allocation twice a week

Which treatment should be allocated to patients with cancer given the stage of their disease?

Squamous Cell

Carcinoma

Page 45: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Data acquisition problem

• Mice with induced cancer tumours

• Treatment options: 5FU, Imiquimod, 5FU+Imiquimod, None

• Treatment allocation twice a week

Which treatment should be allocated to patients with cancer given the stage of their disease?

Squamous Cell

Carcinoma

Tumour volume

Page 46: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Phase 1: Randomized allocation (only exploration)

• 6 mice

Processing a mouse:• 2x/week:

– Measure volume of tumours– If all tumours are below a critical level

» Randomly assign one of the four treatment options– Otherwise terminate this animal

Page 47: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Phase 1: Randomized allocation (only exploration)

Result: 12 usable tumours• 163 triplets (tumour volume, treatment, next tumour volume)

Page 48: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0 100 200 300

Volume (mm3)

0

5

10

15

20

25

Number

ofmeasurements

0 25 50 75 100

Number of days

0

200

400

Volume(m

m3)

Phase 1: Randomized allocation (only exploration)

Exponential tumour growth → Few data collected for larger tumours

Page 49: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Phase 2: Adaptive trial (exploration/exploitation)

• 10 mice• Select treatment 2x/week

Adaptive Clinical Trial• Do not fix the experiment design a priori• Adapt treatment allocation based on previous observations• Favor selection of better treatments

• Reduce exposition to less effective treatments

Page 50: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Contextual bandit problem

Improve treatment allocation online

• Maximize amount of acquired data• Minimize trials of poor treatments

Treatment

Tumour volume after treatment

Tumour volume before treatment

→ Identify best action given context→ Explore wisely

Page 51: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Alert: Contexts are not independent of actions!

Recall contextual bandits:

For each episode 𝑡:• Observe a context 𝑠# ∼ Π

• Select an action 𝑘# ∈ 1, 2, … , 𝐾

• Obtain a reward 𝑟#~𝒟(𝑓/0(𝑠#))

Beware of traps!

Page 52: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Reward shaping

Natural reward definition could be 𝑟# = 𝑠# − 𝑠#�2

Tumour volume reduction

Page 53: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Reward shaping

Natural reward definition could be 𝑟# = 𝑠# − 𝑠#�2

Controlling the disease, i.e. maintain tumour constant, has the same value independently of the tumour volume

Tumour volume reduction

Page 54: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Reward shaping

Natural reward definition could be 𝑟# = 𝑠# − 𝑠#�2

Controlling the disease, i.e. maintain tumour constant, has the same value independently of the tumour volume

What we used instead: 𝑟# = −𝑠#�2

Tumour volume reduction

Page 55: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Exploration/Exploitation strategy

Best Empirical Sampled Average: BESA (Baransi et al., 2014)

• Fair comparison of empirical estimators• Opportunities for actions to show how good they are

Samples # Action 1 Action 2

1 0 1

2 1 0

3 1 1

4 0

… …

100 1

Expected reward

E𝜇2

E𝜇3

Page 56: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Exploration/Exploitation strategy

Best Empirical Sampled Average: BESA (Baransi et al., 2014)

• Fair comparison of empirical estimators• Opportunities for actions to show how good they are

Samples # Action 1 Action 2

1 0 1

2 1 0

3 1 1

4 0

… …

100 1

Expected reward

E𝜇2

E𝜇3

Page 57: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Exploration/Exploitation strategy

Best Empirical Sampled Average: BESA (Baransi et al., 2014)

• Fair comparison of empirical estimators• Opportunities for actions to show how good they are

Samples # Action 1 Action 2

1 0 1

2 1 0

3 1 1

4 0

… …

100 1

Expected reward

E𝜇2

E𝜇3

Page 58: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Exploration/Exploitation strategy

Best Empirical Sampled Average: BESA (Baransi et al., 2014)

• Fair comparison of empirical estimators• Opportunities for actions to show how good they are

Samples # Action 1 Action 2

1 0 1

2 1 0

3 1 1

4 0

… …

100 1

Expected reward

E𝜇2

E𝜇3

Page 59: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0 75 150 225

Tumour volume

0

150

300

Treatmenteffect

f

f̃N

0 75 150 225

Tumour volume

0

150

300

Treatmenteffect

f

f̃N

GP BESA: Extension to contextual bandits

0 75 150 225

Tumour volume

0

150

300

Treatmenteffect

f

f̃N

0 75 150 225

Tumour volume

0

150

300

Treatmenteffect

f

f̃N

𝑁 = 50 𝑁 = 20

𝑁 = 10 𝑁 = 5

Page 60: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0 75 150 225 300

Tumour volume

0

150

300

Treatmenteff

ect

st

A1 (N = 5)

A2 (N = 5)

Example of exploration

Action 1: 100 observationsAction 2: 5 observations

0 75 150 225 300

Tumour volume

0

150

300

Treatmenteff

ect

st

A1 (N = 100)

A2 (N = 5)

0 75 150 225 300

Tumour volume

0

150

300

Treatmenteffect

st

A1

A2

True functions: Predictions (all data): Predictions (subsample):

Page 61: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Experimental setting

• 10 mice total

• Processing a group of mice (2-3 subjects)– Twice a week:

» For each mouse in the group:• Measure tumour → reward for last treatment• Select treatment to assign now

– Until death/sacrifice of all mice in group

• Update algorithm with tuples of (volume, treatment, next volume)

• Start next group

Page 62: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Animals live longer

Algorithm updated after each group

None Random 5-FU GP BESA

50

100

150

Lifeduration(days)

None Random 5-FU GP

50

100

150

Lifeduration(days)

A B C D

50

100

150Lifeduration(days)

Page 63: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Evolution of tumour volumes

Slowing the exponential growth

Recall phase 1:

0

200

400

Volume(m

m3)

Group A

0

200

400

Volume(m

m3)

Group B

0

200

400

Volume(m

m3)

Group C

0 50 100

Days

0

200

400

Volume(m

m3)

Group D

0 25 50 75 100

Number of days

0

200

400

Volume(m

m3)

Page 64: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

A better state space covering

Using data in a next phase• More information on the tumor growth process• 40% more data points of volume > 70mm3

0 100 200 300

Volume (mm3)

0

5

10

15

20

25

Number

demeasurements

Page 65: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

0.0

0.5

1.0

Probability

Group A

None 5-FU Imiquimod 5-FU + Imiquimod

Group B

0 50 100 150 200 250 300

Tumour volume (mm3)

0.0

0.5

1.0

Probability

Group C

0 50 100 150 200 250 300

Tumour volume (mm3)

Group D

Evolution of the policy

Page 66: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Conclusion + Take homes

• Bandits is a nice framework for theory, but also has applications! J

• We often break theoretical guarantees in practice L

• How to design algorithms that don’t make unrealistic assumptions?

• Other aspects important in practice were not considered here, e.g.– Fairness in exploration– Safe exploration

Page 67: Applied bandits: Supporting health-related researchQuality score Parameter selection Online analysis Images % bleach Thompson Sampling for generating outcome options • One kernel

Huge thanks again to my collaborators!

Questions?…and more