thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to...

51
Thompson sampling for web optimisation 29 Jan 2016 David S. Leslie

Upload: others

Post on 29-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Thompson sampling for web optimisation

29 Jan 2016David S. Leslie

Page 2: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Plan

• Contextual bandits on the web• Thompson sampling in bandits• Selecting multiple adverts

Page 3: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Plan

• Contextual bandits on the web• Thompson sampling in bandits• Selecting multiple adverts Optimising a web server

Page 4: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Contextual bandits . . .

• Receive state signal xt

• Select at from a finite set of actions A• Rewards stationary over time, but depend on both xt and at

rt = r(xt ,at ) + εt

Page 5: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

. . . on the web

Page 6: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Natural solution method

rt = r(xt ,at ) + εt

• For each a ∈ A estimate the function r(·,a) of x using somestatistical procedure

• When xt is presented, calculate r̂t (xt ,a)

p(r(xt ,a) |Ht )

for each a and select an action

Objective

Maximise average reward, minimise regret, select “correct”actions eventually

Page 7: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Natural solution method

rt = r(xt ,at ) + εt

• For each a ∈ A estimate the function r(·,a) of x using somestatistical procedure

• When xt is presented, calculate

r̂t (xt ,a)

p(r(xt ,a) |Ht )for each a and select an action

Objective

Maximise average reward, minimise regret, select “correct”actions eventually

Page 8: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Simple bandits

L R• Receive state signal xt

• Finite set of actions a ∈ A

• Rewards stationary over time, but depend on xt and at

rt = r(at ) + εt

• Estimate r(L) and r(R) using very simple statistics• On trial t , calculate p(r(a) |Ht ) for each a and select an action

Page 9: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Simple bandits

L R• Receive state signal xt

• Finite set of actions a ∈ A

• Rewards stationary over time, but depend on xt and at

rt = r(at ) + εt

• Estimate r(L) and r(R) using very simple statistics• On trial t , calculate p(r(a) |Ht ) for each a and select an action

Page 10: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Solution methods

Full Bayesian decision theory (Gittins indices etc)

• Beautiful optimality theory• Action selected optimises the true objective• Marginalises over all possible future outcomes• Impossible to use in all but the simplest settings

Alternative approach

Heuristics to balance exploration and exploitation. Often involverandomisation

Page 11: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Undirected action selection

Select based purely on expected values r̂t (a)

Greedy: Action at maximises r̂t (a)

ε-greedy: Select greedy action with prob 1− ε, otherwiseexplore a random action

Softmax: P(at = a |Ht ) ∝ exp {r̂t (a)/τ}

Page 12: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Spot the difference!

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

Solid lines are posterior density of the expected reward forred/blue actions. Dashed lines are the means of thesedistributions. Undirected methods treat left and right panelsidentically.

Page 13: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Myopic action selection

Give up on full optimality. Heuristics, usually using more than justr̂t (a), to explore ‘sensibly’

Optimism in face of uncertainty: create confidence intervals foreach action, select action with highest “top” of CI.

Thompson sampling: sample a value from the posterior for eachaction, select action with highest sample

Main ideaCI and posterior both narrow as more data have been observedfor that action: exploration more likely for less-visited actions.

Page 14: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Myopic action selection

Give up on full optimality. Heuristics, usually using more than justr̂t (a), to explore ‘sensibly’

Optimism in face of uncertainty: create confidence intervals foreach action, select action with highest “top” of CI.

Thompson sampling: sample a value from the posterior for eachaction, select action with highest sample

Main ideaCI and posterior both narrow as more data have been observedfor that action: exploration more likely for less-visited actions.

Page 15: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Myopic action selection

Give up on full optimality. Heuristics, usually using more than justr̂t (a), to explore ‘sensibly’

Optimism in face of uncertainty: create confidence intervals foreach action, select action with highest “top” of CI.

Thompson sampling: sample a value from the posterior for eachaction, select action with highest sample

Main ideaCI and posterior both narrow as more data have been observedfor that action: exploration more likely for less-visited actions.

Page 16: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Myopic action selection

Give up on full optimality. Heuristics, usually using more than justr̂t (a), to explore ‘sensibly’

Optimism in face of uncertainty: create confidence intervals foreach action, select action with highest “top” of CI.

Thompson sampling: sample a value from the posterior for eachaction, select action with highest sample

Main ideaCI and posterior both narrow as more data have been observedfor that action: exploration more likely for less-visited actions.

Page 17: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Thompson sampling properties

Posteriors overaction values

→ Thompson sampling→ Probabilisticaction selection

P(at = a |Ht) = P(r(a) is maximal |Ht)

Proof idea:• Let Qt (a) ∼ p(r(a) |Ht )

• {at = a} = {Qt (a) > Qt (b) ∀b 6= a}

Page 18: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Thompson sampling properties

Posteriors overaction values

→ Thompson sampling→ Probabilisticaction selection

Suboptimal actions with high uncertainty are selected withlarger probability than those with low uncertainty

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

Page 19: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Thompson sampling properties

Posteriors overaction values

→ Thompson sampling→ Probabilisticaction selection

Fixed posteriors for unplayed actions⇒ infinite explorationProof idea:• Suppose L is only played finitely often⇒

• posterior for r(L) freezes• R played infinitely often, and posterior for r(R) converges• so sampled values for R converge to r(R)

• So prob of playing L bounded below• So

∑t P(at = L |Ht ) =∞×××× (Borel–Cantelli)

Page 20: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Thompson sampling properties

Posteriors overaction values

→ Thompson sampling→ Probabilisticaction selection

Asymptotic average reward is maxa r(a)Proof idea:• Infinite exploration⇒ posteriors converge to r(a)

• For all large t , sampled values for a are close to r(a) with highprobability

• ∀ε > 0, prob of selecting best is larger than 1− ε for large t• Coupling argument⇒ average reward converges to max

ar(a)

Page 21: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

TheoryMay, Korda, Lee and DL, JMLR 2012

TheoremIn

contextual

bandit problems with stationary reward functionsr(

x ,

a), if Thompson sampling is used then

limT→∞

∑Tt=1 r(

xt ,

at )∑Tt=1 maxa r(

xt ,

a)→ 1

(In English: The average reward is as good as it could be)

Cleverer theory: finite time regret properties, in more restrictedsettings (see Korda, Agrawal and others)

Page 22: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

TheoryMay, Korda, Lee and DL, JMLR 2012

TheoremIn contextual bandit problems with stationary reward functionsr(x ,a), if Thompson sampling is used then

limT→∞

∑Tt=1 r(xt ,at )∑T

t=1 maxa r(xt ,a)→ 1

(In English: The average reward is as good as it could be)

Cleverer theory: finite time regret properties, in more restrictedsettings (see Korda, Agrawal and others)

Page 23: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

A problem

• Let Qt (a) ∼ p(r(a) |Ht ) be sampled value for action a• Decompose as Qt (a) = r̂t (a) + Exploratory bonus• Thompson sampling gives negative exploratory bonuses ????

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

Reduced probability of selecting high variance optimal actions

Page 24: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

A problem

• Let Qt (a) ∼ p(r(a) |Ht ) be sampled value for action a• Decompose as Qt (a) = r̂t (a) + Exploratory bonus• Thompson sampling gives negative exploratory bonuses ????

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

r

p(r|

H)

Reduced probability of selecting high variance optimal actions

Page 25: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Optimistic Bayesian SamplingMay, Korda, Lee and DL, JMLR 2012

• Let Qt (a) ∼ p(r(a) |Ht ) be sampled value for action a• Set QOBS

t (a) = max{Qt (a), r̂t (a)}• Select the action to maximise QOBS

All proofs go through as before

Page 26: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Emergent softwarewith Barry Porter and Matthew Grieves

App <interface>

WebServer

RequestHandler <interface>

RequestHandler RequestHandlerPT

HTTPHandler <interface>

HTTPHandler

HTTPHandlerCMP HTTPHandlerCHCMP HTTPHandlerCH

Compressor <interface>

GZip

Deflate

Cache <interface>

Cache

CacheLFU

CacheLRU

CacheFS

CacheMRU

CacheRR

Thread poolimplementation

Thread per clientimplementation Implementation without

caching or compression

Implementation withcompression

Implementation withcaching

Main method: opens a serversocket and accepts clientconnetions, each of which ispassed to a request handler.

Takes a client socket,applies a concurrencyapproach, and passesthe on socket to theHTTP handler.

Takes a clientsocket, parsesHTTP requestheaders andformulates aresponse.

Implementation withcaching and compression

Uh oh: trying each configuration only once takes 7 minutes. . .

Page 27: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Emergent softwarewith Barry Porter and Matthew Grieves

• Each component of the server can be provided by severalimplementations: 42 different valid configurations

• Configurations perform well under different traffic scenarios• Learn to use best configuration

Framework:Every 10 seconds, try a configuration, observe performance

Uh oh: trying each configuration only once takes 7 minutes. . .

Page 28: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Regression modelsimilar approach to Scott (2010)

Each component corresponds to a factor variable:

ResponseTime ∼RequestHandler + HTTPhandler + Compressor + Cache

A configuration conf corresponds to a binary vector xconf .

Expected response time for deploying conf is given by

xconfβ

where β is unknown.

Only 11 regression coefficients

Page 29: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Iterative decision-making

In each 10 second slot:

• Choose an action based on the fitted model• Observe the outcome• Add the observation to the pool of data• Update the statistical model

Challenge

Need to manage explore–exploit, as in simple bandits

Page 30: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Thompson sampling

Thompson sampling implementation:

Use Bayesian linear regression. Then for each t• sample a βTh from the posterior at time t• deploy conf which maximises xconfβTh

That’s it!

Page 31: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Initial resultsRepeatedly requesting a small text file

“Loss” is the difference between the reciprocal of the optimalresponse time at that instant, and the reciprocal of the actualresponse time

Page 32: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Changing request patternsLow/High text and Low/High Entropy

Different configurations are better for different request patterns

Page 33: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Changing request patternsAlternating traffic characteristics

The request pattern alternates, switching every 10 iterations.Poor performance.

Page 34: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Using contextCoding the context

At end of iteration t , categorise the traffic as HighEnt/LowEnt andas HighText/LowText.

Include Ent and Text as factors in the regressionAlso the interactions Ent:Cache and Text:Compressor

Performance under different traffic characteristics is learned

Page 35: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Using contextDecision-making

Thompson sampling implementation:

Use Bayesian linear regression. Then for each t• sample a βTh from the posterior at time t• deploy conf which maximises ((Entt−1,Textt−1) ? xconf)βTh

This makes the working assumption that(Entt ,Textt ) = (Entt−1,Textt−1)

Page 36: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Using contextResults

The request pattern alternates, switching every 10 iterations.Good performance.

Page 37: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Conclusion

• Contextual bandits and Thompson sampling: simple and(provably and empirically) effective

• Optimistic Bayesian sampling: removes negative exploratorybonus

• Extremely simple to deploy in more complicated settings• Basic statistical approaches are a revelation to (some) ‘Data

Scientists’

Page 38: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

29 Jan 2016David S. Leslie

Page 39: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Backup slides

Page 40: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

CopifyWith G Malhotra, W Simm and R McVey

• Marketplace matching copywriting jobs with authors• Copywriters select from the (ever-changing) available jobs

Page 41: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

A Copify brief

Page 42: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

A Copify brief

Page 43: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

A Copify brief

Page 44: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

The writer’s view

Page 45: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Copify’s challenge

The briefOffer appropriate jobs to a writer when they log in

Main differentiating features:

Jobs: a relatively small amount of free textWriters: history of jobs accepted/declined

Challenges include:• only light computation is allowed• zero to moderate data per writer• each job is completed by only one writer• a different set of available jobs on each login

Page 46: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Encoding a brief

Whenever a job arrives, it is coded into regression vector x ,consisting of:• price• reported topic category• (SVD compressed) ‘bag of semantic topics’ counts

Page 47: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Learning writer preferences

For each writer w , we know• which briefs they have been shown• which briefs they have accepted

Simple logistic regression to estimate writer ‘preferences’ β̂w andcovariance Σw = var(β̂w ). Updated each night for each writer.

If insufficient data (< 20 previous jobs) set β̂w and Σw to aglobally-estimated version with inflated covariance

Page 48: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Displaying jobs

On page load, there are jobs j = 1, . . . , J waiting to be accepted

Thompson sampling principle:

System selects job j with probability job j is the best

Implementation in regression framework:

• sample βTSw ∼ N(β̂w ,Σw ),

• select argmaxj

xjβTSw

Optimistic version: replace xjβTSw with max{xjβ

TSw , xj β̂w}

Page 49: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Displaying jobs

On page load, there are jobs j = 1, . . . , J waiting to be accepted

Thompson sampling principle:

System selects job j with probability job j is the best

Implementation in regression framework:

• sample βTSw ∼ N(β̂w ,Σw ),

• rank jobs according to xjβTSw

Optimistic version: replace xjβTSw with max{xjβ

TSw , xj β̂w}

Page 50: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Effectiveness

The new brief is ranked highly. It is for a blog post about fantasyfootball. This writer has completed many tasks to do with football.The editorial team also know the writer to be “football mad”.

Page 51: Thompson sampling for web optimisation · 2019. 11. 25. · connetions, each of which is passed to a request handler. Takes a client socket, applies a concurrency approach, and passes

Effectiveness

Hopefully some performance stats