Download - Real Time Learning
![Page 1: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/1.jpg)
1©MapR Technologies - Confidential
Real-time Learning
![Page 2: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/2.jpg)
2©MapR Technologies - Confidential
Contact:– [email protected]– @ted_dunning
Slides and such (available late tonight):– http://slideshare.net/tdunning
Hash tags: #mapr #hivedata
![Page 3: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/3.jpg)
3©MapR Technologies - Confidential
We have a product to sell … from a web-site
![Page 4: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/4.jpg)
4©MapR Technologies - Confidential
What picture?
What tag-line?
What call to action?
![Page 5: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/5.jpg)
5©MapR Technologies - Confidential
The Challenge
Design decisions affect probability of success– Cheesy web-sites don’t even sell cheese
The best designers do better when allowed to fail– Exploration juices creativity
But failing is expensive– If only because we could have succeeded– But also because offending or disappointing customers is bad
![Page 6: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/6.jpg)
6©MapR Technologies - Confidential
More Challenges
Too many designs– 5 pictures– 10 tag-lines– 4 calls to action– 3 back-ground colors=> 5 x 10 x 4 x 3 = 600 designs
It gets worse quickly– What about changes on the back-end?– Search engine variants?– Checkout process variants?
![Page 7: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/7.jpg)
7©MapR Technologies - Confidential
Example – AB testing in real-time
I have 15 versions of my landing page Each visitor is assigned to a version– Which version?
A conversion or sale or whatever can happen– How long to wait?
Some versions of the landing page are horrible– Don’t want to give them traffic
![Page 8: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/8.jpg)
8©MapR Technologies - Confidential
A Quick Diversion
You see a coin– What is the probability of heads?– Could it be larger or smaller than that?
I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change?– And did it ever have a single value?
![Page 9: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/9.jpg)
9©MapR Technologies - Confidential
A Philosophical Conclusion
Probability as expressed by humans is subjective and depends on information and experience
![Page 10: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/10.jpg)
10©MapR Technologies - Confidential
I Dunno
![Page 11: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/11.jpg)
11©MapR Technologies - Confidential
5 heads out of 10 throws
![Page 12: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/12.jpg)
12©MapR Technologies - Confidential
2 heads out of 12 throws
![Page 13: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/13.jpg)
13©MapR Technologies - Confidential
So now you understand Bayesian probability
![Page 14: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/14.jpg)
14©MapR Technologies - Confidential
Another Quick Diversion
Let’s play a shell game This is a special shell game It costs you nothing to play The pea has constant probability of being under each shell
(trust me) How do you find the best shell? How do you find it while maximizing the number of wins?
![Page 15: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/15.jpg)
15©MapR Technologies - Confidential
Pause for short con-game
![Page 16: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/16.jpg)
16©MapR Technologies - Confidential
Interim Thoughts
Can you identify winners or losers without trying them out?
Can you ever completely eliminate a shell with a bad streak?
Should you keep trying apparent losers?
![Page 17: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/17.jpg)
17©MapR Technologies - Confidential
Pause for second con-game
![Page 18: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/18.jpg)
18©MapR Technologies - Confidential
So now you understand multi-armed bandits
![Page 19: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/19.jpg)
19©MapR Technologies - Confidential
Conclusions
Can you identify winners or losers without trying them out?No
Can you ever completely eliminate a shell with a bad streak?No
Should you keep trying apparent losers?Yes, but at a decreasing rate
![Page 20: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/20.jpg)
20©MapR Technologies - Confidential
Is there an optimum strategy?
![Page 21: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/21.jpg)
21©MapR Technologies - Confidential
Bayesian Bandit
Compute distributions based on data so far Sample p1, p2 and p2 from these distributions
Pick shell i where i = argmaxi pi
Lemma 1: The probability of picking shell i will match the probability it is the best shell
Lemma 2: This is as good as it gets
![Page 22: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/22.jpg)
22©MapR Technologies - Confidential
And it works!
![Page 23: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/23.jpg)
23©MapR Technologies - Confidential
Video Demo
![Page 24: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/24.jpg)
24©MapR Technologies - Confidential
The Code
Select an alternative
Select and learn
But we already know how to count!
n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0)))
for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k)
![Page 25: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/25.jpg)
25©MapR Technologies - Confidential
The Basic Idea
We can encode a distribution by sampling Sampling allows unification of exploration and exploitation
Can be extended to more general response models
![Page 26: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/26.jpg)
26©MapR Technologies - Confidential
The Original Problem
x1x2
x3
![Page 27: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/27.jpg)
27©MapR Technologies - Confidential
Response Function
![Page 28: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/28.jpg)
28©MapR Technologies - Confidential
Generalized Banditry
Suppose we have an infinite number of bandits– suppose they are each labeled by two real numbers x and y in [0,1]– also that expected payoff is a parameterized function of x and y
– now assume a distribution for θ that we can learn online Selection works by sampling θ, then computing f Learning works by propagating updates back to θ– If f is linear, this is very easy
Don’t just have to have two labels, could have labels and context
![Page 29: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/29.jpg)
29©MapR Technologies - Confidential
Context Variables
x1x2
x3
user.geo env.time env.day_of_week env.weekend
![Page 30: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/30.jpg)
30©MapR Technologies - Confidential
Caveats
Original Bayesian Bandit only requires real-time
Generalized Bandit may require access to long history for learning– Pseudo online learning may be easier than true online
Bandit variables can include content, time of day, day of week
Context variables can include user id, user features
Bandit × context variables provide the real power
![Page 31: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/31.jpg)
31©MapR Technologies - Confidential
You can do thisyourself!
![Page 32: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/32.jpg)
32©MapR Technologies - Confidential
Thank You
![Page 33: Real Time Learning](https://reader036.vdocuments.site/reader036/viewer/2022062312/554f5aa8b4c905c8088b45ef/html5/thumbnails/33.jpg)
33©MapR Technologies - Confidential
Contact:– [email protected]– @ted_dunning
Slides and such (available late tonight):– http://slideshare.net/tdunning
Hash tags: #mapr #hivedata