learning to interact...learning to interact. this i’ll be back! using log statistics to justify...
TRANSCRIPT
![Page 1: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/1.jpg)
Léon Bottou
Learning to Interact
![Page 2: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/2.jpg)
this
I’ll be back!
![Page 3: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/3.jpg)
![Page 4: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/4.jpg)
Using log statistics to justify product changes
![Page 5: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/5.jpg)
Product logic
Logged data
OutcomesAlmost there!
![Page 6: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/6.jpg)
Counterfactual reasoning
Offline policy evaluation
Contextual bandits
Explore/Exploit
Multi-world testing
And many more people working on connected topics.
![Page 7: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/7.jpg)
![Page 8: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/8.jpg)
![Page 9: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/9.jpg)
reason about the outcome of manipulations
![Page 10: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/10.jpg)
![Page 11: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/11.jpg)
A: Highlight
suggestion in red
B: User
takes suggestion
![Page 12: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/12.jpg)
after
A: Highlight
suggestion in red
B: User
takes suggestion
?
![Page 13: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/13.jpg)
A: Highlight
suggestion in red
B: User
takes suggestion
C: User often
takes suggestions
![Page 14: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/14.jpg)
Red highlight increases take rate.
despite the fact that he dislikes red
Red highlight decreases take rate.
A: Highlight
suggestion in red
B: User
takes suggestion
C: User often
takes suggestions
?
![Page 15: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/15.jpg)
Average
take rate
Take rate for
C=false
Take rate for
C=true
No highlight24/200
(12%)
Red highlight30/200
(15%)
Users take suggestions
more often with red
highlights
1
![Page 16: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/16.jpg)
Average
take rate
Take rate for
C=false
Take rate for
C=true
No highlight24/200
(12%)
•/182 •/18
Red highlight30/200
(15%)
•/150 •/50
Users take suggestions
more often with red
highlights
Bug favors red
highlights when
user is known to
like suggestions
1
2
![Page 17: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/17.jpg)
Average
take rate
Take rate for
C=false
Take rate for
C=true
No highlight24/200
(12%)
18/182
(10%)
6/18
(33.3%)
Red highlight30/200
(15%)
14/150
(9.3%)
16/50
(32%)
Users take suggestions
more often with red
highlights
Bug favors red
highlights when
user is known to
like suggestions
1
2
In fact, the users take the suggestion
because they like suggestions,
and despite slightly disliking the red highlights!
3
This effect is named “Simpson’s paradox” (1951).
![Page 18: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/18.jpg)
Average
take rate
Take rate for
C=false
Take rate for
C=true
No highlight24/200
(12%)
18/182
(10%)
6/18
(33.3%)
Red highlight30/200
(15%)
14/150
(9.3%)
16/50
(32%)
Users take suggestions
more often with red
highlights
Bug favors red
highlights when
user is known to
like suggestions
1
2
Red highlights are a bad idea for both kinds of users.
3
Using more data won’t make the answer correct!
![Page 19: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/19.jpg)
Unawarepositive correlation
more red highlights
a bad idea all along
![Page 20: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/20.jpg)
A: Randomly highlight
suggestion in red
B: User
takes suggestion
Possible unknown
common causes
![Page 21: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/21.jpg)
no
no
A: Randomly select
treatment
B: Patient recovers
Possible unknown
common causes
?
![Page 22: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/22.jpg)
![Page 23: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/23.jpg)
![Page 24: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/24.jpg)
A/B
Testing
“Multi-world”
Testing
randomization.
![Page 25: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/25.jpg)
Same as
randomized
drug testing!
![Page 26: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/26.jpg)
Outcomes
One
user
BingAB
![Page 27: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/27.jpg)
Users Advertisers
QueriesAds
& Bids
AdsPrices
Clicks (and other outcomes)Click Prediction
ADVERTISER
FEEDBACK LOOP
CLICK PREDICTION
FEEDBACK LOOP
USER
FEEDBACK
LOOP
Ad Placement
![Page 28: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/28.jpg)
Formulate
“”
Code product-grade
Allocate traffic
Collect data long enough
![Page 29: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/29.jpg)
Collect data carefully randomized
Formulate
“
when the data was collected?”
Answer offline
Pay the data collection price only once.
Iterate very quickly because evaluation happens offline.
![Page 30: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/30.jpg)
Reward 𝑟
Action Probability
10%
47%
25%
18%
Policy computes the
probability of each action.
One action is
randomly selected.
The reward depends on
both context and action.
![Page 31: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/31.jpg)
policy 𝜋′ data collection policy 𝜋?”
Observed
Reward 𝑟
Action𝝅 𝝅′
𝒂𝟏 10% 3%
𝒂𝟐 25% 12%
𝒂𝟑 47% 35%
𝒂𝟒 18% 50%
Observed reward 𝑟
occurs with p=47%
under policy 𝜋,
occurs with p=35%
under policy 𝜋′.
Estimate the average
reward under policy 𝜋′,
by giving weight 35/47
to this reward 𝑟.
![Page 32: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/32.jpg)
Work in progressA cloud service targeting contextual bandits-style problems.
![Page 33: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/33.jpg)
Yahoo (2011) – Personalized news.
AdCenter (2011) – “Metropolis” randomization.
LinkedIn (2013) – LinkedIn ad placement.
Bing (2014) – Optimization of click metrics in Bing Speller.
Li, Chu, Langford, Wang – “Unbiased offline evaluation …”, WSDM 2011.
Bottou et al. – “Counterfactual reasoning and learning systems …”, JMLR 2013.
Agarwal - Simons Foundation Workshop, Berkeley, 2013.
Li et al. – Submitted, KDD 2014.
![Page 34: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/34.jpg)
Envision the full spectrum of user interaction policies from the start.
Deploy simple policy with randomized exploration.
Use interaction data to tune the policy and learn how to “delight users”.
Logging is often under-appreciated.
Correctness and completeness of the logs is critical.
Example: logging outcomes rather than decisions…
![Page 35: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/35.jpg)
![Page 36: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/36.jpg)
![Page 37: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/37.jpg)
Large scale user interaction data offers
extraordinary opportunities to improve our products.
This is more complex than vanilla machine learning
because correlation does not imply causation.
MSR has world-class expertise and technology in this area.
![Page 38: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning](https://reader035.vdocuments.site/reader035/viewer/2022062919/5f01cc407e708231d4011634/html5/thumbnails/38.jpg)