brad klingenberg, director of styling algorithms, stitch fix at mlconf sf - 11/13/15
TRANSCRIPT
![Page 1: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/1.jpg)
Combining Statistics and Expert Human Judgment
for Better Recommendations
Brad Klingenberg, Stitch [email protected] MLconf San Francisco 2015
Three lessons
![Page 2: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/2.jpg)
Lessons from having humans in the loop
Humans in the loop
![Page 3: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/3.jpg)
Lessons from having humans in the loop
Humans in the loop
It works really well, but it’s complicated
![Page 4: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/4.jpg)
Lessons from having humans in the loop
Humans in the loop:
It works really well, but it’s complicated
Lesson 1: There’s more than one way to measure success
![Page 5: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/5.jpg)
Lessons from having humans in the loop
Humans in the loop:
It works really well, but it’s complicated
Lesson 1: There’s more than one way to measure success
Lesson 2: You have to think carefully about what you’re predicting
![Page 6: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/6.jpg)
Lessons from having humans in the loop
Humans in the loop:
It works really well, but it’s complicated
Lesson 1: There’s more than one way to measure success
Lesson 2: You have to think carefully about what you’re predicting
Lesson 3: Humans can say “no”, and this complicates experiments
![Page 7: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/7.jpg)
Humans in the loop at Stitch Fix
![Page 8: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/8.jpg)
Stitch Fix
![Page 9: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/9.jpg)
Stitch Fix
![Page 10: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/10.jpg)
Stitch Fix
![Page 11: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/11.jpg)
Stitch Fix
![Page 12: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/12.jpg)
Styling at Stitch Fix
Personal styling
Inventory
![Page 13: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/13.jpg)
Styling at Stitch Fix: personalized recommendations
Inventory Algorithmic recommendations
Statistics
![Page 14: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/14.jpg)
Styling at Stitch Fix: expert human curation
Human curation
Algorithmic recommendations
![Page 15: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/15.jpg)
Lesson 1: There’s more than one way to measure success
![Page 16: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/16.jpg)
Traditional recommenders
Learning through feedback
![Page 17: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/17.jpg)
Humans in the loop
Learning through feedback
![Page 18: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/18.jpg)
Measuring success
In the end, you are usually interested in optimizing
and this may make sense for the combined system.
But when optimizing an algorithm, it is important to consider selection
![Page 19: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/19.jpg)
Optimizing interaction
For a set of algorithms with the same marginal performance,
We generally prefer the algorithms that
![Page 20: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/20.jpg)
Optimizing interaction
For a set of algorithms with the same marginal performance,
We generally prefer the algorithms that
● increase agreement and reduce needed searching (credible and useful recommendations)
![Page 21: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/21.jpg)
Optimizing interaction
For a set of algorithms with the same marginal performance,
We generally prefer the algorithms that
● increase agreement and reduce needed searching (credible and useful recommendations)
● make the humans more efficient (effortless curation)
![Page 22: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/22.jpg)
Optimizing interaction
For a set of algorithms with the same marginal performance,
We generally prefer the algorithms that
● increase agreement and reduce needed searching (credible and useful recommendations)
● make the humans more efficient (effortless curation)● have a better user experience (fewer bad or annoying recommendations)
![Page 23: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/23.jpg)
Logging selection
This means logging and analyzing selection data
![Page 24: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/24.jpg)
Lesson 2: You have to think carefully about what you’re predicting
![Page 25: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/25.jpg)
Training a model
What should you predict?
Naive approach: ignore selection and train on success data
Advantages
● “traditional” supervised problem● simple historical data
![Page 26: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/26.jpg)
Censoring through selection
Problem: selection can censor your data
![Page 27: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/27.jpg)
Censoring through selection
Problem: selection can censor your data
![Page 28: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/28.jpg)
Censoring through selection
Problem: selection can censor your data
Arms flaunted
SuccessYes
No
Yes No
?
?
p
1-p
![Page 29: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/29.jpg)
Predicting selection
What about predicting selection?
![Page 30: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/30.jpg)
Predicting selection
● Simple, but selection is not really success
● There is a much more direct feedback loop
![Page 31: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/31.jpg)
Training a model
You should probably consider both.
It is most interesting when they disagree
Selection model Success model
vs
![Page 32: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/32.jpg)
Good disagreement
Ignoring an inappropriate recommendation
Client request: “I need an outfit for a glamorous night out!”
![Page 33: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/33.jpg)
Good disagreement
Ignoring an inappropriate recommendation
Client request: “I need an outfit for a glamorous night out!”
![Page 34: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/34.jpg)
Bad disagreement
Stylist not choosing something that would be successful
Predicted probability of success = 85%
?
![Page 35: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/35.jpg)
Bad disagreement
Stylist not choosing something that would be successful
Could lack trust in the recommendation: importance of transparency
Predicted probability of success = 85%
?Based on her
recent purchase
![Page 36: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/36.jpg)
Lesson 3: Humans can say “no”, and this complicates experiments
-or-
“the downside of free will”
![Page 37: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/37.jpg)
Testing with humans in the loop
Toy example: Suppose we want to test a (bad) new policy
![Page 38: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/38.jpg)
Testing with humans in the loop
New rule: all fixes must contain polka dots!
Toy example: Suppose we want to test a (bad) new policy
![Page 39: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/39.jpg)
An experiment
Control Test (Polka Dots Rule)
![Page 40: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/40.jpg)
Selective non-compliance
Humans may not comply. Or, they may comply only selectively
Hmm, no“Please don’t send me
any polka dots” - client X
Test (Polka Dots Rule)
![Page 41: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/41.jpg)
Selective non-compliance
Control Test (Polka Dots Rule)
![Page 42: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/42.jpg)
Selective non-compliance
Control Test (Polka Dots Rule)
![Page 43: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/43.jpg)
Selective non-compliance
Humans help avoid bad choices - this is great for the client!
But, this can obscure the effect you are trying to measure.
![Page 44: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/44.jpg)
Selective non-compliance
Humans help avoid bad choices - this is great for the client!
But, this can obscure the effect you are trying to measure. Helpful analogy: non-compliance in clinical trials. This has been intensively studied
![Page 45: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/45.jpg)
Lessons from having humans in the loop
Humans in the loop
It works really well, but it’s complicated
Lesson 1: There’s more than one way to measure success
Lesson 2: You have to think carefully about what you’re predicting
Lesson 3: Humans can say “no”, and this complicates experiments
![Page 46: Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 11/13/15](https://reader034.vdocuments.site/reader034/viewer/2022052117/588628531a28ab8f2c8b6561/html5/thumbnails/46.jpg)
Thanks!
Questions?(we’re hiring!)