building a recommendation engine - a balancing act

22
Building a Recommendation Engine A Balancing Act Elad Rosenheim, Software Architect June 2015

Upload: elad-rosenheim

Post on 08-Aug-2015

175 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Building a Recommendation Engine - A Balancing act

Building a Recommendation Engine

A Balancing ActElad Rosenheim, Software ArchitectJune 2015

Page 2: Building a Recommendation Engine - A Balancing act

Who am I

• Architect at Dynamic Yield

• Ex-Team Lead - “Predictors”

Current Team Lead - Data Team

Who are We?

Dynamic Yield is a SaaS-based solution

for real-time personalization.

Page 3: Building a Recommendation Engine - A Balancing act

Agenda

• “Previously on ML…”

• A New (Medium) Challenge

• A New (Big) Challenge

• Tooling Up

Page 4: Building a Recommendation Engine - A Balancing act

Previously on ML

• What’s the most effective layout?

- a ton of questions

• From the Stone Age to A/B Tests

- Good: data beats opinion

- Bad: A/B Testing takes time

• Time is Money

• We don’t have time:

- Trending stories, products on sale…

Page 5: Building a Recommendation Engine - A Balancing act

From A/B Testing to Personalization

• “Multi-Arm Bandits” (Google et al)

- Converge towards the leader

- Doesn’t know nor care

about specific users

• ML: We can do better!

- Each of us is a beautiful and unique feature vector

- Find the best variation for each user,

and beat the “on average” leader! (with science OMG)

2.4% 1.7% 0.4%

Page 6: Building a Recommendation Engine - A Balancing act

ML for Online Learning

• Collaborative Filtering?

- Cold Start, Batch process

- Not suitable in this case

• It’s also not a classification problem

• We need online learning over a stream: Contextual Bandits

- Have the model ready at all times!

- Ensure all variations are explored!

Page 7: Building a Recommendation Engine - A Balancing act

Nice, but…

• Limited to a small set of variations (~5-50)

• Model-based, rather than Content-based

- Hard to “cut to the chase”

What to do with a feed of 300 Top Videos?

Page 8: Building a Recommendation Engine - A Balancing act

Shrinking the problem with Heuristics!

• We think we know something about the world,

and want to formalize our intuitions.

• Apply ranking and filter to the top few dozen,

using a formula like Reddit, StumbleUpon, Hacker News…

- Votes, views, time decay, social signals, etc.

• Now, problem is “small” again (yay). Just add ML and shake.

Page 9: Building a Recommendation Engine - A Balancing act

That’s also nice, but…

• What to do with a product catalog of 30k items?

- Classifiers would have a hard (and long) time

• Back to the basics:

- Collaborative Filtering to the rescue?

- Heuristics in a big way

Page 10: Building a Recommendation Engine - A Balancing act

“Collaborative Filtering, What is it Good For?”

• The Classic Case: Netflix

• Explicit vs. Implicit Feedback

- Yehuda Koren Strikes Again

• Heuristics might enter here,

through the back door

• I can haz history data?

This would help warm the model

…but it would still apply only to returning users

Page 11: Building a Recommendation Engine - A Balancing act

So, no more (just) black box?

• I want a (rather) quick bootstrap

• I must target new users

• Let’s think about the contexts in which we recommend

- Homepage

- Product Page

- Cart

Page 12: Building a Recommendation Engine - A Balancing act

The No-Context Context

• What to show in the homepage?

• Let’s apply a ranking formula, this time for e-commerce

- Long-time data, recent trends,

views, carts, promotions,

search, social, …?

• I have to add component <X> to the mix!

- No you don’t. You don’t know all forces at play.

- Better test it.

Page 13: Building a Recommendation Engine - A Balancing act

The Specific Product Context

• Substitute vs. Complementary products

- Not the same for everyone always

• “Similar Items”

- What is similar?

• “Bought Together”

- Hehe, that’s easy right?

Page 14: Building a Recommendation Engine - A Balancing act

Similar Items to <x>

• Keyword Similarity

- Score rare keywords higher!

- We’re in the realm of search engines

- TF/IDF, BM25, Practical Scoring

• Balancing Similarity with Popularity

- Somewhat popular “close” items, very popular “far” items

- At a high-level: a•similarity + b•popularity

- The secret sauce: achieving a stable formula that works.

Page 15: Building a Recommendation Engine - A Balancing act

Bought Together with <x>

• Items commonly in the same cart. Sort and serve!

• Wrong, because I always buy milk.

• We must mitigate the effect of the globally popular

• Again a TF/IDF-type problem

• A very good read:

“People who like this also like…”

Page 16: Building a Recommendation Engine - A Balancing act

Tying it all together

• Which strategy to use, and when?

- CF when you have enough data (on the whole and for the user)

- You can also use heuristics to determine the best strategy,

but don’t go crazy with those

- You can also A/B test, tune, repeat

• ML for strategy selection and item selection?

Can we make the problem “small” enough again?

You’d need to sign the NDA, sorry.

Page 17: Building a Recommendation Engine - A Balancing act

Let’s talk a bit about

Tools

Page 18: Building a Recommendation Engine - A Balancing act

ElasticSearch, so hot right now

• In the center:

- Receive catalog updates, collect all signals

- Pre-calculate and bake rankings

- Intermediate data vs. final index for recommendations

• Should ElasticSearch be the database?

- Yes and no. It’s cool to have an all-in-one document store,

but we can always recreate from “the” big-data store (HBase)

Page 19: Building a Recommendation Engine - A Balancing act

Nice, but… I want to recommend globally!

• For latency, high-availability, security and sanity.

• ES doesn’t do XDCR, and we don’t need the whole thing anyway.

• Stand-alone ES machines in each geo

- Take snapshots in the center & restore in each geo,

via S3 with the AWS Cloud Plugin.

- Fine-grained access control via IAM. Never use “root”-level keys.

• Redis (yay) for pushing notifications and metadata

Page 20: Building a Recommendation Engine - A Balancing act

Center Site

Page Views,Carts,

Search, etc.

Catalog Updates

Worker Process

US East

Recommender

APJ

Recommender

Europe

Recommender

US West

Recommender

Page 21: Building a Recommendation Engine - A Balancing act

Parting words

• ‘ts all about balance

• Formulae won’t make you any less a Jon Snow

• So start small & test

Page 22: Building a Recommendation Engine - A Balancing act

Thank You

Elad [email protected]

For real science:Idan Michaeli

[email protected]

#hiring_but_of_course