building personalized applications at scale

28
Building Personalized Applications at Scale Garrett Wu Director of Engineering Odiago, Inc.

Upload: wibidata

Post on 25-Jun-2015

1.895 views

Category:

Technology


0 download

DESCRIPTION

Garrett Wu presents WibiData to the Bay Area Software Engineering meetup.

TRANSCRIPT

Page 1: Building Personalized Applications at Scale

Building Personalized Applications at Scale

Garrett WuDirector of Engineering

Odiago, Inc.

Page 2: Building Personalized Applications at Scale

Personalized Applications

Page 3: Building Personalized Applications at Scale

Personalized Applications

Page 4: Building Personalized Applications at Scale

Examples

● Recommendations○ Amazon○ Netflix

● Ad Targeting○ Hulu○ YouTube

● Fraud Detection○ Visa○ JPMC

● Spam○ GMail

● Search Personalization○ Google

Page 5: Building Personalized Applications at Scale

Overall Requirements

● React to events in near real time.○ Low latency reads/writes.○ Event-driven analysis (not just batch).

● Web scale: 100's of millions of users.○ High throughput reads/writes.

● Reliable.○ Distributed, fault tolerant, graceful degradation.

● Flexible.○ Evolvable schema.○ Support ad-hoc experimentation and analyses.

Page 6: Building Personalized Applications at Scale

Data Flow

Page 7: Building Personalized Applications at Scale

Data Flow

Page 8: Building Personalized Applications at Scale

Datastore Requirements

1. Random writes.2. Analysis (MapReduce).3. Random reads.

Page 9: Building Personalized Applications at Scale

Datastore Requirements

1. Random writes.2. Analysis (MapReduce).3. Random reads.

Page 10: Building Personalized Applications at Scale

Data Model Requirements

1. Write user-centric data.○ "Bob bought the Hunger Games book."○ "Sally viewed product page X."

2. Query user-centric data.○ "What were Jim's most recent 5 purchases?"○ "What are Sue's top 3 recommendations?"

Given everything we know about John:● Transactions.● Tweets.● Likes.

... recommend, classify, predict, cluster, profile.

Page 11: Building Personalized Applications at Scale

User-centric Data Model

Page 12: Building Personalized Applications at Scale

User-centric Data Model

<column> <name>email</name> <description>Email address</description> <schema>"string"</schema></column>

Cells have Avro schemas for evolvable storage and retrieval.

Page 13: Building Personalized Applications at Scale

User-centric Data Model

● 3-D storage with timestamps.

Page 14: Building Personalized Applications at Scale

Analyzing Data: Producers

● produce() generates derived data for a single row:○ recommend○ profile○ classify○ etc.

Page 15: Building Personalized Applications at Scale

Analyzing Data: Gatherers

● gather() aggregates data across all rows.○ build association rules for collaborative filtering.○ train classifier models.○ compute prior probabilities for events.○ etc.

Page 16: Building Personalized Applications at Scale

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond Fishing

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Game CategoriesMiniGolf Pro Golf,

Sports

Kitten Krash Cats,Racing

Apples Everywhere Puzzles

Page 17: Building Personalized Applications at Scale

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Game CategoriesMiniGolf Pro Golf,

Sports

Kitten Krash Cats,Racing

Apples Everywhere Puzzles

Producer

Page 18: Building Personalized Applications at Scale

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Category AdvertisementGolf ESPN.com

Animals Petco.com

Racing Nascar.com

Producer

ESPN.com

Page 19: Building Personalized Applications at Scale

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Category AdvertisementGolf ESPN.com

Animals Petco.com

Racing Nascar.com

Producer

ESPN.com

Wait, where did this come from?

Page 20: Building Personalized Applications at Scale

Example: Gathering Associations

User Games Interests Clicked AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Page 21: Building Personalized Applications at Scale

Example: Gathering Associations

User Games Interests Clicked AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Page 22: Building Personalized Applications at Scale

Example: Gathering Associations

Page 23: Building Personalized Applications at Scale

Example: Gathering Associations

Page 24: Building Personalized Applications at Scale

Example: Gathering Associations

Page 25: Building Personalized Applications at Scale

Example: Gathering Associations

Page 26: Building Personalized Applications at Scale

Example: Gathering Associations

Map

.

.

.

Page 27: Building Personalized Applications at Scale

Example: Gathering Associations

Map

.

.

.

Reduce

Page 28: Building Personalized Applications at Scale

Final Thoughts

● A user-centric data storage model has great advantages:○ Fast per-user reads and writes.○ Already pivoted by your most common analysis.

● HBase provides fast, reliable random-access and scans.○ Billions of rows, millions of columns.○ Integrates well with MapReduce for analysis.

● Build scalable personalized applications with WibiData.○ Check out www.wibidata.com

Garrett Wu | [email protected]