replication in data science - a dance between data science & machine learning strata 2016

68
Pinterest

Upload: june-andrews

Post on 13-Apr-2017

2.867 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Pinterest

Iterative supervised clusteringA dance between data science and machine learning

Dr June Andrews — September 2016

Explore Pinterest’s content Question our understanding Inspire the future

Agenda

1

2

3

Design system

Explore Pinterest’s content Question our understanding Inspire the future

Agenda

1

2

3

Design system

Clothing Cooking Decorating Beauty Teaching Carpentry Cars Animated GIFs

Electronics Stereos Fashion Sewing Articles Painting Photography Nature

Cute cats Tattoos Hair Microscopy TV shows Apps Self help Motorcycles

Chairs

Fashion

Travel

Garden

Chairs

Food

Links are behind every PinHow are users engaging with link domains?

2:50 PM 100%

Tool Pros Cons

Cluster algorithms (SVM, K-Means, Spectral)

• Considers all users • Accurate

• Tough to communicate • Definitions change over time

User experience studies • Deep knowledge • Captures the immeasurable

• Costly • Considers few users

Domain expert hypothesis • Human interpretable • Inaccurate

Tool Pros Cons

Cluster algorithms (SVM, K-Means, Spectral)

• Considers all users • Accurate

• Tough to communicate • Definitions change over time

User experience studies • Deep knowledge • Captures the immeasurable

• Costly • Considers few users

Domain expert hypothesis • Human interpretable • Inaccurate

Current cluster analysisClean and load data into favorite clustering algorithm

Build visualizations on top of clusters

Fiddle with parameters in clustering algorithm

Add human labels to each cluster

Share human interpretation of clusters

1

2

3

4

5

Current cluster analysisClean and load data into favorite clustering algorithm

Build visualizations on top of clusters

Fiddle with parameters in clustering algorithm

Add human labels to each cluster

Share human interpretation of clusters

1

2

3

4

5

Fatal flaw

Human in the loop computingCommunity membership identification from small seed sets (Kloumann & Kleinberg)

T

Domain Expert

Favorite Clustering Algorithm

Human in the loop computingWhen machine confidence dips, engage with domain expert

T

Domain Expert

Favorite Clustering Algorithm

?

T

Unsure

Confident

Human in the loop computingWhen machine confidence dips, engage with domain expert

T

Domain Expert

Favorite Clustering Algorithm

T

T

Unsure

Confident

?

Human in the loop computingDomain expert determines when labeling is done

T

Domain Expert

Favorite Clustering Algorithm

T

Thats all!

Current analysis methodologyClean and load data into favorite clustering algorithm

Build visualizations on top of clusters

Fiddle with parameters in clustering algorithm

Add human labels to each cluster

Share human interpretation of clusters

1

2

3

4

5

Human in the loop computingStage 1: Machine clusters data

Favorite Clustering Algorithm

Human in the loop computingStage 2: Domain expert creates 1 human interpretable cluster

Domain Expert

Human in the loop computingStage 3: Remove human labeled clusters and iterate

Favorite Clustering Algorithm

Domain Expert

How are users engaging with link domains?

• For a sample set of link domains we’re interested in: • All Pin creates in their first year on Pinterest • All repins in their first year on Pinterest • 100k link domains sampled total

Links are behind every Pin

2:50 PM 100%

Python Notebook

Provides guided iteration

Python Notebook

Sample visualization for each cluster

Python Notebook

Pin creates RepinsFew Many

Many

Few

Iteration 1

Title Dark content

Description Fewer than 2 Pins a week on average

Examples Noisy low quality content

Iteration 242% of domains left

Few Many Few Some Few Many

0 0 0 0 0 0

Cluster 1 Cluster 3Cluster 2

Pin creates Repins Pin creates RepinsPin creates Repins

DescriptionDomains with few Pins, but these Pins thrive in the Pinterest ecosystem

Calculation

def detect_pinterest_specials(domain_engagement): ratio = domain_engagement.n_repins / max(1.0, float(domain_engagement.n_pin_creates)) return domain_engagement.n_pin_creates <= X and ratio >= Y

Examples Fashion and impulse sites

Iteration 2Pinterest specials

Few

Pinterest specialsRepins

Many

0 0

Pin creates

Iteration 333% of domains left

Few Few Few Some Few Many

0 0 0 0 0 0

Cluster 1 Cluster 3Cluster 2

Pin creates Repins Pin creates RepinsPin creates Repins

Iteration 3Steady growth

DescriptionActive Pin creates and steady growth throughout the year

Calculationdef detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.polyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Y

Examples Recipe and DIY sites

Some

Steady growthRepins

Many

0 0

Pin creates

Iteration 425% of domains left

Few Some Many Some Few Some

0 0 0 0 0 0

Cluster 1 Cluster 3Cluster 2

Pin creates Repins Pin creates RepinsPin creates Repins

Iteration 4Slow growth

Description Similar to steady growth, but not as fast

Calculation

def detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.podef detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.polyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Ylyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Y

Examples Little lower quality recipe and DIY sites

Few

Slow growthRepins

Many

0 0

Pin creates

Iteration 5Churning

Description Slowly fade through the year

Calculation

def detect_churning(domain_engagement): (repin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_repins[2:], 1) (pin_create_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_pin_creates[2:], 1) return repin_growth < 0 and pin_create_growth < 0

Examples Fashion sale and click bait sites

Few

ChurningRepins

Many

0 0

Pin creates

Iteration 6Yearly

Description Slowly fade through the year

Calculation

def detect_churning(domain_engagement): (repin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_repins[2:], 1) (pin_create_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_pin_creates[2:], 1) return repin_growth < 0 and pin_create_growth < 0

Examples Seasonal fashion, such as snow boots

Few

YearlyPin creates Repins

Many

0 0

Iteration 7Late bloomer

Description Peak mid year

Calculation

def detect_late_bloomer(domain_engagement): (concavity, pin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), [r + p for (r, p) in zip(domain_engagement.monthly_repins[2:], domain_engagement.monthly_pin_creates[2:])], 2) return concavity < 0

Examples Blogs that get off to a slow start

Few

Pinterest late bloomerPin creates Repins

Many

0 0

Clusters• Dark content • Pinterest specials • Steady growth • Slow growth • Churning • Yearly • Late bloomer

Explore Pinterest’s content Question our understanding Inspire the future

Agenda

1

2

3

Design system

Does asking twice yield the same answer?Should we cluster again?

2:50 PM 100%

Cost of replicating analysis is leaving other business opportunities on the table

2:50 PM 100%Data science is expensive

Unknown

2:50 PM 100%Would it make a difference?

Replication Crisis in Psychology

Silberzahn & Ahlmann; Crowdsourced research: Many hands make tight work

Nature August 2015

Crowd sourced study on red cards in soccer

Silberzahn & Ahlmann; Crowdsourced research: Many hands make tight work

Nature October 2015

The New York Times on predicting the presidencySeptember, 2016

Cohn; We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results.

… but we’ve lowered the cost!

2:50 PM 100%Data science is expensive

… 9 data scientists and machine learning engineers. Same data, same UI, same day. Everyone finished in ~1 hour.

…so we did it again

Models a real world situation with limited resources

9 is huge!

were the results the same?

Everything was the same

Baseline clusters Results e Results l Results d Results m Results z Results b Results k

Dark content

Pinterest specials

Steady growth

Slow growth

Churning

Yearly

Late bloomer

Existing clusters as our baseline

Baseline clusters Results e Results l Results d Results m Results z Results b Results k

Dark content Unpopular (95%) Trailing (90%)

Pinterest specials Trailing (100%) Viral on Pinterest (98%)

Pin creates drop off (97%)

Steady growth Increasing repins (94%)

Continuous growth (94%)

Slow growth

Churning

Yearly

Late bloomer

90% Matches

Baseline clusters Results e Results l Results d Results m Results z Results b Results k

Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)

Pinterest specials Trailing (100%) Minimal original Pins (66%)

Viral on Pinterest (98%)

Pin creates drop off (97%)

Steady growth Pinterest viral content (62%) Other (53%) Original Pinny

(51%)Viral on the internet (69%)

Increasing repins (94%)

Continuous growth (94%)

Suspected Save button high Pin creates (73%)

Slow growth Pinterest viral content (55%)

Original Pinny (82%)

Viral on the internet (65%)

Increasing repins (65%)

Continuous growth (86%)

Suspected Save button high Pin creates (51%)

Churning Original Pinny (68%)

Viral on the internet (53%)

Yearly Original Pinny (71%)

Late bloomer Original Pinny (71%)

Continuous growth (55%)

Suspected Save button high Pin creates (59%)

50% Matches

Baseline Clusters Results e Results l Results d Results m Results z Results b Results k

Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)

Pinterest specials Trailing (100%) Minimal original Pins (66%)

Viral on Pinterest (98%)

Pin creates drop off (97%)

Steady growth Pinterest viral content (62%) Other (53%) Original Pinny

(51%)Viral on the internet (69%)

Increasing repins (94%)

Continuous growth (94%)

Suspected Save button high Pin creates (73%)

Slow growth Pinterest viral content (55%)

Original Pinny (82%)

Viral on the internet (65%)

Increasing repins (65%)

Continuous growth (86%)

Suspected Save button high Pin creates (51%)

Churning Original Pinny (68%)

Viral on the internet (53%)

Yearly Original Pinny (71%)

Late bloomer Original Pinny (71%)

Continuous growth (55%)

Suspected Save button high Pin creates (59%)

50% Matches

Baseline clusters Results e Results l Results d Results m Results z Results b Results k

Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)

Pinterest specials Trailing (100%) Minimal original Pins (66%)

Viral on Pinterest (98%)

Pin creates drop off (97%)

Steady growth Pinterest viral content (62%) Other (53%) Original Pinny

(51%)Viral on the internet (69%)

Increasing repins (94%)

Continuous growth (94%)

Suspected Save button high Pin creates (73%)

Slow growth Pinterest viral content (55%)

Original Pinny (82%)

Viral on the internet (65%)

Increasing repins (65%)

Continuous growth (86%)

Suspected Save button high Pin creates (51%)

Churning Original Pinny (68%)

Viral on the internet (53%)

Yearly Original Pinny (71%)

Late bloomer Original Pinny (71%)

Continuous growth (55%)

Suspected Save button high Pin creates (59%)

50% Matches

Baseline clusters Results e Results l Results d Results m Results z Results b Results k

Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)

Pinterest specials Trailing (100%) Minimal original Pins (66%)

Viral on Pinterest (98%)

Pin creates drop off (97%)

Steady growth Pinterest viral content (62%) Other (53%) Original Pinny

(51%)Viral on the internet (69%)

Increasing repins (94%)

Continuous growth (94%)

Suspected Save button high Pin creates (73%)

Slow growth Pinterest viral content (55%)

Original Pinny (82%)

Viral on the internet (65%)

Increasing repins (65%)

Continuous growth (86%)

Suspected Save button high Pin creates (51%)

Churning Original Pinny (68%)

Viral on the internet (53%)

Yearly Original Pinny (71%)

Late bloomer Original Pinny (71%)

Continuous growth (55%)

Suspected Save button high Pin creates (59%)

50% Matches

Baseline clusters Results e Results l Results d Results m Results z Results b Results k

Yearly Seasonal Throwback Seasonal Annual

Steady growth Gaining popularity Increasing repins Continuous

growth High engagement

Pinterest specials Initial flurry Minimal original Pins Viral on Pinterest Pin create drop

offUnpopular domains with good content

Conceptually similar clustersBut not related in implementation

…Good vs. bad

Differences in perspective

Two roots of variations

Signs of suboptimal clustering

• Leading with biases • Cherry-picking: responding

to a limited subset of the data

Few

SeasonalPin creates Repins

Few

0 0

Differences of perspective• Results m - Viral growth centric

• Viral on Pinterest • Viral on the internet • Lame

• Results d - Original content centric • Persistent original Pins • Minimal original Pins • Original Pinny

• Results l - Return on investment centric • Underserved • Draught • Trailing

Impact implications

9 data scientists 9 answers• Products depending on cluster used

• Viral mechanisms • Speeding Pin demotion • Promoting underserved Pins

• For same product, domains impacted differ for • Seasonality • Steady growth • Pinterest specials

Bottom lineIt matters which data scientist does an analysis

Explore Pinterest’s content Question our understanding Inspire the future

Agenda

1

2

3

Design system

Let’s ask the hard question and brave the answer together

When is data science a house of cards?

Avalanche of ResourcesMeasuring data science impact• Experimental systems are now standard • Data scientists are more available • Reproducible analysis • [Now] Fast replicable analysis

Utilize ResourcesExperiment• Record end to end from analysis to impact • Innovate on processes • Borrow ideas on replication from science • Tailor our techniques for replication

Concrete experimentsBreak down the problem and build up• Narrow Difference in Perception

through Priming analysts • Develop a rubric of excellence • Train analysts on generated data • Add process stabilizers

Pinterestis interested

pin.it/Data

Reach out!

Dr June Andrews [email protected] / DrAndrews/ DrJuneAndrews

Let’s data science, data science!Let’s crack the code to systematic innovation

Thank you!

We are hiring!https://engineering.pinterest.com/

pin.it/Data