replication in data science

March 2017Frances HaugenWhen is Data Science

a House of Cards?

Replication in Data Science

Dr June Andrews

Agenda

1

2

3

Explore Pinterest’s content Pinterest Replication Study Inspire the future

Design system

Clothing Cooking Decorating Beauty Teaching Carpentry Cars Animated GIFs Electronics

Stereos Fashion Sewing Articles Painting Photography Nature Cute cats Tattoos Hair

Microscopy TV shows Apps Self help Motorcycles

Chairs

Fashion

Travel

Garden

Chairs

Food

How are users engaging with link domains?

Links are behind every Pin

General MethodologyRefine Goals with Stakeholders

ETL data

Analyze data (Some iteration involved)

Draw Conclusions

Share Conclusions & Support Stakeholders

1

2

3

4

5

General MethodologyRefine Goals with Stakeholders

ETL data

Analyze data (Some iteration involved)

Draw Conclusions

Share Conclusions & Support Stakeholders

1

2

3

4

5

Want Increased Visability

Reproducibility in Data ScienceSame Data + Same Code = Same Results

Jason Chin Writing a Genome Assembler with IPython: http://nbviewer.jupyter.org/github/cschin/Write_A_Genome_Assembler_With_IPython/blob/master/Write_An_Assembler.ipynb

http://nbviewer.jupyter.org/github/cschin/Write_A_Genome_Assembler_With_IPython/blob/master/Write_An_Assembler.ipynb

Replication in Data ScienceSame Goal + Same Data = Same Conclusions

Jason Chin Writing a Genome Assembler with IPython: http://nbviewer.jupyter.org/github/cschin/Write_A_Genome_Assembler_With_IPython/blob/master/Write_An_Assembler.ipynb

Treat with Faviparivir

http://nbviewer.jupyter.org/github/cschin/Write_A_Genome_Assembler_With_IPython/blob/master/Write_An_Assembler.ipynb

How can we replicate?

Can we afford to replicate?

Would it make a difference?

Replication?

Replication Crisis in Psychology

Nature August 2015

Monya Baker - Over half of psychology studies fail reproducibility test

Crowd sourced study on red cards in soccerNature October 2015

Silberzahn & Ahlmann; Crowdsourced research: Many hands make tight work

Cohn; We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results.

… let’s lower the cost!

Data science is expensive

Agenda

1

2

3


Design system

For a sample set of link domains we’re interested in:

• All Pin creates in their first year on Pinterest

• All repins in their first year on Pinterest

• 100k link domains sampled total

Links are behind every Pin

Current cluster analysisETL data into clustering algorithm

Build cluster visualizations

Tune parameters

Add human labels to each cluster

Share human interpretation of clusters

1

2

3

4

5



Tune parameters



1

2

3

4

5

Expensive

Tool Pros Cons

Cluster algorithms (SVM, K-Means, Spectral)

Considers all users Accurate

Tough to communicate Definitions change over time

User experience studiesDeep knowledge

Captures the immeasurableCostly

Considers few users

Domain expert hypothesis Human interpretable Inaccurate

Human in the loop computingCommunity membership identification from small seed sets (Kloumann & Kleinberg)

Kloumann & Kleinberg - Community Membership Identification from Small Seed Sets - KDD

T

Domain Expert

Favorite Clustering Algorithm

Human in the loop computingWhen machine confidence dips, engage with domain expert

Domain Expert


T

Unsure

Confident

?

T


Human in the loop computingIterate through problem space

Domain Expert


T

?

Unsure

Confident

T

T


Human in the loop computingTerminate when Domain Expert determines labeling is done

Domain Expert


T

T That’s all!


Human in the loop computingStage 1: Machine clusters data


Human in the loop computingStage 2: Domain expert creates 1 human interpretable cluster

Domain Expert

Human in the loop computingStage 3: Remove human labeled clusters and iterate

Domain ExpertFavorite

Clustering Algorithm

Provides guided iteration

Python Notebook

Python NotebookSample visualization for each cluster

1000

800

600

400

200

0

1200

800

600

400

200

0

1000

0

35

30

25

20

15

10

5

1

Months Active

12

10

8

6

4

0

2

Few

Many

1

Peak Distance

12

10

8

6

4

0

2

200

150

100

50

01

Pin Creates

1

Repins

1

Total Pins

1

Repin/Create Ratio

Few

Many

Pin creates RepinsFew Many

Iteration 1Title Dark content

Description Fewer than 2 Pins a week on average

Examples Noisy low quality content

Machine Cluster 0

Cluster Size: 60587 link domains representing 58.00% of link domains

Feature Quantities

Pin Creates

Repins

Repins + Pin Creates

Months Active

Peak Distance

Repin to Pin Create Ratio

Iteration 2Pinterest Specials

Pin creates RepinsFew ManyDescription Domains with few Pins, but these Pins thrive in the Pinterest

ecosystem

Calculation

def detect_pinterest_specials(domain_engagement): ratio = domain_engagement.n_repins / max(1.0, float(domain_engagement.n_pin_creates)) return domain_engagement.n_pin_creates <= X and ratio >= Y

Examples Fashion and impulse sites

Iteration 3Steady growth

Pin creates RepinsFew ManyDescription Active Pin creates and steady growth throughout the year

Calculation

def detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.polyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Y

Examples Recipe and DIY sites

Iteration 4Slow growth

Pin creates RepinsFew ManyDescription Similar to steady growth, but not as fast

Calculation

def detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.podef detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.polyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Ylyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Y

Examples Little lower quality recipe and DIY sites

Iteration 5Churning

Pin creates RepinsFew ManyDescription Slowly fade through the year

Calculation

def detect_churning(domain_engagement): (repin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_repins[2:], 1) (pin_create_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_pin_creates[2:], 1) return repin_growth < 0 and pin_create_growth < 0

Examples Fashion sale and click bait sites

Iteration 6Yearly

Pin creates RepinsFew ManyDescription Slowly fade through the year

Calculation

def detect_churning(domain_engagement): (repin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_repins[2:], 1) (pin_create_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_pin_creates[2:], 1) return repin_growth < 0 and pin_create_growth < 0

Examples Seasonal fashion, such as snow boots

Iteration 7Late Bloomer

Pin creates RepinsFew ManyDescription Peak mid year

Calculation

def detect_late_bloomer(domain_engagement): (concavity, pin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), [r + p for (r, p) in zip(domain_engagement.monthly_repins[2:], domain_engagement.monthly_pin_creates[2:])], 2) return concavity < 0

Examples Blogs that get off to a slow start

Clusters• Dark content

• Pinterest specials

• Steady growth

• Slow growth

• Churning

• Yearly

• Late bloomer

… but we lowered the cost!

Data science is expensive



Tune parameters



1

2

3

4

5

Interactive Notebook

9 data scientists and machine learning engineers. Same data, same UI, same day. Everyone finished in about one hour.

So we did it again

9 is huge!

Everything was the same

Baseline clusters Results e Results l Results d Results m Results z Results b Results k

Dark content

Pinterest specials

Steady growth

Slow growth

Churning

Yearly

Late bloomer

Existing clusters as our baseline


Dark content Unpopular (95%) Trailing (90%)

Pinterest specials Trailing (100%)Viral on Pinterest (98%)

Pin creates drop off (97%)

Steady growth Increasing repins (94%)

Continuous growth (94%)

Slow growth

Churning

Yearly

Late bloomer

90% Matches


Dark content Unpopular (95%) Trailing (90%)Original pinny (84%)

Pinterest specials Trailing (100%)Minimal original Pins (66%)

Viral on Pinterest (98%)

Pin creates drop off (97%)

Steady growth Pinterest viral content (62%)

Other (53%) Original Pinny (51%)Viral on the internet (69%)

Increasing repins (94%)


Suspected Save button high Pin creates (73%)

Slow growth Pinterest viral content (55%)

Original Pinny (82%)

Viral on the internet (65%)

Increasing repins (65%)



ChurningOriginal Pinny (68%)

Viral on the internet (53%)

Yearly Original Pinny (71%)

Late bloomer Original Pinny (71%)Continuous growth (55%)


50% Matches


Yearly Seasonal Throwback Seasonal Annual

Steady growth Gaining popularity Increasing repins Continuous growth High engagement

Pinterest specials Initial flurryMinimal original Pins

Viral on Pinterest Pin create drop offUnpopular domains with good content

Ideologically similar clustersBut not related in implementation

9 data scientists 9 answersImpact implications

Build different products

Same product applied to different users

Agenda

1

2

3


Design system

Source placeholder

Signs of suboptimal clusteringLeading with biases

Cherry-picking: responding to a limited subset of the data

Pin creates RepinsFew Many

Seasonal

Differences of perspectiveCluster m - Viral growth centric

• Viral on Pinterest

• Viral on the internet

• Lame

…Good vs. Bad Answers

Differences in Perspective

Chaotic Solution Space

Roots of variations

Bottom Line It matters which data scientist does an analysis

Let’s ask the hard question and brave the answer together

When is data science a house of cards?

Turning of the tideMeasuring data science impact • Experimental systems are now standard

• Data scientists are more available

• Reproducibility is saving analysis

• [Now] Fast and cheap analysis by multiple people from changing algorithms and open source contributions [Prophet]

Next StepsTest variations in analysis • Record analysis decisions to

product outcomes

• Toss analysis variations at experimental systems

• Borrow from additional fields for rigorous processes

• Tailor our analysis techniques to replication

Concrete experimentsBreak down the problem and build up • Prime analysts before

jumping into the clustering

• Set expectations of what good is

• Train analysts with generated data

• Add process reminders for the goal

Let’s crack the code to systematic innovation

Let’s data science, data science!

Pintrest

[email protected] FrancesHaugen Frances_Haugen

Dr. Frances Haugen

We’re hiring!https://engineering.pinterest.com/

[email protected] DrAndrews DrJuneAndrews

Dr. June Andrews

pin.it/data

replication in data science

Data & Analytics