data science @nyt ; inaugural data science initiative lecture

93
data science @ The New York Times [email protected] [email protected] @chrishwiggins references: bit.ly/brown-refs

Upload: chris-wiggins

Post on 05-Jan-2017

67.099 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: data science @NYT ; inaugural Data Science Initiative Lecture

data science @ The New York Times

[email protected]@nytimes.com@chrishwiggins

references: bit.ly/brown-refs

Page 2: data science @NYT ; inaugural Data Science Initiative Lecture

data science @ The New York Times

Page 3: data science @NYT ; inaugural Data Science Initiative Lecture

data science @ The New York Times

Page 4: data science @NYT ; inaugural Data Science Initiative Lecture

“data science” jobs, jobs, jobs

Page 5: data science @NYT ; inaugural Data Science Initiative Lecture

“data science” jobs, jobs, jobs

Page 6: data science @NYT ; inaugural Data Science Initiative Lecture

data science: mindset & toolset

drew conway, 2010

Page 7: data science @NYT ; inaugural Data Science Initiative Lecture

modern history:2009

Page 8: data science @NYT ; inaugural Data Science Initiative Lecture

modern history:2009

Page 9: data science @NYT ; inaugural Data Science Initiative Lecture

“data science” ancient history: 2001

Page 10: data science @NYT ; inaugural Data Science Initiative Lecture

“data science” ancient history: 2001

Page 11: data science @NYT ; inaugural Data Science Initiative Lecture

data science context

Page 12: data science @NYT ; inaugural Data Science Initiative Lecture

home schooled

Page 13: data science @NYT ; inaugural Data Science Initiative Lecture

B.A. & M.Sc. from Brown

Page 14: data science @NYT ; inaugural Data Science Initiative Lecture

PhD in topology

Page 15: data science @NYT ; inaugural Data Science Initiative Lecture

“By the end of late 1945, I was a statistician rather than a topologist”

Page 16: data science @NYT ; inaugural Data Science Initiative Lecture

invented: “bit”

Page 17: data science @NYT ; inaugural Data Science Initiative Lecture

invented: “software”

Page 18: data science @NYT ; inaugural Data Science Initiative Lecture

invented: “FFT”

Page 19: data science @NYT ; inaugural Data Science Initiative Lecture

“the progenitor of data science.” - @mshron

Page 20: data science @NYT ; inaugural Data Science Initiative Lecture

“The Future of Data Analysis,” 1962John W. Tukey

Page 21: data science @NYT ; inaugural Data Science Initiative Lecture

introduces: “Exploratory data anlaysis”

Page 22: data science @NYT ; inaugural Data Science Initiative Lecture

Tukey 1965, via John Chambers

Page 23: data science @NYT ; inaugural Data Science Initiative Lecture

TUKEY BEGAT S WHICH BEGAT R

Page 24: data science @NYT ; inaugural Data Science Initiative Lecture

Tukey 1972

Page 25: data science @NYT ; inaugural Data Science Initiative Lecture

Tukey 1975

In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information

Page 26: data science @NYT ; inaugural Data Science Initiative Lecture

TUKEY BEGAT VDQI

Page 27: data science @NYT ; inaugural Data Science Initiative Lecture

Tukey 1977

Page 28: data science @NYT ; inaugural Data Science Initiative Lecture

TUKEY BEGAT EDA

Page 29: data science @NYT ; inaugural Data Science Initiative Lecture

fast forward -> 2001

Page 30: data science @NYT ; inaugural Data Science Initiative Lecture

“The primary agents for change should be university departments themselves.”

Page 31: data science @NYT ; inaugural Data Science Initiative Lecture

data science @ The New York Times

histories

1. slow burn @Bell: as heretical statistics (see also Breiman)

2. caught fire 2009-now: as job description

historical rant: bit.ly/data-rant

Page 32: data science @NYT ; inaugural Data Science Initiative Lecture

biology: 1892 vs. 1995

Page 33: data science @NYT ; inaugural Data Science Initiative Lecture

biology: 1892 vs. 1995

biology changed for good.

Page 34: data science @NYT ; inaugural Data Science Initiative Lecture

biology: 1892 vs. 1995

new toolset, new mindset

Page 35: data science @NYT ; inaugural Data Science Initiative Lecture

genetics: 1837 vs. 2012

ML toolset; data science mindset

Page 36: data science @NYT ; inaugural Data Science Initiative Lecture

genetics: 1837 vs. 2012

Page 37: data science @NYT ; inaugural Data Science Initiative Lecture

genetics: 1837 vs. 2012

ML toolset; data science mindset

arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost

Page 38: data science @NYT ; inaugural Data Science Initiative Lecture

data science: mindset & toolset

Page 39: data science @NYT ; inaugural Data Science Initiative Lecture

1851

Page 40: data science @NYT ; inaugural Data Science Initiative Lecture

news: 20th century

church state

Page 41: data science @NYT ; inaugural Data Science Initiative Lecture

church

Page 42: data science @NYT ; inaugural Data Science Initiative Lecture

church

Page 43: data science @NYT ; inaugural Data Science Initiative Lecture

church

Page 44: data science @NYT ; inaugural Data Science Initiative Lecture

news: 20th century

church state

Page 45: data science @NYT ; inaugural Data Science Initiative Lecture

news: 21st century

church state

engineering

Page 46: data science @NYT ; inaugural Data Science Initiative Lecture

1851 1996

newspapering: 1851 vs. 1996

Page 47: data science @NYT ; inaugural Data Science Initiative Lecture

example:

millions of views per hour2015

Page 48: data science @NYT ; inaugural Data Science Initiative Lecture
Page 49: data science @NYT ; inaugural Data Science Initiative Lecture

"...social activities generate large quantities of potentially valuable data...The data were not generated for the purpose of learning; however, the potential for learning is great’’

Page 50: data science @NYT ; inaugural Data Science Initiative Lecture

"...social activities generate large quantities of potentially valuable data...The data were not generated for the purpose of learning; however, the potential for learning is great’’ - J Chambers, Bell Labs,1993

Page 51: data science @NYT ; inaugural Data Science Initiative Lecture

data science: the web

Page 52: data science @NYT ; inaugural Data Science Initiative Lecture

data science: the web

is your “online presence”

Page 53: data science @NYT ; inaugural Data Science Initiative Lecture

data science: the web

is a microscope

Page 54: data science @NYT ; inaugural Data Science Initiative Lecture

data science: the web

is an experimental tool

Page 55: data science @NYT ; inaugural Data Science Initiative Lecture

1851 1996

newspapering: 1851 vs. 1996 vs. 2008

2008

Page 56: data science @NYT ; inaugural Data Science Initiative Lecture

“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank

Page 57: data science @NYT ; inaugural Data Science Initiative Lecture

every publisher is now a startup

Page 58: data science @NYT ; inaugural Data Science Initiative Lecture

every publisher is now a startup

Page 59: data science @NYT ; inaugural Data Science Initiative Lecture
Page 60: data science @NYT ; inaugural Data Science Initiative Lecture

news: 21st century

church state

engineering

Page 61: data science @NYT ; inaugural Data Science Initiative Lecture

news: 21st century

church state

engineering

Page 62: data science @NYT ; inaugural Data Science Initiative Lecture

learnings

Page 63: data science @NYT ; inaugural Data Science Initiative Lecture

learnings

- predictive modeling- descriptive modeling- prescriptive modeling

Page 64: data science @NYT ; inaugural Data Science Initiative Lecture

(actually ML, shhhh…)

- (supervised learning)- (unsupervised learning)- (reinforcement learning)

Page 65: data science @NYT ; inaugural Data Science Initiative Lecture

learnings

- predictive modeling- descriptive modeling- prescriptive modeling

cf. modelingsocialdata.org

Page 66: data science @NYT ; inaugural Data Science Initiative Lecture

predictive modeling, e.g.,

cf. modelingsocialdata.org

Page 67: data science @NYT ; inaugural Data Science Initiative Lecture

predictive modeling, e.g.,

“the funnel”

cf. modelingsocialdata.org

Page 68: data science @NYT ; inaugural Data Science Initiative Lecture

interpretable predictive modeling

supe

r co

ol s

tuff

cf. modelingsocialdata.org

Page 69: data science @NYT ; inaugural Data Science Initiative Lecture

interpretable predictive modeling

supe

r co

ol s

tuff

cf. modelingsocialdata.org

arxiv.org/abs/q-bio/0701021

Page 70: data science @NYT ; inaugural Data Science Initiative Lecture

optimization & learning, e.g.,

“How The New York Times Works “popular mechanics, 2015

Page 71: data science @NYT ; inaugural Data Science Initiative Lecture

optimization & prediction, e.g.,

“How The New York Times Works “popular mechanics, 2015

(some models)

(som

e mo

neys

)

Page 72: data science @NYT ; inaugural Data Science Initiative Lecture

recommendation as predictive modeling

Page 73: data science @NYT ; inaugural Data Science Initiative Lecture

recommendation as predictive modeling

bit.ly/AlexCTM

Page 74: data science @NYT ; inaugural Data Science Initiative Lecture

descriptive modeling, e.g,

cf. daeilkim.com ; import bnpy

Page 75: data science @NYT ; inaugural Data Science Initiative Lecture

modeling your audiencebit.ly/Hughes-Kim-Sudderth-AISTATS15

Page 76: data science @NYT ; inaugural Data Science Initiative Lecture

modeling your audience(optimization, ultimately)

Page 77: data science @NYT ; inaugural Data Science Initiative Lecture

also allows insight+targeting as inferencemodeling your audience

Page 78: data science @NYT ; inaugural Data Science Initiative Lecture

prescriptive modeling

Page 79: data science @NYT ; inaugural Data Science Initiative Lecture

prescriptive modeling

cf. modelingsocialdata.org

Page 80: data science @NYT ; inaugural Data Science Initiative Lecture

prescriptive modeling

aka “A/B testing”;RCT

cf. modelingsocialdata.org

Page 81: data science @NYT ; inaugural Data Science Initiative Lecture

prescriptive modeling, e.g,

Page 82: data science @NYT ; inaugural Data Science Initiative Lecture

prescriptive modeling, e.g,

Page 83: data science @NYT ; inaugural Data Science Initiative Lecture

prescriptive modeling, e.g,

Page 84: data science @NYT ; inaugural Data Science Initiative Lecture

Reporting

Learning

Test

Optimizing

Exploredescriptive:

predictive:

prescriptive:

Page 85: data science @NYT ; inaugural Data Science Initiative Lecture

Reporting

Learning

Test

Optimizing

Exploredescriptive:

predictive:

prescriptive:

Page 86: data science @NYT ; inaugural Data Science Initiative Lecture

common requirements in data science:

Page 87: data science @NYT ; inaugural Data Science Initiative Lecture

common requirements in data science:

1. people2. ideas3. things

cf. John Boyd, USAF

Page 88: data science @NYT ; inaugural Data Science Initiative Lecture

data science: ideas

Page 89: data science @NYT ; inaugural Data Science Initiative Lecture

data skills

data science and…

- data engineering- data embeds- data product- data multiliteracies

cf. “data scientists at work”, ch 1

Page 90: data science @NYT ; inaugural Data Science Initiative Lecture

data science: ideas

- new mindset > new toolset

Page 91: data science @NYT ; inaugural Data Science Initiative Lecture

data science: people

Page 92: data science @NYT ; inaugural Data Science Initiative Lecture

thanks to the data science team!

Page 93: data science @NYT ; inaugural Data Science Initiative Lecture

data science @ The New York Times

[email protected]@nytimes.com@chrishwiggins

references: bit.ly/brown-refs