data science @nyt ; inaugural data science initiative lecture

Post on 05-Jan-2017

67.099 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

data science @ The New York Times

chris.wiggins@columbia.educhris.wiggins@nytimes.com@chrishwiggins

references: bit.ly/brown-refs

data science @ The New York Times

data science @ The New York Times

“data science” jobs, jobs, jobs

“data science” jobs, jobs, jobs

data science: mindset & toolset

drew conway, 2010

modern history:2009

modern history:2009

“data science” ancient history: 2001

“data science” ancient history: 2001

data science context

home schooled

B.A. & M.Sc. from Brown

PhD in topology

“By the end of late 1945, I was a statistician rather than a topologist”

invented: “bit”

invented: “software”

invented: “FFT”

“the progenitor of data science.” - @mshron

“The Future of Data Analysis,” 1962John W. Tukey

introduces: “Exploratory data anlaysis”

Tukey 1965, via John Chambers

TUKEY BEGAT S WHICH BEGAT R

Tukey 1972

Tukey 1975

In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information

TUKEY BEGAT VDQI

Tukey 1977

TUKEY BEGAT EDA

fast forward -> 2001

“The primary agents for change should be university departments themselves.”

data science @ The New York Times

histories

1. slow burn @Bell: as heretical statistics (see also Breiman)

2. caught fire 2009-now: as job description

historical rant: bit.ly/data-rant

biology: 1892 vs. 1995

biology: 1892 vs. 1995

biology changed for good.

biology: 1892 vs. 1995

new toolset, new mindset

genetics: 1837 vs. 2012

ML toolset; data science mindset

genetics: 1837 vs. 2012

genetics: 1837 vs. 2012

ML toolset; data science mindset

arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost

data science: mindset & toolset

1851

news: 20th century

church state

church

church

church

news: 20th century

church state

news: 21st century

church state

engineering

1851 1996

newspapering: 1851 vs. 1996

example:

millions of views per hour2015

"...social activities generate large quantities of potentially valuable data...The data were not generated for the purpose of learning; however, the potential for learning is great’’

"...social activities generate large quantities of potentially valuable data...The data were not generated for the purpose of learning; however, the potential for learning is great’’ - J Chambers, Bell Labs,1993

data science: the web

data science: the web

is your “online presence”

data science: the web

is a microscope

data science: the web

is an experimental tool

1851 1996

newspapering: 1851 vs. 1996 vs. 2008

2008

“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank

every publisher is now a startup

every publisher is now a startup

news: 21st century

church state

engineering

news: 21st century

church state

engineering

learnings

learnings

- predictive modeling- descriptive modeling- prescriptive modeling

(actually ML, shhhh…)

- (supervised learning)- (unsupervised learning)- (reinforcement learning)

learnings

- predictive modeling- descriptive modeling- prescriptive modeling

cf. modelingsocialdata.org

predictive modeling, e.g.,

cf. modelingsocialdata.org

predictive modeling, e.g.,

“the funnel”

cf. modelingsocialdata.org

interpretable predictive modeling

supe

r co

ol s

tuff

cf. modelingsocialdata.org

interpretable predictive modeling

supe

r co

ol s

tuff

cf. modelingsocialdata.org

arxiv.org/abs/q-bio/0701021

optimization & learning, e.g.,

“How The New York Times Works “popular mechanics, 2015

optimization & prediction, e.g.,

“How The New York Times Works “popular mechanics, 2015

(some models)

(som

e mo

neys

)

recommendation as predictive modeling

recommendation as predictive modeling

bit.ly/AlexCTM

descriptive modeling, e.g,

cf. daeilkim.com ; import bnpy

modeling your audiencebit.ly/Hughes-Kim-Sudderth-AISTATS15

modeling your audience(optimization, ultimately)

also allows insight+targeting as inferencemodeling your audience

prescriptive modeling

prescriptive modeling

cf. modelingsocialdata.org

prescriptive modeling

aka “A/B testing”;RCT

cf. modelingsocialdata.org

prescriptive modeling, e.g,

prescriptive modeling, e.g,

prescriptive modeling, e.g,

Reporting

Learning

Test

Optimizing

Exploredescriptive:

predictive:

prescriptive:

Reporting

Learning

Test

Optimizing

Exploredescriptive:

predictive:

prescriptive:

common requirements in data science:

common requirements in data science:

1. people2. ideas3. things

cf. John Boyd, USAF

data science: ideas

data skills

data science and…

- data engineering- data embeds- data product- data multiliteracies

cf. “data scientists at work”, ch 1

data science: ideas

- new mindset > new toolset

data science: people

thanks to the data science team!

data science @ The New York Times

chris.wiggins@columbia.educhris.wiggins@nytimes.com@chrishwiggins

references: bit.ly/brown-refs

top related