data science @nyt ; inaugural data science initiative lecture
TRANSCRIPT
data science @ The New York Times
[email protected]@nytimes.com@chrishwiggins
references: bit.ly/brown-refs
data science @ The New York Times
data science @ The New York Times
“data science” jobs, jobs, jobs
“data science” jobs, jobs, jobs
data science: mindset & toolset
drew conway, 2010
modern history:2009
modern history:2009
“data science” ancient history: 2001
“data science” ancient history: 2001
data science context
home schooled
B.A. & M.Sc. from Brown
PhD in topology
“By the end of late 1945, I was a statistician rather than a topologist”
invented: “bit”
invented: “software”
invented: “FFT”
“the progenitor of data science.” - @mshron
“The Future of Data Analysis,” 1962John W. Tukey
introduces: “Exploratory data anlaysis”
Tukey 1965, via John Chambers
TUKEY BEGAT S WHICH BEGAT R
Tukey 1972
Tukey 1975
In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information
TUKEY BEGAT VDQI
Tukey 1977
TUKEY BEGAT EDA
fast forward -> 2001
“The primary agents for change should be university departments themselves.”
data science @ The New York Times
histories
1. slow burn @Bell: as heretical statistics (see also Breiman)
2. caught fire 2009-now: as job description
historical rant: bit.ly/data-rant
biology: 1892 vs. 1995
biology: 1892 vs. 1995
biology changed for good.
biology: 1892 vs. 1995
new toolset, new mindset
genetics: 1837 vs. 2012
ML toolset; data science mindset
genetics: 1837 vs. 2012
genetics: 1837 vs. 2012
ML toolset; data science mindset
arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
data science: mindset & toolset
1851
news: 20th century
church state
church
church
church
news: 20th century
church state
news: 21st century
church state
engineering
1851 1996
newspapering: 1851 vs. 1996
example:
millions of views per hour2015
"...social activities generate large quantities of potentially valuable data...The data were not generated for the purpose of learning; however, the potential for learning is great’’
"...social activities generate large quantities of potentially valuable data...The data were not generated for the purpose of learning; however, the potential for learning is great’’ - J Chambers, Bell Labs,1993
data science: the web
data science: the web
is your “online presence”
data science: the web
is a microscope
data science: the web
is an experimental tool
1851 1996
newspapering: 1851 vs. 1996 vs. 2008
2008
“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank
every publisher is now a startup
every publisher is now a startup
news: 21st century
church state
engineering
news: 21st century
church state
engineering
learnings
learnings
- predictive modeling- descriptive modeling- prescriptive modeling
(actually ML, shhhh…)
- (supervised learning)- (unsupervised learning)- (reinforcement learning)
learnings
- predictive modeling- descriptive modeling- prescriptive modeling
cf. modelingsocialdata.org
predictive modeling, e.g.,
cf. modelingsocialdata.org
predictive modeling, e.g.,
“the funnel”
cf. modelingsocialdata.org
interpretable predictive modeling
supe
r co
ol s
tuff
cf. modelingsocialdata.org
interpretable predictive modeling
supe
r co
ol s
tuff
cf. modelingsocialdata.org
arxiv.org/abs/q-bio/0701021
optimization & learning, e.g.,
“How The New York Times Works “popular mechanics, 2015
optimization & prediction, e.g.,
“How The New York Times Works “popular mechanics, 2015
(some models)
(som
e mo
neys
)
recommendation as predictive modeling
recommendation as predictive modeling
bit.ly/AlexCTM
descriptive modeling, e.g,
cf. daeilkim.com ; import bnpy
modeling your audiencebit.ly/Hughes-Kim-Sudderth-AISTATS15
modeling your audience(optimization, ultimately)
also allows insight+targeting as inferencemodeling your audience
prescriptive modeling
prescriptive modeling
cf. modelingsocialdata.org
prescriptive modeling
aka “A/B testing”;RCT
cf. modelingsocialdata.org
prescriptive modeling, e.g,
prescriptive modeling, e.g,
prescriptive modeling, e.g,
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
common requirements in data science:
common requirements in data science:
1. people2. ideas3. things
cf. John Boyd, USAF
data science: ideas
data skills
data science and…
- data engineering- data embeds- data product- data multiliteracies
cf. “data scientists at work”, ch 1
data science: ideas
- new mindset > new toolset
data science: people
thanks to the data science team!
data science @ The New York Times
[email protected]@nytimes.com@chrishwiggins
references: bit.ly/brown-refs