Transcript

data science @ The New York Timesand how a 164-year old content company became data-driven

[email protected]@nytimes.com@chrishwiggins

references: bit.ly/icerm

data science @ The New York Timesand how a 164-year old content company became data-driven

references: bit.ly/icerm

data science @ The New York Timesand how a 164-year old content company became data-driven

references: bit.ly/icerm

data science @ The New York Timesand how a 164-year old content company became data-driven

references: bit.ly/icerm

data science @ The New York Timesand how a 164-year old content company became data-driven

references: bit.ly/icerm

data science @ The New York Timesand how a 164-year old content company became data-driven

references: bit.ly/icerm

data science @ The New York Timesand how a 164-year old content company became data-driven

references: bit.ly/icerm

“data science” jobs, jobs, jobs

references: bit.ly/icerm

“data science” jobs, jobs, jobs

references: bit.ly/icerm

“data science” jobs, jobs, jobs

references: bit.ly/icerm

data science: mindset & toolset

drew conway, 2010

references: bit.ly/icerm

modern history:2009

references: bit.ly/icerm

“data science” blogs, blogs, blogs

references: bit.ly/icerm

“data science” blogs, blogs, blogs

The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.

The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.

references: bit.ly/icerm

“data science” blogs, blogs, blogs

references: bit.ly/icerm

“data science” ancient history: 2001

references: bit.ly/icerm

“data science” ancient history: 2001

references: bit.ly/icerm

data science context

references: bit.ly/icerm

home schooled

references: bit.ly/icerm

PhD in topology

references: bit.ly/icerm

“By the end of late 1945, I was a statistician rather than a topologist”

references: bit.ly/icerm

invented: “bit”

references: bit.ly/icerm

invented: “software”

references: bit.ly/icerm

invented: “FFT”

references: bit.ly/icerm

“the progenitor of data science.” - @mshron

references: bit.ly/icerm

“The Future of Data Analysis,” 1962John W. Tukey

references: bit.ly/icerm

introduces: “Exploratory data anlaysis”

references: bit.ly/icerm

Tukey 1965, via John Chambers

references: bit.ly/icerm

TUKEY BEGAT S WHICH BEGAT R

references: bit.ly/icerm

Tukey 1972

references: bit.ly/icerm

? 1972

references: bit.ly/icerm

Jerome H. Friedman

references: bit.ly/icerm

Tukey 1975

In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information

references: bit.ly/icerm

TUKEY BEGAT VDQI

references: bit.ly/icerm

Tukey 1977

references: bit.ly/icerm

TUKEY BEGAT EDA

references: bit.ly/icerm

fast forward -> 2001

references: bit.ly/icerm

“The primary agents for change should be university departments themselves.”

references: bit.ly/icerm

data science @ The New York Timesand how a 164-year old content company became data-driven

histories

1. in academia -> Bell: as heretical statistics (see also Breiman)

2. in industry: as job description

historical rant: bit.ly/data-rant

data science @ The New York Timesand how a 164-year old content company became data-driven

[email protected]@nytimes.com@chrishwiggins

references: bit.ly/icerm

biology: 1892 vs. 1995

biology changed for good.

references: bit.ly/icerm

genetics: 1837 vs. 2012

ML toolset; data science mindset

references: bit.ly/icerm

genetics: 1837 vs. 2012

references: bit.ly/icerm

genetics: 1837 vs. 2012

ML toolset; data science mindset

arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost

data science: mindset & toolset

references: bit.ly/icerm

1851

references: bit.ly/icerm

news: 20th century

church state

references: bit.ly/icerm

church

references: bit.ly/icerm

church

references: bit.ly/icerm

church

news: 20th century

church state

references: bit.ly/icerm

news: 21st century

church state

engineering

references: bit.ly/icerm

1851 1996

newspapering: 1851 vs. 1996

references: bit.ly/icerm

example:

millions of views per hour2015

references: bit.ly/icerm

data science: the web

references: bit.ly/icerm

data science: the web

is your “online presence”

references: bit.ly/icerm

data science: the web

is a microscope

references: bit.ly/icerm

data science: the web

is an experimental tool

references: bit.ly/icerm

data science: the web

is an optimization tool

references: bit.ly/icerm

1851 1996

newspapering: 1851 vs. 1996 vs. 2008

2008

references: bit.ly/icerm

“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank

references: bit.ly/icerm

every publisher is now a startup

references: bit.ly/icerm

news: 21st century

church state

engineering

references: bit.ly/icerm

news: 21st century

church state

engineering

references: bit.ly/icerm

learnings

references: bit.ly/icerm

learnings

- supervised learning- unsupervised learning- reinforcement learning

references: bit.ly/icerm

learnings

- supervised learning- unsupervised learning- reinforcement learning

cf. modelingsocialdata.org

references: bit.ly/icerm

stats.stackexchange.com

references: bit.ly/icerm

from “are you a bayesian or a frequentist” —michael jordan

L =NX

i=1

' (yif(xi;�)) + �||�||

supervised learning, e.g.,

cf. modelingsocialdata.org

supervised learning, e.g.,

“the funnel”

cf. modelingsocialdata.org

interpretable supervised learning

supe

r co

ol s

tuff

cf. modelingsocialdata.org

interpretable supervised learning

supe

r co

ol s

tuff

cf. modelingsocialdata.org

arxiv.org/abs/q-bio/0701021

optimization & learning, e.g.,

“How The New York Times Works “popular mechanics, 2015

recommendation as supervised learning

unsupervised learning, e.g,

cf. daeilkim.com ; import bnpy

modeling your audiencebit.ly/Hughes-Kim-Sudderth-AISTATS15

modeling your audience(optimization, ultimately)

also allows recommendation as inferencemodeling your audience

Reporting

Learning

Testaka “A/B testing”;

business as usual

(esp. supervised)

Some of the most recognizable personalization in our service is the collection of “genre” rows. …Members connect with these rows so

well that we measure an increase in member retention by placing the most tailored rows higher on the page instead of lower.

cf. modelingsocialdata.org

reinforcement learning: from A/B to….

real-time A/B -> “bandits”

GOOG blog:

cf. modelingsocialdata.org

Reporting

Learning

Test

Optimizing

Exploreunsupervised:

supervised:

reinforcement:

Reporting

Learning

Test

Optimizing

Exploreunsupervised:

supervised:

reinforcement:

common requirements in data science:

common requirements in data science:

1.people2.ideas3.things

cf. USAF

things:what does DS team deliver?

things:what does DS team deliver?

- build data prototypes- build APIs- impact roadmaps

- build data prototypes

- build data prototypes

cf. daeilkim.com

- build data prototypes

cf. daeilkim.com

- in puppet, w/python2.7- collaboration w/pers. team

- build APIs

- impact roadmaps

flickr/McJex

data science: ideas

data skills

- data engineering- data science- data visualization- data product- data multiliteracies- data embeds

cf. “data scientists at work”, ch 1

data skills

- data engineering- data science- data visualization- data product- data multiliteracies- data embeds

cf. “data scientists at work”, ch 1

data science: people

- new mindset > new toolset

summary:pay attention to:

1.people2.ideas3.things

cf. USAF

thanks to the data science team!

data science @ The New York Timesand how a 164-year old content company became data-driven

[email protected]@nytimes.com@chrishwiggins


Top Related