data science history / data science @ nyt

Download data science history / data science @ NYT

Post on 14-Aug-2015

1.063 views

Category:

Engineering

1 download

Embed Size (px)

TRANSCRIPT

  1. 1. data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm
  2. 2. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
  3. 3. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
  4. 4. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
  5. 5. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
  6. 6. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
  7. 7. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
  8. 8. data science jobs, jobs, jobs references: bit.ly/icerm
  9. 9. data science jobs, jobs, jobs references: bit.ly/icerm
  10. 10. data science jobs, jobs, jobs references: bit.ly/icerm
  11. 11. data science: mindset & toolset drew conway, 2010 references: bit.ly/icerm
  12. 12. modern history: 2009 references: bit.ly/icerm
  13. 13. data science blogs, blogs, blogs references: bit.ly/icerm
  14. 14. data science blogs, blogs, blogs The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science. The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science. references: bit.ly/icerm
  15. 15. data science blogs, blogs, blogs references: bit.ly/icerm
  16. 16. data science ancient history: 2001 references: bit.ly/icerm
  17. 17. data science ancient history: 2001 references: bit.ly/icerm
  18. 18. data science context references: bit.ly/icerm
  19. 19. home schooled references: bit.ly/icerm
  20. 20. PhD in topology references: bit.ly/icerm
  21. 21. By the end of late 1945, I was a statistician rather than a topologist references: bit.ly/icerm
  22. 22. invented: bit references: bit.ly/icerm
  23. 23. invented: software references: bit.ly/icerm
  24. 24. invented: FFT references: bit.ly/icerm
  25. 25. the progenitor of data science. - @mshron references: bit.ly/icerm
  26. 26. The Future of Data Analysis, 1962 John W. Tukey references: bit.ly/icerm
  27. 27. introduces: Exploratory data anlaysis references: bit.ly/icerm
  28. 28. Tukey 1965, via John Chambers references: bit.ly/icerm
  29. 29. TUKEY BEGAT S WHICH BEGAT R references: bit.ly/icerm
  30. 30. Tukey 1972 references: bit.ly/icerm
  31. 31. ? 1972 references: bit.ly/icerm
  32. 32. Jerome H. Friedman references: bit.ly/icerm
  33. 33. Tukey 1975 In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the eld of information design). These course materials became the foundation for his rst book on information design, The Visual Display of Quantitative Information references: bit.ly/icerm
  34. 34. TUKEY BEGAT VDQI references: bit.ly/icerm
  35. 35. Tukey 1977 references: bit.ly/icerm
  36. 36. TUKEY BEGAT EDA references: bit.ly/icerm
  37. 37. fast forward -> 2001 references: bit.ly/icerm
  38. 38. The primary agents for change should be university departments themselves. references: bit.ly/icerm
  39. 39. data science @ The New York Times and how a 164-year old content company became data-driven histories 1. in academia -> Bell: as heretical statistics (see also Breiman) 2. in industry: as job description historical rant: bit.ly/data-rant
  40. 40. data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm
  41. 41. biology: 1892 vs. 1995 biology changed for good. references: bit.ly/icerm
  42. 42. genetics: 1837 vs. 2012 ML toolset; data science mindset references: bit.ly/icerm
  43. 43. genetics: 1837 vs. 2012 references: bit.ly/icerm
  44. 44. genetics: 1837 vs. 2012 ML toolset; data science mindset arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
  45. 45. data science: mindset & toolset references: bit.ly/icerm
  46. 46. 1851 references: bit.ly/icerm
  47. 47. news: 20th century church state references: bit.ly/icerm
  48. 48. church references: bit.ly/icerm
  49. 49. church references: bit.ly/icerm
  50. 50. church
  51. 51. news: 20th century church state references: bit.ly/icerm
  52. 52. news: 21st century church state engineering references: bit.ly/icerm
  53. 53. 1851 1996 newspapering: 1851 vs. 1996 references: bit.ly/icerm
  54. 54. example: millions of views per hour2015
  55. 55. references: bit.ly/icerm
  56. 56. data science: the web references: bit.ly/icerm
  57. 57. data science: the web is your online presence references: bit.ly/icerm
  58. 58. data science: the web is a microscope references: bit.ly/icerm
  59. 59. data science: the web is an experimental tool references: bit.ly/icerm
  60. 60. data science: the web is an optimization tool references: bit.ly/icerm
  61. 61. 1851 1996 newspapering: 1851 vs. 1996 vs. 2008 2008 references: bit.ly/icerm
  62. 62. a startup is a temporary organization in search of a repeatable and scalable business model Steve Blank references: bit.ly/icerm
  63. 63. every publisher is now a startup references: bit.ly/icerm
  64. 64. news: 21st century church state engineering references: bit.ly/icerm
  65. 65. news: 21st century church state engineering references: bit.ly/icerm
  66. 66. learnings references: bit.ly/icerm
  67. 67. learnings - supervised learning - unsupervised learning - reinforcement learning references: bit.ly/icerm
  68. 68. learnings - supervised learning - unsupervised learning - reinforcement learning cf. modelingsocialdata.org references: bit.ly/icerm
  69. 69. stats.stackexchange.com references: bit.ly/icerm
  70. 70. from are you a bayesian or a frequentist michael jordan L = NX i=1 ' (yif(xi; )) + || ||
  71. 71. supervised learning, e.g., cf. modelingsocialdata.org
  72. 72. supervised learning, e.g., the funnel cf. modelingsocialdata.org
  73. 73. interpretable supervised learning supercoolstuff cf. modelingsocialdata.org
  74. 74. interpretable supervised learning supercoolstuff cf. modelingsocialdata.org arxiv.org/abs/q-bio/0701021
  75. 75. optimization & learning, e.g., How The New York Times Works popular mechanics, 2015
  76. 76. recommendation as supervised learning
  77. 77. unsupervised learning, e.g, cf. daeilkim.com ; import bnpy
  78. 78. modeling your audience bit.ly/Hughes-Kim-Sudderth-AISTATS15
  79. 79. modeling your audience (optimization, ultimately)
  80. 80. also allows recommendation as inference modeling your audience
  81. 81. Reporting Learning Test aka A/B testing; business as usual (esp. supervised) Some of the most recognizable personalization in our service is the collection of genre rows. Members connect with these rows so well that we measure an increase in member retention by placing the most tailored rows higher on the page instead of lower. cf. modelingsocialdata.org reinforcement learning: from A/B to.
  82. 82. real-time A/B -> bandits GOOG blog: cf. modelingsocialdata.org
  83. 83. Reporting Learning Test Optimizing Exploreunsupervised: supervised: reinforcement:
  84. 84. Reporting Learning Test Optimizing Exploreunsupervised: supervised: reinforcement:
  85. 85. common requirements in data science:
  86. 86. common requirements in data science: 1. people 2. ideas 3. things cf. USAF
  87. 87. things: what does DS team deliver?
  88. 88. things: what does DS team deliver? - build data prototypes - build APIs - impact roadmaps
  89. 89. - build data prototypes
  90. 90. - build data prototypes cf. daeilkim.com
  91. 91. - build data prototypes cf. daeilkim.com
  92. 92. - in puppet, w/python2.7 - collaboration w/pers. team - build APIs
  93. 93. - impact roadmaps flickr/McJex
  94. 94. data science: ideas
  95. 95. data skills - data engineering - data science - data visualization - data product - data multiliteracies - data embeds cf. data scientists at work, ch 1
  96. 96. data skills - data engineering - data science - data visualization - data product - data multiliteracies - data embeds cf. data scientists at work, ch 1
  97. 97. data science: people - new mindset > new toolset
  98. 98. summary: pay attention to: 1. people 2. ideas 3. things cf. USAF
  99. 99. thanks to the data science team!
  100. 100. data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins