dark data: a data scientists exploration of the unknown by rob witoff pydata sv 2014

128
Dark Data A Data Scientist’s Exploration of the Unknown Rob Witoff Data Scientist IT CTO Office @rwitoff Jet Propulsion Laboratory California Institute of Technology

Upload: pydata

Post on 27-Jan-2015

107 views

Category:

Technology


1 download

DESCRIPTION

Modern Data Science is enabling NASA's engineers uncover actionable information from our "dark" data coffers. From starting small to operating at scale, Rob will discuss applications in telemetry, workforce analytics and liberating data from the Mars Rovers. Tools include iPython, Pandas, Boto and more.

TRANSCRIPT

  • 1. Dark Data A Data Scientists Exploration of the Unknown Rob Witoff ! Data Scientist IT CTO Office @rwitoff Jet Propulsion Laboratory California Institute of Technology

2. NASA Explores our Universe 3. Exploration brings home data a lot of it 4. This talk is about how were enabling _ to uncover and act on our Dark Data 5. The Situation 6. Goldstone, USA Madrid, Spain Canberra, Australia 7. Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony Cloud Computing 8. ! ! ! ! ! 9. 1967 10. Today 11. 209 Data Sources documented ^ 12. Dark Matter ! ! ! 84.5% of the known Universe 13. Dark Data ?? % of our universe 14. Dark Data is the data all around us, that we cant use today.cant use 15. findaccess interpret understand share re-interface process store question experiment cant use 16. cant imagine 17. If this species is to survive indenitely we need to become a multi-planet species. We need to go to Mars, and Mars is a stepping stone to other solar systems. ! NASA Administrator, Charlie Bolden 18. Successful LandingCuriosity: 19. Mars 2020 20. Mars Sample Return 21. And Beyond! 22. Towards a solution 23. After exploring the universe for decades, what unfound discoveries lie in our data? 24. The Data Scientist 25. The Data Scientist Hypothesize Experiment Explore 26. http://www.cgtrader.com/3d-models/character-people/man/nasa-astronaut-apollo Tools Software & Libs World Watching Open Source Gloves Interpreted Languages Camera Data Viz Mission Control Active Communities Vehicles Scalable Storage & Compute 27. one Liberate Dark Data two Enable Engineers 28. Liberate 29. JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 Cloud Computing 30. http://json.jpl.nasa.gov 31. Closed Silos 32. How Do you Liberate New Data? 33. Greedy Solution: Explore For Data Wise Solution: Explore For Problems Best Solution: Explore For Questions 34. Greedy Solution: Explore For Data Find APIs! 35. Greedy Solution: Explore For Data Explore Find Get Excited! Find APIs! miss the point 36. Greedy Solution: Explore For Data Explore :-( Data is a means, not the end Find 37. Wise Solution: Explore For Problems Explore Integrate Increment! Incremental Successes! 38. Wise Solution: Explore For Problems Explore Integrate :-/ Incremental Expectations. 39. Best Solution: Explore For Questions Explore Rapid Prototype Like its 2014 Reset Expectations 40. Hacker News Heatmaps! Hum anity! 41. DataTau ML! Optimization! Hum anL! Lessons! Hum anL! Markov Chains! NoSQL! Sports! 42. Reddit! 43. Know Whats Out There. 44. Solve our Problems Together. 45. Best Solution: Explore For Questions 46. Data What can we do with this brain data? 47. Data Problem How healthy is my lobe? 48. Data Problem Question What if we could see the brain evolve? 49. Simple Sankey Flow Diagram 50. Added more data. 51. Added dimensions. 52. Data ProblemIdea What about the rest of our brain data? 53. what about this? Or this? 54. what about this? Or this? ? 55. People ?Projects 56. People Projects ? Investments Projects People Engineering Science ?Exploration 57. ? Investments Projects People Engineering Science Exploration Calendars Degrees Orgs HR Helpdesk Finance Sentiment? Resumes ecruiting ? Confer 58. ?Engineering Science Exploration Calendars Degrees HR Helpdesk Finance Sentiment? Resumes Recruiting ? Confer Investments Projects People Orgs 59. ? Investments Helpdesk Finance Sentiment? Resumes ecruiting People Engineering Science Exploration Degrees Confer HR Projects Orgs Calendars 60. ? Investments Helpdesk Finance Sentiment? Resumes ecruiting People Engineering Science Exploration Degrees Confer HR Projects Orgs Calendars Connect Your Dots 61. Enable 62. Expertise ! ! ! Data Expertise Data 63. Expertise Data 64. Python+ REPL Remote Browser 65. AWESOME + Python REPL Remote Browser 66. https://github.com/ipython/nbviewer 67. https://github.com/ipython/nbviewer 68. Human Problems Wont be Solved by Root Mean Square Error -Drew Conway 69. Engage 70. Through Visualization 71. Engage before you Answer 72. http://xkcd.com/1356/ 73. Pandas! Vincent Vega D3 74. Outside the Notebook 75. 12k Interesting Files 12k Documents Dynamo Results ReST 76. Making a Difference 77. Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Data Processing Workers EC2 EC2 EC2 EC2 S3 JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony 78. Q What if we could interact with ALL of our Data? 79. QWhat if we were even closer to our data? 80. http://json.jpl.nasa.gov 81. Liberate your Dark Data Enable your Engineers Lets Grow Data Science Together 82. Thank you! @rwitoff