For Interactive
Data Science Collaboration
CineGrid December 10, 2015
HELLO
CAROL WILLING
➤ Python Software Foundation, Director
➤ Project Jupyter, Contributor
➤ Fab Lab San Diego, Geek in Residence
WRITER
MANAGER AND
ANALYST
ENGINEER
ARTIST
TEACHER
WONDER AND
CURIOSITY
PROJECT JUPYTERJust the Facts
JUPYTER NOTEBOOK
The Notebook: “Literate Computing”
Computational Narratives
❖ Computers deal with code and data.
❖ Humans deal with narratives that communicate.
Literate Computing (not Literate Programming)
narratives anchored in a live computation, that communicate a story based on data and results.
Cf: Mathematica, Maple, MuPad, Sage…
“Project Jupyter serves not only the academic and scientific communities but also a much broader constituency of data scientists in research, education, industry and journalism…
- Fernando Pérez UC Berkeley
“…we see uses of our tools that range from high school education in programming to the nation’s supercomputing facilities and the leaders of the tech industry.
- Fernando Pérez UC Berkeley
“More than a million people are currently using Jupyter for everything from…
-Prof. Brian Granger Cal Poly
“…analyzing massive gene sequencing datasets to processing images from the Hubble Space Telescope and developing models of financial markets.
-Prof. Brian Granger Cal Poly
“We are excited by the potential of Project Jupyter to reach even wider audiences and to contribute to increased cross-disciplinary collaboration in the sciences.
-Betsy Fader Helmsley Charitable Trust
“Jupyter Notebook… will enable data exploration, visualization, and analysis in a way that encourages sound science and speeds progress.
-Chris Mentzel The Gordon and Betty Moore Foundation
DATA CHALLENGES Constraints or Opportunities?
SCALE
SPEED
CHOICES
CONNECTIONS
OPPORTUNITIESUse our strengths
–Hamming'62
“The purpose of computing is insight, not numbers”
The Lifecycle of a Scientific Idea (schematically)
1. Individual exploratory work
2. Collaborative development
3. Parallel production runs (HPC, cloud, ...)
4. Publication & communication (reproducibly!)
5. Education
6. Goto 1.
JUPYTERHUBand Project Jupyter ecosystem
EDUCATION
nbviewer: seamless notebook sharing
❖ Zero-install reading of notebooks
❖ Just share a URL
❖ nbviewer.ipython.org
Executable books
❖ Springer hardcover book
❖ Chapters: IPython Notebooks
❖ Posted as a blog entry
❖ All available as a Github repo
Python for Signal Processing, by José Unpingco
University Courses
These are just some we are aware of!
A collaborative MOOC on OpenEdX
http://lorenabarba.com/news/announcing-practical-numerical-methods-with-python-mooc
❖ Lorena Barba at George Washington University, USA.
❖ Ian Hawke at Southampton, UK❖ Carlos Jerez at Pontifical Catholic
University of Chile.❖ All materials on Gihtub.
Changing the scientific culture
http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261
Executable papers: the future?
http://www.nature.com/news/ipython-interactive-demo-7.21492?article=1.16261
Notebook Workflows: The Big Picture
Image credit: Joshua Barratt
Lots more! The IPython Gallery
https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks
GOVERNMENT
Shreyas Cholia & !Oliver Ruebel!NERSC Data & Analytics Services Group!Jupyterhub Day, July 17 2015
Jupyterhub at NERSC and OpenMSI
NERSC is the Production HPC & Data Facility for DOE Office of Science Research
Bio$Energy,$$Environment$ Compu2ng$ Materials,$Chemistry,$$Geophysics$
Par2cle$Physics,$Astrophysics$
Largest$funder$of$physical$science$research$in$U.S.$$
Nuclear$Physics$ Fusion$Energy,$Plasma$Physics$
D$2$D$
ART
BUSINESS
Quantopian: algorithmic trading
Karen RubinDir. Product Management
at Quantopian
Quantopian Research Post Fortune.com
Microsoft: Python Tools for Visual Studio
Shahrokh Mortazavi, Dino Viehland, Wenming Ye, Dennis Gannon.
Microsoft Azure: Notebooks in the Cloud
Google CoLaboratoryKayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google
Matt Turk @ NCSA/UIUC
IBM Watson
SCIENCE
JupyterHub: multiuser support
❖ Out of the box
❖ Unix accounts
❖ Local single-user notebooks
❖ Customizable
❖ Authentication: OAuth, LDAP, etc.
❖ Subprocess control: Docker, VMs, etc.
JupyterHub in Education @ Berkeley
https://developer.rackspace.com/blog/deploying-jupyterhub-for-education
❖ Computationally intensive course, ~220 students
❖ Fully hosted environment, zero-install
❖ Homework management and grading (w B. Granger)
Jess Hamrick @ Cal
K. KelleyRackspace
M. Ragan-KelleyCal
B. GrangerCal Poly
COLLABORATIONWhy?
A ten year journey.
Optimism and hope for the future.
IMAGINE THE POSSIBILITIES
TRY.JUPYTER.ORG
WE’RE OPEN FOR YOU.
THANK YOUtry.jupyter.org
www.jupyter.org
numfocus.org ipython.org
CREDITS AND ATTRIBUTION
➤ Sources ➤ Jupyter website www.jupyter.org [11, 31, 65, 66, 69]
➤ Fernando Pérez [12, 28, 29, 33-40, 48-52, 53-55] http://fperez.org/ BIDS http://bids.berkeley.edu/
➤ Cal Poly and UC Berkeley Press Releases http://calpolynews.calpoly.edu/news_releases/2015/July/jupyter.html, http://bids.berkeley.edu/news/project-jupyter-gets-6m-expand-collaborative-data-science-software [14-19]
➤ Jupyterhub at NERSC and OpenMSI, S. Cholla and O. Ruebel, Jupyterhub Day presentation, July 17, 2015 [42, 43]
➤ music21 website http://web.mit.edu/music21/ [45]
➤ Jeremy Freeman http://jeremyfreeman.net/ PyData Talk NYC Winter 2015 https://github.com/freeman-lab/talk-nyc-winter-2015 [56, 57, 58]
➤ CodeNeuro website http://codeneuro.org/ [59-60]
➤ Binder website http://mybinder.org/ [61]
➤ Images ➤ [2, 10, 21, 27, 30, 62, 64] Galaxy
➤ [23] Hummingbird https://flic.kr/p/mo5pa1
➤ [25] Netflix Prize Christopher Hefele https://flic.kr/p/6LWT6K
➤ [3-7, 8 (artwork FabLab interns), 9, 20, 22, 24, 26, 42, 43, 46, 57, 63] Carol Willing. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
➤ For additional information ➤ Jupyter www.jupyter.org
➤ Python Software Foundation www.python.org
➤ Carol Willing, [email protected], @willingcarol, GitHub: willingc