jupyterhub for interactive data science collaboration

69
For Interactive Data Science Collaboration CineGrid December 10, 2015

Upload: carol-willing

Post on 19-Jan-2017

205 views

Category:

Software


4 download

TRANSCRIPT

Page 1: JupyterHub for Interactive Data Science Collaboration

For Interactive

Data Science Collaboration

CineGrid December 10, 2015

Page 2: JupyterHub for Interactive Data Science Collaboration

HELLO

Page 3: JupyterHub for Interactive Data Science Collaboration

CAROL WILLING

➤ Python Software Foundation, Director

➤ Project Jupyter, Contributor

➤ Fab Lab San Diego, Geek in Residence

Page 4: JupyterHub for Interactive Data Science Collaboration

WRITER

Page 5: JupyterHub for Interactive Data Science Collaboration

MANAGER AND

ANALYST

Page 6: JupyterHub for Interactive Data Science Collaboration

ENGINEER

Page 7: JupyterHub for Interactive Data Science Collaboration

ARTIST

Page 8: JupyterHub for Interactive Data Science Collaboration

TEACHER

Page 9: JupyterHub for Interactive Data Science Collaboration

WONDER AND

CURIOSITY

Page 10: JupyterHub for Interactive Data Science Collaboration

PROJECT JUPYTERJust the Facts

Page 11: JupyterHub for Interactive Data Science Collaboration

JUPYTER NOTEBOOK

Page 12: JupyterHub for Interactive Data Science Collaboration

The Notebook: “Literate Computing”

Computational Narratives

❖ Computers deal with code and data.

❖ Humans deal with narratives that communicate.

Literate Computing (not Literate Programming)

narratives anchored in a live computation, that communicate a story based on data and results.

Cf: Mathematica, Maple, MuPad, Sage…

Page 13: JupyterHub for Interactive Data Science Collaboration
Page 14: JupyterHub for Interactive Data Science Collaboration

“Project Jupyter serves not only the academic and scientific communities but also a much broader constituency of data scientists in research, education, industry and journalism…

- Fernando Pérez UC Berkeley

Page 15: JupyterHub for Interactive Data Science Collaboration

“…we see uses of our tools that range from high school education in programming to the nation’s supercomputing facilities and the leaders of the tech industry.

- Fernando Pérez UC Berkeley

Page 16: JupyterHub for Interactive Data Science Collaboration

“More than a million people are currently using Jupyter for everything from…

-Prof. Brian Granger Cal Poly

Page 17: JupyterHub for Interactive Data Science Collaboration

“…analyzing massive gene sequencing datasets to processing images from the Hubble Space Telescope and developing models of financial markets.

-Prof. Brian Granger Cal Poly

Page 18: JupyterHub for Interactive Data Science Collaboration

“We are excited by the potential of Project Jupyter to reach even wider audiences and to contribute to increased cross-disciplinary collaboration in the sciences.

-Betsy Fader Helmsley Charitable Trust

Page 19: JupyterHub for Interactive Data Science Collaboration

“Jupyter Notebook… will enable data exploration, visualization, and analysis in a way that encourages sound science and speeds progress.

-Chris Mentzel The Gordon and Betty Moore Foundation

Page 20: JupyterHub for Interactive Data Science Collaboration
Page 21: JupyterHub for Interactive Data Science Collaboration

DATA CHALLENGES Constraints or Opportunities?

Page 22: JupyterHub for Interactive Data Science Collaboration

SCALE

Page 23: JupyterHub for Interactive Data Science Collaboration

SPEED

Page 24: JupyterHub for Interactive Data Science Collaboration

CHOICES

Page 25: JupyterHub for Interactive Data Science Collaboration

CONNECTIONS

Page 26: JupyterHub for Interactive Data Science Collaboration
Page 27: JupyterHub for Interactive Data Science Collaboration

OPPORTUNITIESUse our strengths

Page 28: JupyterHub for Interactive Data Science Collaboration

–Hamming'62

“The purpose of computing is insight, not numbers”

Page 29: JupyterHub for Interactive Data Science Collaboration

The Lifecycle of a Scientific Idea (schematically)

1. Individual exploratory work

2. Collaborative development

3. Parallel production runs (HPC, cloud, ...)

4. Publication & communication (reproducibly!)

5. Education

6. Goto 1.

Page 30: JupyterHub for Interactive Data Science Collaboration

JUPYTERHUBand Project Jupyter ecosystem

Page 31: JupyterHub for Interactive Data Science Collaboration
Page 32: JupyterHub for Interactive Data Science Collaboration

EDUCATION

Page 33: JupyterHub for Interactive Data Science Collaboration

nbviewer: seamless notebook sharing

❖ Zero-install reading of notebooks

❖ Just share a URL

❖ nbviewer.ipython.org

Page 34: JupyterHub for Interactive Data Science Collaboration

Executable books

❖ Springer hardcover book

❖ Chapters: IPython Notebooks

❖ Posted as a blog entry

❖ All available as a Github repo

Python for Signal Processing, by José Unpingco

Page 35: JupyterHub for Interactive Data Science Collaboration

University Courses

These are just some we are aware of!

Page 36: JupyterHub for Interactive Data Science Collaboration

A collaborative MOOC on OpenEdX

http://lorenabarba.com/news/announcing-practical-numerical-methods-with-python-mooc

❖ Lorena Barba at George Washington University, USA.

❖ Ian Hawke at Southampton, UK❖ Carlos Jerez at Pontifical Catholic

University of Chile.❖ All materials on Gihtub.

Page 37: JupyterHub for Interactive Data Science Collaboration

Changing the scientific culture

http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261

Page 38: JupyterHub for Interactive Data Science Collaboration

Executable papers: the future?

http://www.nature.com/news/ipython-interactive-demo-7.21492?article=1.16261

Page 39: JupyterHub for Interactive Data Science Collaboration

Notebook Workflows: The Big Picture

Image credit: Joshua Barratt

Page 40: JupyterHub for Interactive Data Science Collaboration

Lots more! The IPython Gallery

https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks

Page 41: JupyterHub for Interactive Data Science Collaboration

GOVERNMENT

Page 42: JupyterHub for Interactive Data Science Collaboration

Shreyas Cholia & !Oliver Ruebel!NERSC Data & Analytics Services Group!Jupyterhub Day, July 17 2015

Jupyterhub at NERSC and OpenMSI

Page 43: JupyterHub for Interactive Data Science Collaboration

NERSC is the Production HPC & Data Facility for DOE Office of Science Research

Bio$Energy,$$Environment$ Compu2ng$ Materials,$Chemistry,$$Geophysics$

Par2cle$Physics,$Astrophysics$

Largest$funder$of$physical$science$research$in$U.S.$$

Nuclear$Physics$ Fusion$Energy,$Plasma$Physics$

D$2$D$

Page 44: JupyterHub for Interactive Data Science Collaboration

ART

Page 45: JupyterHub for Interactive Data Science Collaboration
Page 46: JupyterHub for Interactive Data Science Collaboration
Page 47: JupyterHub for Interactive Data Science Collaboration

BUSINESS

Page 48: JupyterHub for Interactive Data Science Collaboration

Quantopian: algorithmic trading

Karen RubinDir. Product Management

at Quantopian

Quantopian Research Post Fortune.com

Page 49: JupyterHub for Interactive Data Science Collaboration

Microsoft: Python Tools for Visual Studio

Shahrokh Mortazavi, Dino Viehland, Wenming Ye, Dennis Gannon.

Page 50: JupyterHub for Interactive Data Science Collaboration

Microsoft Azure: Notebooks in the Cloud

Page 51: JupyterHub for Interactive Data Science Collaboration

Google CoLaboratoryKayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google

Matt Turk @ NCSA/UIUC

Page 52: JupyterHub for Interactive Data Science Collaboration

IBM Watson

Page 53: JupyterHub for Interactive Data Science Collaboration

SCIENCE

Page 54: JupyterHub for Interactive Data Science Collaboration

JupyterHub: multiuser support

❖ Out of the box

❖ Unix accounts

❖ Local single-user notebooks

❖ Customizable

❖ Authentication: OAuth, LDAP, etc.

❖ Subprocess control: Docker, VMs, etc.

Page 55: JupyterHub for Interactive Data Science Collaboration

JupyterHub in Education @ Berkeley

https://developer.rackspace.com/blog/deploying-jupyterhub-for-education

❖ Computationally intensive course, ~220 students

❖ Fully hosted environment, zero-install

❖ Homework management and grading (w B. Granger)

Jess Hamrick @ Cal

K. KelleyRackspace

M. Ragan-KelleyCal

B. GrangerCal Poly

Page 56: JupyterHub for Interactive Data Science Collaboration
Page 57: JupyterHub for Interactive Data Science Collaboration
Page 58: JupyterHub for Interactive Data Science Collaboration
Page 59: JupyterHub for Interactive Data Science Collaboration
Page 60: JupyterHub for Interactive Data Science Collaboration
Page 61: JupyterHub for Interactive Data Science Collaboration
Page 62: JupyterHub for Interactive Data Science Collaboration

COLLABORATIONWhy?

Page 63: JupyterHub for Interactive Data Science Collaboration

A ten year journey.

Optimism and hope for the future.

Page 64: JupyterHub for Interactive Data Science Collaboration

IMAGINE THE POSSIBILITIES

Page 65: JupyterHub for Interactive Data Science Collaboration

TRY.JUPYTER.ORG

Page 66: JupyterHub for Interactive Data Science Collaboration

WE’RE OPEN FOR YOU.

Page 67: JupyterHub for Interactive Data Science Collaboration

THANK YOUtry.jupyter.org

www.jupyter.org

numfocus.org ipython.org

Page 68: JupyterHub for Interactive Data Science Collaboration
Page 69: JupyterHub for Interactive Data Science Collaboration

CREDITS AND ATTRIBUTION

➤ Sources ➤ Jupyter website www.jupyter.org [11, 31, 65, 66, 69]

➤ Fernando Pérez [12, 28, 29, 33-40, 48-52, 53-55] http://fperez.org/ BIDS http://bids.berkeley.edu/

➤ Cal Poly and UC Berkeley Press Releases http://calpolynews.calpoly.edu/news_releases/2015/July/jupyter.html, http://bids.berkeley.edu/news/project-jupyter-gets-6m-expand-collaborative-data-science-software [14-19]

➤ Jupyterhub at NERSC and OpenMSI, S. Cholla and O. Ruebel, Jupyterhub Day presentation, July 17, 2015 [42, 43]

➤ music21 website http://web.mit.edu/music21/ [45]

➤ Jeremy Freeman http://jeremyfreeman.net/ PyData Talk NYC Winter 2015 https://github.com/freeman-lab/talk-nyc-winter-2015 [56, 57, 58]

➤ CodeNeuro website http://codeneuro.org/ [59-60]

➤ Binder website http://mybinder.org/ [61]

➤ Images ➤ [2, 10, 21, 27, 30, 62, 64] Galaxy

➤ [23] Hummingbird https://flic.kr/p/mo5pa1

➤ [25] Netflix Prize Christopher Hefele https://flic.kr/p/6LWT6K

➤ [3-7, 8 (artwork FabLab interns), 9, 20, 22, 24, 26, 42, 43, 46, 57, 63] Carol Willing. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

➤ For additional information ➤ Jupyter www.jupyter.org

➤ Python Software Foundation www.python.org

➤ Carol Willing, [email protected], @willingcarol, GitHub: willingc