data science apps: beyond notebooks

34
Data Science Apps: Beyond Notebooks Natalino Busa - Head of Data Science

Upload: natalino-busa

Post on 21-Feb-2017

176 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Data Science Apps: Beyond NotebooksNatalino Busa - Head of Data Science

2

Linkedin and Twitter:

@natbusa

3Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY

Learning: The Scientific Method

Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method

observation hypothesis deduction synthesis

Hans Christian Ørsted

experiment

4

Data Scientist Experience

5

CloudTools AI & ML

6

The Jupyter Projecthttp://jupyter.org

7

Jupyter notebook: what is it?

The Jupyter Notebook

8

Jupyter notebook: why?

Language of choice

The Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala.

Share notebooks

Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer.

Interactive widgets

Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime.

Big data integration

Leverage big data tools, such as Apache Spark, from Python, R and Scala. Explore that same data with pandas, scikit-learn, ggplot2, dplyr, etc.

9

Text Cell

Code Cell

Cell Input

Cell Output

Edit, Run, Kernel, Widgets Menu’s

Kernel Type

Cell output: ASCII, HTML, Image. etc

11

Architecture of a Jupyter Notebook

• Modular architecture:

Web App, Server, Kernel

• Kernels:

Python, R, Scala, Julia, Bash, SPARKQL

• Web App:

Asynchronous, rich editing, syntax highlight, export and share

12

Jupyter Notebook

● Narratives and Use Cases

Narratives are collaborative, shareable, publishable, and reproducible. We believe that Narratives help both yourself and other researchers by sharing your use of Jupyter projects, technical specifics of your deployment, and installation and configuration tips so that others can learn from your experiences.

From https://jupyter.readthedocs.io/en/latest/use-cases/content-user.html

13

Jupyter is more than Notebooks

14

Examples of Jupyter powered narratives

●●

15

Orioles: A powerful educational narrative

18

Build your own narrative!

What do you need?

Understand how to communicate to the jupyter server

Two ways: websockets or http api endpoints

Build your own web application

Many ways: e.g. angular, polymer, dart, etc

1

2

19

Example: autoscience demo

Purpose:

- Quick exploration of data sets

- No coding required

- Visual analysis of outliers

24

Jupyter Gateway: expose API endpoints

Declare the endpoint

Produce the JSON payload

GET http://localhost:8800/cog/datasets/1

25

Jupyter Gateway: consume the data

Consume the JSON payload

GET http://localhost:8800/cog/datasets/1

app.controller('datasetCtrl', function ($scope, $routeParams, $http) { var id= $routeParams.id; $http({ method: 'GET', url: '/cog/datasets/'+id }).then(function successCallback(response) { // this callback will be called asynchronously // when the response is available $scope.d = response.data

}, function errorCallback(response) { // called asynchronously if an error occurs // or server returns response with an error status. });

});

26

<div class="row"> <div class="col-md-9 offset-md-2"> <p class="small">{{d.ds.rows}} obs. of {{d.ds.cols}} variables <br/> NA rows:{{d.ds.na.rows}}, columns:{{d.ds.na.cols}}</p> </div></div>

... <tr ng-repeat="v in d.vars"> <td><a href="#/ds/{{d.ds.id}}/variables/{{v.id}}">{{v.name}}</a></td> <td class="small">{{ v.sample.toString() }}</td> <td>{{v.type.vtype}}</td> <td>{{v.type.tcoerce}}</td> <td>{{v.type.unique}}</td> <td>{{v.type.nan}}</td> <td>{{v.type.valid}}</td> <td>{{v.type.quality}}</td>

...

Jupyter Gateway: consume the data $scope.d

Render the angular scope object

28

Jupyter: docker stacks

Docker container:jupyter notebook + apache toree

29

Dockerize your jupyter gateway api

Add the jupyter gateway

FROM jupyter/all-spark-notebook

...

# add some extra packagesADD packages /srv/RUN pip install -r /srv/packages

# install the kernel gatewayRUN pip install jupyter_kernel_gatewayENV JUPYTER_GATEWAY=1

# REST API is designed as notebooksADD notebooks /srv/notebooks

Add the notebook which powers the API

30

Dockerize your jupyter gateway api

IMAGE=autoscience/kernel_gateway

docker build -t $(IMAGE) .

docker run --rm -ti -p 8888:8888 $(IMAGE) \ jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 \ --KernelGatewayApp.port=8888 \ --KernelGatewayApp.api=notebook-http \ --KernelGatewayApp.seed_uri=/srv/notebooks/autoscience.ipynb

31

Dockerize your jupyter gateway api

∅MQ

Notebook files

HTTP REST API

Docker Containers

32

Summary

• Jupyter notebook is a great way to create and share

data-driven uses cases and projects

• Jupyter is more than notebooks

– gateway, kernels, hub, etc

• Narratives powered by jupyter

– O’ Reilly Orioles

– build your own: autoscience example

34

Linkedin and Twitter:

@natbusa