continuum analytics and python

94
Continuum Analytics and Python Travis Oliphant CEO, Co-Founder Continuum Analytics

Upload: travis-oliphant

Post on 17-Jan-2017

2.283 views

Category:

Technology


10 download

TRANSCRIPT

Page 1: Continuum Analytics and Python

Continuum Analytics and Python

Travis Oliphant CEO, Co-Founder Continuum Analytics

Page 2: Continuum Analytics and Python

ABOUT CONTINUUM

2

Page 3: Continuum Analytics and Python

Travis Oliphant - CEO

3

• PhD 2001 from Mayo Clinic in Biomedical Engineering

• MS/BS degrees in Elec. Comp. Engineering • Creator of SciPy (1999-2009) • Professor at BYU (2001-2007) • Author of NumPy (2005-2012) • Started Numba (2012) • Founding Chair of Numfocus / PyData • Previous PSF Director

SciPy

Page 4: Continuum Analytics and Python

Started as a Scientist / Engineer

4

Images from BYU CERS Lab

Page 5: Continuum Analytics and Python

Science led to Python

5

Raja Muthupillai

Armando Manduca

Richard Ehman Jim Greenleaf

1997

⇢0 (2⇡f)2 Ui (a, f) = [Cijkl (a, f)Uk,l (a, f)],j ⌅ = r⇥U

Page 6: Continuum Analytics and Python

“Distractions” led to my calling

6

Page 7: Continuum Analytics and Python

7

Data Science

Page 8: Continuum Analytics and Python

8

• Data volume is growing exponentially within companies. Most don't know how to harvest its value or how to really compute on it.

• Growing mess of tools, databases, and products. New products increase integration headaches, instead of simplifying.

• New hardware & architectures are tempting, but are avoided or sit idle because of software challenges.

• Promises of the "Big Red Solve" button continue to disappoint.(If someone can give you this button, they are your competitor.)

Data Scientist Challenges

Page 9: Continuum Analytics and Python

Our Solution

9

• A language-based platform is needed. No simple point-and-click app is enough to solve these business problems, which require advanced modeling and data exploration.

• That language must be powerful, yet still be accessible to domain experts and subject matter experts.

• That language must leverage the growing capability and rapid innovation in open-source.

Anaconda Platform: Enterprise platform for Data Exploration, Advanced Analytics, and Rapid Application Development and Deployment (RADD)

Harnesses the exploding popularity of the Python ecosystem that our principals helped create.

Page 10: Continuum Analytics and Python

Why Python?

10

Analyst

• Uses graphical tools • Can call functions,

cut & paste code• Can change some

variables

Gets paid for: Insight

Excel, VB, Tableau,

Analyst / Data Developer

• Builds simple apps & workflows• Used to be "just an analyst" • Likes coding to solve problems• Doesn't want to be a "full-time

programmer"

Gets paid (like a rock star) for: Code that produces insight

SAS, R, Matlab,

Programmer

• Creates frameworks & compilers

• Uses IDEs • Degree in CompSci• Knows multiple

languages

Gets paid for: Code

C, C++, Java, JS,

Python Python Python

Page 11: Continuum Analytics and Python

Python Is Sweeping Education

11

Page 12: Continuum Analytics and Python

Tools used for Data

12

Source: O’Reilly Strata attendee survey 2012 and 2013

Page 13: Continuum Analytics and Python

Python for Data Science

13

http://readwrite.com/2013/11/25/python-displacing-r-as-the-programming-language-for-data-science

Page 14: Continuum Analytics and Python

Python is the top language in schools!

14

Page 15: Continuum Analytics and Python

OUR CUSTOMERS & OUR MARKET

15

Page 16: Continuum Analytics and Python

Some Users

16

Page 17: Continuum Analytics and Python

Anaconda: Game-changing Python distribution

17

"Hands down, the best Scientific Python distribution products for analytics and machine learning."

"Thanks for such a great product, been dabbling in python for years (2002), always knew it was going to be massive as it hit the sweet spot in so many ways, with llvm-numba its magic!!!"

"It was quick and easy and had everything included, even a decent IDE. I'm so grateful it brings tears to my eyes."

"I love Anaconda more than the sun, moon, and stars…"

Page 18: Continuum Analytics and Python

Anaconda: Game-changing Python distribution

18

• 2 million downloads in last 2 years • 200k / month and growing • conda package manager serves up 5 million packages

per month • Recommended installer for IPython/Jupyter, Pandas,

SciPy, Scikit-learn, etc.

Page 19: Continuum Analytics and Python

Conferences & Community

19

• PyData: London, Berlin, New York, Bay Area, Seattle • Strata: Tutorials, PyData Track • PyCon, Scipy, EuroScipy, EuroPython, PyCon Brasil... • Spark Summit • JSM, SIAM, IEEE Vis, ICML, ODSC, SuperComputing

Page 20: Continuum Analytics and Python

Observations & Off-the-record

20

• Hype is out of whack with reality • Dashboards are "old school" BI, but still important for

narrative/confirmatory analysis • Agility of data engineering and exploration is critical

• Get POCs out faster; iterate faster on existing things • Need cutting-edge tools, but production is hard

• Notebooks, reproducibility, provenance - all matter

Page 21: Continuum Analytics and Python

21http://tuulos.github.io/sf-python-meetup-sep-2013/#/

Page 22: Continuum Analytics and Python

Data Science Platforms

22

• All-in-one new "platform" startups are walled gardens • Cloud vendor native capabilities are all about lock-in:

"Warehouse all your data here!" • Machine Learning and Advanced Analytics is too early,

and disrupting too fast, to place bets on any single walled garden. • Especially since most have no experience with

"exotic" regulatory and security requirements

Page 23: Continuum Analytics and Python

Good News

23

You can have a modern, advanced analytics system that integrates well with your infrastructure

Bad NewsIt's not available as a SKU from any vendor.

META-PLATFORM CONSTRUCTION KIT

Page 24: Continuum Analytics and Python

Great News

24

• If done well, it adds deep, fundamental business capability • Many wall street banks and firms using this.

• All major Silicon Valley companies know this • Facebook, LinkedIn, Uber, Tesla, SpaceX, Netflix, ...

Page 25: Continuum Analytics and Python

EXAMPLE PROJECTS

25

Page 26: Continuum Analytics and Python

26

Bitcoin Dataset Bitcoin is a digital currency invented in 2008 and operates on a peer-to-peer system for transaction validation. This

decentralized currency is an attempt to mimic physical currencies in that there is limited supply of Bitcoins in the world,

each Bitcoin must be “mined”, and each transaction can be verified for authenticity. Bitcoins are used to exchange

every day goods and services, but it also has known ties to black markets, illicit drugs, and illegal gambling transactions.

The dataset is also very inclined towards anonymization of behavior, though true anonymization is rarely achieved.

The Bitcoin Dataset The Bitcoin dataset was obtained from http://compbio.cs.uic.edu/data/bitcoin/ and captures transaction-level

information. For each transaction, there can be multiple senders and multiple receivers as detailed here:

https://en.bitcoin.it/wiki/Transactions. This dataset provides a challenge in that multiple addresses are usually

associated with a single entity or person. However, some initial work has been done to associated keys with a single

user by looking at transactions that are associated with each other (for example, if a transaction has multiple public keys

as input on a single transaction, then a single user owns both private keys). The dataset provided provides these known

associations by grouping these addresses together under a single UserId (which then maps to a set of all associated

addresses).

Key Challenge Questions

1. Can we detect bulk Bitcoin thefts by hackers? Can we track where the money went after thefts? 2. Can we detect illicit transactions based on Bitcoin transaction behavior? What sort of graph patterns emerge? 3. Can we detect attempts at money laundering (called a “mixing service” in Bitcoin)

a. Can we detect money laundering attempts and the people who use them? Note: Current Bitcoin mixing services tend to mix Bitcoins amongst all the people who bother to use a mixing service – so does the mixing service actually obfuscate anything? b. Can we trace back the originator of these laundering attempts?

4. Can we detect currency manipulation (hackers try to destabilize Bitcoin currency exchanges to deflate prices) 5. Is Bitcoin gaining traction or losing traction among the regular population for use as a regular digital currency? 6. It is Bitcoin best practice to generate and use a new address with every transaction. Is this practice followed? If not, then what can we learn from this? 7. Can we identify and extract organizational behavior amidst the Bitcoin transactions? 8. Can we determine which Bitcoin addresses belong to a single entity? While the initial pass over the data have yielded some resolution of entities, can we further improve this mapping?

# Transactions: 15.8 Million+ # Edges: 37.4 Million + # Senders: 5.4 Million+ # Receivers: 6.3 Million+ # Bitcoins Transacted: 1.4 Million +

Bitcoin Data Set Overview (May 15, 2013)

Figure 1: Bitcoin Transactions Over Time

Bitcoin Blockchain

Page 27: Continuum Analytics and Python

Microcap Stock Fraud

27

Page 28: Continuum Analytics and Python

Memex Dark/Deep Web Analytics

28

Page 29: Continuum Analytics and Python

TECH & DEMOS

29

Page 30: Continuum Analytics and Python

Data Science @ NYT

30

@jakevdp

eSciences Institute, Univ. Washington

Page 31: Continuum Analytics and Python

31

conda cross-platform, multi-language package & container tool

bokeh interactive web plotting for Python, R; no JS/HTML required

numba JIT compiler for Python & NumPy, using LLVM, supports GPU

blaze deconvolve data, expression, and computation; data-web

dask lightweight, fast, Pythonic scheduler for medium data

xray easily handle heterogeneously-shaped dense arrays

holoviews slice & view dense cubes of data in the browser

seaborn easy, beautiful, powerful statistical plotting

beaker polyglot alternative Notebook-like project

Page 32: Continuum Analytics and Python

32

• Databricks Canvas • Graphlab Create • Zeppelin • Beaker • Microsoft AzureML • Domino • Rodeo? Sense? • H2O, DataRobot, ...

Notebooks Becoming Table Stakes

Page 33: Continuum Analytics and Python

33

Page 34: Continuum Analytics and Python

34

"With  more  than  200,000  Jupyter  notebooks  already  on  GitHub  we're  excited  to  level-­‐up  the  GitHub-­‐Jupyter  experience."

Page 35: Continuum Analytics and Python

Anaconda

35

✦ Centralized analytics environment • browser-based interface • deploys on existing infrastructure

✦ Collaboration • cross-functional teams using same data and software

✦ Publishing • code • data • visualizations

Page 36: Continuum Analytics and Python

Bokeh

36

http://bokeh.pydata.org

• Interactive visualization • Novel graphics • Streaming, dynamic, large data • For the browser, with or without a server • No need to write Javascript

Page 37: Continuum Analytics and Python

Versatile Plots

37

Page 38: Continuum Analytics and Python

Novel Graphics

38

Page 39: Continuum Analytics and Python

Previous: Javascript code generation

39

server.py Browser

js_str = """ <d3.js><highchart.js><etc.js>"""

plot.js.template

App Model

D3highchartsflotcrossfilteretc. ...

One-shot; no MVC interaction; no data streaming

HTML

Page 40: Continuum Analytics and Python

bokeh.py & bokeh.js

40

server.py BrowserApp Model

BokehJS object graph

bokeh-serverbokeh.pyobject graph

JSON

Page 41: Continuum Analytics and Python

41

Page 42: Continuum Analytics and Python

42

4GB Interactive Web Viz

Page 43: Continuum Analytics and Python

rBokeh

43http://hafen.github.io/rbokeh

Page 44: Continuum Analytics and Python

44

Page 47: Continuum Analytics and Python

47

hBp://nbviewer.ipython.org/github/bokeh/bokeh-­‐notebooks/blob/master/tutorial/00  -­‐  intro.ipynb#InteracHon  

Page 48: Continuum Analytics and Python

Additional Demos & Topics

48

• Airline flights • Pandas table • Streaming / Animation • Large data rendering

Page 49: Continuum Analytics and Python

49

Latest Cosmological Theory

Page 50: Continuum Analytics and Python

50

Dark Data: CSV, hdf5, npz, logs, emails, and other files in your company outside a traditional data store

Page 51: Continuum Analytics and Python

51

Dark Data: CSV, hdf5, npz, logs, emails, and other files in your company outside a traditional data store

Page 52: Continuum Analytics and Python

52

Database Approach

Data Sources

Data Store

Data Sources

Clients

Page 53: Continuum Analytics and Python

53

Bring the Database to the Data

Data Sources

Data Sources

ClientsBlaze (datashape, dask)

Num

Py, Pandas, SciPy, sklearn, etc.

(for analytics)

Page 54: Continuum Analytics and Python

Anaconda — portable environments

54

PYTHON'&'R'OPEN'SOURCE'ANALYTICS

NumPy, SciPy, Pandas, Scikit=learn, Jupyter / IPython,

Numba, Matplotlib, Spyder, Numexpr, Cython, Theano,

Scikit=image, NLTK, NetworkX, IRKernel, dplyr, shiny,

ggplot2, tidyr, caret, nnet and 330+ packages

conda

Easy to install Quick & agile data exploration Powerful data analysis Simple to collaborate Accessible to all

Page 55: Continuum Analytics and Python

55

• Infrastructure for meta-data, meta-compute, and expression graphs/dataflow • Data glue for scale-up or scale-out • Generic remote computation & query system • (NumPy+Pandas+LINQ+OLAP+PADL).mashup()

Blaze is an extensible interface for data analytics. It feels like NumPy/Pandas. It drives other data systems. Blaze expressions enable high-level reasoning

http://blaze.pydata.org

Blaze

Page 56: Continuum Analytics and Python

56

Blaze

?

Page 57: Continuum Analytics and Python

57

Expressions

Metadata Ru

ntime

Blaze

Page 58: Continuum Analytics and Python

58

Blaze+ - / * ^ []

join, groupby, filtermap, sort, take

where, topk

datashape, dtype,

shape, stride

hdf5, json, csv, xls

protobuf, avro, ...

Num

Py, P

anda

s, R,

Julia

, K, S

QL, S

park

,

Mon

go, C

assa

ndra

, ...

Page 59: Continuum Analytics and Python

59

numpy

pandas

sql DB

Data Runtime Expressions

spark

data

shap

e

metadata storage

odo

paralleloptimized

dask

numbaDyND

blaz

e

castra

bcolz

Page 60: Continuum Analytics and Python

60

Data Runtime

Expressions

metadata

storage/containers

compute

APIs, syntax, language

datashape

blaze

daskodo

parallelize optimize, JIT

Page 61: Continuum Analytics and Python

61

Page 62: Continuum Analytics and Python

62

Blaze ServerProvide  RESTful  web  API  over  any  data  supported  by  Blaze.  

Server  side:  

>>> my_spark_rdd = … >>> from blaze import Server >>> Server(my_spark_rdd).run() Hosting computation on localhost:6363

Client  Side:  

$ curl -H "Content-Type: application/json" \ -d ’{"expr": {"op": "sum", "args": [ ... ] }’ my.domain.com:6363/compute.json

• Quickly share local data to collaborators on the web.

• Expose any system (Mongo, SQL, Spark, in-memory) simply

• Share local computation as well, sending computations to server to run remotely.

• Conveniently drive remote server with interactive Blaze client

Page 63: Continuum Analytics and Python

63

Dask: Out-of-Core Scheduler• A parallel computing framework • That leverages the excellent Python ecosystem • Using blocked algorithms and task scheduling • Written in pure Python

Core Ideas • Dynamic task scheduling yields sane parallelism • Simple library to enable parallelism • Dask.array/dataframe to encapsulate the functionality • Distributed scheduler coming

Page 64: Continuum Analytics and Python

Example: Ocean Temp Data

64

• http://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html

• Every 1/4 degree, 720x1440 array each day

Page 65: Continuum Analytics and Python

Bigger data...

65

36 years: 720 x 1440 x 12341 x 4 = 51 GB uncompressedIf you don't have this much RAM...

... better start chunking.

Page 66: Continuum Analytics and Python

DAG of Computation

66

Page 67: Continuum Analytics and Python

Simple Architecture

67

Page 68: Continuum Analytics and Python

Core Concepts

68

Page 69: Continuum Analytics and Python

dask.array: OOC, parallel, ND array

69

Arithmetic: +, *, ...

Reductions: mean, max, ...

Slicing: x[10:, 100:50:-2]Fancy indexing: x[:, [3, 1, 2]] Some linear algebra: tensordot, qr, svdParallel algorithms (approximate quantiles, topk, ...)

Slightly overlapping arrays

Integration with HDF5

Page 70: Continuum Analytics and Python

dask.dataframe: OOC, parallel, ND array

70

Elementwise operations: df.x + df.yRow-wise selections: df[df.x > 0] Aggregations: df.x.max()groupby-aggregate: df.groupby(df.x).y.max() Value counts: df.x.value_counts()Drop duplicates: df.x.drop_duplicates()Join on index: dd.merge(df1, df2, left_index=True, right_index=True)

Page 71: Continuum Analytics and Python

More Complex Graphs

71

cross validation

Page 72: Continuum Analytics and Python

72

http://continuum.io/blog/xray-dask

Page 73: Continuum Analytics and Python

73

from dask import dataframe as dd columns = ["name", "amenity", "Longitude", "Latitude"] data = dd.read_csv('POIWorld.csv', usecols=columns) with_name = data[data.name.notnull()] with_amenity = data[data.amenity.notnull()] is_starbucks = with_name.name.str.contains('[Ss]tarbucks') is_dunkin = with_name.name.str.contains('[Dd]unkin')

starbucks = with_name[is_starbucks] dunkin = with_name[is_dunkin]

locs = dd.compute(starbucks.Longitude, starbucks.Latitude, dunkin.Longitude, dunkin.Latitude)

# extract arrays of values fro the series: lon_s, lat_s, lon_d, lat_d = [loc.values for loc in locs]

%matplotlib inline import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap

def draw_USA(): """initialize a basemap centered on the continental USA""" plt.figure(figsize=(14, 10)) return Basemap(projection='lcc', resolution='l', llcrnrlon=-119, urcrnrlon=-64, llcrnrlat=22, urcrnrlat=49, lat_1=33, lat_2=45, lon_0=-95, area_thresh=10000)

m = draw_USA() # Draw map background m.fillcontinents(color='white', lake_color='#eeeeee') m.drawstates(color='lightgray') m.drawcoastlines(color='lightgray') m.drawcountries(color='lightgray') m.drawmapboundary(fill_color='#eeeeee')

# Plot the values in Starbucks Green and Dunkin Donuts Orange style = dict(s=5, marker='o', alpha=0.5, zorder=2) m.scatter(lon_s, lat_s, latlon=True, label="Starbucks", color='#00592D', **style) m.scatter(lon_d, lat_d, latlon=True, label="Dunkin' Donuts", color='#FC772A', **style) plt.legend(loc='lower left', frameon=False);

Page 74: Continuum Analytics and Python

74

• Dynamic, just-in-time compiler for Python & NumPy • Uses LLVM • Outputs x86 and GPU (CUDA, HSA) • (Premium version is in Accelerate product)

http://numba.pydata.org

Numba

Page 75: Continuum Analytics and Python

Python Compilation Space

75

Ahead Of Time Just In Time

Relies on CPython / libpython

Cython Shedskin

Nuitka (today) Pythran

NumbaHOPE

Theano

Replaces CPython / libpython

Nuitka (future) Pyston PyPy

Page 76: Continuum Analytics and Python

Example

76

Numba

Page 77: Continuum Analytics and Python

77

@jit('void(f8[:,:],f8[:,:],f8[:,:])') def filter(image, filt, output): M, N = image.shape m, n = filt.shape for i in range(m//2, M-m//2): for j in range(n//2, N-n//2): result = 0.0 for k in range(m): for l in range(n): result += image[i+k-m//2,j+l-n//2]*filt[k, l] output[i,j] = result

~1500x speed-up

Page 78: Continuum Analytics and Python

Features

78

• Windows, OS X, and Linux • 32 and 64-bit x86 CPUs and NVIDIA GPUs • Python 2 and 3 • NumPy versions 1.6 through 1.9 • Does not require a C/C++ compiler on the user’s system. • < 70 MB to install. • Does not replace the standard Python interpreter

(all of your existing Python libraries are still available)

Page 79: Continuum Analytics and Python

How Numba Works

79

Bytecode Analysis

Python Function(bytecode)

Function Arguments

Type Inference

Numba IR

Rewrite IR

Lowering

LLVM IRLLVM JITMachine Code

@jitdef do_math(a,b): …>>> do_math(x, y)

Cache

Execute!

Page 80: Continuum Analytics and Python

THE ANACONDA PLATFORM

80

Page 81: Continuum Analytics and Python

Anaconda — portable environments

81

PYTHON'&'R'OPEN'SOURCE'ANALYTICS

NumPy, SciPy, Pandas, Scikit=learn, Jupyter / IPython,

Numba, Matplotlib, Spyder, Numexpr, Cython, Theano,

Scikit=image, NLTK, NetworkX, IRKernel, dplyr, shiny,

ggplot2, tidyr, caret, nnet and 330+ packages

conda

Easy to install Quick & agile data exploration Powerful data analysis Simple to collaborate Accessible to all

Page 82: Continuum Analytics and Python

82

• cross platform package manager • can create sandboxes ("environments"), akin to

Windows Portable Applications or WinSxS • "un-container" for deploying data science/data

processing workflows

http://conda.pydata.org

Conda

Page 83: Continuum Analytics and Python

System Package Managers

83

yum (rpm)

apt-get (dpkg)

Linux OSX

macports

homebrew

fink

Windows

chocolatey

npackd

Cross-platform

conda

Page 84: Continuum Analytics and Python

84

• Excellent support for “system-level” environments — like having mini VMs but much lighter weight than docker (micro containers)

• Minimizes code-copies (uses hard/soft links if possible) • Simple format: binary tar-ball + metadata • Metadata allows static analysis of dependencies • Easy to create multiple “channels” which are repositories for packages • User installable (no root privileges needed) • Integrates very well with pip • Cross Platform

Conda features

Page 85: Continuum Analytics and Python

Anaconda Cloud: analytics repository

85

• Commercial long-term support • Licensed for redistribution • Private, on-premises available • Proprietary tools for building custom

distribution, like Anaconda • Enterprise tools for managing custom

packages and environments • http://anaconda.org

Page 86: Continuum Analytics and Python

Anaconda Cluster: Anaconda + Hadoop + Spark

86

For data scientists: • Rapidly, easily create clusters on EC2, DigitalOcean, on-prem cloud/provisioner • Manage Python, R, Java, JS packages across the cluster

For operations & IT: • Robustly manage runtime state across the cluster

• Outside the scope of rpm, chef, puppet, etc. • Isolate/sandbox packages & libraries for different jobs or groups of users

• Without introducing complexity of Docker / virtualization • Cross platform - same tooling for laptops, workstations, servers, clusters

Page 87: Continuum Analytics and Python

Cluster Creation

87

$ conda cluster create mycluster --profile=spark_profile$ conda cluster submit mycluster mycode.py

$ conda cluster destroy mycluster

spark_profile: provider: aws_east num_nodes: 4 node_id: ami-3c994355 node_type: m1.large

aws_east: secret_id: <aws_access_key_id> secret_key: <aws_secret_access_key> keyname: id_rsa.pub location: us-east-1 private_key: ~/.ssh/id_rsa cloud_provider: ec2 security_group: all-open

http://continuumio.github.io/conda-cluster/quickstart.html

Page 88: Continuum Analytics and Python

88

$ conda cluster manage mycluster list... info -e... install python=3 pandas flask... set_env... push_env <local> <remote>

$ conda cluster ssh mycluster$ conda cluster run.cmd mycluster "cat /etc/hosts"

Package & environment management:

Easy SSH & remote commands:

http://continuumio.github.io/conda-cluster/manage.html

Cluster Management

Page 89: Continuum Analytics and Python

Anaconda Cluster & Spark

89

# example.pyconf = SparkConf()conf.setMaster("yarn-client")conf.setAppName("MY APP")sc = SparkContext(conf=conf)# analysissc.parallelize(range(1000)).map(lambda x: (x, x % 2)).take(10)

$ conda cluster submit MY_CLUSTER /path/to/example.py

Page 90: Continuum Analytics and Python

Python & Spark in Practice

90

Challenges of real-world usage • Package management (perennial popular topic in Python) • Python (& R) are outside the "normal" Java build toolchain • bash scripts, spark jobs to pip install or conda install <foo> • Kind of OK for batch; terrible for interactive

• Rapid iteration • Production vs dev/test clusters • Data scientist needs vs Ops/IT concerns

Page 91: Continuum Analytics and Python

Fix it twice…

91

PEP 3118: Revising the buffer protocol

Basically the “structure” of NumPy arrays as a protocol in Python itself to establish a

memory-sharing standard between objects.

It makes it possible for a heterogeneous world of powerful array-like objects outside of NumPy that

communicate.

Falls short in not defining a general data description language (DDL).

http://python.org/dev/peps/pep-3118/

Page 92: Continuum Analytics and Python

Putting Rest of NumPy in Std Lib

92

• memtype• dtype system on memory-views• extensible with Numba and C• extensible with Python

• gufunc• generalized low-level function dispatch on memtype• extensible with Numba and C • usable by any Python

Working on now with a (small) team — could use funding

Page 93: Continuum Analytics and Python

93

• Python has had a long and fruitful history in Data Analytics • It will have a long and bright future with your help! • Contribute to the PyData community and make the world a

better place!

The Future of Python

Page 94: Continuum Analytics and Python

© 2015 Continuum Analytics- Confidential & Proprietary

Thanks

October1, 2015

•SIG for hosting tonight and inviting me to come •DARPA XDATA program (Chris White and Wade Shen) which helped fund Numba, Blaze, Dask and Odo.

•Investors of Continuum. •Clients and Customers of Continuum who help support these projects. •Numfocus volunteers •PyData volunteers