numba: flexible analytics written in python with machine-code speeds and avoiding the gil- travis...

Numba: Flexible analytics written in PythonWith machine code speeds while potentially releasing the GIL

Space of Python CompilationAhead Of Time Just In Time

Relies on CPython / libpython

Cython Shedskin

Nuitka (today) Pythran

Numba

Numba HOPE

Theano Pyjion

Replaces CPython / libpython

Nuitka (future) Pyston PyPy

Compiler overview

Intermediate Representation

(IR)

x86C++

ARM

PTX

C

Fortran

ObjCCode Generation

BackendParsing Frontend

Numba

Intermediate Representation

(IR)

x86

ARM

PTX

Python

LLVMNumba

Parsing Frontend Code Generation Backend

ExampleNumba

How Numba works

Bytecode Analysis

Python Function

Function Arguments

Type Inference

Numba IR

LLVM IRMachine Code

@jitdef do_math(a,b): …>>> do_math(x, y)

Cache

Execute!

Rewrite IR

Lowering

LLVM JIT

• Numba supports: – Windows, OS X, and Linux – 32 and 64-bit x86 CPUs and NVIDIA GPUs – Python 2 and 3 – NumPy versions 1.6 through 1.9

• Does not require a C/C++ compiler on the user’s system. • < 70 MB to install. • Does not replace the standard Python interpreter

(all of your existing Python libraries are still available)

Numba Features

• object mode: Compiled code operates on Python objects. Only significant performance improvement is compilation of loops that can be compiled in nopython mode (see below).

• nopython mode: Compiled code operates on “machine native” data. Usually within 25% of the performance of equivalent C or FORTRAN.

Numba Modes

1. Create a realistic benchmark test case.(Do not use your unit tests as a benchmark!)

2. Run a profiler on your benchmark.(cProfile is a good choice)

3. Identify hotspots that could potentially be compiled by Numba with a little refactoring.(see rest of this talk and online documentation)

4. Apply @numba.jit and @numba.vectorize as needed to critical functions. (Small rewrites may be needed to work around Numba limitations.)

5. Re-run benchmark to check if there was a performance improvement.

How to Use Numba

• Sometimes you can’t create a simple or efficient array expression or ufunc. Use Numba to work with array elements directly.

• Example: Suppose you have a boolean grid and you want to find the maximum number neighbors a cell has in the grid:

A Whirlwind Tour of Numba Features

The Basics

The Basics

Array Allocation

Looping over ndarray x as an iterator

Using numpy math functions

Returning a slice of the array

2.7x speedup!

Numba decorator (nopython=True not required)

Calling Other Functions

Calling Other FunctionsThis function is not

inlined

This function is inlined

9.8x speedup compared to doing this with numpy functions

Making Ufuncs

Making Ufuncs

Monte Carlo simulating 500,000 tournaments in 50 ms

Case-study -- j0 from scipy.special• scipy.special was one of the first libraries I wrote (in 1999)• extended “umath” module by adding new “universal functions” to

compute many scientific functions by wrapping C and Fortran libs.• Bessel functions are solutions to a differential equation:

x

2 d2y

dx

2+ x

dy

dx

+ (x2 � ↵

2)y = 0

y = J↵ (x)

Jn (x) =1

⇡

Z ⇡

0cos (n⌧ � x sin (⌧)) d⌧

scipy.special.j0 wraps cephes algorithm

Don’t need this anymore!

Result --- equivalent to compiled codeIn [6]: %timeit vj0(x) 10000 loops, best of 3: 75 us per loop

In [7]: from scipy.special import j0

In [8]: %timeit j0(x) 10000 loops, best of 3: 75.3 us per loop

But! Now code is in Python and can be experimented with more easily (and moved to the GPU / accelerator more easily)!

Word starting to get out!Recent numba mailing list reports experiments of a SciPy author who got 2x speed-‐up by removing their Cython type annotations and surrounding function with numba.jit (with a few minor changes needed to the code).

As soon as Numba’s ahead-‐of-‐time compilation moves beyond experimental stage one can legitimately use Numba to create a library that you ship to others (who then don’t need to have Numba installed — or just need a Numba run-‐time installed).

SciPy (and NumPy) would look very different in Numba had existed 16 years ago when SciPy was getting started…. — and you would all be happier.

Generators

Releasing the GILMany fret about the GIL in Python With PyData Stack you often have multi-‐threaded In PyData Stack we quite often release GIL

NumPy does it SciPy does it (quite often) Scikit-‐learn (now) does it Pandas (now) does it when possible Cython makes it easy Numba makes it easy

Releasing the GIL

Only nopython mode functions can release

the GIL

Releasing the GIL

2.8x speedup with 4 cores

CUDA Python (in open-source Numba!)

CUDA Developmentusing Python syntax for optimal performance!

You have to understand CUDA at least a little —

writing kernels that launch in parallel on the

GPU

Example: Black-Scholes

Black-Scholes: Results

core i7 GeForce GTX 560 Ti About 9x

faster on this GPU

~ same speed as CUDA-C

• CUDA Simulator to debug your code in Python interpreter • Generalized ufuncs (@guvectorize) • Call ctypes and cffi functions directly and pass them as arguments • Preliminary support for types that understand the buffer protocol • Pickle Numba functions to run on remote execution engines • “numba annotate” to dump HTML annotated version of compiled

code • See: http://numba.pydata.org/numba-doc/0.20.0/

Other interesting things

http://numba.pydata.org/numba-doc/0.20.0/

(A non-comprehensive list) • Sets, lists, dictionaries, user defined classes (tuples do work!) • List, set and dictionary comprehensions • Recursion • Exceptions with non-constant parameters • Most string operations (buffer support is very preliminary!) • yield from • closures inside a JIT function (compiling JIT functions inside a closure works…) • Modifying globals • Passing an axis argument to numpy array reduction functions • Easy debugging (you have to debug in Python mode).

What Doesn’t Work?

(Also a non-comprehensive list) • “JIT Classes” • Better support for strings/bytes, buffers, and parsing use-

cases • More coverage of the Numpy API (advanced indexing, etc) • Documented extension API for adding your own types, low

level function implementations, and targets. • Better debug workflows

The (Near) Future

• Lots of progress in the past year! • Try out Numba on your numerical and Numpy-related

projects: conda install numba

• Your feedback helps us make Numba better!Tell us what you would like to see:https://github.com/numba/numba

• Stay tuned for more exciting stuff this year…

Conclusion

https://github.com/numba/numba

221 W. 6th Street Suite #1550 Austin, TX 78701 +1 512.222.5440

[email protected]

@ContinuumIO

Thanks to Entire Numba team and Numba users!Stan Seibert, Antoine Pitrou, Siu Kwan Lam, Jon Riehl, Graham Markall, Oscar Villellas, Jay Borque and a host of others…

numba: flexible analytics written in python with machine-code speeds and avoiding the gil- travis...

Data & Analytics