python as number crunching code glue

Python as number crunching glue

Jiahao Chenjiahao@mit.edu@mitpostdoc

theochem.mit.edu

1Thursday, September 22, 2011

This is not a crash course on scientific computing or numerical linear algebraRecommended texts:

nr.com

Thursday, September 22, 2011

NumPy and SciPyHow to say:

NumPy: no official pronunciation

SciPy: “sigh pie”

NumPy and SciPyHow to say:

NumPy: no official pronunciation

SciPy: “sigh pie”

Where to get:

scipy.org, numpy.scipy.org

You might already have it

Otherwise, have fun installing it ;)

You may already know how to use numpy/scipy!

Similar to Matlab, Octave, Scilab, R.

see:http://mathesaurus.sourceforge.net/

In many cases, Matlab/Octave/Scilab code can be translated easily to use numpy+scipy+matplotlib.

Other interfaces exist: e.g. mlabwrap lets you wrap Python around Matlab.

Approximately continuous arithmeticfloating point*

- vs -

Exact discrete arithmeticbooleans, integers, strings, ...

*David Goldberg, “What every computer scientist should know about floating-point arithmetic”

Using numpy can make code cleaner

a = range(10000000)b = range(10000000)c = []

for i in range(len(a)): c.append(a[i] + b[i])

import numpy as npa = np.arange(10000000)b = np.arange(10000000)c = a + b

What’s different??

What’s different?

a = range(10000000)b = range(10000000)c = [] #a+b is concatenation

for i in range(len(a)): c.append(a[i] + b[i])

import numpy as npa = np.arange(10000000)b = np.arange(10000000)c = a + b #vectorized addition

Using numpy can save lots of time

0.333s7.050s (21x)

a convenient interface to compiled C/Fortran libraries: BLAS, LAPACK, FFTW, UMFPACK,...

creates list ofdynamically typed int

creates ndarray ofstatically typed int

Numerical sw stack

PythonBLAS

linearalgebra

Fouriertransforms

External Fortran/C

Your code

LAPACK

“One thing that graduate students eventually learn is that you can hide just about anything in a NxN matrix... (for sufficiently large N)” - anonymous string theorist

If your data can be put into a matrix/vector, numpy/scipy can help you!

You may already be working with matrix/vector data...

bitmap/video waveform

database table text differential

equation model

# Chapter NumPy SciPy

1 Scientific Computing2 Systems of linear equations X X

3 Linear least squares X

4 Eigenvalue problems X X

5 Nonlinear equations X

6 Optimization X

7 Interpolation X

8 Numerical integration and differntiation X

9 Initial value problems for ODEs X

10 Boundary value problems for ODEs X

11 Partial differential equations X

12 Fast Fourier Transform X

13 Random numbers and stochastic simulation X

Table of contents from Michael Heath’s textbook

Outline:

* NumPy: explicit data typing with dtypes : array manipulation with ndarrays

* SciPy: high-level numerical routines : use cases

* NumPy/SciPy as code glue: f2py and weave

The most fundamental object in NumPy is the ndarray (N-dimensional array)

v[:] vector M[:,:] matrix x[:,:,...,:] higher order tensor

unlike built-in Python data types,ndarrays are designed forhomogeneous, explicitly typed data

numpy primitive dtypes

Bits Boolean Signedinteger

Unsignedinteger Float Complex

8 bool int8 uint816 int16 uint1632 int32 uint32 float32

64int intp uint float

float64 complex6464int64 uint64

floatfloat64 complex64

128 float128 complex128256 complex256

dtypes bring explicit typing to Python

>>> mol = np.array(mol, dtype={'atomicnum':('uint8',0), 'coords':('3float64',1)})>>> mol['atomicnum']array([8, 1, 1], dtype=uint8)

Recarray: ndarray of data structure with named fields (record)

Structured array: ndarray of data structure

>>> mol = np.zeros(3, dtype=('uint8, 3float64'))>>> mol[0] = 8, (-0.464, 0.177, 0.0)>>> mol[1] = 1, (-0.464, 1.137, 0.0)>>> mol[2] = 1, (0.441, -0.143, 0.0)>>> molarray([(8, [-0.46400000000000002, 0.17699999999999999, 0.0]), (1, [-0.46400000000000002, 1.137, 0.0]), (1, [0.441, -0.14299999999999999, 0.0])], dtype=[('f0', '|u1'), ('f1', '<f8', (3,))])

The most fundamental object in NumPy is the ndarray (N-dimensional array)In 2D, the matrix class is also useful, especially when porting Matlab/Octave code.* For matrices, a*b is matrix multiply. For ndarrays, a*b is elementwise multiply.

* Matrices have convenient attributes: M.T transpose of M M.H Hermitian conjugate of M M.I matrix inverse of M

* Matrices are always 2D, no matter how you manipulate them. ****** This can lead to some very severe, insidious bugs. ******

using asarray() and asmatrix() views allows the best of both worlds.see: http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#matrix-objects

Memory layout of matrices

column major: first dimension is contiguous in memory Fortran, Matlab, R,...

row major: last dimension is contiguous in memory C, Java, numpy,...

Why you should care:• Cache coherence• Transposing a matrix is very expensive

• from Python iterable: lists, tuples,...e.g. array([1, 2, 3]) == asarray((1, 2, 3))• from intrinsic functionsempty() allocates memory onlyzeros() initializes to 0ones() initializes to 1arange() creates a uniform rangerand() initializes to uniform randomrandn() initializes to standard normal random...• from binary representation in string/buffer• from file on disk

Creating ndarrays

fromfunction() creates an ndarray whose entries are functions of its indices

e.g. the Hilbert matrix

>>> np.fromfunction(lambda i,j: 1./(i+j+1), (4,4))array([[ 1. , 0.5 , 0.33333333, 0.25 ], [ 0.5 , 0.33333333, 0.25 , 0.2 ], [ 0.33333333, 0.25 , 0.2 , 0.16666667], [ 0.25 , 0.2 , 0.16666667, 0.14285714]])

Generating ndarrays

arange(): like range() but accepts floats>>> import numpy as np>>> np.arange(2, 2.5, 0.1)array([ 2. , 2.1, 2.2, 2.3, 2.4])

linspace(): creates array with specified number of elements, spaced equally between the specified beginning and ending.>>> np.linspace(2.0, 2.4, 5)array([ 2. , 2.1, 2.2, 2.3, 2.4])

Generating ndarrays

ndarray native I/OFormat Reader Writer

pickle pickle.loads() dumps()pickle

np.load()

dumps()

NPY np.load() np.save()NPZ

np.load()np.savez()

Memory map np.memmapnp.memmap

NPY is numpy’s native binary formatNPZ is a zip file of NPYsMemory map: a class useful for handling huge matrices won’t load entire matrix into memory

ndarray text I/OFormat Reader Writer

Stringeval() np.array_repr()

Stringor below with StringIOor below with StringIO

Text filenp.loadtxt()

np.genfromtxt()np.recfromtxt()

savetxt()

CSV np.recfromcsv()Matrix Market scipy.io.mmread() mmwrite()

ndarray binary I/OFormat Reader WriterList np.array() ndarray.tolist()

Stringnp.fromstring() tostring()

Stringor below with StringIOor below with StringIO

Raw binary file

scipy.io.numpyio.fread() ndarray.fromfile()

fwrite().tofile()

MATLAB scipy.io.loadmat() savemat()netCDF scipy.io.netcdf.netcdf_filescipy.io.netcdf.netcdf_file

WAV audio scipy.io.wavfile.read() write()Image

(via PIL)scipy.misc.imread()

scipy.misc.fromimage()imsave()toimage()

Also video (OpenCV), HDF5 (PyTables), FITS (PyFITS)...Thursday, September 22, 2011

Indexing>>> x = np.arange(12).reshape(3,4); xarray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> x[1,2]6>>> x[2,-1]11>>> x[0][2]2>>> x[(2,2)]10>>> x[:1]array([[0, 1, 2, 3]])>>> x[::2,1:4:2]array([[ 1, 3], [ 9, 11]])

#slices return views, not copies

#tuple

row, then column

Fancy indexing>>> x = np.arange(12).reshape(3,4); xarray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> x[(2,2)]10>>> x[np.array([2,2])] #same as x[[2,2]]array([[ 8, 9, 10, 11], [ 8, 9, 10, 11]])>>> x[np.array([1,0]), np.array([2,1])]array([6, 1])>>> x[x>8]array([ 9, 10, 11])>>> x>8array([[False, False, False, False], [False, False, False, False], [False, True, True, True]], dtype=bool)

array index

Boolean mask

Fancy indexing II>>> y = np.arange(1*2*3*4).reshape(1,2,3,4); yarray([[[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],

[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]])

>>> y[0, Ellipsis, 0] # == y[0, ..., 0] == [0,:,:,0]array([[ 0, 4, 8], [12, 16, 20]])>>> y[0, 0, 0, slice(2,4)] # == y[(0, 0, 0, 2:4)]array([2, 3])

Broadcasting

>>> x #.shape = (3,4)array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> y #.shape = (1,2,3,4)array([[[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],

[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]])

>>> y * xarray([[[[ 0, 1, 4, 9], [ 16, 25, 36, 49], [ 64, 81, 100, 121]],

[[ 0, 13, 28, 45], [ 64, 85, 108, 133], [160, 189, 220, 253]]]])

What happens when you multiply ndarrays of different dimensions?

Case I: trailing dimensions match

Broadcasting

>>> a = np.arange(4); aarray([0, 1, 2, 3])>>> b = np.arange(4)[::-1]; barray([3, 2, 1, 0])>>> a + barray([3, 3, 3, 3])

What happens when you multiply ndarrays of different dimensions?

Case II: trailing dimension is 1>>> b.shape = 4,1>>> a + barray([[3, 4, 5, 6], [2, 3, 4, 5], [1, 2, 3, 4], [0, 1, 2, 3]])

>>> b.shape = 1,4>>> a + barray([[3, 3, 3, 3]])

In 2D, the matrix class is often more useful than ndarrays, especially when porting Matlab/Octave code.* For matrices, a*b is matrix multiply. For ndarrays, a*b is elementwise multiply.

* Matrices have convenient attributes: M.T transpose of M M.H Hermitian conjugate of M M.I matrix inverse of M

* Matrices are always 2D, no matter how you manipulate them. ****** This can lead to some very severe, insidious bugs. ******

using asarray() and asmatrix() views allows the best of both worlds.see: http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#matrix-objects

Matrix operations

Matrix functionsYou can apply a function elementwise to a matrix...>>> from numpy import array, exp>>> X = array([[1, 1], [1, 0]])>>> exp(X)array([[ 2.71828183, 2.71828183], [ 2.71828183, 1.]])

...or a matrix version of that function>>> from scipy.linalg import expm>>> expm(X)array([[ 2.71828183, 7.3890561 ], [ 1. , 2.71828183]])

other functions in scipy.linalg.matfuncs30

SciPy by example

* Data fitting

* Signal matching

* Disease outbreak modeling (epidemiology)

http://scipy-central.org/

Least-squares curve fittingfrom scipy import *from scipy.optimize import leastsqfrom matplotlib.pyplot import plot

#Make up data x(t) with Gaussian noisenum_points = 150t = linspace(5, 8, num_points)x = 11.86*cos(2*pi/0.81*t-1.32) + 0.64*t\ +4*((0.5-rand(num_points))*\ exp(2*rand(num_points)**2))

# Target functionmodel = lambda p, x: \ p[0]*cos(2*pi/p[1]*x+p[2]) + p[3]*x# Distance to the target functionerror = lambda p, x, y: model(p, x) - y# Initial guess for the parametersp0 = [-15., 0.8, 0., -1.]p1, _ = leastsq(error, p0, args=(t, x))

t2 = linspace(t.min(), t.max(), 100)plot(t, x, "ro", t2, model(p1, t2), "b-")raw_input()

fit data to model

Matching signalsSuppose I have a short audio clip

that I know to be part of a larger file

How can I figure out its offset?

Problem: naïve matching scales as O(N2)

An O(N lg N) solutionNaïve matching scales as O(N2)How can we do faster?

phase correlation

Exploit Fourier transforms: they encode relative offsets in complex phase

1/6Thursday, September 22, 2011

From math to code

import numpy

#Make up some dataN = 30000idx = 24700size = 300data = numpy.random.rand(N)frag_pad = numpy.zeros(N)frag = data[idx:idx+size]frag_pad[:size] = frag

#Compute phase correlationdata_ft = numpy.fft.rfft(data)frag_ft = numpy.fft.rfft(frag_pad)phase = data_ft * numpy.conj(frag_ft)phase /= abs(phase)cross_correlation = numpy.fft.irfft(phase)offset = numpy.argmax(cross_correlation)

print 'Input offset: %d, computed: %d' % (idx, offset)from matplotlib.pyplot import plotplot(cross_correlation)raw_input() #Pause

From math to code

import numpy

#Make up some dataN = 30000idx = 24700size = 300data = numpy.random.rand(N)frag_pad = numpy.zeros(N)frag = data[idx:idx+size]frag_pad[:size] = frag

#Compute phase correlationdata_ft = numpy.fft.rfft(data)frag_ft = numpy.fft.rfft(frag_pad)phase = data_ft * numpy.conj(frag_ft)phase /= abs(phase)cross_correlation = numpy.fft.irfft(phase)offset = numpy.argmax(cross_correlation)

print 'Input offset: %d, computed: %d' % (idx, offset)from matplotlib.pyplot import plotplot(cross_correlation)raw_input() #Pause

Modeling a zombie apocalypse

http://blogs.cdc.gov/publichealthmatters/2011/05/preparedness-101-zombie-apocalypse/

http://www.scipy.org/Cookbook/Zombie_Apocalypse_ODEINT

Normal (S) Zombie Dead (R)

Each person can be in one of three states

Normal (S) Zombie Dead (R)

Various processes connect these states

birth (P) normal death

resurrection (G)transmission (B)

destruction (A)

from numpy import linspacefrom scipy.integrate import odeint

P = 0 # birth rated = 0.0001 # natural death rateB = 0.0095 # transmission rateG = 0.0001 # resurrection rateA = 0.0001 # destruction ratedef f(y, t): Si, Zi, Ri = y return [P - B*Si*Zi - d*Si, B*Si*Zi + G*Ri - A*Si*Zi, d*Si + A*Si*Zi - G*Ri]

y0 = [500, 0, 0] # initial conditionst = linspace(0, 5., 1000) # time grid

soln = odeint(f, y0, t) # solve ODES, Z, R = soln[:, :].T

From math to code

Using external code“NumPy can get you most of the way to compiled speeds through vectorization. In situations where you still need the last ounce of speed in a critical section, or when it either requires a PhD in NumPy-ology to vectorize the solution or it results in too much memory overhead, you can reach for Cython or Weave. If you already know C/C++, then weave is a simple and speedy solution. If, however, you are not already familiar with C then you may find Cython to be exactly what you are looking for to get the speed you need out of Python.” - Travis Oliphant, 2011-06-20

see:http://www.scipy.org/PerformancePythonhttp://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html

Python as code glue- numpy.f2py: wraps * C, Fortran 77/90/95 functions * Fortran 90/95 module data * Fortran 77 COMMON blocks

- scipy.weave * .inline: compiles & runs C/C++ code manipulating Python scalars/ndarrays * .blitz: interfaces with Blitz++

Other wrapper libraries and programs: seehttp://scipy.org/Topical_Software

numpy.f2py: Fortran/C

$ cat>invsqrt.f real*8 function invsqrt (a) real*8 a invsqrt = 1.0/sqrt(a) end

$ f2py -c -m invsqrt invsqrt.f$ python -c 'import invsqrt; print invsqrt.invsqrt(4)'0.5

see: http://www.scipy.org/F2py

$ cat>invsqrt.c#include <math.h>double invsqrt(a) { return 1.0/sqrt(a);}$ cat>invsqrt.mpython module invsqrtinterface real*8 function invsqrt(x) intent(c) :: invsqrt real*8 intent(in) :: x end function invsqrtend interfaceend python module invsqrt$ f2py invsqrt.m invsqrt.c -c$ python -c 'import invsqrt; print invsqrt.invsqrt(4)'0.5

scipy.weave.inline

>>> from scipy.weave import inline>>> x = 4.0>>> inline('return_val = 1./sqrt(x));',['x'])0.5

see: https://github.com/scipy/scipy/blob/master/scipy/weave/doc/tutorial.txt

inline Extension

pythonscipyweave

distutilscore

on-the-flycompiledC/C++program

scipy.weave.blitzUses the Blitz++ numerical library for C++Converts between ndarrays and Blitz arrays>>> # Computes five-point average using numpy and weave.blitz>>> import numpy import empty>>> from scipy.weave import blitz>>> a = numpy.zeros((4096,4096)); c = numpy.zeros((4096, 4096))>>> b = numpy.random.randn(4096,4096)>>> c[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] + b[1:-1,2:] + b[1:-1,:-2]) / 5.0>>> blitz("a[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] + b[1:-1,2:] + b[1:-1,:-2]) / 5.")>>> (a == c).all()True

see:https://github.com/scipy/scipy/blob/master/scipy/weave/doc/tutorial.txt

ParallelizationThe easy way: numpy/scipy’s primitives automatically use vectorization compiled into external BLAS/LAPACK/... libraries

The usual way:- MPI interfaces (mpi4py,...)- Python threads/multiprocessing/...- OpenMP/pthreads... in external C/Fortran

see:http://www.scipy.org/ParallelProgramming

How I use NumPy/Scipy

Text input

Matrices Test model Visualize

Text output

scipy.optimizeQuasi-Newton optimizers

External binary

Binary outputndarray.

fromfile()

Beyond NumPy/SciPy

Python

SciPyExternal Fortran/C

My script

CVXOpt

many more examples at http://www.scipy.org/Topical_Software

PyTables VTK matplotlib

My interactive session

PylabHDF5

file I/Onumerical

optimization

visualization

moleculeviz.

python as number crunching code glue

biusing numpy

chapter numpy scipy

matrix x

numpy primitive dtypes

arange10000000 b

code cleaner import

arange10000000 c

nxn matrix

Education

excel 2: adventures in data crunching

aws glue - developer...

python programming for the absolute beginner second...

karst topography paper model · 2017-10-30 · 7 karst...

number crunching tons of examples

crunching the numbers nr14

crunching molecules and numbers in r

crunching reviews, 8 million & counting! reviewlattice

transition from number-crunching to process intelligence

f r e e c a d - uncreated.net · python everywhere the...

from crunching numbers to human capital

instructions - vuforia developer portal · glue here glue...

true glue - best eyelash glue

crunching numbers: designing your family budget

top gear destruction: crunching numbers over 21 seasons

munching & crunching - lucene index post-processing

h2 o glue other products other proro h2o · h2o glue stic...

gpgpu: number crunching in your graphics card

application software spreadsheets "number crunching"

f r e e c a d - fosdem...python everywhere the “glue”...