december 1st, 2016 - mcgill hpc - mcg… · december 1st, 2016 1 ... using cython and numba ......

48
Advanced and Parallel Python December 1st, 2016 1 http://tinyurl.com/cq-advanced-python-20161201 By: Bart Oldeman and Pier-Luc St-Onge

Upload: dinhdiep

Post on 30-Apr-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Advanced and Parallel Python

December 1st, 2016

1

http://tinyurl.com/cq-advanced-python-20161201By: Bart Oldeman and Pier-Luc St-Onge

Page 2: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Financial Partners

2

Page 3: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Setup for the workshop1. Get a user ID and password paper (provided in class):

##: userNMXXXXXXXXXX **********

2. Access to local computer (replace ## and ___ with appropriate values, “___” is provided in class):a. User name: csuser##b. Password: ___@[S##

3. HTTPS connection to Colosse (replace **********):a. https://jupyter.calculquebec.cab. User name: userNMc. Password: **********d. If requested:

i. click Start Server button, set walltime 83

Page 4: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Select Modules -Change Notebook Kernel

● In the Software tab, select:○ compilers/llvm/3.7.1○ compilers/gcc/4.8.5

● Open notebooks/01-stack.ipynb○ File -> Save and Checkpoint

4

Page 5: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Import Examples and ExercisesIn case the cq-formation-advanced-python folder is not in your home directory, open a Terminal and type:

module load apps/git/1.8.5.3 # If on Colosse

git clone -b ulaval \

https://github.com/calculquebec/cq-formation-advanced-python.git

cd cq-formation-advanced-python

5

Page 6: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Outline

● Revisiting the Scientific Python Stack● Why (and What) is Python?

○ Accelerating Python code: PyPy and Numpy○ Using C code from Python code

● Finding Bottlenecks - Profiling code● Compiling Python Code

○ Using Cython and Numba● Parallelizing Python Programs

○ Parallel Programming Concepts○ The multiprocessing Module○ MPI for Python (mpi4py)

6

Page 7: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

7

The Scientific Python stack

Page 8: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Scientific Python stack

In the introductory workshop we looked at:● Python itself● Numpy, for numerical array objects● Scipy, for higher level routines● IPython, an advanced Python shell● Matplotlib, for plottingOn top of that we introduce some new components, for example:● Cython, for speed and interfacing● mpi4py for using MPI in Python

8

Page 9: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

9

Speeding up Python programs

Page 10: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Speeding up PythonCentral example: approx_pi.c / approx_pi.py:

// approx_pi.c

double approx_pi(int intervals)

{ double pi = 0.0;

int i;

for (i = 0; i < intervals; i++) {

pi += (4 - ((i % 2) * 8)) /

(double)(2 * i + 1);

}

return pi;

}

10

# approx_pi.py

def approx_pi(intervals):

pi = 0.0

for i in range(intervals):

pi += (4 - 8 * (i % 2)) /

(float)(2 * i + 1)

return pi

Page 11: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Speeding up PythonCompile:$ gcc -O2 pi_collect.c approx_pi.c -o pi_collect

$ ./pi_collect 100000000

.. Time = 0.88 secPython run (example on Guillimin):$ module load iomkl/2015b Python/3.5.0

$ python pi_collect.py approx_pi 100000000

The compiled C code runs almost 100 times faster than the Python code (0.88 vs. 66 seconds with intervals = 100000000).Note that “approx_pi” is the module to import for pi_collect.py.

11

Page 12: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Speeding up Python

How to speed up: two approaches1. Make Python go faster

a. Use the PyPy just-in-time compilerb. Use Numpy with vectorized codec. Use Cython

2. Call C code from Pythona. Manuallyb. Use SWIGc. Use Ctypesd. Use Cythone. ....

12

Page 13: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Speeding up Python using PyPy

How to speed up: use PyPy:$ module add pypy/3-2.4.0

$ pypy3 pi_collect.py approx_pi 100000000

gives 2.2 seconds (30 times faster)

An alternative to PyPy is Numba (not installed on Guillimin).

13

Page 14: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Speeding up with numpy

How to speed up: use vectorized code:from __future__ import division # only needed for Python 2.x

def approx_pi(intervals):

pi1 = 4/numpy.arange(1, intervals*2, 4)

pi2 = -4/numpy.arange(3, intervals*2, 4)

return numpy.sum(pi1) + numpy.sum(pi2)

$ python3 pi_collect.py approx_pi_numpy 100000000

gives 1.4 seconds (47 times faster).Drawback: extra memory use.How to speed up: Cython: see later

14

Page 15: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

15

Interfacing with C/C++/Fortran

Page 16: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Interfacing with C and C++

● There are at least 14 different ways to do it:1. By hand using the Python API (*)2. Pyrex3. Cython (**)4. SWIG (*)5. SIP6. Boost.Python7. PyCXX8. CTypes (*)9. Py++

10. f2py (*)11. PyD12. Interrogate13. Robin (*) Quick introduction14. Pybind11 (**) Most popular now, more thorough introduction

16

Page 17: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Using the Python API● Pros: no extra dependencies● Cons: a lot of boilerplate code, which can change between

Python version/* Example of wrapping approx_pi() with the Python-C-API. */

#include <Python.h>

#include "approx_pi.h"

static PyObject* approx_pi_func(PyObject* self, PyObject* args) // wrapped approx_pi()

{ int value; double answer;

if (!PyArg_ParseTuple(args, "i", &value)) // parse input, python float to c double

return NULL;

/* if the above function returns -1, an appropriate Python exception will

* have been set, and the function simply returns NULL */

answer =approx_pi(value);

/* construct the output from approx_pi, from c double to python float */

return Py_BuildValue("f", answer); }

17

Page 18: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Using the Python API/* define functions in module */

static PyMethodDef PiMethods[] =

{

{"approx_pi", approx_pi_func, METH_VARARGS, "approximate Pi"},

{NULL, NULL, 0, NULL} };

static struct PyModuleDef PiModule = {

PyModuleDef_HEAD_INIT, "approx_pi_pyapi", NULL, -1, PiMethods,

NULL, NULL, NULL, NULL };

/* module initialization */

PyMODINIT_FUNC PyInit_approx_pi_pyapi(void)

{ (void) PyModule_Create(&PiModule);}

Compile using $ python3 setup_approx_pi_pyapi.py build_ext --inplacefrom distutils.core import setup, Extension

# define the extension module

module = Extension('approx_pi_pyapi', sources=['approx_pi_pyapi.c', 'approx_pi.c'])

setup(ext_modules=[module]) # run the setup

18

Page 19: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Using CTypes● Pros: the ctypes package is in Python by default, pure

Python solution● Cons: wrapped code in shared lib, interface not fastFirst compile approx_pi_ctypes.so:$ gcc -fPIC -shared -O2 approx_pi.c -o approx_pi_ctypes.so# approx_pi_ctypes.py

""" Example of wrapping approx_pi using ctypes. """

import ctypes

approx_pi_dll = ctypes.cdll.LoadLibrary('./approx_pi_ctypes.so') # find and load the library

approx_pi_dll.approx_pi.argtypes = [ctypes.c_int] # set the argument type

approx_pi_dll.approx_pi.restype = ctypes.c_double # set the return type

def approx_pi(arg):

''' Wrapper for approx_pi '''

return approx_pi_dll.approx_pi(arg)

19

Page 20: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Using SWIG● Mature solution● Wrapper file is autogenerated from interface file./* approx_pi_swig.i */

/* Example of wrapping approx_pi using SWIG. */

%module approx_pi_swig

%{

/* the resulting C file should be built as a python extension */

#define SWIG_FILE_WITH_INIT

/* Includes the header in the wrapper code */

#include "approx_pi.h"

%}

/* Parse the header file to generate wrappers */

%include "approx_pi.h"

20

Page 21: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Using SWIG● Use distutils as before (python3

setup_approx_pi_swig.py build_ext --inplace) but mention the interface file in the setup script.

from distutils.core import setup, Extension

approx_pi_module = Extension("_approx_pi", sources=["approx_pi.c", "approx_pi.i"])setup(ext_modules=[approx_pi_module]])

● This generates three files: approx_pi_swig.py, approx_pi_swig_wrap.c, and _approx_pi_swig*.so

21

Page 22: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Using f2py● Fortran version: approx_pi.f90subroutine approx_pi(intervals, pi)

integer, intent(in) :: intervals

double precision, intent(out) :: pi

integer i

pi = 0

do i = 0, intervals - 1

pi = pi + (4 - (mod(i,2) * 8)) / dble(2 * i + 1)

enddo

end subroutine approx_pi

● Compile usingf2py3 -c -m approx_pi_f2py approx_pi.f90

● Then dopython3 pi_collect.py approx_pi_f2py 100000000

22

Page 23: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

23

Cython

Page 24: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Cython● Cython compiles from Python (with extensions) to C.● Based on Pyrex● Goals: faster execution (especially with those

extensions) and easier interoperability with other C code.

● Cython files use the .pyx extension.

24

Page 25: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Cython● Example: approx_pi_cython1.pyx (same as

approx_pi.py)def approx_pi(intervals):

pi = 0.0

for i in range(intervals):

pi += (4 - 8 * (i % 2)) / (float)(2 * i + 1)

return pi

● Executing python3 setup_cython.py build_ext --inplace from distutils.core import setup

from Cython.Build import cythonize

setup(ext_modules = cythonize("*.pyx"))

turns all .pyx files into .c files and .so modules● Run python3 pi_collect.py approx_pi_cython1 100000000

○ 25 seconds: the C code uses only Python objects.25

Page 26: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Cython: declare variables● Need to declare variables using cdef to make it fast● Example: approx_pi_cython2.pyx def approx_pi(int intervals):

cdef double pi

cdef int i

pi = 0.0

for i in range(intervals):

pi += (4 - 8 * (i % 2)) / (float)(2 * i + 1)

return pi

● Execute python3 setup_cython.py build_ext --inplace ● Run python3 pi_collect.py approx_pi_cython2 100000000

○ 0.89 seconds: almost as fast as native C.

26

Page 27: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Cython: division● Inspecting approx_pi_cython2.c we found it uses

__Pyx_mod_long(__pyx_v_i, 2) instead of a plain __pyx_v_i % 2. This is because in C, -1%10=-1 but in Python, -1%10=9.

● Here we can ignore this and tell Cython to use C behaviour, by adding a line

#cython:cdivision=True● Execute python3 setup_cython.py build_ext --inplace

○ Check that approx_pi_cython3.c uses %.● Run python3 pi_collect.py approx_pi_cython3 100000000

○ 0.88 seconds: the same as native C.● Note: use Cython in IPython/Jupyter using “%load_ext

cythonmagic” and “%%cython” in a cell.27

Page 28: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Cython: wrapping C code● Last but not least: interfacing with C code:# approx_pi_cython4.pyx

cdef extern from "approx_pi.h":

double c_approx_pi "approx_pi" (int intervals)

# C name: approx_pi, Cython name: c_approx_pi

def approx_pi(int intervals):

return c_approx_pi(intervals)

● Plus special setup_cython4.py scriptfrom distutils.core import setup, Extension

from Cython.Distutils import build_ext

setup(cmdclass={'build_ext': build_ext},

ext_modules=[Extension("approx_pi_cython4",

sources=["approx_pi_cython4.pyx", "approx_pi.c"])])

● Execute python3 setup_cython4.py build_ext --inplace ● Run python3 pi_collect.py approx_pi_cython4 100000000

28

Page 29: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Parallel Programming Concepts

29

Page 30: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Vocabulary

● Serial tasks○ Any task that cannot be split in two simultaneous

sequences of actions

○ Examples: starting a process, reading a file, any communication between two processes

● Parallel tasks○ Data parallelism: same action applied on different

data. Could be serial tasks done in parallel.

○ Process parallelism: one action on one set of data. Action split in multiple processes or threads.■ Data partitioning: rectangles or blocks

30

Page 31: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Parallel tasks

● Parallel efficiency (scaling)○ Amdahl’s law: how long does it take to compute a

task with an infinite number of processors?○ Gustafson's law: what size of problem can we

solve in a given time with N processors?

● Shared memory○ Multiple threads share the same memory space in a

single process: full read and write access.

● Distributed memory○ Each process has its own memory space○ Information is sent and received by messages

31

Page 32: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Distributed Memory Model

32

Net

wor

k

Process 1

A(10)

Process 2

A(10)

Different variables!

Page 33: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Serial Code Parallelization● Implicit Parallelization - minimum work for you

○ Threaded libraries (MKL, ACML, GOTO, etc.)○ Compiler directives (OpenMP)○ Good for desktops and shared memory machines

● Explicit Parallelization - work is required !○ You tell what should be done on what CPU○ Solution for distributed clusters (shared nothing!)

● Hybrid Parallelization - work is required !○ Mix of implicit and explicit parallelization

■ Vectorization and parallel CPU instructions○ Good for accelerators (CUDA, OpenCL, etc.)

33

Page 34: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

The multiprocessing Module

34

Page 35: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

The multiprocessing Module

● Because of the implementation of CPython, only one thread at a time can execute Python code○ This avoids common issues with the shared

memory model: race condition, ...

○ There is a threading module, but it is no longer recommended

● Solution: the multiprocessing module!

35

Page 36: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Pool of WorkersFor embarrassingly parallel tasks, the Pool class allows the creation of worker processes. Each process will compute different data.

Warning: only works in a script!

36

from multiprocessing import Pool

def prod(values):

return values[0] * values[1]

if __name__ == '__main__':

N = 12

values = [(i + 1, N - i)

for i in range(0, N)]

print(values)

workers = Pool(processes=4)

results = workers.map(prod, values)

print(results)

Page 37: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Pool of Workers

● Run: python script.py● What happens with 4 workers:

37

Page 38: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Pool of WorkersAsynchronous map calls can be used in order to do something else in the main process. The map_async() method returns an AsyncResult object which can wait until all workers are done.

38

from multiprocessing import Pool

import time

def prod(values):

time.sleep(1)

return values[0] * values[1]

if __name__ == '__main__':

N = 12

values = [(i + 1, N - i)

for i in range(0, N)]

print(values)

workers = Pool(processes=4)

results = workers.map_async(prod, values)

print('Waiting...')

print(results.get(timeout=10))

Page 39: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Pool of WorkersAsynchronous map calls can use a callback function. Then, the main thread has to wait by first closing the access to workers, and by joining the pool of workers.

39

def printRes(results):

print(results)

if __name__ == '__main__':

N = 12

values = [(i + 1, N - i)

for i in range(0, N)]

print(values)

workers = Pool(processes=4)

results = workers.map_async(prod,

values, callback=printRes)

print('Waiting...')

workers.close()

workers.join()

Page 40: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Pool of Workers

● class Pool([processes[,...]])○ processes: number of worker processes. If None,

processes=multiprocessing.cpu_count()○ Methods:

■ map(func, iterable[, ...]): returns results

■ map_async(func, iterable[, ...]): returns an AsyncResult object

■ close(): closes access to worker processes

■ join(): waiting for all workers to exit. Must call close() before.

40

Page 41: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Pool of Workers

● class AsyncResult○ Methods:

■ get([timeout]): blocking, get results as soon as they are available. In case of error, get

■ wait([timeout]): blocking, waits until the call is done

■ ready(): non-blocking, returns a boolean indicating if the call has completed.

■ successful(): non-blocking, returns a boolean indicating if the call has succeeded.

41

Page 42: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Exercise - Baby Genomic

● Edit baby-genomic.py○ Use a pool of 4 workers○ Use the asynchronous map function

○ Provide a callback function that will print results at the end

○ Tip: use the edProxy() function in order to call the real editDistance() function.

● Run:time -p python baby-genomic.py

42

Page 43: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

The Process class● https://docs.python.org/2/library/multiprocessing.html

○ The Process class: manually spawn and control each

processProcess(target=fct, args=(arg1,arg2)).start()

○ Communication channels:■ The Pipe class: to communicate between two

processes, one sends data, one receives data■ The Queue class: a shared pipe managed with locks

and semaphores, one puts data, one gets data○ Synchronization:

■ The Lock class: one acquires lock, one releases lock

43

Page 44: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

44

MPI for Python (mpi4py)

Page 45: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

MPI for Python● The mpi4py package provides bindings from Python to

MPI (Message Passing Interface).● MPI functions are then available in Python but with

some simplifications:○ MPI_Init() and MPI_Finalize() are done automatically○ The bindings can auto-detect many values that

need to be specified as explicit parameters in the C and Fortran bindings.

○ Example: dest = 1; tag = 54321; MPI_Send( &matrix,

count, MPI_INT, dest, tag, MPI_COMM_WORLD )

becomes MPI.COMM_WORLD.Send(matrix, dest=1, tag=54321)

45

Page 46: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

MPI for Python● Import as from mpi4py import MPI● Then often use comm = MPI.COMM_WORLD● Two variations for most functions:

a. all lowercase, e.g. comm.recv()■ works on general Python objects, using pickle (can

be slow)■ received object (value) returned:

● matrix = comm.recv(source=0, tag=MPI.ANY_TAG)

b. capitalized, e.g. comm.Recv()■ works fast on numpy arrays & other buffers■ received object given as parameter:

● comm.Recv(matrix, source=0, tag=MPI.ANY_TAG)

■ Specify [matrix, MPI.INT], or [data, count, MPI.INT] if autodetection fails.

46

Page 47: December 1st, 2016 - McGill HPC - McG… · December 1st, 2016 1 ... Using Cython and Numba ... components, for example: Cython, for speed and interfacing mpi4py for using MPI in

Conclusions● Main techniques covered:

○ Speeding up: PyPy, Numba, CTypes, Cython

○ Parallel programming: multiprocessing, mpi4py

● Useful links:○ http://www.scipy-lectures.org/advanced/interfacing_with_c/int

erfacing_with_c.html

○ https://github.com/kwmsmith/scipy-2015-cython-tutorial

○ https://docs.python.org/3/library/multiprocessing.html

○ http://materials.jeremybejarano.com/MPIwithPython

47