parallel processing with ipython
DESCRIPTION
In this screencast, Travis Oliphant gives an introduction to IPython, an extremely useful tool for task-based parallel processing with Python.TRANSCRIPT
Parallel Processing with IPython
January 22, 2010
Enthought Python Distribution (EPD)
MORE THAN SIXTY INTEGRATED PACKAGES
• Python 2.6
• Science (NumPy, SciPy, etc.)
• Plotting (Chaco, Matplotlib)
• Visualization (VTK, Mayavi)
• Multi-language Integration (SWIG,Pyrex, f2py, weave)
• Repository access
• Data Storage (HDF, NetCDF, etc.)
• Networking (twisted)
• User Interface (wxPython, Traits UI)
• Enthought Tool Suite (Application Development Tools)
Enthought Training Courses
Python Basics, NumPy, SciPy, Matplotlib, Chaco, Traits, TraitsUI, …
PyCon
http://us.pycon.org/2010/tutorials/
Introduction to TraitsIntroduction to Enthought Tool Suite
Fantastic deal (normally $700 at PyConget the same material for $275)
Corran Webster
Upcoming Training ClassesMarch 1 – 5, 2009 Python for Scientists and Engineers Austin, Texas, USA
March 1 – 5, 2009 Python for Quants London, UK
http://www.enthought.com/training/
6
Parallel Processingwith
IPython
7
IPython.kernel
• IPython's interactive kernel provides a simple (but powerful) interface for task-based parallel programming.
• Allows fast development and tuning of task-parallel algorithm to better utilize resources.
8
Getting started --- local clustermanually WINDOWSUNIX and OSX (and now WINDOWS)
# run ipcluster to start-up a # controller and a set of engines$ ipcluster local –n 4Your cluster is up and running.
...
You can then cleanly stop the cluster from IPython using:
mec.kill(controller=True)
You can also hit Ctrl-C to stop it, or use from the cmd line:
kill -INT 20465
Creates several key-files in ~/.ipython/security :
ipcontroller-engine.furl ipcontroller-mec.furl ipcontroller-tc.furl
# run ipcontroller and then# ipengine for each desired engine> start /B C:\Python25\Scripts\ipcontroller.exe> start /B C:\Python25\Scripts\ipengine.exe> start /B C:\Python25\Scripts\ipengine.exe> start /B C:\Python25\Scripts\ipengine.exe...2009-02-11 23:58:26-0600 [-] Log opened.2009-02-11 23:58:28-0600 [-] Using furl file: C:\Documents and Settings\demo\_ipython\security\ipcontroller-engine.furl2009-02-11 23:58:28-0600 [-] registered engine with id: 32009-02-11 23:58:28-0600 [-] distributing Tasks2009-02-11 23:58:28-0600 [Negotiation,client] engine registration succeeded, got id: 3
Creates several key-files in %HOME%\_ipython\security :
ipcontroller-engine.furl ipcontroller-mec.furl ipcontroller-tc.furl
9
Getting started -- distributed• Run ipcontroller on a host and create .furl files
• Creates separate .furl files to be used by the different connections (engine, multiengine client, task client).
• Places .furl files by default in ~/.ipython/security (UNIX or Mac OSX) or %HOME%\_ipython\security (Windows).
• Takes --<connection>-furl-file=FILENAME options where <connection> is engine, multiengine, or task to place the .furl files somewhere else.
• Ensure the ipcontroller-engine.furl file is available to each host that will run an engine and run ipengine on these hosts.• Either place it in the default security directory
• Use the –furl-file=FILENAME option to ipengine
• Ensure the multiengine (task) .furl file is available to each host that will run a multiengine (task) client. • Either place it in the default security directory
• Pass the FILENAME as the first argument to the constructor
10
Initialize client
TASKCLIENTMULTIENGINECLIENT
# * allows fine-grained control# * each engine has an id number# * more intuitive for beginners# optional argument can be # location of mec furl-file# created by the controller>>> mec = client.MultiEngineClient()>>> mec.get_ids()[0 1 2 3]
>>> from IPython.kernel import client
# * does not expose individual # engines# * presents a load-balanced,# fault-tolerant queue# optional argument can be # location of tc furl-file# created by the controller>>> tc = client.TaskClient()
mec.map -- parallel mapmec.parallel –- parallel functionmec.execute -- execute in parallelmec.push -- push datamec.pull -- pull datamec.scatter -- spread outmec.gather -- collect backmec.kill -- kill engines and controller
tc.map –- parallel maptc.parallel –- function decoratortc.run -- run Taskstc.get_task_result – get result
client.MapTask –- function-likeclient.StringTask –- code-string
11
MultiEngineClientSCALAR FUNCTION PARALLEL VECTORIZED FUNCTION
# Using map>>> def func(x):... return x**2.5 * (3*x – 2)# standard map>>> result = map(func, range(32))# mec.map>>> parallel_result = mec.map(func, range(32))
# mec.parallel >>> pfunc = mec.parallel()(func)
@mec.paralleldef pfunc(x): return x**2.5 * (3*x – 2)
>>> parallel_result2 = pfunc(range(32))
or using decorators
12
TaskClient – Load BalancingSCALAR FUNCTION PARALLEL VECTORIZED FUNCTION
# Using map>>> def func(x):... return x**2.5 * (3*x – 2)# standard map>>> result = map(func, range(32))# mec.map>>> parallel_result = tc.map(func, range(32))
# mec.parallel >>> pfunc = tc.parallel()(func)
@tc.paralleldef pfunc(x): return x**2.5 * (3*x – 2)
>>> parallel_result2 = pfunc(range(32))
or using decorators
13
MultiEngineClient EXECUTE CODESTRING IN PARALLEL
>>> from enthought.blocks.api import func2str# decorator that turns python-code into a string>>> @func2str... def code():... import numpy as np... a = np.random.randn(N,N)... eigs, vals = np.linalg.eig(a)... maxeig = max(abs(eigs))>>> mec['N'] = 100>>> result = mec.execute(code)>>> print mec['maxeig'][10.471428625885835, 10.322386155553213, 10.237638983818622, 10.614715948426941]
14
TaskClient – Load Balancing QueueEXECUTE CODESTRING IN PARALLEL
>>> from enthought.blocks.api import func2str# decorator that turns python-code into a string>>> @func2str... def code():... import numpy as np... a = np.random.randn(N,N)... eigs, vals = np.linalg.eig(a)... maxeig = max(abs(eigs))>>> task = client.StringTask(str(code), push={'N':100}, pull='maxeig') >>> ids = [tc.run(task) for i in range(4)]>>> res = [tc.get_task_result(id) for id in ids]>>> print [x['maxeig'] for x in res][10.439989436983467, 10.250842410862729, 10.040835983392991, 10.603885977189803]
Parallel FFT On Memory Mapped File
ProcessorsTime
(seconds)Speed Up
1 11.75 1.0
2 6.06 1.9
4 3.36 3.5
8 2.50 4.7
EPDhttp://www.enthought.com/products/epd.php
Enthought Training:http://www.enthought.com/training/
Webinarshttp://www.enthought.com/training/webinars.php