implementations of parallel processing
DESCRIPTION
Final report made for the University of Michigan -- Flint in CSC 478 "Parallel Processing" for Dr. Michael Farmer. The report explores and briefly introduces four programming languages for parallel processing: C, Charm, Erlang, and JOCaml.TRANSCRIPT
FLINT
Implementations of Parallel ProcessingLanguages and Thoughts
Robert J. Knuuti
University of Michigan – Flint
April 19, 2010
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 1 / 60
FLINT
Outline
Topics
1 Purpose
2 Languages StudiedC and C++Charm++ErlangJoCaml
3 Benchmark
4 Bibliography
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 2 / 60
FLINT
Purpose
Philosophies
There are no silver bullets
There shall never be one implementation that will work in every case. [2]
Use the right tool for the right job
You should always let a project dictate what you should use rather thanalways select the most general choice.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 3 / 60
FLINT
Purpose
Philosophies
TIMTOWTDI
“There Is More Than One Way To Do It”. This is a very powerfulstatement developed by the practitioners of Larry Wall’s Perl scriptinglanguage. By having multiple ways to complete a task, new and innovativeways will be recognized leading to the evolution of parallel utilities. [9]
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 4 / 60
FLINT
Purpose
Parallel Process Models
There are only two accepted models for parallel processing:
Shared Memory (Threaded-Centric)
Each executing process has access to a pool of memory which theprogram has access to.Requires Locking system to ensure that data is being securely operatedon
Process (Message Passing)
Each executing process has its own unique copy of dataNullifies data dependencies and mutexes.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 5 / 60
FLINT
Purpose
Considerations
Questions
Which implementation is the fastest?
Which implementation is the cleanest?
Which implementation is the most efficient?
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 6 / 60
FLINT
Languages Studied
1 Purpose
2 Languages StudiedC and C++Charm++ErlangJoCaml
3 Benchmark
4 Bibliography
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 7 / 60
FLINT
Languages Studied
Overview of Paradigms
Procedural
Stepped executionHas conditionals and branching, mutable data, and callsTypically follow Shared Memory Model
Functional
Lambda execution (all operations are functions)Pattern Matching and GuardsTypically follow Process Model
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 8 / 60
FLINT
Languages Studied C and C++
1 Purpose
2 Languages StudiedC and C++Charm++ErlangJoCaml
3 Benchmark
4 Bibliography
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 9 / 60
FLINT
Languages Studied C and C++
Background
Unarguably most used language
Great deal of support and extension
Parallelism achieved through libraries
Threads shall be implemented in C++0x
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 10 / 60
FLINT
Languages Studied C and C++
Supported Paradigms
Message Passing
Uses MPI and AMPI libraries to perform tasksRequires a VM layer to facilitate network communication
Shared Memory
MFC Threads (Windows Library)PThreads (Unix Library, experimental Windows support)Boost Threads (Cross-Platform wrapper)
Shared Memory is the most commonly used implementation.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 11 / 60
FLINT
Languages Studied C and C++
Example: Hello World
Listing 1: C Hello World�#inc lude <s t d i o . h>
i n t main ( ) {p r i n t f ( ” He l l o World ! ” ) ;re tu rn 0 ;
} �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 12 / 60
FLINT
Languages Studied C and C++
Example: Factorial I
Listing 2: C Factorial�#inc lude <s t d i o . h>#inc lude < s t d l i b . h>
long f a c t o r i a l ( i n t n ) {i f ( n <= 1) re tu rn 1 ;e l s e
re tu rn n∗ f a c t o r i a l ( n−1) ;}
i n t main ( i n t argc , char ∗∗ a rgv ) {long i n t f = f a c t o r i a l ( a t o i ( a rgv [ 1 ] ) ) ;p r i n t f ( ”%d\n” , f ) ;re tu rn 0 ;
} �Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 13 / 60
FLINT
Languages Studied C and C++
PThread Library
Built off of the Posix standard (1003.1-1995)
Has modules for thread creation, mutexes, and condition barriers
Over 60 function calls
Has modifiers for manipulating each module’s actions
Implemented in C
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 14 / 60
FLINT
Languages Studied C and C++
PThread Code Index
Type system: pthread module t.
Functions: pthread module function.
Constants: PTHREAD CONST.
This is covered in detail inside the book.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 15 / 60
FLINT
Languages Studied Charm++
1 Purpose
2 Languages StudiedC and C++Charm++ErlangJoCaml
3 Benchmark
4 Bibliography
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 16 / 60
FLINT
Languages Studied Charm++
Background
Event-driven object oriented programming language
Based on C++
Provides built in primitives for parallel programming (using interface)files to identify parallel C++ code)
Runs on a VM which is complied and optimized for compiled code.
[6]
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 17 / 60
FLINT
Languages Studied Charm++
Supported Paradigms
Message Passing
Utilizes an optimized VM layer to communicate between nodesUses built in calls to communicate between VMs
Shared Memory
Uses a pooled namespace classes, known as “Chares”Global variables are required to be read-only.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 18 / 60
FLINT
Languages Studied Charm++
Novelties
Mutexes are unneeded, as parallel data is essentially exclusive ormarked read-only.
Pseudo parallelism is generated automatically from programmingcode, split by objects.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 19 / 60
FLINT
Languages Studied Charm++
Example: Hello World I
Listing 3: Header File�#i f n d e f HELLO H#def ine HELLO H
c l a s s He l l o : p u b l i c CBase Main {p u b l i c :
Main (CkArgMsg ∗msg) ;Main ( CkMigrateMessage ∗msg) ;
} ;
#end i f // HELLO H �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 20 / 60
FLINT
Languages Studied Charm++
Example: Hello World II
Listing 4: Source File�#inc lude ” h e l l o . d e c l . h”#inc lude ” h e l l o . h”
Main : : Main (CkArgMsg ∗msg) {CkPr i n t f ( ” He l l o World !\ n” ) ;
CkEx i t ( ) ;}
Main : : Main ( CkMigrateMessage ∗msg) {}
#inc lude ” h e l l o . d e f . h” �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 21 / 60
FLINT
Languages Studied Charm++
Example: Hello World III
Listing 5: Interface File�mainmodule h e l l o {
mainchare He l l o {entry He l l o (CkArgMsg ∗m) ;
} ;} ; �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 22 / 60
FLINT
Languages Studied Charm++
Basics I
Files
Applications are composed of at least three files:
Source File (.C)
Header File (.h)
Common Interface File (.ci)
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 23 / 60
FLINT
Languages Studied Charm++
Basics II
CI file
The CI file is unique to the charm system and identifies parallel executionagents, results in two generated headers for the charm system to read. Itidentifies:
A parent namespace, called module
A server class, called mainchare
Identification of reentrant function using the keyterm entry
Identification of read only data using the term readonly.�module h e l l o {
a r r a y [ 1D] He l l o {entry He l l o ( ) ;entry vo i d sayHi ( i n t ) ;
} ;} ; �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 24 / 60
FLINT
Languages Studied Charm++
Basics III
�mainmodule main {
readon ly CProxy Main mainProxy ;e x t e r n module h e l l o ;
mainchare Main {entry Main (CkArgMsg ∗msg) ;entry vo i d done ( ) ;
} ;} ; �
H file
Very similar to a C++ class header, the header files have an identicalsyntax, with the exception of inheriting from a class called CBase Main.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 25 / 60
FLINT
Languages Studied Charm++
Basics IV
�#i f n d e f HELLO H#def ine HELLO H
c l a s s He l l o : p u b l i c CBase He l l o {p u b l i c :
H e l l o ( ) ;H e l l o ( CkMigrateMessage ∗msg) ;
vo i d sayHi ( i n t from ) ;} ;
#end i f // HELLO H �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 26 / 60
FLINT
Languages Studied Charm++
Basics V
�#i f n d e f MAIN H#def ine MAIN H
c l a s s Main : p u b l i c CBase Main {p r i v a t e :
i n t numElements ;i n t doneCount ;
p u b l i c :Main (CkArgMsg ∗msg) ;Main ( CkMigrateMessage ∗msg) ;
vo i d done ( ) ;} ;
#end i f // MAIN H �Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 27 / 60
FLINT
Languages Studied Charm++
Basics VI
C file
The C file (which is actually a C++ file; the compiler looks for capital cextensions) is a little abnormal, but keeps the familiar C++ syntax.A thing to note is the addition of the two headers generated by the ci file.These two includes contain the necessary bindings to use the charmruncommand.�
#inc lude ” h e l l o . d e c l . h”#inc lude ” h e l l o . c”
e x t e r n /∗ r e a don l y ∗/ CProxy Main mainProxy ;
He l l o : : H e l l o ( ) {}He l l o : : H e l l o ( CkMigrateMessage ∗msg) {}
vo i d He l l o : : sayHi ( i n t from ) {
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 28 / 60
FLINT
Languages Studied Charm++
Basics VII
CkPr i n t f ( ” He l l o from cha re %d on P(%d ) , t o l d by↘%d .\ n” ,
t h i s I n d e x , CkMyPe ( ) , from ) ;
mainProxy . done ( ) ;}
#inc lude ” h e l l o . d e f . h” ��#inc lude ”main . d e c l . h”#inc lude ”main . h”#inc lude ” h e l l o . d e c l . h”
/∗ r e a don l y ∗/ CProxy Main mainProxy ;
Main : : Main (CkArgMsg ∗msg) {doneCount = 0 ;numElements = 5 ;
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 29 / 60
FLINT
Languages Studied Charm++
Basics VIII
i f (msg−>a rgc > 1)numElements = a t o i (msg−>a rgv [ 1 ] ) ;
d e l e t e msg ;
CkP r i n t f ( ”Running He l l o World u s i n g %d e l ement s↘ove r %d p r o c e s s o r s .\ n” ,
numElements , CkNumPes ( ) ) ;
mainProxy = th i sP r o x y ;CProxy He l l o = h e l l o A r r a y =
↘CProxy He l l o : : ckNew( numElements ) ;
h e l l o A r r a y . sayHi (−1) ;}
Main : : Main ( CkMigrateMessage ∗msg) {}
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 30 / 60
FLINT
Languages Studied Charm++
Basics IX
vo i d Main : : done ( ) {doneCount++;i f ( doneCount >= numElements )
CkEx i t ( ) ;}
#inc lude ”main . d e f . h” �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 31 / 60
FLINT
Languages Studied Charm++
Basics X
Building
Building the project into an executable is a bit detailed, but nothing reallydifferent from your typical managed C/C++ applications.
compile interface files using charmc
compile independent classes, creating object files
compile main class, also creating an object file
link the two together, using charmc�CHARMDIR=/path / to /charm/ roo tCHARMC=$ (CHARMDIR) / b in /charmc $ (OPTS)
d e f a u l t : a l l
a l l : h e l l o
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 32 / 60
FLINT
Languages Studied Charm++
Basics XI
h e l l o : main . o h e l l o .$ (CHARMC) −l anguage charm++ −o h e l l o main . o
↘ h e l l o . o
main . o : main .C main . h main . d e c l . h main . d e f . h↘ h e l l o . d e c l . h
$ (CHARMC) −o main . o main .C
main . d e c l . h main . d e f . h : main . c i$ (CHARMC) main . c i
h e l l o . o : h e l l o .C h e l l o . h h e l l o . d e c l . h h e l l o . d e f . h↘main . d e c l . h
$ (CHARMC) −o h e l l o . o h e l l o .C
h e l l o . d e c l . h h e l l o . d e f . h : h e l l o . c i$ (CHARMC) h e l l o . c i
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 33 / 60
FLINT
Languages Studied Charm++
Basics XII
c l e a n :rm −f main . d e c l . h main . d e f . h main . orm −f h e l l o . d e c l . h h e l l o . d e f . h h e l l o . orm −f h e l l o charmrun �
Nodefile
This is similar to how MPI works, where you must create a nodelist tospecify which computers are identified as an element, and also the numberof processing elements in each node. The list wraps, making it possible tohave multiple processes over one single computer.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 34 / 60
FLINT
Languages Studied Charm++
Basics XIII
�group main ++s h e l l s sh
hos t member1hos t member2hos t member3hos t member4hos t member5hos t member6hos t member7hos t member8
group h a l fho s t member2 ++s h e l l s shhos t member3 ++s h e l l r s h
group l o c a l ++s h e l l s shhos t l o c a l h o s t �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 35 / 60
FLINT
Languages Studied Charm++
Basics XIV
Running
To run, all you have to do is prefix the application with the charmrunexecutable, the program name, number of processing elements, andwhatever other options you wish to have.Something like, ./charmrun ./hello +p4 ++verbose
This will run the hello program over 4 elements, and display processingdata verbosely to the screen.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 36 / 60
FLINT
Languages Studied Erlang
1 Purpose
2 Languages StudiedC and C++Charm++ErlangJoCaml
3 Benchmark
4 Bibliography
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 37 / 60
FLINT
Languages Studied Erlang
Background
Very stable platform
Light weight process spawning
Primitives built into the language for parallel programming
Immutable (unchanging) variable assignments (purely functional).
Conditional processing is performed by “guards”.
[3]
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 38 / 60
FLINT
Languages Studied Erlang
Supported Paradigms
Message Passing
Uses built in syntax to communicate between processesAny data can be passed between processes, including other processes.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 39 / 60
FLINT
Languages Studied Erlang
Example: Factorial I
Listing 6: Erlang factorial function�−module ( f a c t o r i a l ) .−export ( [ f a c t o r i a l /1 ] ) .
f a c t o r i a l ( 0 ) −>1 ;
f a c t o r i a l (N) −>N∗ f a c t o r i a l (N−1) . �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 40 / 60
FLINT
Languages Studied Erlang
Example: Hello World I
Listing 7: Erlang Hello World�−module ( h e l l o ) .−export ( [ s t a r t /0 ] ) .
s t a r t ( ) −>spawn ( fun ( ) −> l oop ( ) end ) .
l oop ( ) −>r e ce i v e
h e l l o −>i o : fo rmat ( ” He l l o World ! ˜n” ) ,l oop ( ) ;
goodbye −>ok
end . �Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 41 / 60
FLINT
Languages Studied Erlang
Basics I
REPL
Erlang is VM and language specification, thus all programs are run withinthe VM unit. This is accessed though erl, which starts aRead-Evaluate-Print-Loop (REPL) environment. All commands, includingcompilation, are done in this environment.
Modules and Exports
Erlang uses a modular design philosophy, which is similar tonamespacing in C++. Thus, every file is a module, and every modulebegins the file.
Exports are similar to public functions in the C++ class model. onlythe functions listed in export can be accessed outside of the module.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 42 / 60
FLINT
Languages Studied Erlang
Basics II
Pattern Matching
Erlang uses pattern matching for it’s definitions. As seen with the earlierfactorial program, a function definitions can have actual values placedinside them. This allows the compiler to identify which function definitionto call.
Spawning
Spawning is really simple in Erlang. Simply call the spawn/3 functionwhich takes three arguments, and returns a process ID to associate thespawn with.
Module name (?MODULE is the current module name)
function name
a list of arguments
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 43 / 60
FLINT
Languages Studied Erlang
Basics III
Message Passing
Message passing is built into the language, using the ! operator and thereceive, end block identifiers.
Sending as message uses an infix operator (!) where the right handside is the message (any data) to transfer and the left side is theprocess to transfer it to.
Receiving a message uses a case-like structure statement, where ablock is defined and a series of matching arguments are listed withassociated execution functions.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 44 / 60
FLINT
Languages Studied Erlang
Basics IV
compiling
Start the erlang shell
Call the c/1 function, passing the module name
Your module has been imported into the shell for execution�−module ( bas i c spawn ) .−export ( [ s t a r t /1 , s t a r t p r o c /2 ] ) .
s t a r t (Num) −>s t a r t p r o c (Num, s e l f ( ) ) .
s t a r t p r o c (0 , Pid ) −>Pid ! ok ;
s t a r t p r o c (Num, Pid ) −>NPid = spawn (?MODULE, s t a r t p r o c , [Num−1, Pid ] ) ,NPid ! ok ,
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 45 / 60
FLINT
Languages Studied Erlang
Basics V
r e ce i v e ok −> ok end . �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 46 / 60
FLINT
Languages Studied JoCaml
1 Purpose
2 Languages StudiedC and C++Charm++ErlangJoCaml
3 Benchmark
4 Bibliography
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 47 / 60
FLINT
Languages Studied JoCaml
Background
Extension off of OCaml
Very strongly typed
Uses pattern matching, and stream processing
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 48 / 60
FLINT
Languages Studied JoCaml
Supported Paradigms
Shared Memory
Uses Join Calculus to implement threads [4]Mutex locking is unneeded for thread “channels”
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 49 / 60
FLINT
Languages Studied JoCaml
Pi Calculus
Pi Calculus (π-calculus) is a mathematical construct which provides theframework for basic process calculation. It defines a minimalistic languagebased off of BNF grammar to discuss processes based off of:
Concurrency Written P | Q, where P and Q are individual processesand the pipe operator means parallel execution.
Communication By way of input prefixing (function waiting for inputfollowed by process it belongs execute) or by outfixprefixing (function sending data to process).
Replication Written as !P, which essentially means copy.
Creation Written (νx)P, stating the creation of x inside of P.
Nil Written as 0 identifies a process’s complete halt.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 50 / 60
FLINT
Languages Studied JoCaml
Join Calculus
Join Calculus is a revised form of Pi-Calculus with a focus on locality andmobility, and provide a method to model asynchronous communication.
Added Pattern Matching capabilities and several utility systems thatwere not provided in pi calculus, most notably
Tests and EquivalenciesTracing
Extended the pi calculus specification from 20 pages to 80 pages
Further information can be found in Fournet and Gonthier’s paper onJoin Calculus. [4]
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 51 / 60
FLINT
Languages Studied JoCaml
Example: Hello World I
Listing 8: OCaml Hello World�l e t = p r i n t e n d l i n e ” He l l o World” ; ; �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 52 / 60
FLINT
Languages Studied JoCaml
Example: Factorial I
Listing 9: OCaml Factorial�l e t rec f a c t o r i a l n =
i f n = 1 then 1e l s e f a c t o r i a l ( n−1) ∗ n ; ; �
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 53 / 60
FLINT
Benchmark
1 Purpose
2 Languages StudiedC and C++Charm++ErlangJoCaml
3 Benchmark
4 Bibliography
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 54 / 60
FLINT
Benchmark
What is a Benchmark?
Definition
A Benchmark is a standard defined by a context of “good” and “bad”.
Provide a numerical or graphical representation of performance.
Professional benchmarks are often packaged as a “suite” of tools.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 55 / 60
FLINT
Benchmark
Challenges of Parallel Process Benchmarking
Keeping track of every node and process.
An acceptable way to sum each process execution.
Dealing with overhead v.s. micro benchmarking
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 56 / 60
FLINT
Benchmark
The Real Challenges of Benchmarking
I was unable to take benchmarks down, due to unforeseen MPI issues.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 57 / 60
FLINT
Bibliography
Sources I
Charm++ programming tutorial, April 2009.http://charm.cs.uiuc.edu/tutorial.
Brooks, Jr., F. P.The Mythical Man Month, 1995 ed.Addison-Wesley, Crawfordsville, Indiania, 2009.
Cesarini, F., and Thompson, S.Erlang Programming, 1st ed.O’Reilly, June 2009.
Fournet, C., and Gonther, G.The join calculus: a language for distributed mobile programming.Tech. rep., Microsoft Research and INRIA Rocquencourt, 2001.http://research.microsoft.com/en-us/um/people/fournet/
papers/join-tutorial.pdf.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 58 / 60
FLINT
Bibliography
Sources II
Jordan, H. F., and Alaghband, G.Fundamentals of Parallel Processing, 1st ed.Prentice Hall, 2003.
Kale, L., and Krishnan, S.Charm++: Parallel Programming with Message-Driven Objects.University of Illinois at Urbana-Champaign, 1996.http://charm.cs.uiuc.edu.
Mandel, L., and Maranget, L.The JoCaml Language Release 3.11.Institut National de Recherche en Informatique et en Automatique,2007.http://jocaml.inria.fr.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 59 / 60
FLINT
Bibliography
Sources III
R Development Core Team.R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria, 2008.http://www.R-project.org.
Wall, L.Perl, the first postmodern computer language, March 1999.http://www.perl.com/pub/a/1999/03/pm.html.
Knuuti (UM Flint) Implementations of Parallel Processing WI ’10 60 / 60