parallel programming. introduction idea has been around since 1960’s –pseudo parallel systems on...
TRANSCRIPT
Parallel Programming
Introduction
• Idea has been around since 1960’s– pseudo parallel systems on multiprogram-
able computers
• True parallelism – Many processors connected to run in
concert• Multiprocessor system• Distributed system
– stand-alone systems connected– More complex with high-speed networks
Programming Languages
• Used to express algorithms to solve problems presented by parallel processing systems
• Used to write OSs that implement these solutions
• Used to harness capabilities of multiple processors efficiently
• Used to implement and express communication across networks
Two kinds of parallelism
• Existing in underlying hardware• As expressed in programming language
– May not result in actual parallel processing– Could be implemented with pseudo
parallelism– Concurrent programming – expresses only
potential for parallelism
Some Basics
• Process – An instance of a program or program part
that has been scheduled for independent execution
• Heavy-weight process– full-fledged independent entity with all the
memory and other resources that are ordinarily allocated by OS
• Light-weight process or thread – shares resources with program it came
from
Primary requirements for organization
• Must be a way for processors to synchronize their activities
– 1st processor input and sorts data– 2nd processor waits to perform
computations on sorted data
• Must be a way for processors to communicate data among themselves
– 2nd processor needs data
Architectures
• SIMD (single-instruction, multiple-data)– One processor is controller– All processors execute same instructions on
respective registers or data sets– Multiprocessing– Synchronous (all processors operate at same
speed)– Implicit solution to synchronization problem
• MIMD (multiple-instruction, multiple-data)– All processors act independently– Multiprocessor or distributed processor systems– Asynchronous (synchronization critical problem)
OS requirements for Parallelism
• Means of creating and destroying processes
• Means of managing the number of processors used by processes
• Mechanism for ensuring mutual exclusion on shared-memory systems
• Mechanism for creating and maintaining communication channels between processors on distributed-memory systems
Language requirements
• Machine independence• Adhere to language design principles
• Some languages use shared-memory model and provide facilities for mutual exclusion through a library
• Some assume distributed-memory model and provide communication facilities
• A few include both
Common mechanisms
• Threads
• Semaphores
• Monitors
• Message passing
2 common sample problems
• Bounded buffer problem – similar to producer-consumer problem
• Parallel matrix multiplication – N3 algorithm– Assign a process to compute each
element, each process on a separate processor N steps
Without explicit language facilities
• One approach is not to be explicit– Possible in some functional, logical, and
OO languages– Certain inherent parallelism implicit
• Language translators use optimization techniques to make use automatically of OS utilities to assign different processors to different parts of program
• Suboptimal
Another alternative without explicit language facilities
• Translator offers compiler options to allow explicit indicating of areas where parallelism is called for.
• Most effective in nested loops• Example: Fortran
integer a(100, 100), b(100, 100), c(100,100)
integer i, j, k, numprocs, err
numprocs = 10
C code to read in a and b goes here
err = m_set_procs (numprocs)
C$doacross share (a, b, c), local (j, k)
do 10 i = 1, 100
do 10 j = 1, 100
c(i,j) = 0
do 10 k = 1, 100
c(i, j) = c(i,j) + a(i, k) * b (k, j)
10 continue
call m_kill_procs
C code to write out c goes here
end
compiler directive
synchronizes the processes, all processeswait for entire loop tofinish; one processcontinues after loop
local – local to process
share – access by all processes
m_set_procs –sets thenumber of processes
3rd way with explicit constructs
• Provide a library of functions • This passes facilities provided by
OS directly to programmer• (This is the same as providing it in
language)• Example: C with library parallel.h
#include <parallel.h>
#define size 100
#define NUMPROCS 10
shared int a[SIZE][SIZE], b[SIZE][SIZE], c [SIZE] [SIZE]
void multiply (void)
{ int i, j, k;
for (i=m_get_myid(); i < SIZE; i += NUMPROCS)
for (j=0; j < SIZE; j++)
for (k=0; k < SIZE; k++)
c(i, j) += a(i, k) * b (k, j);
}
main ()
{ int err;
// code to read in a and b goes here
m_set_procs (NUMPROCS);
m_fork (multiply);
m_kill_procs ();
// C code to write out c goes here
return 0;
}
m_set_procs –creates the10 processes, all instances of multiply
4th final alternative
• Simply rely on OS• Example:
– pipes in Unix OS ls | grep “java”– runs ls and grep in parallel– output of ls is piped to grep
Language with explicit mechanism
• 2 basic ways to create new processes– SPMD (single program multiple data)
• split the current process into 2 or more that execute copies of the same program
– MPMD (multiple program multiple data)• a segment of code associated with each new
process• typical case fork-join model, in which a process
creates several child processes, each with its own code (a fork), and then waits for the children to complete their execution (a join)
• last example similar, but m_kill_procs takes place of join
Granularity
• Size of code assignable to separate processes– fine-grained: statement-level parallelism– medium-grained: procedure-level
parallelism– large-grained: program-level parallelism
• Can be an issue in program efficiency– small-grained: overhead – large-grained: may not exploit all
opportunities for parallelism
Thread
• fine-grained or medium-grained without overhead of full-blown process creation
Issues
• Does parent suspend execution while child processes are executing, or does it continue to execute alongside them?
• What memory, if any, does a parent share with its children or the children share among themselves?
Answers in Last example
• parent process suspended execution
• indicate explicitly global variables shared by all processes
Process Termination
• Simplest case– a process executes its code to
completion then ceases to exist
• Complex case– process may need to continue
executing until a certain condition is met and then terminate
Statement-Level Parallelism (Ada)
parbegin
S1;
S2;
…
Sn;
parend;
Statement-Level Parallelism (Fortran95)
FORALL (I = 1:100, J=1:100)
C(I,J) = 0;
DO 10 K = 1,100
C(I,J) = C(I,J) + A(I,k) * B(K,j)
10 CONTINUE
END FORALL
Procedure-Level Parallelism (Ada)
x = newprocess(p);
…
…
killprocess(x);
• where p is declared procedure and x is a process designator
• similar to tasks in Ada
Program-Level Parallelism (Unix)
• fork creates a process that is an• exact copy of calling process
if (fork ( ) == 0) { /*..child executes this part */}else { /* ..parent executes this part */}
• a returned 0-value indicates process is the child
Java threads
• built into Java• Thread class part of java.lang
package• reserved word synchronize
– establish mutual exclusion
• create an instance of Thread object• define its run method that will
execute when thread starts
Java threads
• 2 ways (I’ll show you second more versatile way)
• Define a class that implements Runnable interface (define run method)
• Then pass an object of this class to the Thread constructor
• Note: Every Java program is already executing inside a thread whose run method is main.
Java Thread Example
class MyRunner implements Runnable
{ public void run()
{ … }
}
MyRunner m = new MyRunner ();
Thread t = new Thread (m);
t.start (); //t will now execute the run
//method
Destroying threads
• let each thread run to completion• wait for other threads to finish
t.start (); //do some other workt.join () //wait for t to finish
• interrupt itt.start (); //do some other workt.interrupt() //tell t we are waiting…t.join () //wait for t to finish
Mutual exclusion
class Queue{ … synchronized public Object dequeue () { if (empty()) throw … } synchronized public Object enqueue (Object
obj) { … }…}
Mutual exclusion
class Remover implements Runnable
{ public Remover (Queue q) { ..}
public void run( ) { …q.dequeue() …}
}
class Insert implements Runnable
{ public Insert (Queue q) {…}
public void run () { …q.enqueue (…) …}
}
Mutual exclusion
Queue myqueue = new Queue(..);
…
Remover r = new Remover (q);
Inserter i = new Insert (q);
Thread t1 = new Thread (r);
Thread t2 = new Thread (i);
t1.start();
t2.start();
Manually stalling a thread and then reawakening it
class Queue{ … synchronized public Object dequeue () { try { while (empty()) wait(); } catch (InterruptedException e) //reset interrupt { … } } synchronized public Object enqueue (Object obj) { … notifyAll(); }…}