13. introduction to parallel programming fabrizio perin prof. o. nierstrasz
TRANSCRIPT
© Oscar Nierstrasz 2
Sources
> Section 4.4 of Concurrent Programming in Java (Doug Lea, Prentice Hall PTR, November 1999)— Covers parallel decomposition in greater detail.
> Section 6-7-8 of Java concurrency in practice (Brian Goetz, et al., Addison Wesley Professional May 09, 2006)
> Doug Lea's concurrency-interest website: — Download the fork-join framework as part of the jsr166y package — read the paper on its design.— http://gee.cs.oswego.edu/dl/concurrency-interest/index.html
Parallelism
© Oscar Nierstrasz 3
Roadmap
> Concurrent programming and parallelism> Why we should practice parallel programming?
(Amdahl's laws about speedup)> Steps to create parallel programs> Java library for concurrent and parallel programming> Examples (find the max)> Example (merge sort parallel)
Parallelism
© Oscar Nierstrasz 4
Roadmap
> Concurrent programming and parallelism> Why we should practice parallel programming?
(Amdahl's laws about speedup)> Steps to create parallel programs> Java library for concurrent and parallel programming> Examples (find the max)> Example (merge sort parallel)
Parallelism
Parallelism
5
Concurrent programming and parallelism
> Concurrent computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel.
> Parallel computing is a form of computation in which many calculations are carried out simultaneously.
© Oscar Nierstrasz
Wikipedia
© Oscar Nierstrasz 6
Roadmap
> Concurrent programming and parallelism> Why we should practice parallel programming?
(Amdahl's laws about speedup)> Steps to create parallel programs> Java library for concurrent and parallel programming> Examples (find the max)> Example (merge sort parallel)
Parallelism
Why we should practice parallel programming?
© Oscar Nierstrasz 7
Parallelism
Because I want to keep my super cool multi-core computer busy!
© Oscar Nierstrasz 8
Parallelism
Speedup = old running time / new running time
Speedup = 140/ 65 = 2.15
(parallel version is 2.15 times faster)
25
20
25 25 25
20
20
100
20
140 65
Why we should practice parallel programming?
Why we should practice parallel programming?
© Oscar Nierstrasz 9
Parallelism
Speedup = old running time / new running time
Speedup = 140/ 65 = 2.15
(parallel version is 2.15 times faster)
25
20
25 25 25
20
20
100
20
140 65
Why we should practice parallel programming?
© Oscar Nierstrasz 10
Parallelism
Speedup = old running time / new running time
Speedup = 140/ 65 = 2.15
(parallel version is 2.15 times faster)
25
20
25 25 25
20
20
100
20
140 65
Why we should practice parallel programming?
© Oscar Nierstrasz 11
Parallelism
25
20
25 25 25
20
20
100
20
140 65
€
Speed up =1
(1− p) +p
n
€
p = part of parallel code
n = number of CPUs
Amdahl’s law
Why we should practice parallel programming?
© Oscar Nierstrasz 12
Parallelism
Amdahl’s law
25
20
25 25 25
20
20
100
20
140 65
€
=1
(1 − p) +p
n
=1
(1 − 71%) +71%
4
= 2.15
€
speedup =oldrunningtime
newrunningtime=
€
p = part of parallel code =100⋅100%
140= 71%
n = number of CPUs = 4
Why we should practice parallel programming?
© Oscar Nierstrasz 13
Parallelism
Amdahl’s law
€
p = part of parallel code =100⋅100%
140= 71%
n = number of CPUs = 4
25
20
25 25 25
20
20
100
20
140 65
The maximum speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.
Why we should practice parallel programming?
© Oscar Nierstrasz 14
Parallelism
Think about the problem!!!
Parallelism
15
Why we should practice parallel programming?
> Don’t try to force a non-parallel problem to be parallel
> Identify which are the program chunks that can provide the best ratio speedup/effort
© Oscar Nierstrasz
Why we should practice parallel programming?
© Oscar Nierstrasz 16
Parallelism
B 100A 50
B 100
A 50
e.g.
B x2
A 10
B 50
> Don’t try to force a non-parallel problem to be parallel
> Identify which are the program chunks that can provide the best ratio speedup/effort
A x5
© Oscar Nierstrasz 17
Roadmap
> Concurrent programming and parallelism> Why we should practice parallel programming?
(Amdahl's laws about speedup)> Steps to create parallel programs> Java library for concurrent and parallel programming> Examples (find the max)> Example (merge sort parallel)
Parallelism
Kind of parallelisms (Problem decomposition)
© Rodric Rabbah, IBM 20
Parallelism
> Data parallelism: The same task run on different data in parallel
Can divide parts of the data between different tasks and perform the tasks in parallel
No dependencies among the tasks that cause their results to be ordered or merged
Kind of parallelisms (Problem decomposition)
© Rodric Rabbah, IBM 21
Parallelism
> Task parallelism: Different tasks running on the same data
Several functions on
the same data (e.g. average, max, min etc..)
Tasks are independent so they can run in parallel
Kind of parallelisms (Problem decomposition)
© Rodric Rabbah, IBM 22
Parallelism
> Hybrid data/task parallelism: A parallel pipeline of tasks, each of which might be data parallel Each task can run in
parallel E.g. Unix pipes
© Oscar Nierstrasz 23
Roadmap
> Concurrent programming and parallelism> Why we should practice parallel programming?
(Amdahl's laws about speedup)> Steps to create parallel programs> Java library for concurrent and parallel
programming> Examples (find the max)> Example (merge sort parallel)
Parallelism
24
Java library for concurrent and parallel programming
> This package includes classes and extensible frameworks to support concurrent and parallel programming
Parallelism
© Oscar Nierstrasz
© Oscar Nierstrasz 25
Executor
> Executor is an interface used to define custom thread-like systems.
> Executor contains methods to execute tasks and manage with them.
> Tasks may execute in a newly created thread, an existing task-execution thread, or the thread calling execute(), and may execute sequentially or concurrently.
Parallelism
public interface Executor {
Void execute(Runnable command);}
© Oscar Nierstrasz 26
Web server without Executor
Parallelism
class ThreadPerTaskWebServer{public static void main(String[] args) throws IOException{
ServerSocket socket = new ServerSocket(80);while(true){
final Socket connection = socket.accept();Runnable task = new Runnable() {
public void run(){handleRequest(connection);
}};new Thread(task).start();
}}
}
© Oscar Nierstrasz 27
Web server with Executor
Parallelism
class TaskExecutionWebServer{private static final int NTHREADS = 100;private static final Executor exec
= Executor.newFixedThreadPool(NTHREADS);
public static void main(String[] args) throws IOException{ServerSocket socket = new ServerSocket(80);while(true){
final Socket connection = socket.accept();Runnable task = new Runnable() {
public void run(){handleRequest(connection);
}};exec.execute(task);
}}
}
© Oscar Nierstrasz 28
Thread Pool
> A thread pool manages a set of worker threads.> The threads into the pool have a simple life cycle:
> Request the next task from the queue of tasks> Execute> And wait for another task
> Advantages from using a thread pool:> Reduce the costs of thread creation and teardown> Increases responsiveness> By properly tuning the pool you always have the correct number
of threads (you don’t run out of memory and all your CPUs are busy)
Parallelism
© Oscar Nierstrasz 29
Thread Pool
> newFixedThreadPool: A fixed-size thread pool. A new thread is created for each task to execute up to the maximum pool size. Attempts to keep the pool size constant (threads that die for any reason are replaced by new threads).
> newCachedThreadPool: More flexible pool that removes idle threads when the size of the pool exceeds the demand for processing, and adds new threads when demand increases, but places no bounds on the size of the pool.
Parallelism
© Oscar Nierstrasz 30
Thread Pool
> newSingleThreadExecutor: A single-threaded executor creates a single worker thread to process tasks, replacing it if it dies unexpectedly. Tasks are guaranteed to be processed sequentially according to the order imposed by the task queue (FIFO, LIFO, priority order).
> newScheduledThreadPool: A fixed-size thread pool that supports delayed and periodic task execution, similar to Timer.
Parallelism
© Oscar Nierstrasz 31
Roadmap
> Concurrent programming and parallelism> Why we should practice parallel programming?
(Amdahl's laws about speedup)> Steps to create parallel programs> Java library for concurrent and parallel programming> Examples (find the max)> Example (merge sort parallel)
Parallelism
© Oscar Nierstrasz 32
Select Max in parallel
…public class SelectMaxProblem {…public int solveSequentially(){
int max = Integer.MIN_VALUE;for (int i = start; i<end; i++){
int n = numbers[i];if(n > max)
max = n;}return max;
}…
Parallelism
© Oscar Nierstrasz 33
Select Max in parallel
…
import jsr166y.ForkJoinTask;import jsr166y.RecursiveAction;
public class MaxWithFJ extends RecursiveAction {
…
@Overrideprotected void compute() {
if(problem.getSize() < threshold){ result = problem.solveSequentially(); System.out.println(Thread.currentThread()+
" is solving sequentially on: ” +problem.getSize());
}else{ int midpoint = problem.getSize()/2; MaxWithFJ left =
new MaxWithFJ(problem.subproblem(0, midpoint), threshold); MaxWithFJ right =
new MaxWithFJ(problem.subproblem(midpoint+1, problem.getSize()), threshold); invokeAll(left, right); System.out.println("solving in parallel on: " + ForkJoinTask.getPool()); result = Math.max(left.getResult(), right.getResult());}
…
Parallelism
© Oscar Nierstrasz 34
Select Max in parallel
public class MaxWithFJTests {
private final SelectMaxProblem problem; private final int threshold; private final int nThreads; private final int[] number = new int[500000];
public MaxWithFJTests(){…}
@Testpublic void parallelTest(){
MaxWithFJ mfj = new MaxWithFJ(problem, threshold); ForkJoinPool fjPool = new ForkJoinPool(nThreads);
fjPool.invoke(mfj); int result = mfj.getResult();
…assertEquals(result, max);}}
Parallelism
© Oscar Nierstrasz 35
Roadmap
> Concurrent programming and parallelism> Why we should practice parallel programming?
(Amdahl's laws about speedup)> Steps to create parallel programs> Java library for concurrent and parallel programming> Examples (find the max)> Example (merge sort parallel)
Parallelism
Parallelism
36
Merge sort
> Divide et Impera algorithm:> Divide: split your problem into sub-problems that are smaller
parts of the original problem> Impera: solve the sub-problems recursively. (If the sub-
problem is small enough than it is solve in a straightforward manner).
> Combine: the solutions to the sub-problems into the solution for the original problem.
© Oscar Nierstrasz
Parallelism
37
Merge sort
> Merge sort:> Divide the n-element sequence to be sorted into two sub-
sequences of n/2 elements each> Impera: Sort the the sub-sequence recursively using merge
sort> Combine: Merge the two sorted sub-sequences to produce the
sorted answer
© Oscar Nierstrasz
© Oscar Nierstrasz 38
Merge sort in parallel
. . .public class MergeSort extends RecursiveAction {. . .private void merge(MergeSort left, MergeSort right) {
int i=0, leftPos=0, rightPos=0, leftSize = left.size(), rightSize = right.size();
while (leftPos < leftSize && rightPos < rightSize)
result[i++] = (left.result[leftPos] <= right.result[rightPos]) ? left.result[leftPos++] : right.result[rightPos++]; while (leftPos < leftSize) result[i++] = left.result[leftPos++]; while (rightPos < rightSize) result[i++] = right.result[rightPos++]; }
. . .
Parallelism
© Oscar Nierstrasz 39
Merge sort in parallel
. . .public class MergeSort extends RecursiveAction {. . .
public int size() {return endPos-startPos;}
protected void compute() { if (size() < SEQUENTIAL_THRESHOLD) { System.arraycopy(numbers, startPos, result, 0, size()); Arrays.sort(result, 0, size()); } else { int midpoint = size() / 2; MergeSort left =
new MergeSort(numbers, startPos, startPos+midpoint); MergeSort right =
new MergeSort(numbers, startPos+midpoint, endPos); invokeAll(left, right); merge(left, right); } } public int[] getResult() {return result;}}
Parallelism
© Oscar Nierstrasz 40
Merge sort in parallel
package mergeSortParallel;import java.util.Arrays;import jsr166y.RecursiveAction;
public class MergeSort extends RecursiveAction { private static final int SEQUENTIAL_THRESHOLD = 50000;…
private void merge(MergeSort left, MergeSort right) { int i=0, leftPos=0, rightPos=0, leftSize = left.size(), rightSize = right.size(); while (leftPos < leftSize && rightPos < rightSize) result[i++] = (left.result[leftPos] <= right.result[rightPos]) ? left.result[leftPos++] : right.result[rightPos++]; while (leftPos < leftSize) result[i++] = left.result[leftPos++]; while (rightPos < rightSize) result[i++] = right.result[rightPos++]; }
public int size() {return endPos-startPos;}
protected void compute() { if (size() < SEQUENTIAL_THRESHOLD) { System.arraycopy(numbers, startPos, result, 0, size()); Arrays.sort(result, 0, size()); } else { int midpoint = size() / 2; MergeSort left = new MergeSort(numbers, startPos, startPos+midpoint); MergeSort right = new MergeSort(numbers, startPos+midpoint, endPos); invokeAll(left, right); merge(left, right); } } public int[] getResult() {return result;}}
Parallelism
© Oscar Nierstrasz 42
WordCount example
Parallelism
50 500 5000 50000 5000000
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1
2
3
4
5
6
7
8
9
10
© Oscar Nierstrasz 43
WordCount example
Parallelism
50 500 5000 50000 500000400
1
2
3
4
5
6
7
8
9
10
© Oscar Nierstrasz 44
WordCount example
Parallelism
1 2 3 4 5 6 7 8 9 100
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
50
500
5000
50000
500000
© Oscar Nierstrasz 45
WordCount Example
[ wordcount ]> java WordCount bigdict.txt Total words = 241104Total time = 225 ms[ wordcount ]> java WordCount bigdict.txt Total words = 241104Total time = 226 ms
[ wordcount ]> java WordCountParallel bigdict.txt 8 50000Total words = 241104Total time = 148 ms[ wordcount ]> java WordCountParallel bigdict.txt 8 50000Total words = 241104Total time = 148 ms
[ wordcount ]> java WordCountParallelAtomicInt bigdict.txt 8 50000Total words = 241104Total time = 133 ms[ wordcount ]> java WordCountParallelAtomicInt bigdict.txt 8 50000Total words = 241104Total time = 132 ms
Parallelism
© Oscar Nierstrasz
Architectural Styles for Concurrency
46
What you should know!
> What is the difference between Concurrent Computing and Parallel Computing?
> Why you execute code in parallel?> Which kind of problem decomposition you can apply?> Which are the main functionality of the
java.util.concurrent package of Java?
License
© Oscar Nierstrasz
ESE — Introduction
Attribution-ShareAlike 3.0 UnportedYou are free:
to Share — to copy, distribute and transmit the workto Remix — to adapt the work
Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor
(but not in any way that suggests that they endorse you or your use of the work).Share Alike. If you alter, transform, or build upon this work, you may distribute the
resulting work only under the same, similar or a compatible license.For any reuse or distribution, you must make clear to others the license terms of this work.
The best way to do this is with a link to this web page.Any of the above conditions can be waived if you get permission from the copyright holder.Nothing in this license impairs or restricts the author's moral rights.
http://creativecommons.org/licenses/by-sa/3.0/