art of multiprocessor programming1 futures, scheduling, and work distribution companion slides for...

113
Art of Multiprocessor Progr amming 1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640 University of Pennsylvania

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

1

Futures, Scheduling, and Work Distribution

Companion slides forThe Art of Multiprocessor Programming

by Maurice Herlihy & Nir Shavit

Modified by Rajeev Alur forCIS 640 University of Pennsylvania

Page 2: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

22

How to write Parallel Apps?

• How to– split a program into parallel parts– In an effective way– Thread management

Page 3: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

33

Matrix Multiplication

BAC

Page 4: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

44

Matrix Multiplication

cij = k=0N-1 aki * bjk

Page 5: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5

Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}

Page 6: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6

Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}

a thread

Page 7: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7

Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}

Which matrix entry to compute

Page 8: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8

Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}

Actual computation

Page 9: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

9

Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}

Page 10: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

10

Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}

Create n x n threads

Page 11: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

11

Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}

Start them

Page 12: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

12

Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}

Wait for them to finish

Start them

Page 13: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

13

Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}

Wait for them to finish

Start them

What’s wrong with this picture?

Page 14: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

14

Thread Overhead

• Threads Require resources– Memory for stacks– Setup, teardown

• Scheduler overhead• Worse for short-lived threads

Page 15: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

15

Thread Pools

• More sensible to keep a pool of long-lived threads

• Threads assigned short-lived tasks– Runs the task– Rejoins pool– Waits for next assignment

Page 16: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

16

Thread Pool = Abstraction

• Insulate programmer from platform– Big machine, big pool– And vice-versa

• Portable code– Runs well on any platform– No need to mix algorithm/platform

concerns

Page 17: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

17

ExecutorService Interface

• In java.util.concurrent– Task = Runnable object

• If no result value expected• Calls run() method.

– Task = Callable<T> object• If result value of type T expected• Calls T call() method.

Page 18: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

18

Future<T>Callable<T> task = …; …

Future<T> future = executor.submit(task);

T value = future.get();

Page 19: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

19

Future<T>Callable<T> task = …; …

Future<T> future = executor.submit(task);

T value = future.get();

Submitting a Callable<T> task returns a Future<T> object

Page 20: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

20

Future<T>Callable<T> task = …; …

Future<T> future = executor.submit(task);

T value = future.get();

The Future’s get() method blocks until the value is

available

Page 21: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

21

Future<?>Runnable task = …; …

Future<?> future = executor.submit(task);

future.get();

Page 22: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

2222

Future<?>Runnable task = …; …

Future<?> future = executor.submit(task);

future.get();

Submitting a Runnable task returns a Future<?> object

Page 23: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

2323

Future<?>Runnable task = …; …

Future<?> future = executor.submit(task);

future.get();

The Future’s get() method blocks until the computation is

complete

Page 24: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

2424

Matrix Addition

00 00 00 00 01 01

10 10 10 10 11 11

C C A B B A

C C A B A B

Page 25: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

2525

Matrix Addition

00 00 00 00 01 01

10 10 10 10 11 11

C C A B B A

C C A B A B

4 parallel additions

Page 26: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

2626

Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}

Page 27: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

27

Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}

Base case: add directly

Page 28: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

28

Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}

Constant-time operation

Page 29: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

29

Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}

Submit 4 tasks

Page 30: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

30

Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}

Let them finish

Page 31: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

31

Dependencies

• Matrix example is not typical• Tasks are independent

– Don’t need results of one task …– To complete another

• Often tasks are not independent

Page 32: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

3232

Fibonacci

1 if n = 0 or 1F(n)

F(n-1) + F(n-2) otherwise

• Note– potential parallelism– Dependencies

Page 33: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

3333

Disclaimer

• This Fibonacci implementation is– Egregiously inefficient

• So don’t deploy it!

– But illustrates our point• How to deal with dependencies

• Exercise:– Make this implementation efficient!

Page 34: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

34

class FibTask implements Callable<Integer> { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 2) { Future<Integer> left = exec.submit(new FibTask(arg-1)); Future<Integer> right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}}

Multithreaded Fibonacci

Page 35: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

35

class FibTask implements Callable<Integer> { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 2) { Future<Integer> left = exec.submit(new FibTask(arg-1)); Future<Integer> right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}}

Multithreaded Fibonacci

Parallel calls

Page 36: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

36

class FibTask implements Callable<Integer> { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 2) { Future<Integer> left = exec.submit(new FibTask(arg-1)); Future<Integer> right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}}

Multithreaded Fibonacci

Pick up & combine results

Page 37: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

3737

Dynamic Behavior

• Multithreaded program is– A directed acyclic graph (DAG)– That unfolds dynamically

• Each node is– A single unit of work

Page 38: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

3838

Fib DAGfib(4)

fib(3) fib(2)

submitget

fib(2) fib(1) fib(1)fib(1)

fib(1) fib(1)

Page 39: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

3939

Arrows Reflect Dependenciesfib(4)

fib(3) fib(2)

submitget

fib(2) fib(1) fib(1)fib(1)

fib(1) fib(1)

Page 40: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4040

How Parallel is That?

• Define work:– Total time on one processor

• Define critical-path length:– Longest dependency path– Can’t beat that!

Page 41: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4141

Fib Workfib(4)

fib(3) fib(2)

fib(2) fib(1) fib(1)fib(1)

fib(1) fib(1)

Page 42: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4242

Fib Work

work is 17

1 2 3

84 765 9

1410 131211 15

16 17

Page 43: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4343

Fib Critical Pathfib(4)

Page 44: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4444

Fib Critical Pathfib(4)

Critical path length is 8

1 8

2 7

3 64

5

Page 45: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4545

Notation Watch

• TP = time on P processors

• T1 = work (time on 1 processor)

• T∞ = critical path length (time on ∞ processors)

Page 46: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4646

Simple Bounds

• TP ≥ T1/P– In one step, can’t do more than P

work

• TP ≥ T∞

– Can’t beat infinite resources

Page 47: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4747

More Notation Watch

• Speedup on P processors– Ratio T1/TP

– How much faster with P processors

• Linear speedup– T1/TP = Θ(P)

• Max speedup (average parallelism)– T1/T∞

Page 48: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4848

Matrix Addition

00 00 00 00 01 01

10 10 10 10 11 11

C C A B B A

C C A B A B

Page 49: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

4949

Matrix Addition

00 00 00 00 01 01

10 10 10 10 11 11

C C A B B A

C C A B A B

4 parallel additions

Page 50: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5050

Addition

• Let AP(n) be running time – For n x n matrix– on P processors

• For example– A1(n) is work

– A∞(n) is critical path length

Page 51: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5151

Addition

• Work is

A1(n) = 4 A1(n/2) + Θ(1)

4 spawned additions

Partition, synch, etc

Page 52: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5252

Addition

• Work is

A1(n) = 4 A1(n/2) + Θ(1)

= Θ(n2)

Same as double-loop summation

Page 53: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5353

Addition

• Critical Path length is

A∞(n) = A∞(n/2) + Θ(1)

spawned additions in parallel

Partition, synch, etc

Page 54: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5454

Addition

• Critical Path length is

A∞(n) = A∞(n/2) + Θ(1)

= Θ(log n)

Page 55: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5555

Matrix Multiplication Redux

BAC

Page 56: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5656

Matrix Multiplication Redux

2221

1211

2221

1211

2221

1211

BB

BB

AA

AA

CC

CC

Page 57: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5757

First Phase …

2222122121221121

2212121121121111

2221

1211

BABABABA

BABABABA

CC

CC

8 multiplications

Page 58: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5858

Second Phase …

2222122121221121

2212121121121111

2221

1211

BABABABA

BABABABA

CC

CC

4 additions

Page 59: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

5959

Multiplication

• Work is

M1(n) = 8 M1(n/2) + A1(n)

8 parallel multiplications

Final addition

Page 60: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6060

Multiplication

• Work is

M1(n) = 8 M1(n/2) + Θ(n2)

= Θ(n3)

Same as serial triple-nested loop

Page 61: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6161

Multiplication

• Critical path length is

M∞(n) = M∞(n/2) + A∞(n)

Half-size parallel multiplications

Final addition

Page 62: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6262

Multiplication

• Critical path length is

M∞(n) = M∞(n/2) + A∞(n)

= M∞(n/2) + Θ(log n)

= Θ(log2 n)

Page 63: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6363

Parallelism

• M1(n)/ M∞(n) = Θ(n3/log2 n)

• To multiply two 1000 x 1000 matrices– 10003/102=107

• Much more than number of processors on any real machine

Page 64: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6464

Work Distributionzzz…

Page 65: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6565

Work Dealing

Yes!

Page 66: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6666

The Problem with Work Dealing

D’oh!

D’oh!

D’oh!

Page 67: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6767

Work Stealing No work…

zzzYes!

Page 68: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6868

Lock-Free Work Stealing

• Each thread has a pool of ready work

• Remove work without synchronizing

• If you run out of work, steal someone else’s

• Choose victim at random

Page 69: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

6969

Local Work Pools

Each work pool is a Double-Ended Queue

Page 70: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7070

Work DEQueue1

pushBottom

popBottom

work

1. Double-Ended Queue

Page 71: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7171

Obtain Work

•Obtain work•Run task until•Blocks or terminates

popBottom

Page 72: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7272

New Work

•Unblock node•Spawn node

pushBottom

Page 73: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7373

Whatcha Gonna do When the Well Runs Dry?

@&%$!!

empty

Page 74: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7474

Steal Work from OthersPick random thread’s DEQeueue

Page 75: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7575

Steal this Task!

popTop

Page 76: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7676

Task DEQueue

• Methods– pushBottom– popBottom– popTop

Never happen concurrently

Page 77: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7777

Task DEQueue

• Methods– pushBottom– popBottom– popTop

These most common –

make them fast(minimize use

of CAS)

Page 78: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7878

Ideal

• Wait-Free• Linearizable• Constant time

Page 79: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

7979

Compromise

• Method popTop may fail if– Concurrent popTop succeeds, or a – Concurrent popBottom takes last

work

Blame the victim!

Page 80: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8080

Dreaded ABA Problem

topCAS

Page 81: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8181

Dreaded ABA Problem

top

Page 82: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8282

Dreaded ABA Problem

top

Page 83: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8383

Dreaded ABA Problem

top

Page 84: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8484

Dreaded ABA Problem

top

Page 85: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8585

Dreaded ABA Problem

top

Page 86: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8686

Dreaded ABA Problem

top

Page 87: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8787

Dreaded ABA Problem

topCAS

Uh-Oh …

Yes!

Page 88: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8888

Fix for Dreaded ABA

stamp

top

bottom

Page 89: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

8989

Bounded DEQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] tasks; …}

Page 90: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

9090

Bounded DQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] tasks; …}

Index & Stamp (synchronized)

Page 91: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

9191

Bounded DEQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] deq; …}

Index of bottom taskno need to synchronizeDo need memory barrier

Page 92: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

92

Bounded DEQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] tasks; …}

Array holding tasks

Page 93: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

93

pushBottom()

public class BDEQueue { … void pushBottom(Runnable r){ tasks[bottom] = r; bottom++; } …}

Page 94: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

94

pushBottom()

public class BDEQueue { … void pushBottom(Runnable r){ tasks[bottom] = r; bottom++; } …} Bottom is the index to

store the new task in the array

Page 95: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

95

pushBottom()

public class BDEQueue { … void pushBottom(Runnable r){ tasks[bottom] = r; bottom++; } …}

Adjust the bottom index

stamp

top

bottom

Page 96: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

96

Steal Work

public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))

return r; return null; }

Page 97: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

97

Steal Work

public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))

return r; return null; } Read top (value & stamp)

Page 98: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

98

Steal Work

public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))

return r; return null; } Compute new value & stamp

Page 99: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

99

Steal Work

public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))

return r; return null; }

Quit if queue is empty

stamp

top

bottom

Page 100: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

100

Steal Work

public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))

return r; return null; } Try to steal the task

stamp

topCAS

bottom

Page 101: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

101

Steal Work

public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))

return r; return null; }

Give up if conflict occurs

Page 102: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

102

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}

Take Work

Page 103: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

103

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;} 103

Take Work

Make sure queue is non-empty

Page 104: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

104

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}

Take Work

Prepare to grab bottom task

Page 105: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

105105

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}

Take Work

Read top, & prepare new values

Page 106: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

106

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } return null;} 106

Take Work

If top & bottom 1 or more apart, no conflict

stamp

top

bottom

Page 107: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

107

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;} 107

Take Work

At most one item left

stamp

top

bottom

Page 108: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

108

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;} 108

Take WorkTry to steal last task.Always reset bottombecause the DEQueue will be empty even if unsuccessful (why?)

Page 109: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

109109

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}

Take Work

stamp

topCAS

bottom

I win CAS

Page 110: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

110110

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}

Take Work

stamp

topCAS

bottom

If I lose CASThief must have won…

Page 111: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

111111

Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}

Take Work

failed to get last task Must still reset top

Page 112: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

112112

Variations

• Stealing is expensive– Pay CAS– Only one thread taken

• What if– Randomly balance loads?

Page 113: Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy

Art of Multiprocessor Programming

113113

Work Stealing & Balancing

• Clean separation between app & scheduling layer

• Works well when number of processors fluctuates.

• Works on “black-box” operating systems