predicting parallel performance

21
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10

Upload: job

Post on 22-Feb-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Predicting Parallel Performance. Introduction to Parallel Programming – Part 10. Review & Objectives. Previously: Design and implement of a task decomposition solution At the end of this part you should be able to: Define speedup and efficiency - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Predicting Parallel Performance

INTEL CONFIDENTIAL

Predicting Parallel PerformanceIntroduction to Parallel Programming – Part 10

Page 2: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

2

Review & Objectives

Previously: Design and implement of a task decomposition solution

At the end of this part you should be able to:Define speedup and efficiencyUse Amdahl’s Law to predict maximum speedup

Page 3: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

3

Speedup

Speedup is the ratio between sequential execution time and parallel execution time

For example, if the sequential program executes in 6 seconds and the parallel program executes in

2 seconds, the speedup is 3X

Speedup curveslook like this

Cores

Spee

dup

Page 4: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Efficiency

EfficiencyA measure of core utilizationSpeedup divided by the number of cores

ExampleProgram achieves speedup of 3 on 4 coresEfficiency is 3 / 4 = 75%

4

Efficie

ncy

Cores

Efficiency curveslook like this

Page 5: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Speedup Example

Painting a picket fence– 30 minutes of preparation (serial)– One minute to paint a single picket– 30 minutes of cleanup (serial)

Thus, 300 pickets takes 360 minutes (serial time)

5

Speedup and Efficiency

Page 6: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Computing Speedup

6

Number of painters

Time Speedup

1 30 + 300 + 30 = 360 1.0X

2 30 + 150 + 30 = 210 1.7X

10 30 + 30 + 30 = 90 4.0X

100 30 + 3 + 30 = 63 5.7X

Infinite 30 + 0 + 30 = 60 6.0X

Speedup and Efficiency

Page 7: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

7

Efficiency ExampleNumber of painters

Time Speedup Efficiency

1 360 1.0X 100%

2 30 + 150 + 30 = 210 1.7X 85%

10 30 + 30 + 30 = 90 4.0X 40%

100 30 + 3 + 30 = 63 5.7X 5.7%

Infinite 30 + 0 + 30 = 60 6.0X very low

Speedup and Efficiency

Page 8: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Idea Behind Amdahl’s Law

8

Cores

Exec

utio

n Ti

me

s

s

ss s

1-s (1-s )/2 (1-s )/3 (1-s )/5(1-s )/4

Portion of computationthat will be performed

sequentially

Portion of computationthat will be executed

in parallel

Page 9: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

9

Derivation of Amdahl’s Law

Speedup is ratio of execution time on 1 core to execution time on p cores

Execution time on 1 core is s + (1-s)Execution time on p cores is at least s + (1-s)/p

psspssss

/)1(1

/)1()1(

Page 10: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Amdahl’s Law Is Too Optimistic

Amdahl’s Law ignores parallel processing overheadExamples of this overhead include time spent

creating and terminating threadsParallel processing overhead is usually an increasing

function of the number of cores (threads)

10

Page 11: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Graph with Parallel Overhead Added

11

Cores

Exec

utio

n Ti

me Parallel overhead

increases with# of cores

Page 12: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Other Optimistic Assumptions

Amdahl’s Law assumes that the computation divides evenly among the cores

In reality, the amount of work does not divide evenly among the cores

Core waiting time is another form of overhead

12

Task started

Task completed

Working time

Waiting time

Page 13: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Graph with Workload Imbalance Added

13

Cores

Exec

utio

n Ti

me

Time lostdue to

workloadimbalance

Page 14: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Illustration of the Amdahl Effect

14

n = 100,000

n = 10,000

n = 1,000

Cores

Spee

dup

Linear speedup

Page 15: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Using Amdahl’s Law

Program executes in 5 secondsProfile reveals 80% of time spent in function alpha,

which we can execute in parallelWhat would be maximum speedup on 2 cores?

New execution time ≥ 5 sec / 1.67 = 3 seconds

15

67.16.01

2/)2.01(2.01

Page 16: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Superlinear Speedup

According to our general speedup formula, the maximum speedup a program can achieve on p cores is p

Superlinear speedup is the situation where speedup is greater than the number of cores used

It means the computational rate of the cores is faster when the parallel program is executing

Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher

16

Page 17: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

17

References

Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

Page 18: Predicting Parallel Performance
Page 19: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

19

More General Speedup Formula

(n,p) Speedup for problem of size n on p cores(n) Time spent in sequential portion of code for

problem of size n(n) Time spent in parallelizable portion of code

for problem of size n(n,p) Parallel overhead

),(/)()()()(),(

pnpnnnnpn

Page 20: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Amdahl’s Law: Maximum Speedup

20

),(/)()()()(),(

pnpnnnnpn

This term is set to 0

Assumes parallelwork divides perfectlyamong available cores

Page 21: Predicting Parallel Performance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

The Amdahl Effect

21

As n theseterms dominate

Speedup is an increasing function of problem size

),(/)()()()(),(

pnpnnnnpn