performance evaluation of parallel processing. why performance?

32
Performance Evaluation of Parallel Processing

Upload: sabina-stevenson

Post on 28-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Evaluation of Parallel Processing. Why Performance?

Performance Evaluation ofParallel Processing

Page 2: Performance Evaluation of Parallel Processing. Why Performance?

Why Performance?

Page 3: Performance Evaluation of Parallel Processing. Why Performance?

Models of SpeedupSpeedup

Scaled Speedup◦Parallel processing gain over

sequential processing, where problem size scales up with computing power (having sufficient workload/parallelism)

TimeExecutionParallelTimeExecutionorUniprocesspS

Performance Evaluation of Parallel Processing

Page 4: Performance Evaluation of Parallel Processing. Why Performance?

Speedup

Ts =time for the best serial algorithm

Tp=time for parallel algorithm using p processors

p

sp T

TS

Page 5: Performance Evaluation of Parallel Processing. Why Performance?

Example

Processor 1

time

100

time

1 2 3 4

25 25 25 25 time

1 2 3 4

35 35 35 35

(a) (b) (c)

ationparallelizperfect

,0.425

100pS

10 iscost synch but

balancing loadperfect

,85.235

100pS

Page 6: Performance Evaluation of Parallel Processing. Why Performance?

Example (cont.)

time

1 2 3 4

30 20 40 10 time

1 2 3 4

50 50 50 50

(d) (e)

imbalance loadbut

synch no

,5.240

100pS

costsynch and

imbalance load

,0.250

100pS

Page 7: Performance Evaluation of Parallel Processing. Why Performance?

What Is “Good” Speedup?

Linear speedup:

Superlinear speedup

Sub-linear speedup:

pS p

pS p

pS p

Page 8: Performance Evaluation of Parallel Processing. Why Performance?

Speedup

p

speedup

Page 9: Performance Evaluation of Parallel Processing. Why Performance?

Ideal Speedup in Multiprocessor System• Linear

Linear speedup─ the execution time of program on an n-processor system would be l/nth of the execution time on a one-processor system

Page 10: Performance Evaluation of Parallel Processing. Why Performance?

Limitations

• Interprocessor communication• Synchronization• Load Balancing

Page 11: Performance Evaluation of Parallel Processing. Why Performance?

Limitations of Interprocessor communicationWhenever one processor generates (computes)

avalue that is needed by the fraction of theprogram running on another processor, thatvalue must be communicated to the processorsthat need it, which takes time

On a uniprocessor system, the entire program

runs on one processor, so there is no time lost to

interprocessor communication

Page 12: Performance Evaluation of Parallel Processing. Why Performance?

Limitations of Synchronization It is often necessary to

synchronize the processors to ensure that they have all completed some phase of the program before any processor begins working on the next phase of the program

Page 13: Performance Evaluation of Parallel Processing. Why Performance?

Load balancing

In many parallel applications, difficult to

divide the program across the processors

• When each processor working the same

amount of time not possible, some of the

processors complete their tasks early and

are then idle waiting for the others to finish

Page 14: Performance Evaluation of Parallel Processing. Why Performance?

Superlinear speedups

Achieving speedup of greater than n on nprocessor systems

• Each of the processors in an n-processor

multiprocessor to complete its fraction of the

program in less than l/nth of the program’s

execution time on a uniprocessor

Page 15: Performance Evaluation of Parallel Processing. Why Performance?

Factors That Limit Speedup

● Software Overhead  Even with a completely equivalent algorithm, software overhead arises in the concurrent implementation ● Load Balancing Speedup is generally limited by the speed of the slowest node. So an

important consideration is to ensure that each node performs the same amount of work

 ● Communication Overhead Assuming that communication and calculation cannot be overlapped, then any time spent communicating the data between processorsdirectly degrades the speedup

Page 16: Performance Evaluation of Parallel Processing. Why Performance?

CS546

Lecture 5 Page 16

Degradations of Parallel Processing

Unbalanced Workload

Communication Delay

Overhead Increases with the Ensemble Size

Page 17: Performance Evaluation of Parallel Processing. Why Performance?

Degradations of Distributed Computing

Unbalanced Computing Power and Workload

Shared Computing and Communication Resource

Uncertainty, Heterogeneity, and Overhead Increases with the Ensemble Size

Page 18: Performance Evaluation of Parallel Processing. Why Performance?

Causes of Superlinear Speedup

Cache size increasedOverhead reducedLatency hiddenRandomized algorithmsMathematical inefficiency of the

serial algorithmHigher memory access cost in

sequential processing• X.H. Sun, and J. Zhu, "Performance Considerations of Shared Virtual Memory Machines," IEEE Trans. on Parallel and Distributed Systems, Nov. 1995

Page 19: Performance Evaluation of Parallel Processing. Why Performance?

Efficiency

 ● Speed up does not measure how efficiently

the processors are being used● Is it worth using 100 processors to get a

speedup of 2? ● Efficiency is defined as the ratio of the

speedup and the number of processors required to achieve it

 ● Efficiency is given by E(P,N) = S(P, N) / P

Page 20: Performance Evaluation of Parallel Processing. Why Performance?

If the best known serial algorithm takes 8 seconds i.e. Ts = 8, while a parallel algorithm takes 2 seconds using 5 processors, then

 

Page 21: Performance Evaluation of Parallel Processing. Why Performance?

Say we have a program containing 100 operations each of which take 1 time unit.If 80 operations can be done in parallel i.e. P = 80 and 20 operations must be done sequentially i.e. S = 20then using 80 processors

find speedup

Page 22: Performance Evaluation of Parallel Processing. Why Performance?

Speedup metricsThree performance models based on

three speedup metrics are commonly used.

Amdahl’s law -- Fixed problem size Gustafson’s law -- Fixed time speedup Sun-Ni’s law -- Memory Bounding

speedupThree approaches to scalability analysis

are based on• Maintaining a constant efficiency,• A constant speed, and• A constant utilization

Page 23: Performance Evaluation of Parallel Processing. Why Performance?

Amdahl’s LawThe performance improvement that can be

gained by a parallel implementation is limited by the fraction of time parallelism can actually be used in an application

Let = fraction of program (algorithm) that is serial and cannot be parallelized. For instance:◦ Loop initialization◦ Reading/writing to a single disk◦ Procedure call overhead

Parallel run time is given by

CS546

Lecture 5 Page 23

sp T)p

α(αT

1

Page 24: Performance Evaluation of Parallel Processing. Why Performance?

Amdahl’s Law

Amdahl’s law gives a limit on speedup in terms of

CS546

Lecture 5 Page 24

pp

TT

TS

p

TTT

ss

sp

ssp

11

)1(

)1(

Page 25: Performance Evaluation of Parallel Processing. Why Performance?

• Fixed-Size Speedup (Amdahl Law, 67)

CS546

Lecture 5 Page 25

Wp

W1

Wp Wp WpWp

W1 W1 W1 W1

1 2 3 4 5

Number of Processors (p)

Amountof

Work

Tp

T1

Tp Tp Tp

T1T1

Tp

T1 T1

1 2 3 4 5

Number of Processors (p)

Elapsed

Time

Page 26: Performance Evaluation of Parallel Processing. Why Performance?

Consider the effect of the serial fraction F on the speedup produced for N = 10 and N = 1024.

Page 27: Performance Evaluation of Parallel Processing. Why Performance?
Page 28: Performance Evaluation of Parallel Processing. Why Performance?

Comments on Amdahl’s Law The Amdahl’s fraction in practice depends on the

problem size n and the number of processors p An effective parallel algorithm has:

For such a case, even if one fixes p, we can get linear speedups by choosing a suitable large problem size

Scalable speedup Practically, the problem size that we can run for a

particular problem is limited by the time and memory of the parallel computer

CS546

Lecture 5 Page 28

npn as 0),(

nppnp

p

T

TS

p

sp as

),()1(1

Page 29: Performance Evaluation of Parallel Processing. Why Performance?

Gustafson law

Gustafson defined two “more relevant” notions of

speedup» Scaled speedup» Fixed-time speedup» And renamed Amdahl’s version

as “fixed-size” speedup

Page 30: Performance Evaluation of Parallel Processing. Why Performance?

Gustafson’s Law

Page 31: Performance Evaluation of Parallel Processing. Why Performance?
Page 32: Performance Evaluation of Parallel Processing. Why Performance?

Gustafson’s Law : Scaling for Higher Accuracy?

The problem size (workload) is fixed and cannot scale to

match the available computing power as the machine size

increases.

Thus, Amdahl’s law leads to a diminishing return

when a larger system is employed to solve a small problem.

The sequential bottleneck in Amdahl’s law can be alleviated

by removing the restriction of a fixed problem size.

Gustafson’s proposed a fixed time concept that achieves an

improved speedup by scaling problem size with the increase

in machine size