Chapter 4Assessing and Understanding
Performance
Bo Cheng
Which One Is Good?
Airplane Passenge
rs Range (m
i) Speed (mp
h)Boeing 737-100 101 630 598
Boeing 747 470 4150 610
BAC/Sud Concorde 132 4000 1350
Douglas DC-8-50 146 8720 544
Depends on measures of performance• Cruising speed• Longest range• Largest capacity
Measuring Performance
Elapsed Time, wall-clock time or response time– Total time to complete a task
Including disk and memory accesses, I/O , etc.– a useful number, but often not good for comparison purposes
CPU (execution) time – Doesn't count I/O or time spent running other programs– can be broken up into system CPU time, and user CPU time
CPU time = user CPU time +system CPU time Our focus: user CPU time
– time spent executing the lines of code that are "in" our program
CPU Performance Metrics
Response time: the time between the start and the completion of a task (in time units)
Throughput: the total amount of work done in a given time (in number of tasks per unit of time)
Performance
Problem: Machine A runs a
program in 10 sec. Machine B runs the
same program in 15 sec.
How much faster is A than B ?
ntimeexecution
timeexecution
ePerformanc
ePerformanc
timeexecutionePerformanc
x
y
y
x
xx
_
_
_
1
5.110
15
A is 1.5 times faster than B
Clock Rate Measurement
Name Example Measurement
Millisecond 1 msec (ms) 1.E-03
Microsecond 1 usec (us) 1.E-06
Nanosecond 1 nsec (ns) 1.E-09
Picosecond 1 psec (ps) 1.E-12
Femtosecond 1 fsec (fs) 1.E-15
10 nsec clock cycle => 100 MHz clock rate1 nsec clock cycle => 1 GHz clock rate500 psec clock cycle => 2 GHz clock rate200 psec clock cycle => 5 GHz clock rate
• Clock cycle: The time for one clock period running at a constant rate • Clock rate is given in Hz (=1/sec)
• clock_cycle_time = 1/clock_rate (in sec)
MHz
One MHz represents one million cycles per second.
The speed of microprocessors, called the clock speed, is measured in megahertz. – For example, a microprocessor that runs at 200
MHz executes 200 million cycles per second.
One GHz represents 1 billion cycles per second.
http://www.webopedia.com/TERM/M/MHz.html
CPU Time or CPU Execution Time
The actual time the CPU spends computing for a specific task
This time accounts for the time CPU is computing the given program, including operating system routines executed on the program’s behave, and it does not include the time waiting for I/O and running other programs.
Performance of processor/memory = 1 / CPU_time
CPU Execution Time Formula
E = CPU Execution time for a program
N = Number of CPU clock cycles for a program
T = clock cycle Time
R = clock Rate
R
NTNE *
Example
410
N
Computer A4 GHz
Job
10 seconds
Computer BX GHz
Job
6 seconds
R
N*2.16
R = 8 GHz
Clock cycles Per Instruction (CPI)
CIN *N = Number of CPU clock cycles for a programI = total Instructions for a programC = CPI
• The average number of clock cycles per instruction for a program or program fragment
The Big Picture
R
CI
R
NE
cycleClock
Seconds
nsInstructio
cyclesClock
ogram
nsInstructio
ogram
SecondsTime
TCITNER
NTNE
*
_*
_*
PrPr
***
*
• Instruction count depends on the architecture, but not on the exact implementation• Average CPI depends on design details and on the mix of types of instructions executed in an application
Understanding Program Performance
Instruction Count
CPIClock Rate
Algorithm XPossi
bly
Programming Language
X X
Compiler X X ISA X X X
Using Performance Equation
Clock Cycle Time
CPI
Computer A
250 ps 2
Computer B
500 ps 1.2Which computer is faster for this program, and by how much?
2.1500
600
600500*2.1*
500250*2*
I
I
CPU
CPU
ePerformanc
ePerformanc
IICPU
IICPU
A
B
B
A
B
A
Computing CPI
• Done by looking at the different types of instructions and using their individual cycle counts
)*(_1
n
iii CCPICycleClock
Ci: The count of the number of instructions of class i executedCPIi: The average number of cycles per instruction for that instruction class ln: is the number of instruction classes
Example
CPI for this
instruction class
A B C
CPI 1 2 3
CodeSequen
ce
CPI for this instruction
class
A B C
1 2 1 2
2 4 1 12
5
10
10)3*2()2*1()1*2(
1
1
CPI
CC
5.16
9
9)3*1()2*1()1*4(
2
2
CPI
CC
Workload
A set of programs used for evaluating a computer or a system
Benchmarks: programs specifically chosen to measure performance.
SPEC 2000 benchmarks (12 integer, 14 floating-point programs).
Performance results given by benchmarks may not be correct if the system (or the compiler of the system) is optimized for the benchmarks
Benchmark
Programs specifically chosen to measure performance Best determined by running a real application
– use programs typical of expected workload– e.g., compilers/editors, scientific applications, graphics...
Small benchmarks– nice for architects and designers
SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real program and inputs
Simplest Approach
Computer A
Computer B
Program 1 (sec)
1 10
Program 2 (sec)
1000 100
Total (sec) 1001 1101.9
110
1001
_
_
B
A
A
B
TimeExecution
TimeExecution
ePerformanc
ePerformanc
Evaluating Performance
Different classes and applications of computer require different types of benchmarks
Desktop
CPU Performance
SPEC CPU benchmark to measure CPU performance and response time
focusing on a specific task: DVD playback or graphic performance of games
Server
depend on the nature of intended application
Throughput
requirements on response time to individual events: database query and web page request
SPECweb99
Embedded
Computing
EEMBC
Reproducibility: list everything another experimenter need to duplicate the results
SPEC CPU2000 Benchmark
SPEC: CINT2000 and CFP2000
Relative Performance in Three Different Modes
Relative Energy Efficiency Comparison
Amdahl’s Law
2080
sec20
)80100(80
_
n
nafterET
Execution Time After Improvement = ( Execution Time Affected/ Amount of Improvement) + Execution Time Unaffected
Example:Suppose a program runs in 100 seconds on a machine, with multiply operation responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 5 times faster?"
Principle: Make the common case fast
MIPS (million instructions per second)
610*_
_
TimeExecution
CountnInstructioMIPS
280010*5.2
10*)115(
sec5.210*4
10*10
10*1010*)3*12*11*5(
6
9
1
9
9
1
991
MIPS
E
CC
Instruction class
CPI
A 1
B 2
C 3
Code
from
Instruction counts
(in billion)
A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
320010*75.3
10*)1110(
sec75.310*4
10*15
10*1510*)3*12*11*10(
6
9
2
9
9
1
992
MIPS
E
CC
Always trust execution time metric!
http://www.faculty.uaf.edu/ffdr/EE443/Handouts/Set5_Sp05_3pp.pdf
A Complete Example (I)
http://www.faculty.uaf.edu/ffdr/EE443/Handouts/Set5_Sp05_3pp.pdf
A Complete Example (II)
A Complete Example (III)
Three problems with using MIPS
MIPS specifies the instruction execution rate but does not take into account the capabilities of the instructions.
– We cannot compare computers with different instruction sets using MIPS, since the instruction counts will certainly differ.
MIPS varies between programs on the same computer;
– a computer cannot have a single MIPS rating for all programs.
MIPS can vary inversely with performance.