integrated circuits, performance,...
TRANSCRIPT
Integrated Circuits, Performance, Power
Alexander Nelson
February 17, 2020
University of Arkansas - Department of Computer Science and Computer Engineering
Integrated Circuits
Technologies Over Time
1
Transistors and Integrated Circuits
Transistors – On/Off Switch controlled by electric signal
Integrated Circuit – Many transistors in a single chip
VLSI/ULSI – Containing thousands/millions/billions of transistors
Intel 8080 (1974) – 6,000 Transistors
AMD Ryzen 9 3900X (2019) – 9.89B Transistors
45 years = 16,483,333 times more transistors
2
Semiconductors
Semiconductor – Solid substance with conductivity between
insulator and conductor
Silicon – Natural semiconductor
Can be chemically modified to be:
• Conductor
• Insulator
• Areas that can conduct or insulate as a switch
3
Manufacturing Process
Yield – Proportion of working dies per wafer
4
Intel Core i7 Wafer
300mm wafer, 280 chips,
32nm technology
Each chip is 20.7 x 10.5nm
5
Integrated Circuit Cost
Nonlinear relation to area and defect rate
• Wafer cost and area are fixed
• Defect rate determined by manufacturing process
• Die area determined by architecture & circuit design
6
Performance
How do you define CPU performance?
6
Example
Which airplane has the best performance? 7
Response Time vs. Throughput
Response Time – How long to finish a task
Throughput – Total work done per unit time
• e.g. tasks/transaction per hour
How are these two metrics affected by:
• Replacing processor with faster version?
• Adding more processors?
8
Relative Performance
If Performance defined as:
Performance = 1ExecutionTime
Then:PerformanceXPerformanceY
= Execution TimeYExecution TimeX
= n
Can be phrased as “Processor X is n times faster than Processor Y”
9
Relative Performance
If Performance defined as:
Performance = 1ExecutionTime
Then:PerformanceXPerformanceY
= Execution TimeYExecution TimeX
= n
Can be phrased as “Processor X is n times faster than Processor Y”
Example: Assume a program runs in:
• 10s on Processor A
• 15s on Processor B
Then: Execution TimeYExecution TimeX
= 1510 = 1.5
So, A is 1.5 times faster than B
10
Execution Time
How do you measure execution time?
Elapsed Realtime – Total response time including all aspects
• Processing, I/O, OS overhead, idle time
Determines system performance
CPU Time – Time spent processing a given job
• Discounts I/O time, other jobs’ shares
Comprises user CPU time & system CPU time
Different programs affected differently by CPU & system
performance
11
CPU Clocking
Nearly all CPU governed by clock
Clock Period – Duration of a clock cycle (in seconds)
Clock Frequency – # of clock cycles per second (in hertz)
12
CPU Time
CPU Time can be defined as:
CPUTime = CPUClockCycles×ClockCycleTime = CPU Clock CyclesClock Rate
How to improve performance?
• Reduce number of clock cycles per operation or per program
• Increase clock rate
Hardware designer may need to trade off clock rate with cycle
count
13
CPU Time Example
Example:
Computer A: 2GHz Clock Frequency results in 10s CPU Time
Design Computer B such that:
• Aiming for 6s CPU Time
• May increase clock frequency, but causes 1.2X clock cycles
14
CPU Time Example
Example:
Computer A: 2GHz Clock Frequency results in 10s CPU Time
Design Computer B such that:
• Aiming for 6s CPU Time
• May increase clock frequency, but causes 1.2X clock cycles
Clock RateB = Clock CyclesBCPU TimeB
= 1.2×Clock CyclesA6s
ClockCyclesA = CPUTimeA×ClockRateA = 10s×2GHz = 20×109
Clock RateB = 1.2×20×109
6s = 24×109
6s = 4GHz
15
Instruction Counts and CPI
Clock Cycles = Instruction Count × Cycles Per Instruction
CPU Time = Instruction Count × CPI × Clock Cycle Time
CPU Time = Instruction Count×CPIClock Rate
Instruction Count – Number of instructions for a particular
program
Depends on:
• Program
• ISA
• Compiler
Average Cycles Per Instruction – Determined by ISA/CPU
Hardware
Different instructions may have different CPI
Average CPI affected by % of instruction classes in program 16
CPI Example
Computer A – Cycle Time = 250ps, CPI = 2.0
Computer B – Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster? By how much?
17
CPI Example
A is faster by 1.2 times relative performance
18
CPI in Detail
Different instruction classes take different # of cycles
Clock Cycles =∑n
i=1(CPIi × Instruction Counti )
Weighted Average CPI:
CPI = Clock CyclesInstruction Count =
∑ni=1(CPIi × Instruction Counti
Instruction Count )
The sum is the relative frequency of each class of instruction
19
CPI Example
20
Performance Summary
Performance Depends on:
• Algorithm – Affects instruction count, possibly CPI
• Programming language – Affects instruction count & CPI
• Compiler – Affects instruction count & CPI
• ISA – Affects instruction count, CPI, & Time per cycle
CPU Time = InstructionsProgram × Clock Cycles
Instruction ×Seconds
ClockCycle
21
Power
Power Trends
Intel Core i5-9400 9th Gen (2019) – 14nm, 2.90-4.1 GHz, 65W
22
Why isn’t clock frequency continuing to
increase?
22
Battery Life Isn’t Keeping Up
23
Dynamic Power Consumption
CMOS Transistors – complementary metal oxide semiconductor
Best performance per watt since 1976
Gradually being replaced by FinFET technology (can go <20nm)
Primary energy consumption is dynamic energy
Energy ∝ 12 × Capacitive Load × Voltage2 × Frequency Switched
Reducing voltage from 5V→1V allowed increase of 1000x clock
frequency with only 30x gain in power consumption
24
Reducing Power
Suppose a new CPU has:
• 85% capacitive load of old CPU
• 15% voltage and 15% frequency reduction
Then: PnewPold
= Cold×0.85×(Vold×0.85)2×Fold×0.85Cold×V 2
old×Fold= 0.854 = 0.52
The new CPU uses 52% of the power of the old CPU
However:
• Can’t reduce voltage further
• Can’t remove more heat
How else can we improve performance?
25
Uniprocessor Performance
Since 2003, constrained by power, instruction-level parallelism,
memory latency26
Multiprocessors
Multicore Microprocessors – More than one processor per chip
Requires explicitly parallel programming
• Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer!
• Hard to do
• Programming for performance
• Load balancing
• Optimizing Communication and synchronization
27
Benchmarking Processors
SPEC – System Performance Evaluation Cooperative – Funded by
computer vendors for standard set of benchmarks
SPECINTC2006 benchmarks on 2.66 GHz Intel Core i7 920
28
SPEC Power Benchmark
Power consumption of a server at different workload levels
• Performance: ssj ops/sec
• Power: Watts (Joules/sec)
Overall ssj ops per Watt = (∑10
i=0 ssj opsi )÷ (∑10
i=0 poweri )
29
Power SPEC
SPECpower ssj2008 for Xeon X5650
30
Pitfall: Amdahl’s Law
Improving one aspect does not guarantee a proportional
improvement in overall performance
Amdahl’s Law:
Timproved = Taffectedimprovement factor + Tunaffected
Example: Multiply accounts for 80/100 seconds of execution
How much improvement in multiply performance to get 5x
improvement?
31
Pitfall: Amdahl’s Law
Improving one aspect does not guarantee a proportional
improvement in overall performance
Amdahl’s Law:
Timproved = Taffectedimprovement factor + Tunaffected
Example: Multiply accounts for 80/100 seconds of execution
How much improvement in multiply performance to get 5x
improvement?
20 = 80n + 20 – Can’t be done!
Corollary: Make the common case fast!
32