integrated circuits, performance,...

38
Integrated Circuits, Performance, Power Alexander Nelson February 17, 2020 University of Arkansas - Department of Computer Science and Computer Engineering

Upload: others

Post on 05-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Integrated Circuits, Performance, Power

Alexander Nelson

February 17, 2020

University of Arkansas - Department of Computer Science and Computer Engineering

Page 2: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Integrated Circuits

Page 3: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Technologies Over Time

1

Page 4: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Transistors and Integrated Circuits

Transistors – On/Off Switch controlled by electric signal

Integrated Circuit – Many transistors in a single chip

VLSI/ULSI – Containing thousands/millions/billions of transistors

Intel 8080 (1974) – 6,000 Transistors

AMD Ryzen 9 3900X (2019) – 9.89B Transistors

45 years = 16,483,333 times more transistors

2

Page 5: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Semiconductors

Semiconductor – Solid substance with conductivity between

insulator and conductor

Silicon – Natural semiconductor

Can be chemically modified to be:

• Conductor

• Insulator

• Areas that can conduct or insulate as a switch

3

Page 6: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Manufacturing Process

Yield – Proportion of working dies per wafer

4

Page 7: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Intel Core i7 Wafer

300mm wafer, 280 chips,

32nm technology

Each chip is 20.7 x 10.5nm

5

Page 8: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Integrated Circuit Cost

Nonlinear relation to area and defect rate

• Wafer cost and area are fixed

• Defect rate determined by manufacturing process

• Die area determined by architecture & circuit design

6

Page 9: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Performance

Page 10: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

How do you define CPU performance?

6

Page 11: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Example

Which airplane has the best performance? 7

Page 12: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Response Time vs. Throughput

Response Time – How long to finish a task

Throughput – Total work done per unit time

• e.g. tasks/transaction per hour

How are these two metrics affected by:

• Replacing processor with faster version?

• Adding more processors?

8

Page 13: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Relative Performance

If Performance defined as:

Performance = 1ExecutionTime

Then:PerformanceXPerformanceY

= Execution TimeYExecution TimeX

= n

Can be phrased as “Processor X is n times faster than Processor Y”

9

Page 14: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Relative Performance

If Performance defined as:

Performance = 1ExecutionTime

Then:PerformanceXPerformanceY

= Execution TimeYExecution TimeX

= n

Can be phrased as “Processor X is n times faster than Processor Y”

Example: Assume a program runs in:

• 10s on Processor A

• 15s on Processor B

Then: Execution TimeYExecution TimeX

= 1510 = 1.5

So, A is 1.5 times faster than B

10

Page 15: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Execution Time

How do you measure execution time?

Elapsed Realtime – Total response time including all aspects

• Processing, I/O, OS overhead, idle time

Determines system performance

CPU Time – Time spent processing a given job

• Discounts I/O time, other jobs’ shares

Comprises user CPU time & system CPU time

Different programs affected differently by CPU & system

performance

11

Page 16: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPU Clocking

Nearly all CPU governed by clock

Clock Period – Duration of a clock cycle (in seconds)

Clock Frequency – # of clock cycles per second (in hertz)

12

Page 17: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPU Time

CPU Time can be defined as:

CPUTime = CPUClockCycles×ClockCycleTime = CPU Clock CyclesClock Rate

How to improve performance?

• Reduce number of clock cycles per operation or per program

• Increase clock rate

Hardware designer may need to trade off clock rate with cycle

count

13

Page 18: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPU Time Example

Example:

Computer A: 2GHz Clock Frequency results in 10s CPU Time

Design Computer B such that:

• Aiming for 6s CPU Time

• May increase clock frequency, but causes 1.2X clock cycles

14

Page 19: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPU Time Example

Example:

Computer A: 2GHz Clock Frequency results in 10s CPU Time

Design Computer B such that:

• Aiming for 6s CPU Time

• May increase clock frequency, but causes 1.2X clock cycles

Clock RateB = Clock CyclesBCPU TimeB

= 1.2×Clock CyclesA6s

ClockCyclesA = CPUTimeA×ClockRateA = 10s×2GHz = 20×109

Clock RateB = 1.2×20×109

6s = 24×109

6s = 4GHz

15

Page 20: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Instruction Counts and CPI

Clock Cycles = Instruction Count × Cycles Per Instruction

CPU Time = Instruction Count × CPI × Clock Cycle Time

CPU Time = Instruction Count×CPIClock Rate

Instruction Count – Number of instructions for a particular

program

Depends on:

• Program

• ISA

• Compiler

Average Cycles Per Instruction – Determined by ISA/CPU

Hardware

Different instructions may have different CPI

Average CPI affected by % of instruction classes in program 16

Page 21: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPI Example

Computer A – Cycle Time = 250ps, CPI = 2.0

Computer B – Cycle Time = 500ps, CPI = 1.2

Same ISA

Which is faster? By how much?

17

Page 22: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPI Example

A is faster by 1.2 times relative performance

18

Page 23: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPI in Detail

Different instruction classes take different # of cycles

Clock Cycles =∑n

i=1(CPIi × Instruction Counti )

Weighted Average CPI:

CPI = Clock CyclesInstruction Count =

∑ni=1(CPIi × Instruction Counti

Instruction Count )

The sum is the relative frequency of each class of instruction

19

Page 24: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

CPI Example

20

Page 25: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Performance Summary

Performance Depends on:

• Algorithm – Affects instruction count, possibly CPI

• Programming language – Affects instruction count & CPI

• Compiler – Affects instruction count & CPI

• ISA – Affects instruction count, CPI, & Time per cycle

CPU Time = InstructionsProgram × Clock Cycles

Instruction ×Seconds

ClockCycle

21

Page 26: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Power

Page 27: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Power Trends

Intel Core i5-9400 9th Gen (2019) – 14nm, 2.90-4.1 GHz, 65W

22

Page 28: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Why isn’t clock frequency continuing to

increase?

22

Page 29: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Battery Life Isn’t Keeping Up

23

Page 30: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Dynamic Power Consumption

CMOS Transistors – complementary metal oxide semiconductor

Best performance per watt since 1976

Gradually being replaced by FinFET technology (can go <20nm)

Primary energy consumption is dynamic energy

Energy ∝ 12 × Capacitive Load × Voltage2 × Frequency Switched

Reducing voltage from 5V→1V allowed increase of 1000x clock

frequency with only 30x gain in power consumption

24

Page 31: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Reducing Power

Suppose a new CPU has:

• 85% capacitive load of old CPU

• 15% voltage and 15% frequency reduction

Then: PnewPold

= Cold×0.85×(Vold×0.85)2×Fold×0.85Cold×V 2

old×Fold= 0.854 = 0.52

The new CPU uses 52% of the power of the old CPU

However:

• Can’t reduce voltage further

• Can’t remove more heat

How else can we improve performance?

25

Page 32: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Uniprocessor Performance

Since 2003, constrained by power, instruction-level parallelism,

memory latency26

Page 33: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Multiprocessors

Multicore Microprocessors – More than one processor per chip

Requires explicitly parallel programming

• Compare with instruction level parallelism

• Hardware executes multiple instructions at once

• Hidden from the programmer!

• Hard to do

• Programming for performance

• Load balancing

• Optimizing Communication and synchronization

27

Page 34: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Benchmarking Processors

SPEC – System Performance Evaluation Cooperative – Funded by

computer vendors for standard set of benchmarks

SPECINTC2006 benchmarks on 2.66 GHz Intel Core i7 920

28

Page 35: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

SPEC Power Benchmark

Power consumption of a server at different workload levels

• Performance: ssj ops/sec

• Power: Watts (Joules/sec)

Overall ssj ops per Watt = (∑10

i=0 ssj opsi )÷ (∑10

i=0 poweri )

29

Page 36: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Power SPEC

SPECpower ssj2008 for Xeon X5650

30

Page 37: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Pitfall: Amdahl’s Law

Improving one aspect does not guarantee a proportional

improvement in overall performance

Amdahl’s Law:

Timproved = Taffectedimprovement factor + Tunaffected

Example: Multiply accounts for 80/100 seconds of execution

How much improvement in multiply performance to get 5x

improvement?

31

Page 38: Integrated Circuits, Performance, Powercsce.uark.edu/.../lecture3-processors-performance-power.pdf · 2020-02-17 · Transistors and Integrated Circuits Transistors { On/O Switch

Pitfall: Amdahl’s Law

Improving one aspect does not guarantee a proportional

improvement in overall performance

Amdahl’s Law:

Timproved = Taffectedimprovement factor + Tunaffected

Example: Multiply accounts for 80/100 seconds of execution

How much improvement in multiply performance to get 5x

improvement?

20 = 80n + 20 – Can’t be done!

Corollary: Make the common case fast!

32