chapter 1 computer abstractions and technology.ppt
TRANSCRIPT
Chapter 1
Computer Abstractions Computer Abstractions and Technology
The Computer Revolution§1.1 Int
Progress in computer technology
roduction
Underpinned by Moore’s Law Makes novel applications feasible
n
Makes novel applications feasibleComputers in automobilesCell phonesHuman genome projectg jWorld Wide WebSearch EnginesSearch Engines
Computers are pervasive
Chapter 1 — Computer Abstractions and Technology — 2
Classes of ComputersDesktop computers
General purpose, variety of softwareSubject to cost/performance tradeoff
Server computersNetwork basedNetwork basedHigh capacity, performance, reliabilityRange from small servers to building sizedRange from small servers to building sized
Embedded computersHidden as components of systemsStringent power/performance/cost constraints
Chapter 1 — Computer Abstractions and Technology — 3
The Processor Market
Chapter 1 — Computer Abstractions and Technology — 4
What You Will LearnHow programs are translated into the machine language
And how the hardware executes themAnd how the hardware executes themThe hardware/software interfaceWhat determines program performance
And how it can be improvedpHow hardware designers improve performanceperformanceWhat is parallel processing
Chapter 1 — Computer Abstractions and Technology — 5
Understanding PerformanceAlgorithm
Determines number of operations executedProgramming language, compiler, architecture
Determine number of machine instructions executed per operation
Processor and memory systemDetermine how fast instructions are executed
I/O system (including OS)Determines how fast I/O operations are executedDetermines how fast I/O operations are executed
Chapter 1 — Computer Abstractions and Technology — 6
Below Your Program§1.2 B
e
Application software
elow Your
Written in high-level languageSystem software
r Program
Compiler: translates HLL code to machine code
m
Operating System: service codeHandling input/outputM i dManaging memory and storageScheduling tasks & sharing resources
HardwareHardwareProcessor, memory, I/O controllers
Chapter 1 — Computer Abstractions and Technology — 7
Levels of Program CodeHigh-level language
L l f b t ti lLevel of abstraction closer to problem domainProvides for productivityProvides for productivity and portability
Assembly languagey g gTextual representation of instructions
Hardware representationBinary digits (bits)Encoded instructions and data
Chapter 1 — Computer Abstractions and Technology — 8
Components of a Computer§1.3 U
n
Same components forll ki d f t
nder the CThe BIG Pictureall kinds of computer
Desktop, server,b dd d
Covers
embeddedInput/output includes
User-interface devicesDisplay, keyboard, mouse
Storage devicesHard disk, CD/DVD, flash
N t k d tNetwork adaptersFor communicating with other computers
Chapter 1 — Computer Abstractions and Technology — 9
other computers
Anatomy of a Computer
Output device
Network cable
Input device
Input device
Chapter 1 — Computer Abstractions and Technology — 10
Anatomy of a MouseOptical mouse
LED illuminates desktopS ll lSmall low-res cameraBasic image processor
L k fLooks for x, y movement
Buttons & wheelButtons & wheelSupersedes roller-ball mechanical mousemechanical mouse
Chapter 1 — Computer Abstractions and Technology — 11
Through the Looking GlassLCD screen: picture elements (pixels)
Mirrors content of frame buffer memory
Chapter 1 — Computer Abstractions and Technology — 12
Opening the Box
Chapter 1 — Computer Abstractions and Technology — 13
Processor Architecture
As programmers, we use the instruction setinstruction set architecture (ISA) as a useful abstraction to understand the
’ i lprocessor’s internal details.
Chapter 1 — Computer Abstractions and Technology — 14
How Do the Pieces Fit Together?
OperatingSystem
Application
Compiler
System
Instruction SetA hit t
Firmware
I/O systemInstr. Set Proc. ArchitectureMemory system
D t th & C t l
Digital DesignCircuit Design
Datapath & Control
Coordination of many levels of abstraction
Circuit Design
Coordination of many levels of abstractionUnder a rapidly changing set of forces
Chapter 1 — Computer Abstractions and Technology — 15Design, measurement, and evaluation
CISC vs. RISC
CISC emphasizes hardware complexity. RISC emphasizes compiler complexityRISC emphasizes compiler complexity
Chapter 1 — Computer Abstractions and Technology — 16
CISC
A microprogram is a small run-time interpreter that takesA microprogram is a small run time interpreter that takes the complex instruction and generates a sequence of simple instructions that can be executed by the hardware
Chapter 1 — Computer Abstractions and Technology — 17
hardware.
RISC
RISC systems use only simple SC sys e s use o y s p einstructions. RISC s stems ass me that the req iredRISC systems assume that the required operands are in the processor’s internal registers, not in the main memory.RISC designs on the other handRISC designs, on the other hand, eliminate the microprogram layer and use the hardware to directly executethe hardware to directly execute instructions.
Chapter 1 — Computer Abstractions and Technology — 18
1.1 The RISC design philosophy g yThe RISC philosophy is implemented with four major design rules:g
1. Instructions — RISC processors have a reduced number of instruction classes.
2. Pipelines — The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines.
3. Registers — RISC machines have a large general-purpose register set.
4. Load-store architecture — The processor operates on data held in registers. Separate load and store instructions transfer data between the register bank and external memorybetween the register bank and external memory.
Chapter 1 — Computer Abstractions and Technology — 19
(vonNeumann) Processor Organization
Control needs toCPU
1. input instructions from Memory2. issue signals to control the
information flow between the
CPU
Control
Memory Devices
Input
information flow between the Datapath components and to control what operations they
f
Datapath Output
perform3. control instruction sequencing
Fetch
DecodeExecDatapath needs to have thecomponents – the functional units and
( i fil ) d d i istorage (e.g., register file) needed to execute instructionsinterconnects - components connected so that the instructions can be accomplished and so that data can be loaded from and stored
Chapter 1 — Computer Abstractions and Technology — 20
pto Memory
Inside the Processor (CPU)Datapath: performs operations on dataControl: sequences datapath, memory, ...Cache memoryCache memory
Small fast SRAM memory for immediate access to data
Chapter 1 — Computer Abstractions and Technology — 21
Inside the ProcessorAMD Barcelona: 4 processor cores
Chapter 1 — Computer Abstractions and Technology — 22
AbstractionsThe BIG Picture
Abstraction helps us deal with complexityHide lower-level detailHide lower-level detail
Instruction set architecture (ISA)The hardware/software interface
Application binary interfaceApplication binary interfaceThe ISA plus system software interface
I l iImplementationThe details underlying and interface
Chapter 1 — Computer Abstractions and Technology — 23
y g
A Safe Place for DataVolatile main memory
Loses instructions and data when power offLoses instructions and data when power off
Non-volatile secondary memoryMagnetic diskMagnetic diskFlash memoryOptical disk (CDROM, DVD)p ( )
Chapter 1 — Computer Abstractions and Technology — 24
NetworksCommunication and resource sharingLocal area network (LAN): Ethernet
Within a buildingWithin a buildingWide area network (WAN: the InternetWireless network: WiFi, Bluetooth
Chapter 1 — Computer Abstractions and Technology — 25
Technology TrendsElectronics t h ltechnology continues to evolve
DRAM capacity
Increased capacity and performanceR d d tReduced cost
Year Technology Relative performance/cost1951 Vacuum tube 11965 Transistor 351975 I t t d i it (IC) 9001975 Integrated circuit (IC) 9001995 Very large scale IC (VLSI) 2,400,0002005 Ultra large scale IC 6 200 000 000
Chapter 1 — Computer Abstractions and Technology — 26
2005 Ultra large scale IC 6,200,000,000
DRAM Capacity Growth
Chapter 1 — Computer Abstractions and Technology — 27
Processor Performance Increase
10000
1000
10000
Int)
DEC Alpha 21264/600DEC Alpha 21264A/667
Intel Xeon/2000
Intel Pentium 4/3000
100
1000
e (S
PEC
DEC Alpha 4/266DEC Alpha 5/500
DEC Alpha 5/300100
orm
ance
HP 9000/750
DEC AXP/500 IBM POWER 100
10
Perf
o
SUN-4/260 MIPS M/120MIPS M2000
IBM RS6000
11987 1989 1991 1993 1995 1997 1999 2001 2003
Year
SUN-4/260 MIPS M/120
Chapter 1 — Computer Abstractions and Technology — 28
Impacts of Advancing Technology
ProcessorProcessorlogic capacity: increases about 30% per yearperformance: 2x every 1.5 years
MemoryDRAM capacity: 4x every 3 years, now 2x every 2 yearsmemory speed: 1.5x every 10 yearscost per bit: decreases about 25% per year
DiskDiskcapacity: increases about 60% per year
Chapter 1 — Computer Abstractions and Technology — 29
Defining Performance§1.4 P
e
Which airplane has the best performance?
erformanc
BAC/Sud
Boeing 747
Boeing 777
BAC/Sud
Boeing 747
Boeing 777
ce
0 100 200 300 400 500
DouglasDC-8-50
BAC/SudConcorde
0 2000 4000 6000 8000 10000
Douglas DC-8-50
BAC/SudConcorde
0 100 200 300 400 500
Passenger Capacity
0 2000 4000 6000 8000 10000
Cruising Range (miles)
BAC/Sud
Boeing 747
Boeing 777
BAC/Sud
Boeing 747
Boeing 777
0 500 1000 1500
DouglasDC-8-50
Concorde
0 100000 200000 300000 400000
Douglas DC-8-50
Concorde
Chapter 1 — Computer Abstractions and Technology — 30
Cruising Speed (mph) Passengers x mph
Response Time and ThroughputResponse time
How long it takes to do a taskThroughput
Total work done per unit timee.g., tasks/transactions/… per hour
How are response time and throughput affected byy
Replacing the processor with a faster version?Adding more processors?g p
We’ll focus on response time for now…
Chapter 1 — Computer Abstractions and Technology — 31
Relative PerformanceDefine Performance = 1/Execution Time“X is n time faster than Y”
P fP fn== XY
YX
time Executiontime ExecutionePerformancePerformanc
Example: time taken to run a program10 A 1 B10s on A, 15s on BExecution TimeB / Execution TimeA= 15s / 10s = 1.5So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 32
Measuring Execution TimeElapsed time
Total response time, including all aspectsProcessing, I/O, OS overhead, idle time
Determines system performanceCPU time
Time spent processing a given jobDiscounts I/O time, other jobs’ sharesDiscounts I/O time, other jobs shares
Comprises user CPU time and system CPU timeDifferent programs are affected differently by CPU and system performance
Chapter 1 — Computer Abstractions and Technology — 33
y p
CPU ClockingOperation of digital hardware governed by a
t t t l kconstant-rate clockClock period
Clock (cycles)
Data transferand computation
Update state
Clock period: duration of a clock cyclee g 250ps = 0 25ns = 250×10–12se.g., 250ps = 0.25ns = 250×10 12s
Clock frequency (rate): cycles per second9
Chapter 1 — Computer Abstractions and Technology — 34
e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
TimeCycleClockCyclesClockCPUTimeCPU ×=
CCycles Clock CPU
TimeCycleClockCyclesClockCPUTime CPU
=
×
Performance improved byRateClock
p yReducing number of clock cyclesIncreasing clock rateIncreasing clock rateHardware designer must often trade off clock
t i t l trate against cycle count
Chapter 1 — Computer Abstractions and Technology — 35
CPU Time ExampleComputer A: 2GHz clock, 10s CPU timeD i i C t BDesigning Computer B
Aim for 6s CPU timeCan do faster clock but causes 1 2 × clock cyclesCan do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?CyclesClock1 2CyclesClock
6sCyclesClock1.2
Time CPUCyclesClockRate Clock A
B
BB
×==
10202GHz10s
RateClockTimeCPUCycles Clock9
AAA
×=×=
×=
4GHz6
10246
10201.2Rate Clock
10202GHz10s99
B =×
=××
=
Chapter 1 — Computer Abstractions and Technology — 36
6s6sB
Instruction Count and CPInInstructio per CyclesCount nInstructioCycles Clock ×=
Time Cycle ClockCPICount nInstructioTime CPU ××=
Rate ClockCPICountnInstructio ×
=
Instruction Count for a programDetermined by program, ISA and compilerDetermined by program, ISA and compiler
Average cycles per instructionDetermined by CPU hardwareDetermined by CPU hardwareIf different instructions have different CPI
Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 37
Average CPI affected by instruction mix
CPI ExampleComputer A: Cycle Time = 250ps, CPI = 2.0C t B C l Ti 500 CPI 1 2Computer B: Cycle Time = 500ps, CPI = 1.2Same ISAWhich is faster, and by how much?
ATimeCycleACPICountnInstructioATimeCPU ××=
TimeCycleCPICountnInstructioTimeCPU500psI250ps2.0I
ATimeCycleACPICountnInstructioATime CPU
×=××=
××
A is faster…
600psI500ps1.2IBTimeCycleBCPICountnInstructioBTime CPU
×=××=
××=
1.2500psI600psI
ATime CPUBTime CPU
=××
= …by this much
Chapter 1 — Computer Abstractions and Technology — 38
A
CPI in More DetailIf different instruction classes take different numbers of cycles
n
∑=
×=n
1iii )Count nInstructio(CPICycles Clock
Weighted average CPI
∑=
⎟⎠⎞
⎜⎝⎛ ×==
n
1i
ii CountnInstructio
Count nInstructioCPICountnInstructio
Cycles ClockCPI= ⎠⎝1i
Relative frequency
Chapter 1 — Computer Abstractions and Technology — 39
CPI ExampleAlternative compiled code sequences using instructions in classes A B Cinstructions in classes A, B, C
Class A B CC ass CCPI for class 1 2 3IC in sequence 1 2 1 2IC in sequence 2 4 1 1
Sequence 1: IC 5 Sequence 2: IC 6Sequence 1: IC = 5Clock Cycles= 2×1 + 1×2 + 2×3
Sequence 2: IC = 6Clock Cycles= 4×1 + 1×2 + 1×3= 2×1 + 1×2 + 2×3
= 10Avg CPI = 10/5 = 2 0
= 4×1 + 1×2 + 1×3= 9Avg CPI = 9/6 = 1 5
Chapter 1 — Computer Abstractions and Technology — 40
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
Performance SummaryThe BIG Picture
Secondscycles ClocknsInstructioTimeCPU ××=cycleClocknInstructioProgram
Time CPU ××
Performance depends onAlgorithm: affects IC, possibly CPIg , p yProgramming language: affects IC, CPICompiler: affects IC CPICompiler: affects IC, CPIInstruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and Technology — 41
Power Trends§1.5 The P
ower W
all
In CMOS IC technology
FrequencyVoltageloadCapacitivePower 2 FrequencyVoltageloadCapacitivePower 2 ××=
×1000×30 5V → 1V
Chapter 1 — Computer Abstractions and Technology — 42
×1000×30 5V → 1V
Reducing PowerSuppose a new CPU has
85% of capacitive load of old CPU15% voltage and 15% frequency reduction15% voltage and 15% frequency reduction
0.520.850.85F0.85)(V0.85CP 42
old2
oldoldnew ==×××××
= 0.520.85FVCP old
2oldoldold ××
The power wallThe power wallWe can’t reduce voltage furtherWe can’t remove more heat
How else can we improve performance?Chapter 1 — Computer Abstractions and Technology — 43
p p
Uniprocessor Performance§1.6 The S
ea Chhange: TThe S
witcch to M
ulttiprocesssors
Constrained by power, instruction-level parallelism, memory
Chapter 1 — Computer Abstractions and Technology — 44
y p , p , ylatency
MultiprocessorsMulticore microprocessors
More than one processor per chipRequires explicitly parallel programmingRequires explicitly parallel programming
Compare with instruction level parallelismH d t lti l i t ti tHardware executes multiple instructions at onceHidden from the programmer
Hard to doProgramming for performanceLoad balancingOptimizing communication and synchronization
Chapter 1 — Computer Abstractions and Technology — 45
Manufacturing ICs§1.7 R
eeal Stuff: The A
MDD
Opteron X
4
Yield: proportion of working dies per wafer
Chapter 1 — Computer Abstractions and Technology — 46
iPad A4 Processor
Clock = 1 GHzClock 1 GHz4 Cortex-A8 coresEach Cortex core is based on ARM.based on ARM.250-520 mWatt
ticonsumption
Chapter 1 — Computer Abstractions and Technology — 47
AMD Opteron X2 Wafer
X2: 300mm wafer, 117 chips, 90nm technologyX4: 45nm technology
Chapter 1 — Computer Abstractions and Technology — 48
gy
Integrated Circuit Cost waferperCostdieperCost
areaDieareaWaferwaferperDies
Yield waferper DiespdieperCost
≈
×=
2/2))Di(D f t(11Yield
areaDieareaWaferwaferper Dies
=
≈
Nonlinear relation to area and defect rate
2area/2))Dieareaper(Defects(1 ×+
Nonlinear relation to area and defect rateWafer cost and area are fixedDefect rate determined by manufacturing processDefect rate determined by manufacturing processDie area determined by architecture and circuit design
Chapter 1 — Computer Abstractions and Technology — 49
SPEC CPU BenchmarkPrograms used to measure performance
Supposedly typical of actual workloadSupposedly typical of actual workloadStandard Performance Evaluation Corp (SPEC)
Develops benchmarks for CPU, I/O, Web, …Develops benchmarks for CPU, I/O, Web, …
SPEC CPU2006Elapsed time to execute a selection of programsElapsed time to execute a selection of programs
Negligible I/O, so focuses on CPU performanceNormalize relative to reference machineSummarize as geometric mean of performance ratios
CINT2006 (integer) and CFP2006 (floating-point)
nn
1iiratio time Execution∏
Chapter 1 — Computer Abstractions and Technology — 50
1i=
CINT2006 for Opteron X4 2356Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
C ( )sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete e ent sim lation 587 2 94 0 40 690 6 250 9 1omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11 7Geometric mean 11.7
High cache miss rates
Chapter 1 — Computer Abstractions and Technology — 51
SPEC Power BenchmarkPower consumption of server at different workload levels
Performance: ssj ops/secPerformance: ssj_ops/secPower: Watts (Joules/sec)
⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛= ∑∑
==
10
0ii
10
0ii powerssj_ops Wattper ssj_ops Overall
⎠⎝⎠⎝ == 0i0i
Chapter 1 — Computer Abstractions and Technology — 52
SPECpower_ssj2008 for X4Target Load % Performance (ssj_ops/sec) Average Power (Watts)
100% 231,867 29590% 211,282 28680% 185,803 27580% 185,803 27570% 163,427 26560% 140,160 25650% 118,324 24640% 920,35 23330% 70,500 222,20% 47,126 20610% 23,066 1800% 0 1410% 0 141
Overall sum 1,283,590 2,605∑ssj_ops/ ∑power 493
Chapter 1 — Computer Abstractions and Technology — 53
Fallacy: Low Power at IdleLook back at X4 power benchmark
At 100% load: 295WAt 50% load: 246W (83%)At 50% load: 246W (83%)At 10% load: 180W (61%)
G l d t tGoogle data centerMostly operates at 10% – 50% loadyAt 100% load less than 1% of the time
Consider designing processors to makeConsider designing processors to make power proportional to load
Chapter 1 — Computer Abstractions and Technology — 54
Concluding Remarks§1.9 C
o
Cost/performance is improving
oncluding
Due to underlying technology developmentHierarchical layers of abstraction
g Rem
arkyIn both hardware and software
Instruction set architecture
ks
Instruction set architectureThe hardware/software interface
E ti ti th b t fExecution time: the best performance measurePower is a limiting factor
Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 55
p p p