cpsc614 lec 1.1 based on lectures by: prof. david e culler prof. david patterson uc berkeley cpsc...
Post on 19-Dec-2015
217 views
TRANSCRIPT
![Page 1: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/1.jpg)
cpsc614Lec 1.1
Based on Lectures by:
Prof. David E Culler
Prof. David Patterson
UC Berkeley
CPSC 614:Graduate Computer Architecture2006 Spring
Introduction
![Page 2: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/2.jpg)
cpsc614Lec 1.2
Outline
• Why Take CPSC 614?• Fundamental Abstractions & Concepts• Instruction Set Architecture &
Organization• Pipelined Instruction Processing• Performance• The Memory Abstraction• Summary
![Page 3: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/3.jpg)
cpsc614Lec 1.3
What is “Computer Architecture”?
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction Set Architecture
Firmware
•Coordination of many levels of abstraction•Under a rapidly changing set of forces•Design, Measurement, and Evaluation
Datapath & Control
Layout
![Page 4: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/4.jpg)
cpsc614Lec 1.4
Why take CPSC 614?
• To design the next great instruction set?...well...
• Tremendous organizational innovation relative to established ISA abstractions
• Many New instruction sets or equivalent– embedded space, controllers, specialized
devices, ...
• Design, analysis, implementation concepts vital to all aspects of EE & CS
– systems, PL, theory, circuit design, VLSI, comm.
• Equip you with an intellectual toolbox for dealing with a host of systems design challenges
![Page 5: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/5.jpg)
cpsc614Lec 1.5
Coping with CPSC 614• Students with too varied background?• Review: CPSC 321 and/or
“Computer Organization and Design (COD)2/e”
– Chapters 1 to 8 of COD if never took prerequisite– If took a class, be sure COD Chapters 2, 6, 7 are
familiar
• FAST review this week of basic concepts
![Page 6: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/6.jpg)
cpsc614Lec 1.6
Review of Fundamental Concepts
• Instruction Set Architecture• Machine Organization• Instruction Execution Cycle• Pipelining• Memory• Bus (Peripheral Hierarchy)• Performance Iron Triangle
![Page 7: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/7.jpg)
cpsc614Lec 1.7
Example Hot Developments ca. 2006
• Manipulating the instruction set abstraction– itanium: translate ISA64 -> micro-op sequences– transmeta: continuous dynamic translation of IA32– tinsilica: synthesize the ISA from the application– reconfigurable HW
• Virtualization– vmware: emulate full virtual machine– JIT: compile to abstract virtual machine, dynamically
compile to host• Parallelism
– wide issue, dynamic instruction scheduling, EPIC– multithreading (SMT)– chip multiprocessors
• Communication– network processors, network interfaces
• Exotic explorations– nanotechnology, quantum computing
High performance + Energy Efficiency + Reliability !!
![Page 8: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/8.jpg)
cpsc614Lec 1.8
Forces on Computer Architecture
ComputerArchitecture
Technology ProgrammingLanguages
OperatingSystems
History
Applications
(A = F / M)
![Page 9: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/9.jpg)
cpsc614Lec 1.9
Amazing Underlying Technology Change
![Page 10: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/10.jpg)
cpsc614Lec 1.10
A take on Moore’s LawTr
ansis
tors
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1970 1975 1980 1985 1990 1995 2000 2005
Bit-level parallelism Instruction-level Thread-level (?)
i4004
i8008i8080
i8086
i80286
i80386
R2000
Pentium
R10000
R3000
![Page 11: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/11.jpg)
cpsc614Lec 1.11
Technology Trends
• Clock Rate: ~30% per year• Transistor Density: ~35%• Chip Area: ~15%• Transistors per chip: ~55%• Total Performance Capability: ~100%• by the time you graduate...
– 3x clock rate (3-4 GHz)– 10x transistor count (1 Billion transistors)– 30x raw capability
• plus 16x dram density, 32x disk density
![Page 12: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/12.jpg)
cpsc614Lec 1.12
Per
form
ance
0.1
1
10
100
1965 1970 1975 1980 1985 1990 1995
Supercomputers
Minicomputers
Mainframes
Microprocessors
Performance Trends
![Page 13: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/13.jpg)
cpsc614Lec 1.13
Measurement and EvaluationArchitecture is an iterative process
-- searching the space of possible designs -- at all levels of computer systems
Good IdeasGood Ideas
Mediocre IdeasBad Ideas
Cost /PerformanceAnalysis
Design
Analysis
Creativity
![Page 14: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/14.jpg)
cpsc614Lec 1.14
The Instruction Set: a Critical Interface
instruction set
software
hardware
![Page 15: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/15.jpg)
cpsc614Lec 1.15
Levels of Representation
High Level Language Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw $15,0($2)
lw $16,4($2)
sw $16,0($2)
sw $15,4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
°°
ALUOP[0:3] <= InstReg[9:11] & MASK
![Page 16: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/16.jpg)
cpsc614Lec 1.16
Instruction Set Architecture... the attributes of a [computing] system as
seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. – Amdahl, Blaaw, and Brooks, 1964
SOFTWARESOFTWARE-- Organization of Programmable Storage
-- Data Types & Data Structures: Encodings & Representations
-- Instruction Formats
-- Instruction (or Operation Code) Set
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
![Page 17: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/17.jpg)
cpsc614Lec 1.17
Organization
Logic Designer's View
ISA Level
FUs & Interconnect
•Capabilities & Performance Characteristics of Principal Functional Units
– (e.g., Registers, ALU, Shifters, Logic Units, ...)
•Ways in which these components are interconnected
•Information flows between components
•Logic and means by which such information flow is controlled.
•Choreography of FUs to realize the ISA
•Register Transfer Level (RTL) Description
![Page 18: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/18.jpg)
cpsc614Lec 1.18
Review: MIPS R3000 (core)0r0
r1°°°r31PClohi
Programmable storage
2^32 x bytes
31 x 32-bit GPRs (R0=0)
32 x 32-bit FP regs (paired DP)
HI, LO, PC
Data types ?
Format ?
Addressing Modes?
Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
SLL, SRL, SRA, SLLV, SRLV, SRAV
Memory Access
LB, LBU, LH, LHU, LW, LWL,LWR
SB, SH, SW, SWL, SWR
Control
J, JAL, JR, JALR
BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL
32-bit instructions on word boundary
![Page 19: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/19.jpg)
cpsc614Lec 1.19
Types of Internal Storage
• Stack, Accumulator, A Set of Registers
![Page 20: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/20.jpg)
cpsc614Lec 1.20
Review: Basic ISA ClassesAccumulator:
1 address add A acc acc + mem[A]
1+x address addx A acc acc + mem[A + x]
Stack:
0 address add tos tos + next
General Purpose Register:
2 address add A B EA(A) EA(A) + EA(B)
3 address add A B C EA(A) EA(B) + EA(C)
Load/Store:
load Ra Rb Ra mem[Rb]
store Ra Rb mem[Rb] Ra
![Page 21: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/21.jpg)
cpsc614Lec 1.21
Instruction Formats
Variable:
Fixed:
Hybrid:
…
•Addressing modes–each operand requires addess specifier => variable format
•code size => variable length instructions
•performance => fixed length instructions–simple decoding, predictable operations
•With load/store instruction arch, only one memory address and few addressing modes
•=> simple format, address mode given by opcode
![Page 22: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/22.jpg)
cpsc614Lec 1.22
MIPS Addressing Modes & Formats• Simple addressing modes
• All instructions 32 bits wide
op rs rt rd
immed
register
Register (direct)
op rs rt
register
Base+index
+
Memory
immedop rs rtImmediate
immedop rs rt
PC
PC-relative
+
Memory
![Page 23: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/23.jpg)
cpsc614Lec 1.23
Cray-1: the original RISC
Op
015
Rd Rs1 R2
2689
Load, Store and Branch
35
Op
015
Rd Rs1 Immediate
2689 35 15 0
Register-Register
![Page 24: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/24.jpg)
cpsc614Lec 1.24
VAX-11: the canonical CISC
• Rich set of orthogonal address modes– immediate, offset, indexed, autoinc/dec,
indirect, indirect+offset– applied to any operand
• Simple and complex instructions– synchronization instructions– data structure operations (queues)– polynomial evaluation
OpCode A/M A/M A/M
Byte 0 1 n m
Variable format, 2 and 3 address instruction
![Page 25: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/25.jpg)
cpsc614Lec 1.25
RISC vs. CISC
• Pipelining• Ease of Hardware
Implementation• Simple Instructions• Simple Addressing
Mode• Fixed-Length
Formats• Large Number of
Registers• MIPS, …
• Simple Compilers• Powerful Addressing
Mode• Powerful
Instructions• Efficient Instruction
Encoding• Few Registers• VAX
![Page 26: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/26.jpg)
cpsc614Lec 1.26
Review: Load/Store Architectures
MEM reg° no memory reference per ALU instruction- 3 address GPR
° Register to register arithmetic° Load and store with simple addressing modes (reg +
immediate)° Simple conditionals
compare ops + branch zcompare&branchcondition code + branch on condition
° Simple fixed-format encoding
op
op
op
r r r
r r immed
offset
![Page 27: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/27.jpg)
cpsc614Lec 1.27
MIPS R3000 Instruction Set Architecture
° Machine Environment Target
° Instruction Categories Load/Store Computational Jump and Branch Floating Point (coprocessor)
OP Rs Rt Rd sa funct
OP Rs Rt Immediate
OP jump target
3 Instruction Formats: all 32 bits wide
R0 - R31
PCHILO
Registers
I:R:
J:
![Page 28: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/28.jpg)
cpsc614Lec 1.28
Execution Cycle
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
![Page 29: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/29.jpg)
cpsc614Lec 1.29
What’s a Clock Cycle?
• Old days: 10 levels of gates• Today: determined by numerous time-
of-flight issues + gate delays– clock propagation, wire lengths, drivers
Latchor
register
combinationallogic
![Page 30: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/30.jpg)
cpsc614Lec 1.30
Fast, Pipelined Instruction Interpretation
Instruction Register
Operand Registers
Instruction Address
Result Registers
Next Instruction
Instruction Fetch
Decode &Operand Fetch
Execute
Store Results
NIIF
DE
W
NIIF
DE
W
NIIF
DE
W
NIIF
DE
W
NIIF
DE
W
Time
Registers or Mem
![Page 31: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/31.jpg)
cpsc614Lec 1.31
Sequential Laundry
• Sequential laundry takes 6 hours for 4 loads• If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 2030 40 2030 40 2030 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
![Page 32: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/32.jpg)
cpsc614Lec 1.32
Pipelined LaundryStart work ASAP
• Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
30 40 40 40 40 20
![Page 33: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/33.jpg)
cpsc614Lec 1.33
Pipelining Lessons
• Pipelining doesn’t help latency of single task, it helps throughput of entire workload
• Pipeline rate limited by slowest pipeline stage
• Multiple tasks operating simultaneously
• Potential speedup = Number pipe stages
• Unbalanced lengths of pipe stages reduces speedup
• Time to “fill” pipeline and time to “drain” it reduces speedup
A
B
C
D
6 PM 7 8 9
Task
Order
Time
30 40 40 40 40 20
![Page 34: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/34.jpg)
Recap: A Single Cycle Datapath
° Rs, Rt, Rd and Imed16 hardwired into datapath from Fetch Unit° We have everything except control signals (underline)
• Today’s lecture will show you how to generate the control signals
32
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
55 5
Rw Ra Rb
32 32-bitRegisters
Rs
Rt
Rt
Rd
RegDst
Exten
der
Mu
x
Mux
3216imm16
ALUSrc
ExtOp
Mu
x
MemtoReg
Clk
Data InWrEn
32
Adr
DataMemory
32
MemWrA
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01<
21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel
![Page 35: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/35.jpg)
cpsc614Lec 1.35
5 Steps of MIPS DatapathFigure 3.1, Page 130, CA:AQA 2e
MemoryAccess
Write
Back
InstructionFetch
Instr. DecodeReg. Fetch
ExecuteAddr. Calc
LMD
ALU
MU
X
Mem
ory
Reg File
MU
XM
UX
Data
Mem
ory
MU
X
SignExtend
4
Ad
der Zero?
Next SEQ PC
Addre
ss
Next PC
WB Data
Inst
RD
RS1
RS2
Imm
![Page 36: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/36.jpg)
cpsc614Lec 1.36
5 Steps of MIPS DatapathFigure 3.4, Page 134 , CA:AQA 2e
MemoryAccess
Write
Back
InstructionFetch
Instr. DecodeReg. Fetch
ExecuteAddr. Calc
ALU
Mem
ory
Reg File
MU
XM
UX
Data
Mem
ory
MU
X
SignExtend
Zero?
IF/ID
ID/E
X
MEM
/WB
EX
/MEM
4
Ad
der
Next SEQ PC Next SEQ PC
RD RD RD WB
Data
Next PC
Addre
ss
RS1
RS2
Imm
MU
X
![Page 37: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/37.jpg)
cpsc614Lec 1.37
Visualizing PipeliningFigure 3.3, Page 133 , CA:AQA 2e
Instr.
Order
Time (clock cycles)
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 6Cycle 7Cycle 5
![Page 38: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/38.jpg)
cpsc614Lec 1.38
Its Not That Easy for Computers
• Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle
– Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away)
– Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)
– Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
![Page 39: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/39.jpg)
cpsc614Lec 1.39
Review of Performance
![Page 40: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/40.jpg)
cpsc614Lec 1.40
Which is faster?
• Time to run the task (ExTime)– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns … (Performance)
– Throughput, bandwidth
Plane
Boeing 747
BAD/Sud Concorde
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
![Page 41: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/41.jpg)
cpsc614Lec 1.41
Performance(X) Execution_time(Y)
n = =
Performance(Y) Execution_time(X)
Definitions
•Performance is in units of things per sec– bigger is better
•If we are primarily concerned with response time–performance(x) = 1
execution_time(x)
" X is n times faster than Y" means
![Page 42: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/42.jpg)
cpsc614Lec 1.42
Computer Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Inst Count CPI Clock RateProgram X
Compiler X (X)
Inst. Set. X X
Organization X X
Technology X
inst count
CPI
Cycle time
![Page 43: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/43.jpg)
cpsc614Lec 1.43
Cycles Per Instruction(Throughput)
“Instruction Frequency”
CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count
“Average Cycles per Instruction”
j
n
jj I CPI TimeCycle time CPU
1
Count nInstructio
I F where F CPI CPI j
j
n
jjj
1
![Page 44: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/44.jpg)
cpsc614Lec 1.44
Example: Calculating CPI bottom up
Typical Mix of instruction typesin program
Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
1.5
![Page 45: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/45.jpg)
cpsc614Lec 1.45
Example: Branch Stall Impact
• Assume CPI = 1.0 ignoring branches (ideal)• Assume solution was stalling for 3 cycles• If 30% branch, Stall 3 cycles on 30%
• Op Freq Cycles CPI(i) (% Time)• Other 70% 1 .7 (37%)• Branch30% 4 1.2 (63%)
• => new CPI = 1.9• New machine is 1/1.9 = 0.52 times faster (i.e.
slow!)
![Page 46: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/46.jpg)
cpsc614Lec 1.46
Speed Up Equation for Pipelining
pipelined
dunpipeline
Time Cycle
Time Cycle
CPI pipelined
CPI dunpipeline Speedup
pipelined
dunpipeline
Time Cycle
Time Cycle
CPI stall Pipeline 1
1 Speedup
Instper cycles Stall Average CPI Ideal CPIpipelined
For simple RISC pipeline, CPI = 1:
![Page 47: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/47.jpg)
cpsc614Lec 1.47
• Ex) Suppose we enhance a machine to make all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 20 seconds, what will speedup be if half of the 20 seconds is spent executing floating-point instructions?
![Page 48: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/48.jpg)
cpsc614Lec 1.48
Summary
• Modern Computer Architecture is about managing and optimizing across several levels of abstraction wrt dramatically changing technology and application load
• Key Abstractions– instruction set architecture– memory– bus
• Key concepts– HW/SW boundary– Compile Time / Run Time– Pipelining– Caching
• Performance Iron Triangle relates combined effects
– Total Time = Inst. Count x CPI + Cycle Time
![Page 49: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/49.jpg)
cpsc614Lec 1.49
Instruction Pipelining
• Execute billions of instructions, so throughput is what matters
– except when?
• What is desirable in instruction sets for pipelining?
– Variable length instructions vs. all instructions same length?
– Memory operands part of any operation vs. memory operands only in loads or stores?
– Register operand many places in instruction format vs. registers located in same place?
![Page 50: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/50.jpg)
cpsc614Lec 1.50
MIPS R3000 ISA (Summary)
• Instruction Categories– Load/Store– Computational– Jump and Branch– Floating Point
» coprocessor– Memory Management– Special
R0 - R31
PCHI
LO
OP
OP
OP
rs rt rd sa funct
rs rt immediate
jump target
3 Instruction Formats: all 32 bits wide
Registers
![Page 51: Cpsc614 Lec 1.1 Based on Lectures by: Prof. David E Culler Prof. David Patterson UC Berkeley CPSC 614:Graduate Computer Architecture 2006 Spring Introduction](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2a5503460f949fee81/html5/thumbnails/51.jpg)
cpsc614Lec 1.51
Example: MIPS (Note register location)
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call