1 ece243 cpu. 2 implementing a simple cpu how are machine instructions implemented? what components...
TRANSCRIPT
1
ECE243
CPU
2
IMPLEMENTING A SIMPLE CPU
• How are machine instructions implemented?
• What components are there?
• How are they connected and controlled?
3
MINI ISA:• every instruction is 1-byte wide
– data and address values are also 1-byte wide
• address space– byte addressable (every byte has an address)– 8 addr bits => 256 byte locations
• 4 registers: – r0..r3
• PC (resets to $80)• Condition codes:
– Z (zero), N (negative)– these are used by branches
4
Some Definitions:• IMM3: a 3-bit signed immediate, 2 parts:
– 1 sign bit: sign(IMM3) – 2 bit value: value(IMM3)
• IMM4: a 4-bit signed immediate• IMM5: a 5-bit unsigned immediate• OpA, OpB: registers variables
– represent one of r0..r3
• SE8(X): – means sign-extend value X to 8 bits
• NOTE: ALL INSTS DO THIS LAST: – PC = PC + 1
5
Mini ISA Instructionsload OpA (OpB): OpA = mem[OpB] PC = PC + 1
store OpA (OpB): mem[OpB] = OpA PC = PC + 1
add OpA OpB OpA = OpA+ OpB
IF (OpA == 0) Z = 1 ELSE Z = 0 IF (OpA< 0) N = 1 ELSE N = 0 PC = PC + 1
sub OpA OpB OpA = OpA - OpB IF (OpA == 0) Z = 1 ELSE Z = 0 IF (OpA< 0) N = 1 ELSE N = 0 PC = PC + 1
6
Mini ISA Instructionsnand OpA OpB OpA = OpA bitwise-NAND OpB IF (OpA == 0) Z = 1 ELSE Z = 0 IF (OpA< 0) N = 1 ELSE N = 0 PC = PC + 1
ori IMM5 r1 = r1 bitwise-OR IMM5 IF (r1 == 0) Z = 1 ELSE Z = 0 IF (r1< 0) N = 1 ELSE N = 0 PC = PC + 1
shift OpA IMM3 IF (sign(IMM3)) OpA = OpA << value(IMM3) ELSE OpA = OpA >> value(IMM3) IF (OpA == 0) Z = 1 ELSE Z = 0 IF (OpA< 0) N = 1 ELSE N = 0 PC = PC + 1
7
Mini ISA Instructionsbz IMM4
IF (Z == 1) PC = PC + SE8(IMM4)
PC = PC + 1
bnz IMM4
IF (Z == 0) PC = PC + SE8(IMM4)
PC = PC + 1
bpz IMM4
IF (N == 0) PC= PC + SE8(IMM4)
PC = PC + 1
8
ENCODINGS: Inst(opcode)• Load(0000), store(0010), add(0100),
sub(0110), nand(1000):
• Ori:
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
9
ENCODINGS: Inst(opcode)• Shift:
• BZ(0101), BNZ(1001), BPZ(1101):
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
10
DESIGNING A CPU • Two main components:
– datapath and control
• datapath: – registers, functional units, muxes, wires– must be able to perform all steps of every inst
• control: – a finite state machine (FSM)– commands the datapath– performs: fetch, decode, read, execute, write, get
next inst
11
ECE243
CPU: basic components
12
REGISTERS
• REGISTERS– can always read– we assume falling-edge-triggered– in is stored if REGWrite=1 on falling clock edge – we won’t normally draw the clock input
REG8 8
REGWrite?
in out
clock
13
MUXES
• ‘select’ signal chooses which input to route to output
select
8
8
0
18
out
14
REGISTER FILE
• Out1 is the value of reg indexed by OpA• Out2 is the value of reg indexed by OpB• if REGWrite is 1 when clock goes low
– then the value on ‘in’ is written to reg indexed by Rwrite
RegFILE(r0,r1,r2,r3)
2
2
8
8
OpA
OpB
Out1
Out2
clock
REGWrite?
Rwrite2 8
in
15
ALU (arithmetic logic unit)
• ALUop:– add = 000– sub = 001– or = 010– nand = 011– shift = 100
• Z = nor(out7,out6,out5…out0)• N = out bit 7 (implies negative---sign bit)
8
8
8
In0
In1
ZN
out
ALUop
3
16
MEMORY
• our CPU has two memories for simplicity: – instruction memory and data memory– known as a “Harvard architecture”
17
INSTRUCTION MEM
• is read only• Iout is set to the value indexed by the address
INSTMEM
8
8
addr
Iout
18
DATA MEMORY
• can read or write– but only one in a given clock cycle
• on falling clock edge:– if MEMWrite==1: value on Din is stored at addr– if MEMRead==1: value at addr is output on Dout
DATAMEM8
8 8
addr
Din Dout
MEMRead?
clock
MEMWrite?
19
SE8(x): SIGN-EXTEND TO 8 BITS
• assuming 4-bit input• Recall: want:
– SE8(0100) -> 00000100– SE8(1100) -> 11111100
• In bits i3,i2,i1,i0; out bits o7…o0
I0
I1I2
I3
O0
O1O2
O3
O4
O5O6
O7
20
ZE8(x): ZERO EXTEND TO 8 bits
• assuming 5-bit input• Recall: want
– ZE8(00100) -> 00000100– ZE8(11100) -> 00011100
• In bits i4,i3,i2,i1,i0; out bits o7…o0
I0I1I2
I3
O0O1O2
O3
O4O5O6
O7
I40
21
ECE243
CPU: Single Cycle Implementation
22
SINGLE CYCLE DATAPATH
• each instruction executes entirely – in one cycle of the cpu clock
• registers are triggered by the falling edge– new values begin propagating through datapath– some values may be temporarily incorrect
• the clock period is large enough to ensure:– that all values correct before next falling edge
Inst1 Inst2
1 cyc
23
FETCH
• needed by every instruction– i.e., every instruction must be fetched
8
instINSTMEM
8PC
addr
PCwrite?
24
PC = PC + 1
8
instINSTMEM
8PC addr
PCwrite?
25
BRANCHES: BZ IMM4
• (if branch is taken does: PC = PC + IMM4 + 1)
IMM4 opcode
7 6 5 4 3 2 1 0
8
instINSTMEM
8PC addr
+
PCwrite?
18
26
ADD add OpA OpB
• Does OpA = OpA + OpB• same datapath for sub and nand OpA OpB 0 1 0 0
i7 i6 i5 i4 i3 i2 i1 i0
Inst:
8
instINSTMEM
8PC addr
+
PCwrite?
18
84SE8
+
IMM4
10
PCsel
27
SHIFT: SHIFT OpA IMM3
REGFILERw
2OpA
2
OpB
2
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
ALUop
84SE8
+
IMM4
1
0
OpA 0 1 1
i7 i6 i5 i4 i3 i2 i1 i0
IMM3
PCsel
28
ORI: ORI IMM5
does: r1 <- r1 bitwise-or IMM5 IMM5 1 1 1
i7 i6 i5 i4 i3 i2 i1 i0
REGFILERw
OpA
2
OpB
2
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
ALU2 ALUop
IMM3
84SE8
8
+
IMM4
1
0
PCsel
ZE8
29
Store: Store OpA (OpB)
• does: mem[OpB] = OpA OpA OpB opcode
i7 i6 i5 i4 i3 i2 i1 i0
Inst:
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
00
01
10
11
IMM3
3
84SE8
8
+
IMM4
1
0
PCselZE8
ZE8
30
Load: Load OpA (OpB)
• does: OpA = mem[OpB] OpA OpB opcode
i7 i6 i5 i4 i3 i2 i1 i0
Inst:
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
MEMreadMEMwrite
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
31
Final Datapath!
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
32
DESIGNING THE CONTROL UNIT
• CONTROL SIGNALS TO GENERATE:– PCsel, PCwrite, REGwrite, MEMread, MEMwrite,
OpASel, ALUop, ALU2, RFin
CTRL PCsel…
opcode
ZN
33
Control Signals
INPUTS OUTPUTS
INST
Inst b
its 3-0 N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
LOAD 0000 X X
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
Load OpA (OpB)
34
Control Signals
INPUTS OUTPUTS
INST
Inst b
its 3-0 N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
STORE 0010 X X
Store OpA (OpB)
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
35
Control Signals
INPUTS OUTPUTS
INST
Inst b
its 3-0 N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
ADD 0100 X X
Add OpA OpB
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
36
Control Signals
INPUTS OUTPUTS
INST
Inst b
its 3-0 N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
SUB 0110 X X
Sub OpA OpB
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
37
Control Signals
INPUTS OUTPUTS
INST
Inst b
its 3-0 N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
NAND 1000 X X
Nand OpA OpB
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
38
Control Signals
INPUTS OUTPUTS
INST
Inst b
its 3-0
N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
ORI X111 X X
ori IMM5
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
39
Control Signals
INPUTS OUTPUTS
INST
Inst b
its 3-0
N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
SHIFT X011 X X
Shift OpA IMM3
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
40
Control Signals
INST
Inst b
its 3-0
N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
BZ 0101 X 0
0101 X 1
bz IMM4
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
41
Control Signals
INST
Inst b
its 3-0
N ZP
CS
el
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
BNZ 1001 X 0
1001 X 1
bnz IMM4
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
42
Control Signals
INST
Inst b
its 3-0
N ZP
CS
el
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
BPZ 1101 0 X
1101 1 X
bpz IMM4
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
0
1
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
43
All Control Signals INPUTS OUTPUTS
INST
Inst b
its 3-0 N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
LOAD 0000 X X 0 1 1 1 X 0 X 1 XXX
STORE 0010 X X 0 1 0 0 0 1 X X XXX
ADD 0100 X X 0 1 1 0 0 0 00 0 000
SUB 0110 X X 0 1 1 0 0 0 00 0 001
NAND 1000 X X 0 1 1 0 0 0 00 0 011
44
All Control Signals INPUTS OUTPUTS
INST
Inst b
its 3-0
N Z
PC
Sel
PC
Write
Reg
Write
Mem
Read
Op
AS
el
Mem
Write
AL
U2
RF
in
AL
Uo
p
ORI X111 X X 0 1 1 0 1 0 01 0 010
SHIFT X011 X X 0 1 1 0 0 0 10 0 100
BZ 0101 X 0 0 1 0 0 X 0 X X XXX
0101 X 1 1 1 0 0 X 0 X X XXX
BNZ 1001 X 0 1 1 0 0 X 0 X X XXX
1001 X 1 0 1 0 0 X 0 X X XXX
BPZ 1101 0 X 1 1 0 0 X 0 X X XXX
1101 1 X 0 1 0 0 X 0 X X XXX
45
Building Control Logic: MemReadLoad Store Add Sub Nand Ori Shift Bz Bnz BPZ
inst bits i3-i0
0000 0010 0100 0110 1000 X111 X011 0101 0101 1001 1001 1101 1101
N X X X X X X X X X X X 0 1
Z X X X X X X X 0 1 0 1 X X
Mem
Read
1 0 0 0 0 0 0 0 0 0 0 0 0
46
Building Control Logic: PCSelLoad Store Add Sub Nand Ori Shift Bz Bnz BPZ
inst bits i3-i0
0000 0010 0100 0110 1000 X111 X011 0101 0101 1001 1001 1101 1101
N X X X X X X X X X X X 0 1
Z X X X X X X X 0 1 0 1 X X
PCSel 0 0 0 0 0 0 0 0 1 1 0 1 0
47
ECE243
CPU: Multicycle Implementation
48
A Multicycle Datapath
OpASel
OpA
OpB
49
OpASel
OpA
OpB
Key Difference #1: Only 1 Memory
50
OpASel
OpA
OpB
Key Difference #2: Only 1 ALU
51
OpASel
OpA
OpB
Key Difference #3: Temp Regs
what benefit are tmp regs / multicycle?
52
OpASel
OpA
OpB
Key Difference #3: Temp Regs
critical path is long large clock period
53
OpASel
OpA
OpB
Key Difference #3: Temp Regs
smaller critical pathsshorter clock period
54
OpASel
OpA
OpB
Key Difference #3: Temp Regs
let’s examine these one at a time
55
OpASel
OpA
OpB
IR: Instruction Register
holds inst encoding
56
OpASel
OpA
OpB
MDR: Memory Data Register
holds the value returned from Memory
57
OpASel
OpA
OpB
OpA and OpB
hold values from the register file
58
OpASel
OpA
OpB
ALUout
holds the result calculcated by the ALU
59
Cycle by Cycle Operation
OpASel
OpA
OpB
60
OpASel
OpA
OpB
All Insts Cycle1: Fetch and Increment PC
fetch next inst into the IRincrement PC
IR ← mem[PC]; PC ← PC + 1;
61
OpASel
OpA
OpB
All Insts Cycle2: Decoding Inst & Reading Reg File
Note: not all insts need OpA and OpB
OpA ← rx; OpB ← ry
62
OpASel
OpA
OpB
Add, Sub, Nand Cycle3: Calculate
ALUout ← OpA op OpB
63
OpASel
OpA
OpB
Add, Sub, Nand Cycle4: Write to Reg FIle
rx ← ALUout
64
OpASel
OpA
OpB
Shift Cycle3: Calculate
ALUout ← OpA op IMM3
65
OpASel
OpA
OpB
Shift Cycle4: Write to Reg FIle
rx ← ALUout
66
OpASel
OpA
OpB
ORI Cycle3: Read r1 from Reg File
OpA ← r1
67
OpASel
OpA
OpB
ORI Cycle4: Calculate
ALUout ← OpA op IMM5
68
OpASel
OpA
OpB
ORI Cycle5: Write to Reg FIle
r1 ← ALUout
69
OpASel
OpA
OpB
Load Cycle3: addr to Mem, value into MDR
MDR ← mem[OpB]
70
OpASel
OpA
OpB
Load Cycle4: write value into reg file
rx ← MDR
71
OpASel
OpA
OpB
Store Cycle3:addr to Mem, value to Mem
mem[OpB] ← OpA
72
OpASel
OpA
OpB
Branches Cycle3
PC ← PC + IMM4
73
Summary
Instructions
Single Cycle
Eg: 1 MHz
Multicycle
Eg: 4 MHz
Store, BZ, BNZ, BPZ
1 cycle 3 cycles
Add, Sub, Nand, Load
1 cycle 4 cycles
ORI 1 cycle 5 cycles
Example: total time to execute one of each instruction:Single cycle: 1*4 + 1*4+1*1 = 9 cycles; 9 cycles / 1MHz = 9us
Multicycle: 3*4 + 4*4 + 1*5 = 33 cycles; 33 cycles / 4MHz = 8.25us
74
Implementing Multicycle ControlAdd, Sub,
NandShift Ori Load Store Bnz, Bz,
Bpz
1 IR = [PC]
PC = PC + 1
2 OpA = RF[rx]
OpB = RF[ry]
3 ALUout = OpA op
OpB
ALUout = OpA shift
Imm3
OpA = RF[1]
MDR = mem[OpB]
Mem[OpB] = OpA
PC = PC + SE(Imm4)
4 RF[rx] = ALUout ALUout = OpA OR
Imm5
RF[rx] = MDR
X X
5 X X RF[1] = ALUout
X X X
75
Control: An FSM
• need a state transition diagram
• how many states are there?
• how many bits to represent state?
76
Multicycle Control as an FSM
77
Multicycle Control Hardware
IR
N
Ctrl logic
Z
State Register
(4 bits)
IR:3..0
Pcwrite
Pcsel
ALUop
…
Next_stateCurrent_state
78
ECE243
CPU: Adding a New Instruction
79
EXAMPLE QUESTION:ADDING A NEW INSTRUCTION
• Implement a post-increment load:
• Load rx, (ry)+
Does: RF[rx] = MEM[RF[ry]]
RF[ry] = RF[ry] + 1
ry is permanently changed to be ry+1
80
Implementing: RF[rx] = MEM[RF[ry]]; RF[ry] = RF[ry] + 1
Recall: load rx, (ry)
IR= mem[PC] , PC = PC + 1
OpA = RF[rx], OpB = RF[ry]
MDR = mem[ry]
RF[rx] = MDR
81
RF[ry] = RF[ry] + 1RF[ry] = RF[ry] + 1
Modifying the Datapath
OpASel
OpA
OpB
82
ECE243
CPU: Pipelining
83
A Fast-Food Sandwich Shop
take order
selectbun
addingredients
wrap andbag
cash andchange
cook
84
With One Cook
take order
selectbun
addingredients
wrap andbag
cash andchange
customer1
cook
customer1 customer1 customer1 customer1
• one customer is serviced at a time
85
Like the single-cycle CPU
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
1
0
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
Add r1, r2
• one instruction flows through at a time
86
With Two Cooks?
take order
selectbun
addingredients
wrap andbag
cash andchange
cook cook
87
Pipelining
• Like an assembly line
• Doesn’t change the interface or result– improves performance
88
Pipelining a CPU (rough idea)
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
1
0
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
89
Pipelining Details:
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
1
0
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
90
With Three Cooks?
take order
selectbun
addingredients
wrap andbag
cash andchange
cook cook cook
91
Pipelining a CPU (rough idea)
REGFILERw
2OpA
2
OpB
25
ALU
N Z
Out1
Out2
in
REGwrite?
8
instINSTMEM
8PC addr
+
PCwrite?
18
2
21
OpASel
ALU2
IMM5
ALUop
1
0
DataMEM
00
01
10
11
IMM3
3
addr
Din
0
1
MEMreadMEMwrite
RFin
84SE8
8
+
IMM4
1
0
PCsel
ZE8
ZE8
92
Visualizing Pipelining
Cycle Fetch Decode Execute
1
2
3
4
Fetch(inst mem)
Decode(reg file)
Execute(ALU and data mem)
93
Visualizing Pipelining (again)
Fetch(inst mem)
Decode(reg file)
Execute(ALU and data mem)
Cycle 1 2 3 4 5
inst1
inst2
inst3
inst4
94
Fast Food Hazards
take order
selectbun
addingredients
wrap andbag
cash andchange
cook cook cook
customer3 customer2 customer1
What if: c1 and c2 are friends, c2 has no money, andc2 needs to know how much change c1 will get beforeordering (to ensure c2 can afford his order)?
95
Fast Food Hazards
take order
selectbun
addingredients
wrap andbag
cash andchange
cook cook cook
customer2 customer1
96
CPU Hazards
• called a data hazard• must be observed to ensure correct execution• there are two solutions to data hazards
Fetch(inst mem)
Decode(reg file)
Execute(ALU and data mem)
97
Solution1: Stalling
Cycle 1 2 3 4 5add r1,r2
add r3,r1
sub r0,r2
add r2,r2
Fetch(inst mem)
Decode(reg file)
Execute(ALU and data mem)
98
How to insert bubbles• option1: hardware stalls the pipeline
– need extra logic to do so– happens ‘automatically’ for any code
• option2: compiler inserts “no-ops”– a no-op is an instruction that does nothing– ex: add r0,r0,r0 (NIOS)– compiler must do it right or wrong results!
• example: inserting a bubble with a no-op:add r1, r2noop add r3, r1
99
Solution2: Forwarding Lines
add “forwarding” logic to pass values directly between stages
Fetch(inst mem)
Decode(reg file)
Execute(ALU and data mem)
Cycle 1 2 3 4 5add r1,r2
add r3,r1
sub r0,r2
add r2,r2
100
Control HazardsCycle 1 2 3 4 5 5 5
add r1,r2
bnz -2
add r3,r1
add r2,r2
• cpu predicts each branch is not taken• Better: predict taken
– why?---loops are common, usually taken• More advanced: remember what each branch did last time• “branch predictor”:
– a table that remembers what each branch did the last time– uses this to make a prediction next time
101
Some Real CPU Pipelines
Microprocessor Report 10/28/96
TC nxt IP TC fetch Drv Alloc Rename Que Sch SchSch Disp Disp RF RF Ex Flgs BrCk Drv
Pentium IV’s Pipeline:
21264 Pipeline (Alpha)21264 Pipeline (Alpha)
102
ECE243
CPU: Alternate Architectures
103
ANOTHER MULTICYCLE CPU
CONTROL
IR
PC
MDR
Regs r0..r3
Y
Z
1
ControlSignals toAll components
Internal bus
MEMaddr
Din Dout
MAR
ALU
Select111 … 000
MEMRead MEMWriteImm3,4,5
ALUop
104
SOME CONTROL SIGNALS• PCout:
– write PC value to bus
• PCin: – read bus value into PC
• MDRinBus: – read value from bus into MDR
• MDRinMem: – write value from Dout of MEM into MDR
• MDRoutBus: – write value from MDR onto bus
105
Ex: Ctrl: Add r1, r2 # r1 = r1 + r2
CONTROL
IR
PC
MDR
Regs r0..r3
Y
Z
1
ControlSignals toAll components
Internal bus
MEMaddr
Din Dout
MAR
ALU
Select 111 … 000
MEMRead MEMWriteImm3,4,5
ALUop
106
Ex: Ctrl: Add r1, r2 # r1 = r1 + r2
CONTROL
IR
PC
MDR
Regs r0..r3
Y
Z
1
ControlSignals toAll components
Internal bus
MEMaddr
Din Dout
MAR
ALU
Select 111 … 000
MEMRead MEMWriteImm3,4,5
ALUop
107
CHARACTERIZATION OF ISAs• attribute #1:
– number of explicit operands
• Attribute #2: – are registers general purpose?
• Attribute #3: – Can an operand be a memory location?
• Attribute #4: – RISC vs CISC
• Attribute #5: – Relation between instructions and data
108
att1: num of explicit operands• focus on calculation instructions (add,sub…)
• running example: A = B + C (C-code)– assume A, B, C are memory locations
• 0 operands:– eg., stack based (like first calculator CPUs)– push and pop operations, refer to top of stack
109
att1: num of explicit operands• 1 operand:
– eg., accumulator based;– accumulator is a reg inside cpu– instructions use accum as destination.
•
110
att1: num of explicit operands• 2 operands
– eg: 68k, ia32
111
att1: num of explicit operands• 3-operand
– eg: MIPS, SPARC, POWERpc
• How many operands is NIOS?
112
Att2: are regs general purpose?• if yes:
– you can use any register for any purpose– special registers are by convention only
• if no: – some registers have hardwired purposes– ex: in 68k, A7 is hardwired to be stack pointer– used implicitly for jsr, rts, link instructions
• Are NIOS registers general purpose?
113
Att3: operand = mem location?• with respect to calculation insts (add, sub)
• if yes:– one operand can be in memory, the other in a
register– maybe: can can also write result to memory
• if no:– called a load/store architecture– only load/store insts can get/put memory values
to/from regs
• Can a NIOS operand be a mem location?
114
Att4: RISC vs CISC• Are there instructions with many steps?
– a vague and debatable question
• CISC: complex instruction set computer– Many, complex instructions– can be hard to pipeline!– ex: 68k, x86, PowerPC?
• RISC: reduced instruction set computer– Fewer, simple instructions– easy to pipeline– ex: MIPS, alpha, Powerpc?
• Which is NIOS?
• Quandry: x86 is a CISC– but pentiumIV has a 20-stage pipeline!– How’d they do it?
–
115
Att5: Relation bet. insts & data• SISD: single instruction, single data
– everyting we have seen so far– an inst only writes one reg/memory location
• SIMD: single instruction, multiple data– one instruction tells CPU to operate on an array of regs or
memory locations– ex: multimedia extensions: MMX, SSE, 3Dnow (intel);
altivec (powerpc)– ex: IBM/Sony/toshiba Cell processor (vector processor)
• MIMD: multiple instruction, multiple data– ex: Cluster of workstations, SMP servers, multicores,
hyperthreading
• Which is NIOS?•