Lecture 9. MIPS Processor Design – Instruction Fetch
Prof. Taeweon SuhComputer Science Education
Korea University
2010 R&E Computer System Education & Research
Korea Univ
Introduction
2
Physics
Devices
AnalogCircuits
DigitalCircuits
Logic
Micro-architecture
Architecture
OperatingSystems
ApplicationSoftware
electrons
transistorsdiodes
amplifiersfilters
AND gatesNOT gates
addersmemories
datapathscontrollers
instructionsregisters
device drivers
programs• Microarchitecture: How to implement an
architecture in hardware
• Multiple implementations for a single architecture Single-cycle
• Each instruction executes in a single cycle
Multicycle• Each instruction is executed
broken up into a series of shorter steps
• We don’t cover this in this class Pipeline
• Each instruction is broken up into a series of steps
• Multiple instructions execute simultaneously
Korea Univ
Processor Performance
• Program execution time
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle)
• Challenge in designing microarchitecture is to satisfy constraints of: Cost Power Performance
3
Korea Univ
Overview
• In chapter 4, we are going to implement (design) MIPS CPU The implemented CPU should be able to execute the machine
code we discussed so far• For the sake of your understanding, we simplify the
processor system structure
4
CPU
North Bridge
South Bridg
e
Main Memor
y(DDR)
FSB (Front-Side Bus)
DMI (Direct Media I/F)
Real-PC system
Memory(Instruction,
data)
MIPS CPU
Address Bus
Data Bus
Simplified
Korea Univ
Our MIPS Model
• Our MIPS CPU model has separate connections to instruction memory and data memory Actually, this structure is more realistic as we will
see in chapter 5
5
Instruction Memory
MIPS CPU
Address Bus
Data Bus
Data Memory
Address Bus
Data Bus
Korea Univ
MIPS CPU
Processor
• Our MIPS implementation is simplified by implementing only memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt Control flow instructions: beq, j
• Generic implementation steps Fetch: use the program counter (PC) to supply the instruction
address and fetch the instruction from memory (and update the PC)
Decoding: decode the instruction (and read registers) Execution: execute the instruction
6
Instruction Memory
Address Bus
Data Bus
Data Memory
Address Bus
Data Bus
Fetch PC = PC +4
DecodeExecute
Korea Univ
Instruction Execution in CPU
• Fetch Fetch instruction by accessing memory with PC
• Decoding Extract opcode: Determine what operation should be done Extract operands: Register numbers or immediate from fetched
instruction• Read registers from register file
• Execution Use ALU to calculate (depending on instruction class)
• Arithmetic result• Memory address for load/store• Branch target address
Access data memory for load/store
• Next Fetch PC target address or PC + 4
7
MIPS CPU Instruction Memory
Address Bus
Data Bus
Data Memory
Address Bus
Data Bus
Fetch PC = PC +4
DecodeExecute
Korea Univ
Revisiting Logic Design Basics
• Combinational logic Output is directly determined by input
• Sequential logic Output is determined not only by input, but
also by internal state Sequential logic needs state elements to store
information• Flip-flop and latch are used to store the state
information But, avoid using latch in digital design
8
Korea Univ
Combinational Logic Examples
9
AND gateY = A & B
AB
Y
I0I1
YMux
S
MultiplexerY = S ? I1 : I0
A
B
Y+
AdderY = A +
B
A
B
YALU
F
Arithmetic Logic Unit (ALU)Y = F(A, B)
Korea Univ
State Element (Register)
• Register (flip-flop): stores data in a circuit Clock signal determines when to update the stored value
• Edge-triggered Rising-edge triggered: update when clock changes from 0 to 1 Falling-edge triggered: update when clock changes from 1 to 0
Data input determines what (0 or 1) to update to the output
10
D
Clk
QClk
D
Q
Flip-flop (register)
Korea Univ
State Element (Register)
• Register with write control Only updates on clock edge when write
control input is 1
11
D
Clk
Q
Write
Write
D
Q
Clk
Korea Univ
Clocking Methodology
• Virtually all digital systems are essentially synchronous to the clock
• Combinational logic sits between state elements (registers) • Combinational logic transforms data during clock cycles
Between clock edges Input from state elements Output to the next state elements Longest delay determines clock period (frequency)
12
Korea Univ
Building a Datapath
• Processor is composed of datapath and control Datapath
• Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, …
Control• Logic that controls operations
When to write to a register What kind of operation ALU should do
• Addition, Subtraction, Exclusive OR and so on
• We will build a MIPS datapath incrementally and provide Verilog code We adopt both structural and behavioral modeling
• Behavioral modeling describes what a module does For example, the lowest modules (such as ALU and register files) will be
designed with the behavioral modeling• Structural modeling describes a module from simpler modules via
instantiations For example, the top module (such as MIPS_CPU) will be designed with the
structural modeling
13
Korea Univ
Overview of CPU Design
14
Instruction Memory
MIPS CPU
Address Bus
Data Bus
Data Memory
Address Bus
Data Bus
mips_cpu.v imem.v(Instruction
Memory)
dmem.v(Data
Memory)
mips_cpu_mem.v
mips_tb.v (testbench)
clock
reset
Binary (machine
code)
Data in your
program, Stack, Heap
Address
Instruction
DataOut
DataIn
Address
fetch, pc
Decoding
Register File
ALUMemory Access
Korea Univ
MIPS CPU
Instruction Fetch
15
PC
Instruction Memory
AddressOut
Add
4
32-bit register (flip-flops)
Increment by 4 for next instruction
32
instruction
reset
clock
• What is PC on reset? MIPS initializes the PC to 0xBFC0_0000 For the sake of simplicity, let’s initialize the PC to 0x0000_0000 in our design
• How about x86 and ARM? x86 reset vector is 0xFFFF_FFF0. BIOS ROM is located there ARM reset vector is 0x0000_0000
Korea Univ
Instruction Fetch Verilog Model
16
`include "delay.v"
module pc (input clk, reset, output reg [31:0] pc, input [31:0] pcnext);
always @(posedge clk, posedge reset) begin if (reset) pc <= #`mydelay 0'h00000000; else pc <= #`mydelay pcnext; end
endmodule
PC
Add4
resetclock
`include "delay.v"
module adder(input [31:0] a, b, output [31:0] y);
assign #`mydelay y = a + b;
endmodule
`include "delay.v"
module mips_cpu(input clk, reset, output [31:0] pc, input [31:0] instr);
wire [31:0] pcnext;
// instantiate pc and adder modules pc pcreg (clk, reset, pc, pcnext); adder pcadd4 (pc, 32'b100, pcnext);
endmodule
Korea Univ
Memory
• As studied in the Computer Logic Design, memory is classified into RAM (Random Access Memory) and ROM (Read-Only Memory) RAM is classified into DRAM (Dynamic RAM) and SRAM
(Static RAM) DDR is a DRAM
• Short form of DDR (Double Data Rate) SDRAM (Synchronous DRAM)
DDR is used as main memory in modern computers
• We use a simple Verilog memory model that stores your program since our focus is on how CPU works
17
Korea Univ
Instruction Memory Verilog Model
19
module imem(input [6:0] a, output [31:0] rd);
reg [31:0] RAM[127:0];
initial begin $readmemh("memfile.dat",RAM); end
assign #1 rd = RAM[a]; // word alignedendmodule
Instruction Memory
Compiled binary file
Word (32-bit)
128 words
rd[31:0] 32
a[6:0]7
Data comes out from the address a
200200052003000c2067fff700e220250064282400a4282010a7000a0064202a108000012005000000e2202a0085382000e23822ac6700448c0200500800001120020001ac020054
memfile.dat
• Depending on your needs, you can increase or decrease the memory size Examples
• For 1KB word-addressable memory, reg [31:0] RAM[255:0]• For 16KB byte-addressable memory, reg [7:0] RAM[16*1024-1:0]
Korea Univ
MIPS CPU with imem and Testbench
20
module mips_cpu_mem(input clk, reset);
wire [31:0] pc, instr; // instantiate processor and memories mips_cpu imips_cpu (clk, reset, pc, instr); imem imips_imem (pc[7:2], instr);
endmodule
module mips_tb();
reg clk; reg reset;
// instantiate device to be tested mips_cpu_mem imips_cpu_mem(clk, reset); // initialize test initial begin reset <= 1; # 32; reset <= 0; end
// generate clock to sequence tests initial begin clk <= 0; forever #10 clk <= ~clk; end
endmodule