electrical and computer engineering university of cyprus 22-10-2014 lab3: improving mips performance...

Electrical and Computer EngineeringUniversity of Cyprus

22-10-2014

LAB3: IMPROVING MIPS

PERFORMANCE WITH PIPELINING

Previous Labs: MIPS single-cycle with Memory System

2

In this Lab: Pipelining

You are expected to: Understand the concept Be familiar with the 5 MIPS pipeline stages Understand the three pipeline hazards and

their solutions.

3

PIPELINING

Technique in which the execution of several instructions is overlapped.

Each instruction is broken into several stages. Stages can operate concurrently PROVIDED

WE HAVE SEPARATE RESOURCES FOR EACH STAGE! => each step uses a different functional unit.

Note: execution time for a single instruction is NOT improved. Throughput of several instructions is improved.

4

Major Pipeline Benefit = Performance

Ideal Performance Time/instruction = non-piped-time/#stages This is an asymptote of course, but +10% is

commonly achievedTwo ways to view the performance mechanism Reduced CPI (i.e. non-piped to piped change)

Close to 1 instruction/cycle if you’re lucky Reduced cycle-time (i.e. increasing pipeline depth)

Work split into more stages Simpler stages result in faster clock cycles

5

Other Pipeline Benefits

Completely hardware mechanism All modern machines are pipelined

This was the key technique to advancing performance in the 80’s

In the 90’s the move was to multiple pipelines

Beware, no benefit is totally free/good

6

7

Laundry analogy to pipelining

Implementation of a RISC (Unpipelined, Multicycle) Implementation of an integer subset of a RISC

architecture that takes at most 5 clock cycles. Instruction Fetch (IF) Instruction Decode/Register Fetch (ID) Execution/Effective Address Calculation

(EX) Memory Access (MEM) Write-Back (WB)

8

Review: Single-cycle Datapath for MIPS

DataMemory(Dmem)

PC Registers ALUInstruction

Memory(Imem)

Stage 1 Stage 2 Stage 3 Stage 4

Stage 5

IFtch Dcd Exec Mem WB

Use datapath figure to represent pipeline

AL

U IM Reg DM Reg

9

Classic 5 Stage Pipeline for a RISC Processor

10

Pipelined Execution Representation

To simplify pipeline, every instruction takes same number of steps, called stages

One clock cycle per stage






Program Flow

Time

11

Important Pipeline Characteristics LatencyLatency

Time it takes an instruction to go through the pipe

Latency = # stages * stage-delay Dominant feature if there are a lot of

exceptions… ThroughputThroughput

Determined by the rate at which instructions can start/finish

Dominant feature if there are few exceptions

12

Pipelining Lessons Pipelining doesn’t help latency (execution time) of

single task, it helps throughput of entire workload Multiple tasks operating simultaneously using

different resources Potential speedup = Number of pipe stages Time to “fill” pipeline and time to “drain” it reduces

speedup

Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages also reduces

speedup

13

Single Cycle Datapath

Regs

ReadReg1

Readdata1

ALURead

data2

ReadReg2

WriteReg

WriteData

Zero

ALU-con

RegWrite

Address

Readdata

WriteData

SignExtend

Dmem

MemRead

MemWrite

Mux

MemTo-Reg

Mux

Read Addr

Instruc-tion

Imem

4

PC

add

add <<

2

Mux

PCSrc

ALUOp

ALU-src

Mux

25:21

20:16

15:11

RegDst

15:0

31:0

14

Required Changes to Datapath Introduce registers to separate 5 stages by

putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath.

Next PC value is computed in the 3rd step, but we need to bring in next instruction in the next cycle.

Branch address is computed in 3rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instruction.

15

Changes to Datapath (cont.) For lw instruction, we need write register

address at stage 5. But the IR is now occupied by another instruction! So, we must carry the IR destination field as we move along the stages. See connection in fig.

16

Pipelined Datapath (with Pipeline Regs)

Address

4

32

0

Add Addresult

Shiftleft 2

Ins

tru

ctio

n

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

Imem

Dmem

Regs

IF/ID ID/EX EX/MEM MEM/WB

64 bits 133 bits 102 bits 69 bits

5

Fetch Decode Execute Memory Write Back

17

Pipelined Datapath (with Control Signals)

MemtoReg5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5

IF/IDID/EX

EX/MEM MEM/WB

Zero

0

1

MemRead

ALUSrc

MemWrite

ALUControl6

ALUOp0

1

RegDst5

rs

rt

rt

rd

RegWrite

immed

Branch

0

1

PCSrc PCSrc

0

1

18

Control for Pipelined Datapath

EX

M

WB

Control

IF / ID ID / EX EX / MEM MEM / WB

M

WB

WB

RegDstALUOp[1:0]ALUSrc

MemReadMemWriteBranch

RegWriteMemtoReg

Basic approach: build on single-cycle control Place control unit in ID stage Pass control signals to following stages

19

Control for Pipelined Datapath

Execution/Address Calculation stage control lines

Memory access stage control lines

Write-back stage control lines

Instruction Reg DstALU Op1

ALU Op0 ALU Src Branch

Mem Read

Mem Write

Reg write

Mem to Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

EX

M

WB

Control

IF / ID ID / EX EX / MEM MEM / WB

M

WB

WB

20

Datapath and Control Unit

W

M WE

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5


Zero

0

1

MemRead

ALUSrc

ALUControl6

ALUOp0

1

RegDst

5

rs

rt

rt

rd

RegWrite

immed

Branch

0

1

PCSrcRegWrite

0

1

W

MControl

21

Tracking Control Signals - Cycle 1

LW

W

M WE

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5


Zero

0

1

MemRead

ALUSrc

ALUControl6

ALUOp0

1

RegDst

5

rs

rt

rt

rd

RegWrite

immed

Branch

0

1

PCSrcRegWrite

0

1

W

MControl

22


SW LW

W

M WE

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5


Zero

0

1

MemRead

ALUSrc

ALUControl6

ALUOp0

1

RegDst

5

rs

rt

rt

rd

RegWrite

immed

Branch

0

1

PCSrcRegWrite

0

1

W

MControl

23


ADD SW LW

001

1

W

M WE

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5


Zero

0

1

MemRead

ALUSrc

ALUControl6

ALUOp0

1

RegDst

5

rs

rt

rt

rd

RegWrite

immed

Branch

0

1

PCSrcRegWrite

0

1

W

MControl

24


SUB ADD SW LW

1

0

0

W

M WE

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5


Zero

0

1

MemRead

ALUSrc

ALUControl6

ALUOp0

1

RegDst

5

rs

rt

rt

rd

RegWrite

immed

Branch

0

1

PCSrcRegWrite

0

1

W

MControl

25

1

1

ADD


SUB SW LW

W

M WE

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5


Zero

0

1

MemRead

ALUSrc

ALUControl6

ALUOp0

1

RegDst

5

rs

rt

rt

rd

RegWrite

immed

Branch

0

1

PCSrcRegWrite

0

1

W

MControl

26

Can pipelining get us into trouble? Yes: Pipeline Hazards Hazards occur because data required for

executing the current instruction may not be available. Structural hazards: attempt to use the same

resource two different ways at the same time Control hazards: attempt to make a decision

before condition is evaluated branch instructions interrupts

27

Pipelining troubles (cont.) Data hazards occur when an instruction needs register contents for an arithmetic/ logical/memory instruction, before they are ready. instruction depends on result of prior instruction

still in the pipeline Can always resolve hazards by waiting:

pipeline control must detect the hazard take action (or delay action) to resolve hazards

28

Hazard detection & Forwarding Units

29

Forwarding: A solution to Data Hazards

30

Stalls Forwarding will not always solve the problems of data hazards.

For example, suppose an add instruction follows a load word (lw), and the add involves the register that receives the memory data.

In this case, forwarding will not work.

The reason is that the data must be read from memory, and so it will not be available until the end of the MEM cycle. Thus the required data is not available for a forward, and the add instruction. if it proceeds, will process the wrong data.

A solution to this problem is the stall.

A stall halts the instruction awaiting data, while the key instruction (a lw in this case) proceeds to the end of the MEM cycle, after which the desired data is available to the add.

31

Other Problems With Branches A remaining problem is what to do about instructions following a

branch. Even assuming forwarding and stalls, the branch/no branch

decision is not made until the third stage. This means that in the MIPS pipeline, two following instructions

will enter the pipe before the branch/no branch decision is made. What if:

The following instructions were for the case of “branch taken” and the branch was not taken.

The following instructions were for “branch not taken” and it was taken.

In either case, the wrong instructions are in the pipe and they must be eliminated (“flushed”).

How can this problem be prevented?

32

Control Hazard Approach One approach is to always assume the branch is(or is not)

taken: Say we assume the branch is never taken. Then if the

instruction in ALU/EX is a branch, the instructions in IF and ID/RF will be those in the “not taken” program line (branch determination is made in ALU/EX). It this assumption is correct, the pipeline will continue to flow

without delay. When the branch is taken, instructions in IF and ID/RF must be

“flushed,” usually by changing the “op” code of those instructions to a “nop” and letting them proceed to the end of the pipe.

33

Summary

Pipelining is a fundamental concept in computers/nature Multiple instructions in flight Limited by length of longest stage Latency vs.Throughput

Hazards gum up the works

34

electrical and computer engineering university of cyprus 22-10-2014 lab3: improving mips performance...

Documents

stage pipeline

pipelining slide

mips pipeline stages

stages simpler stages

pipeline benefits

pipeline hazards

single instruction

number of pipe stages