cs 61c: great ideas in computer architecture control and...

CS61C:GreatIdeasinComputerArchitecture

ControlandPipelining

Instructors:VladimirStojanovicandNicholasWeaverhttp://inst.eecs.Berkeley.edu/~cs61c/sp16

Datapath ControlSignals• ExtOp: “zero”,“sign”• ALUsrc: 0⇒ regB;

1⇒ immed• ALUctr: “ADD”,“SUB”,“OR”

• MemWr: 1⇒writememory• MemtoReg:0⇒ ALU;1⇒Mem• RegDst: 0⇒ “rt”;1⇒ “rd”• RegWr: 1⇒writeregister

ALUctr

32busA

Rw Ra Rb

RegFile

RdRegDst

Extender 3216Imm16

ALUSrcExtOp

MemtoReg

DataIn32

MemWr01

1WrEn Adr

DataMemory

4nPC_sel &Equal

AdderAdder

InstAddress

SummaryoftheControlSignals(1/2)inst Register Transfer

add R[rd] ← R[rs] + R[rt]; PC ← PC + 4

ALUsrc=RegB, ALUctr=“ADD”, RegDst=rd, RegWr, nPC_sel=“+4”

sub R[rd] ← R[rs] – R[rt]; PC ← PC + 4

ALUsrc=RegB, ALUctr=“SUB”, RegDst=rd, RegWr, nPC_sel=“+4”

ori R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4

ALUsrc=Im, Extop=“Z”, ALUctr=“OR”, RegDst=rt,RegWr, nPC_sel=“+4”

lw R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4

ALUsrc=Im, Extop=“sn”, ALUctr=“ADD”, MemtoReg, RegDst=rt, RegWr, nPC_sel = “+4”

sw MEM[ R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4

ALUsrc=Im, Extop=“sn”, ALUctr = “ADD”, MemWr, nPC_sel = “+4”

beq if (R[rs] == R[rt]) then PC ← PC + sign_ext(Imm16)] || 00else PC ← PC + 4

nPC_sel = “br”, ALUctr = “SUB”

SummaryoftheControlSignals(2/2)

add sub ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWritenPCselJumpExtOpALUctr<2:0>

1001000xAdd

1001000x

Subtract

01010000Or

01110001Add

x1x01001Add

x0x0010x

Subtract

xxx00?1xx

op targetaddress

op rs rt rd shamt funct061116212631

op rs rt immediate

R-type

I-type

J-type

add,sub

ori,lw,sw,beq

funcop 000000 000000 001101 100011 101011 000100 000010AppendixA

100000See 100010 WeDon’tCare:-)

BooleanExpressionsforControllerRegDst = add + subALUSrc = ori + lw + swMemtoReg = lwRegWrite = add + sub + ori + lw MemWrite = swnPCsel = beqJump = jump ExtOp = lw + swALUctr[0] = sub + beq (assume ALUctr is 00 ADD, 01 SUB, 10 OR)ALUctr[1] = or

Where:

rtype = ~op5 • ~op4 • ~op3 • ~op2 • ~op1 • ~op0, ori = ~op5 • ~op4 • op3 • op2 • ~op1 • op0lw = op5 • ~op4 • ~op3 • ~op2 • op1 • op0sw = op5 • ~op4 • op3 • ~op2 • op1 • op0beq = ~op5 • ~op4 • ~op3 • op2 • ~op1 • ~op0jump = ~op5 • ~op4 • ~op3 • ~op2 • op1 • ~op0

add = rtype • func5 • ~func4 • ~func3 • ~func2 • ~func1 • ~func0sub = rtype • func5 • ~func4 • ~func3 • ~func2 • func1 • ~func0

How do we implement this in

gates?

ControllerImplementation

addsuborilwswbeqjump

RegDstALUSrcMemtoRegRegWriteMemWritenPCselJumpExtOpALUctr[0]ALUctr[1]

“AND” logic “OR” logic

opcode func

P&HFigure4.17

Summary:Single-cycleProcessor• Fivestepstodesignaprocessor:

1.Analyzeinstructionsetàdatapathrequirements

2.Selectsetofdatapathcomponents&establishclockmethodology

3.Assembledatapathmeetingtherequirements

4.Analyzeimplementationofeachinstructiontodeterminesettingofcontrolpointsthateffectstheregistertransfer.

5.Assemblethecontrollogic• FormulateLogicEquations• DesignCircuits

Control

Datapath

Memory

ProcessorInput

Output

SingleCyclePerformance• Assumetimefor actionsare

– 100psforregisterreadorwrite;200psforother events

• Clockperiodis?Instr Instr fetch Register

readALU op Memory

accessRegister write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

• Clock rate (cycles/second = Hz) = 1/Period (seconds/cycle)

SingleCyclePerformance• Assumetimefor actionsare

– 100psforregisterreadorwrite;200psforother events

• Clockperiodis?Instr Instr fetch Register

readALU op Memory

accessRegister write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

beq 200ps 100 ps 200ps 500ps

• What can we do to improve clock rate?• Will this improve performance as well?

Want increased clock rate to mean faster programs10

LevelsofRepresentation/Interpretation

lw $t0,0($2)lw $t1,4($2)sw $t1,0($2)sw $t0,4($2)

HighLevelLanguageProgram(e.g.,C)

AssemblyLanguageProgram(e.g.,MIPS)

MachineLanguageProgram(MIPS)

HardwareArchitectureDescription(e.g.,blockdiagrams)

Compiler

Assembler

MachineInterpretation

temp=v[k];v[k]=v[k+1];v[k+1]=temp;

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

LogicCircuitDescription(CircuitSchematicDiagrams)

ArchitectureImplementation

Anythingcanberepresentedasanumber,

i.e.,dataorinstructions

NoMoreMagic!

CS61ACS61BCS61C ✔CS61C ✔CS61C ✔CS61C çCS61C ✔EE40Phys 7B

I/O systemProcessor

CompilerOperatingSystem(Mac OSX)

Application (ex: browser)

Digital DesignCircuit Design

Instruction Set Architecture

Datapath & Control

transistors

MemoryHardware

Software Assembler

Administrivia

• Project2-2due3/8@23:59:59(Tue)• GuerrillaSessions:MIPSCPU

– Wed3/093- 5PM@241Cory– Sat3/121- 3PM@651@611Soda

GottaDoLaundry• Ann,Brian,Cathy,Daveeachhaveoneloadofclothestowash,dry,fold,andputaway– Washertakes30minutes

– Dryertakes30minutes

– “Folder”takes30minutes

– “Stasher”takes30minutestoputclothesintodrawers

A B C D

SequentialLaundry

• Sequentiallaundrytakes8hoursfor4loads

A30Time

30 30 3030 30 3030 30 30 3030 30 30 3030

6 PM 7 8 9 10 11 12 1 2 AM

PipelinedLaundry

• Pipelinedlaundrytakes3.5hoursfor4loads!

12 2 AM6 PM 7 8 9 10 11 1

Time3030 30 3030 30 30

• Pipeliningdoesn’thelplatencyofsingletask,ithelpsthroughputofentireworkload

• Multiple tasksoperatingsimultaneouslyusingdifferentresources

• Potentialspeedup=Numberpipestages

• Timeto“fill”pipelineandtimeto“drain”itreducesspeedup:2.3x(8/3.5)v.4x(8/2)inthisexample

6 PM 7 8 9Time

A3030 30 3030 30 30

PipeliningLessons(1/2)

• SupposenewWashertakes20minutes,newStashertakes20minutes.Howmuchfasterispipeline?

• Pipelineratelimitedbyslowest pipelinestage

• Unbalancedlengthsofpipestagesreducesspeedup

6 PM 7 8 9Time

A3030 30 3030 30 30

PipeliningLessons(2/2)

1)IFtch:InstructionFetch,IncrementPC2)Dcd:InstructionDecode,ReadRegisters3)Exec:

Mem-ref: CalculateAddressArith-log:PerformOperation

4)Mem:Load:ReadDatafromMemoryStore:WriteDatatoMemory

5)WB:WriteDataBacktoRegister

ExecutionStepsinMIPSDatapath

rtrsrd

1. InstructionFetch

2. Decode/Register Read

3. Execute 4. Memory 5. WriteBack

SingleCycleDatapath

rtrsrd

1. InstructionFetch

2. Decode/Register Read

3. Execute 4. Memory 5. WriteBack

Pipelineregisters

• Needregistersbetweenstages– Toholdinformationproducedinpreviouscycle

MoreDetailedPipeline

IFforLoad,Store,…

IDforLoad,Store,…

EXforLoad

MEMforLoad

WBforLoad– Oops!

Wrongregisternumber!

CorrectedDatapathforLoad

PipelinedExecutionRepresentation

• Everyinstructionmusttakesamenumberofsteps,sosomestageswillidle– e.g.MEMstageforanyarithmeticinstruction

IF ID EX MEM WBIF ID EX MEM WB

GraphicalPipelineDiagrams

• Usedatapath figurebelowtorepresentpipeline:IF ID EX Mem WB

ALUI$ Reg D$ Reg

1.InstructionFetch

2.Decode/RegisterRead

3.Execute 4.Memory 5.WriteBack

RegisterFilert

Time (clock cycles)

Reg Reg

• RegFile: left half is write, right half is read

GraphicalPipelineRepresentation

PipeliningPerformance(1/3)

• UseTc (“timebetweencompletionofinstructions”)tomeasurespeedup

– Equalityonlyachievedifstagesarebalanced(i.e.takethesameamountoftime)

• Ifnotbalanced,speedupisreduced• Speedupduetoincreasedthroughput

– Latency foreachinstructiondoesnotdecrease

• Assumetimeforstagesis– 100psforregisterreadorwrite– 200psforotherstages

• Whatispipelinedclockrate?– Comparepipelineddatapath withsingle-cycledatapath

Instr Instrfetch

Register read

ALU op Memory access

Register write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

beq 200ps 100 ps 200ps 500ps

Single-cycleTc = 800 psf = 1.25GHz

PipelinedTc = 200 ps

f = 5GHz

Clicker/PeerInstructionLogicinsomestagestakes200psandinsome100ps.Clk-Qdelayis30psandsetup-timeis20ps.Whatisthemaximumclockfrequencyatwhichapipelineddesigncanoperate?• A:10GHz• B:5GHz• C:6.7GHz• D:4.35GHz• E:4GHz

cs 61c: great ideas in computer architecture control and...

Documents

l19 [acon] esquema eléctrico

(l19) magnetism f13

l19 hepatic failure

l19: prosodic modification of...

09 5780 l19 - eng.utah.edu

spelling l19

l19-3d survey design

l19 slides

l19 - j2me

cs f301-l19-21

l19 - crisis in healthcare

l19 agen nutrisi

adjektive l19

l19 (2).pdf

l19 ch30 inductors - uml.edu

l19 social

cbalaji “gelefilms l19,

l19 lecture

air & space capabilities < < l19 > >

l19 2000 a pensiei