revisiting the processor arvind computer science & artificial intelligence lab

17
December 8, 2009 L28-1 http://csg.csail.mit/ korea Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Upload: cosmo

Post on 30-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. An unpipelined multicycle architecture. Only one instruction at a time. RFile. pc. PCGen. Exec. WB. dstReg. DCache. ICache. The Multi-stage Design. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-1http://csg.csail.mit/korea

Revisiting the Processor

ArvindComputer Science & Artificial Intelligence LabMassachusetts Institute of Technology

Page 2: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-2http://csg.csail.mit/korea

An unpipelined multicycle architecture

pcPCGenPCGen ExecExec WBWB

ICache DCache

RFileOnly one instruction at a time

Page 3: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-3http://csg.csail.mit/korea

The Multi-stage Design3 Rules (one for each stage)1 instruction active at a time2-3 stages / instructionLittle inter-stage bufferingMulti-cycle memory interface

Page 4: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-4http://csg.csail.mit/korea

Rules for multistage/multicycle designrule pcgen(stage == PC); imemReq.enq(Rd{a:pc}); stage <= EX;endrulerule exec(stage == EX); let inst = imemResp.first(); imemResp.deq(); match {.nextpc, .rf_cmd, .mem_cmd} = doInst(pc,inst,rf); pc <= nextpc; case (rf_cmd) matches tagged RF {.dst,.val}: rf.upd(dst,val); endcase case (mem_cmd) matches tagged Ld {.dst,.addr}: begin dmemReq.enq(Rd{a:addr}}; dstReg <= dst; end tagged St {.addr,.val}: dmemReq.enq(Wr{a:addr,v:val}); endcase stage <= (mem_cmd matches tagged Ld .*)? WB: PC;endrule

rule writeback (stage == WB); let dresp = dmemResp.first(); dmemResp.deq(); case (dresp) matches tagged RdResp {.val}: rf.upd(dstReg, val); endcase stage <= PC; endrule

case (inst) …

Page 5: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-5http://csg.csail.mit/korea

Discussion PointsCan these 3 rules be combined into one rule? No, memory takes multiple cycles

What will happen if we forgot to dequeue (imemResp)? System will get stuck Eventually, PCgen can not longer fire

Page 6: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-6http://csg.csail.mit/korea

Pipelining the Design

pcPCGenPCGen ExecExec WBWB

ICache DCache

RFileStep 1: Insert buffers

Page 7: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-7http://csg.csail.mit/korea

Pipelining the Design 2

pcPCGenPCGen ExecExec WBWB

ICache DCache

RFile

It is problematic to write RFile from two stages: structural hazards & possible out-of-order writes(and reads) to RFile

Step 2: delay all RFile writes to WB stage(Requires passing dst-val pairs to WB stage. Subsumes dstReg)

Page 8: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-8http://csg.csail.mit/korea

Pipelining the Design 3

pcPCGenPCGen ExecExec WBWB

ICache DCache

RFile

PC is read by PCGen and written by Exec. No parallelism without PC speculation

If speculation fails, we must reset the PC and discard false path instructions(may take many cycles)

Epochs to identify to which speculative path an instruction belongs

epoch

Page 9: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-9http://csg.csail.mit/korea

Pipelining the Design 4

pc

PCGenPCGen ExecExec WBWB

ICache DCache

RFile

epoch

Final concern: Data hazards

Page 10: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-10http://csg.csail.mit/korea

Isolating RFile Port Usagerule pcgen(stage == PC); imemReq.enq(Rd{a:pc}); stage <= EX;endrule

rule exec(stage == EX); let inst = imemResp.first(); imemResp.deq(); match {.nextpc, .rf_cmd, .mem_cmd} = doInst(pc,inst,rf); pc <= nextpc; case (rf_cmd) matches tagged RF {.dst,.val}: rf.upd(dst,val); endcase case (mem_cmd) matches tagged Ld {.dst,.addr}: begin dmemReq.enq(Rd{a:addr}}; dstReg <= dst; end tagged St {.addr,.val}: dmemReq.enq(Wr{a:addr,v:val}); endcase stage <= (mem_cmd matches tagged Ld .*)? WB: PC;endrule

rule writeback (stage == WB); let dresp = dmemResp.first(); dmemResp.deq(); case (dresp) matches tagged RdResp {.val}: rf.upd(dstReg, val); endcase stage <= PC; endrule

case (inst) ……Reg2RegOp: …wbData <= RF{dst,val};…LoadOp: …wbData <= Ld{dst};…

rule writeback (stage == WB); case (wbData) matches tagged RF {.dst,.val}: rf.upd(dst,val); tagged Ld {.dst}: begin let dresp = dmemResp.first(); dmemResp.deq(); rf.upd(dst,memVal(dresp)); end endcase stage <= PC; endrule

Suppose we use a wbData register to pass information to the WB stage about updating the RFile

Page 11: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-11http://csg.csail.mit/korea

Rules for the Pipelined machinerule pcgen; imemReq.enq(Rd{a:pc}); pc <= predPC; pcQ.enq(tuple2(pc,epoch));endrule rule discard (epoch != eEpoch); pcQ.deq(); imemResp.deq();endrule

rule exec ((epoch != eEpoch)&& !(stall(inst,wbQ))); case based on the fetched instruction

rule writeback (True); wbQ.deq(); case (wbQ.first()) matches tagged RF {.dst,.val}: rf.upd(dst,val); tagged Ld {.dst}: begin let dresp = dmemResp.first(); dmemResp.deq(); rf.upd(dst, memVal(dresp)); end endcaseendrule

Page 12: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-12http://csg.csail.mit/korea

The Execute Rulelet inst = imemResp.first(); match {.predPC,.eEpoch} = pcQ.first();

rule exec(epoch == eEpoch && !stall(inst, wbQ)); pcQ.deq(); imemResp.deq(); match {.nextPC, .rf_cmd, .mem_cmd} =

doInst(pc,inst,rf); if(predPC!=nextPC) begin pc <= nextPC; epoch<= epoch+1; end case (tuple2(rf_cmd, mem_cmd)) matches {tagged RF {.dst,.val}, .*}: wbQ.enq(RF{dst,val}); {.*, tagged Ld {.dst,.addr}}: begin wbQ.enq(Ld{dst,addr}); dmemReq.enq(Rd{a:dst}}; end {.*, tagged St {.addr,.val}}: begin wbQ.enq(St); dmemReq.enq(Wr{a:addr,v:val}); end endcase endrule

Page 13: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-13http://csg.csail.mit/korea

Design FlowAre these rules correct? I.e. do they produce the correct results regardless of the order in which they are executed

test – fix – test – fix ….

Run the design and look at the traces to understand concurrency traces tell you what is happening at each cycle but not why

something is not happening

Does your design permit “concurrent firings”, i.e., multiple instructions in the pipeline

Compiler output can tell you Can multiple guards be true simultaneously? Structural conflicts? Permitted rule orderings within a cycle

You may want to split a rule into multiple rules

Page 14: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-14http://csg.csail.mit/korea

Top down concurrency analysis

Determine the concurrent rule firings and rule ordering you wantTo hand analysis to determine the required concurrent behavior of methods of submodules. If this behavior is prohibited by a submodule, create a submodule with the desired behavior

this may require Rwires and ConfigRegs

Page 15: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-15http://csg.csail.mit/korea

Branch PredictionPipeline has simple speculation

rule pcGen (True);

pc <= pc + 4;

otherActions;

endrule

Simplest prediction:Always not-taken

Page 16: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-16http://csg.csail.mit/korea

Branch Predictors

pc

epochPCGenPCGen ExecExec WBWB

ICache DCache

RFilepred

Page 17: Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

December 8, 2009 L28-17http://csg.csail.mit/korea

Branch Predictioninterface BranchPredictor; method Addr getNextPC(Addr pc); method Action update (Addr pc, Addr correct_next_pc);endinterface

rule pcGen (True);

pc <= pred.getNextPC(pc);

otherActions;

endrule

rule execute …

if (nextPC != correctPC) pred.update(curPc, nextPC);

case (instr) matches …

BzTaken: if (mispredicted) …

endrule

Update predictions

Make prediction