revisiting the processor arvind computer science & artificial intelligence lab
DESCRIPTION
Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. An unpipelined multicycle architecture. Only one instruction at a time. RFile. pc. PCGen. Exec. WB. dstReg. DCache. ICache. The Multi-stage Design. - PowerPoint PPT PresentationTRANSCRIPT
December 8, 2009 L28-1http://csg.csail.mit/korea
Revisiting the Processor
ArvindComputer Science & Artificial Intelligence LabMassachusetts Institute of Technology
December 8, 2009 L28-2http://csg.csail.mit/korea
An unpipelined multicycle architecture
pcPCGenPCGen ExecExec WBWB
ICache DCache
RFileOnly one instruction at a time
December 8, 2009 L28-3http://csg.csail.mit/korea
The Multi-stage Design3 Rules (one for each stage)1 instruction active at a time2-3 stages / instructionLittle inter-stage bufferingMulti-cycle memory interface
December 8, 2009 L28-4http://csg.csail.mit/korea
Rules for multistage/multicycle designrule pcgen(stage == PC); imemReq.enq(Rd{a:pc}); stage <= EX;endrulerule exec(stage == EX); let inst = imemResp.first(); imemResp.deq(); match {.nextpc, .rf_cmd, .mem_cmd} = doInst(pc,inst,rf); pc <= nextpc; case (rf_cmd) matches tagged RF {.dst,.val}: rf.upd(dst,val); endcase case (mem_cmd) matches tagged Ld {.dst,.addr}: begin dmemReq.enq(Rd{a:addr}}; dstReg <= dst; end tagged St {.addr,.val}: dmemReq.enq(Wr{a:addr,v:val}); endcase stage <= (mem_cmd matches tagged Ld .*)? WB: PC;endrule
rule writeback (stage == WB); let dresp = dmemResp.first(); dmemResp.deq(); case (dresp) matches tagged RdResp {.val}: rf.upd(dstReg, val); endcase stage <= PC; endrule
case (inst) …
December 8, 2009 L28-5http://csg.csail.mit/korea
Discussion PointsCan these 3 rules be combined into one rule? No, memory takes multiple cycles
What will happen if we forgot to dequeue (imemResp)? System will get stuck Eventually, PCgen can not longer fire
December 8, 2009 L28-6http://csg.csail.mit/korea
Pipelining the Design
pcPCGenPCGen ExecExec WBWB
ICache DCache
RFileStep 1: Insert buffers
December 8, 2009 L28-7http://csg.csail.mit/korea
Pipelining the Design 2
pcPCGenPCGen ExecExec WBWB
ICache DCache
RFile
It is problematic to write RFile from two stages: structural hazards & possible out-of-order writes(and reads) to RFile
Step 2: delay all RFile writes to WB stage(Requires passing dst-val pairs to WB stage. Subsumes dstReg)
December 8, 2009 L28-8http://csg.csail.mit/korea
Pipelining the Design 3
pcPCGenPCGen ExecExec WBWB
ICache DCache
RFile
PC is read by PCGen and written by Exec. No parallelism without PC speculation
If speculation fails, we must reset the PC and discard false path instructions(may take many cycles)
Epochs to identify to which speculative path an instruction belongs
epoch
December 8, 2009 L28-9http://csg.csail.mit/korea
Pipelining the Design 4
pc
PCGenPCGen ExecExec WBWB
ICache DCache
RFile
epoch
Final concern: Data hazards
December 8, 2009 L28-10http://csg.csail.mit/korea
Isolating RFile Port Usagerule pcgen(stage == PC); imemReq.enq(Rd{a:pc}); stage <= EX;endrule
rule exec(stage == EX); let inst = imemResp.first(); imemResp.deq(); match {.nextpc, .rf_cmd, .mem_cmd} = doInst(pc,inst,rf); pc <= nextpc; case (rf_cmd) matches tagged RF {.dst,.val}: rf.upd(dst,val); endcase case (mem_cmd) matches tagged Ld {.dst,.addr}: begin dmemReq.enq(Rd{a:addr}}; dstReg <= dst; end tagged St {.addr,.val}: dmemReq.enq(Wr{a:addr,v:val}); endcase stage <= (mem_cmd matches tagged Ld .*)? WB: PC;endrule
rule writeback (stage == WB); let dresp = dmemResp.first(); dmemResp.deq(); case (dresp) matches tagged RdResp {.val}: rf.upd(dstReg, val); endcase stage <= PC; endrule
case (inst) ……Reg2RegOp: …wbData <= RF{dst,val};…LoadOp: …wbData <= Ld{dst};…
rule writeback (stage == WB); case (wbData) matches tagged RF {.dst,.val}: rf.upd(dst,val); tagged Ld {.dst}: begin let dresp = dmemResp.first(); dmemResp.deq(); rf.upd(dst,memVal(dresp)); end endcase stage <= PC; endrule
Suppose we use a wbData register to pass information to the WB stage about updating the RFile
December 8, 2009 L28-11http://csg.csail.mit/korea
Rules for the Pipelined machinerule pcgen; imemReq.enq(Rd{a:pc}); pc <= predPC; pcQ.enq(tuple2(pc,epoch));endrule rule discard (epoch != eEpoch); pcQ.deq(); imemResp.deq();endrule
rule exec ((epoch != eEpoch)&& !(stall(inst,wbQ))); case based on the fetched instruction
rule writeback (True); wbQ.deq(); case (wbQ.first()) matches tagged RF {.dst,.val}: rf.upd(dst,val); tagged Ld {.dst}: begin let dresp = dmemResp.first(); dmemResp.deq(); rf.upd(dst, memVal(dresp)); end endcaseendrule
December 8, 2009 L28-12http://csg.csail.mit/korea
The Execute Rulelet inst = imemResp.first(); match {.predPC,.eEpoch} = pcQ.first();
rule exec(epoch == eEpoch && !stall(inst, wbQ)); pcQ.deq(); imemResp.deq(); match {.nextPC, .rf_cmd, .mem_cmd} =
doInst(pc,inst,rf); if(predPC!=nextPC) begin pc <= nextPC; epoch<= epoch+1; end case (tuple2(rf_cmd, mem_cmd)) matches {tagged RF {.dst,.val}, .*}: wbQ.enq(RF{dst,val}); {.*, tagged Ld {.dst,.addr}}: begin wbQ.enq(Ld{dst,addr}); dmemReq.enq(Rd{a:dst}}; end {.*, tagged St {.addr,.val}}: begin wbQ.enq(St); dmemReq.enq(Wr{a:addr,v:val}); end endcase endrule
December 8, 2009 L28-13http://csg.csail.mit/korea
Design FlowAre these rules correct? I.e. do they produce the correct results regardless of the order in which they are executed
test – fix – test – fix ….
Run the design and look at the traces to understand concurrency traces tell you what is happening at each cycle but not why
something is not happening
Does your design permit “concurrent firings”, i.e., multiple instructions in the pipeline
Compiler output can tell you Can multiple guards be true simultaneously? Structural conflicts? Permitted rule orderings within a cycle
You may want to split a rule into multiple rules
December 8, 2009 L28-14http://csg.csail.mit/korea
Top down concurrency analysis
Determine the concurrent rule firings and rule ordering you wantTo hand analysis to determine the required concurrent behavior of methods of submodules. If this behavior is prohibited by a submodule, create a submodule with the desired behavior
this may require Rwires and ConfigRegs
December 8, 2009 L28-15http://csg.csail.mit/korea
Branch PredictionPipeline has simple speculation
rule pcGen (True);
pc <= pc + 4;
otherActions;
endrule
Simplest prediction:Always not-taken
December 8, 2009 L28-16http://csg.csail.mit/korea
Branch Predictors
pc
epochPCGenPCGen ExecExec WBWB
ICache DCache
RFilepred
December 8, 2009 L28-17http://csg.csail.mit/korea
Branch Predictioninterface BranchPredictor; method Addr getNextPC(Addr pc); method Action update (Addr pc, Addr correct_next_pc);endinterface
rule pcGen (True);
pc <= pred.getNextPC(pc);
otherActions;
endrule
rule execute …
if (nextPC != correctPC) pred.update(curPc, nextPC);
case (instr) matches …
BzTaken: if (mispredicted) …
endrule
Update predictions
Make prediction