Realistic Memories and Caches
ArvindComputer Science & Artificial Intelligence Lab.Massachusetts Institute of Technology
January 19, 2012 L9-1http://csg.csail.mit.edu/SNU
Three-Stage SMIPS
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4fr
Epoch
wbr
stall?
The use of magic memories makes
this design unrealistic
January 19, 2012 L9-2http://csg.csail.mit.edu/SNU
A Simple Memory Model
Reads and writes are always completed in one cycle
a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock
edge(the write address and data must be stable at the clock edge)
MAGIC RAM
ReadData
WriteData
Address
WriteEnableClock
In a real DRAM the data will be available several cycles after the address is supplied
January 19, 2012 L9-3http://csg.csail.mit.edu/SNU
Memory Hierarchy
size: RegFile << SRAM << DRAM why? latency: RegFile << SRAM << DRAM why? bandwidth: on-chip >> off-chip why?
On a data access:hit (data Î fast memory) low latency accessmiss (data Ï fast memory) long latency access (DRAM)
Small,Fast Memory
SRAM
CPURegFile
Big, Slow MemoryDRAM
A B
holds frequently used data
January 19, 2012 L9-4http://csg.csail.mit.edu/SNU
Plan
What do simple caches look like
Incorporating caches in processor pipleline
January 19, 2012 L9-5http://csg.csail.mit.edu/SNU
Data Cache - Interface
interface DCache; method Action req(MemReq r); method ActionValue#(MemResp) resp;
method ActionValue#(MemReq) memReq; method Action memResp(MemResp r);endinterface
cache
req
guardresp
guard
memReq
guardmemResp
guard
Processor DRAM
hitQ
mReqQ
mRespQ
missReq
January 19, 2012 L9-6http://csg.csail.mit.edu/SNU
Direct-Mapped Cache
Tag Data Block V
=
Offset Tag Index
t k b
t
HIT Data Word or Byte
2k
lines
Block number Block offset
What is a bad reference pattern? Strided at size of cache
req address
January 19, 2012 L9-7http://csg.csail.mit.edu/SNU
Data Cache – code structuremodule mkDCache(DCache); ---state declarations; Vector#(Rows, Reg#(Bool)) vArray <-
replicateM(mkReg(False));…
rule doMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(MemResp) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod; method Action memResp(MemResp r) … endmethod;endmodule
January 19, 2012 L9-8http://csg.csail.mit.edu/SNU
Data Cache state declarations
Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False));
Vector#(Rows, Reg#(Tag)) tagArray <- replicateM(mkRegU);
Vector#(Rows, Reg#(Data)) dataArray <- replicateM(mkRegU);
FIFOF#(MemReq) mReqQ <- mkUGFIFOF1; FIFOF#(MemResp) mRespQ <- mkUGFIFOF1;
PipeReg#(MemReq) hitQ <- mkPipeReg; Reg#(MemReq) missReq <- mkRegU; Reg#(Bit#(2)) status <- mkReg(0);
January 19, 2012 L9-9http://csg.csail.mit.edu/SNU
Data Cache processor-side methods method Action req(MemReq req) if (status==0); Index idx = truncate(req.addr>>2); Tag tag = truncateLSB(req.addr); Bool valid = vArray[idx]; Bool tagMatch = tagArray[idx]==tag; if(valid && tagMatch && hitQ.notFull) hitQ.enq(req); else begin missReq <= req; status <= 1; end endmethod
method ActionValue#(MemResp) resp if(hitQ.notEmpty && status==0); hitQ.deq; let r = hitQ.first; Index idx = truncate(r.addr>>2); if(r.op==St) dataArray[idx] <= r.data; return dataArray[idx]; endmethod
January 19, 2012 L9-10http://csg.csail.mit.edu/SNU
Data Cache memory-side methods method ActionValue#(MemReq) memReq if (mReqQ.notEmpty); mReqQ.deq; return mReqQ.first; endmethod
method Action memResp(MemResp res) if (mRespQ.notFull); mRespQ.enq(res); endmethod
January 19, 2012 L9-11http://csg.csail.mit.edu/SNU
Data CacheRule to process a cache-missrule doMiss (status!=0); Index idx = truncate(missReq.addr>>2); if(status==1 && mReqQ.notFull) begin if(vArray[idx]) mReqQ.enq( MemReq{op:St, addr:{tagArray[idx],idx,2'b00}, data:dataArray[idx]}); status <= 2; end
if(status==2 && mReqQ.notFull && (!vArray[idx] || mRespQ.notEmpty)) begin if(vArray[idx]) mRespQ.deq; mReqQ.enq(MemReq{op:Ld, addr:missReq.addr, data:?}); status <= 3; end
January 19, 2012 L9-12http://csg.csail.mit.edu/SNU
Data CacheRule to process a cache-miss rule doMiss (status!=0); … if(status==3 && mRespQ.notEmpty && hitQ.notFull) begin let data = mRespQ.first; mRespQ.deq;
Tag tag = truncateLSB(missReq.addr); vArray[idx] <= True; tagArray[idx] <= tag; dataArray[idx] <= data;
hitQ.enq(missReq); status <= 0; end endrule
January 19, 2012 L9-13http://csg.csail.mit.edu/SNU
Five-Stage SMIPS
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4fr
Epoch
wbr
stall?
dr er
In this organization memory can take any
amount of time
January 19, 2012 L9-14http://csg.csail.mit.edu/SNU
Five-Stage SMIPS state elementsmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; Reg#(Bool) epoch <- mkRegU; RFile rf <- mkRFile;
Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport;
PipeReg#(FBundle) fr <- mkPipeReg; PipeReg#(DBundle) dr <- mkPipeReg; PipeReg#(EBundle) er <- mkPipeReg; PipeReg#(WBBundle) wbr <- mkPipeReg; rule doProc; …
January 19, 2012 L9-15http://csg.csail.mit.edu/SNU
Five-Stage SMIPSinstruction fetch rule doProc; Bool iAcc = False; if(fr.notFull && iMem.notFull) begin iMem.req(MemReq{op:Ld, addr:pc, data:?}); iAcc = True; fr.enq(FBundle{pc:pc, epoch:epoch}); end
enque instruction fetch request
if the request can not be enqued then we must remember not to change the pc to pc+4 (iAcc)
January 19, 2012 L9-16http://csg.csail.mit.edu/SNU
if(fr.notEmpty && dr.notFull && iMem.notEmpty) begin let dInst = decode(iMem.resp); dr.enq(DBundle{pc:fr.first.pc, epoch:fr.first.epoch, dInst:dInst}); fr.deq; iMem.deq;end
Five-Stage SMIPSdecode
decode the fetched instruction
January 19, 2012 L9-17http://csg.csail.mit.edu/SNU
Addr redirPc = ?; Bool redirPCvalid = False;if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin Bool eStall = … Bool wbStall = … if(!eStall && !wbStall) begin let eInst = exec(dInst, …); if(memType(eInst.iType)) dMem.req(…); if(eInst.brTaken) begin redirPC … end; er.enq(EBundle{…}; dr.deq; end end else dr.deq; end
Five-Stage SMIPScode structure for killing wrongly fetched instructions
kill
successful
January 19, 2012 L9-18http://csg.csail.mit.edu/SNU
Addr redirPc = ?; Bool redirPCvalid = False;if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin let dInst = dr.first.dInst;
Bool eStall = er.notEmpty && er.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==er.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==er.first.rDst));
Bool wbStall = wbr.notEmpty && wbr.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==wbr.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==wbr.first.rDst));
Five-Stage SMIPSstall signal
January 19, 2012 L9-19http://csg.csail.mit.edu/SNU
if(!eStall && !wbStall) begin Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, dr.first.pc); if(memType(eInst.iType)) dMem.req(MemReq{op:eInst.iType==Ld ? Ld : St, addr:eInst.addr, data:eInst.data}); if(eInst.brTaken) begin redirPC = eInst.addr; redirPCvalid = True; end er.enq(EBundle{iType:eInst.iType, rDst:eInst.rDst, data:eInst.data}); dr.deq; end end else dr.deq; end
Five-Stage SMIPSExecute if not stall
successful execution
January 19, 2012 L9-20http://csg.csail.mit.edu/SNU
if(er.notEmpty && wbr.notFull && (!memType(er.first.iType) || dMem.notEmpty)) begin wbr.enq(WBBundle{iType:er.first.iType, rDst:er.first.rDst, data:er.first.iType==Ld ? dMem.resp : er.first.data}); er.deq; if(dMem.notEmpty) dMem.deq; end
Five-Stage SMIPSexecute and memory responses to WB
January 19, 2012 L9-21http://csg.csail.mit.edu/SNU
if(wbr.notEmpty) begin if(regWriteType(wbr.first.iType)) rf.wr(wbr.first.rDst, wbr.first.data); wbr.deq; end
pc <= redirPCvalid ? redirPC : iAcc ? pc + 4 : pc; epoch <= redirPCvalid ? !epoch : epoch; endruleendmodule
Five-Stage SMIPSwriteback
January 19, 2012 L9-22http://csg.csail.mit.edu/SNU
SummaryLot of room for making errors verification and testing is essential
Memory systems or dealing with load latencies is a major aspect of computer design The 5-stage design presented here is
different from H&P and is much more realistic
next Branch prediction
January 19, 2012 L9-23http://csg.csail.mit.edu/SNU