caches and memory - cornell university · caches and memory anne bracy cs 3410 computer science...
Post on 08-Feb-2021
7 Views
Preview:
TRANSCRIPT
-
CachesandMemoryAnneBracyCS3410
ComputerScienceCornellUniversity
SeeP&HChapter:5.1-5.4,5.8,5.10,5.13,5.15,5.171
Slides by Anne Bracy with 3410 slides by Professors Weatherspoon, Bala, McKee, and Sirer.
-
Programs101
Load/StoreArchitectures:• Readdatafrommemory(putinregisters)
• Manipulateit• Storeitbacktomemory
int main (int argc, char* argv[ ]) {int i;int m = n;int sum = 0;for (i = 1; i
-
1CyclePerStage:theBiggestLie(SoFar)
3
Write-BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
ALU
memory
din dout
addr
PC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
forwardunitdetect
hazardStack, Data, Code Stored in Memory
Code Stored in Memory(also, data and stack)
-
What’stheproblem?
+big– slow– faraway
SandyBridge Motherboard,2011http://news.softpedia.com
CPUMainMemory
4
-
TheNeedforSpeed
CPUPipeline
5
-
Instructionspeeds:• add,sub,shift: 1cycle• mult: 3cycles• load/store:100 cycles
off-chip50(-70)ns2(-3)GHzprocessorà 0.5nsclock
TheNeedforSpeed
CPUPipeline
6
-
TheNeedforSpeed
CPUPipeline
7
-
What’sthesolution?
Whatluckydatagetstogohere?
Level2$
Level1Data$
Level1Insn$
IntelPentium3,1999
Caches!
8
-
LocalityLocalityLocality
Ifyouaskforsomething,you’relikelytoaskfor:• thesamethingagainsoon
à TemporalLocality• somethingnearthatthing,soon
à SpatialLocalitytotal = 0;for (i = 0; i < n; i++)
total += a[i];return total;
9
-
YourlifeisfullofLocality
10
LastCalledSpeedDialFavoritesContactsGoogle/Facebook/email
-
YourlifeisfullofLocality
11
-
TheMemoryHierarchy
Registers
L1Caches
L2Cache
L3Cache
MainMemory
Disk
1cycle,128bytes
4 cycles,64KB
IntelHaswell Processor,2013
12cycles,256KB
36cycles,2-20MB
50-70ns,512MB– 4GB
5-20ms16GB– 4TB,
Small,Fast
Big,Slow
12
-
SomeTerminologyCachehit• dataisintheCache• thit :timeittakestoaccessthecache• %hit:Hitrate.#cachehits/#cacheaccessesCachemiss• dataisnotintheCache• tmiss :timeittakestogetthedatafrombelowthe$• Missrate(%miss):#cachemisses/#cacheaccesses
13
-
TheMemoryHierarchy
Registers
L1Caches
1cycle,128bytes
4 cycles,64KB
IntelHaswell Processor,2013
50-70ns,512MB– 4GB
5-20ms16GB– 4TB,
averageaccesstimetavg =thit + %miss*tmiss
=4+ 5%x 100=9cycles
12cycles,256KB
36cycles,2-20MB
L2Cache
L3Cache
MainMemory
Disk14
-
SingleCoreMemoryHierarchy
16
Registers
L1Caches
L2Cache
L3Cache
MainMemory
Disk
ONCHIP
Disk
Processor
Regs
I$ D$
L2
MainMemory
-
Multi-CoreMemoryHierarchy
Registers(
L1(Caches(
L2(Cache(
L3(Cache(
Main(Memory(
Disk(
ONCHIP
MainMemory
Processor
Regs
I$ D$
L2
L3
Processor
Regs
I$ D$
L2
Processor
Regs
I$ D$
L2
Processor
Regs
I$ D$
L2
Disk17
-
MemoryHierarchybytheNumbersCPUclockrates~0.33ns– 2ns(3GHz-500MHz)
*Registers,D-Flip Flops:10-100’sofregisters
Memorytechnology
Transistorcount*
Accesstime Accesstime incycles
$perGIBin2012
Capacity
SRAM(onchip)
6-8transistors 0.5-2.5ns 1-3cycles $4k 256 KB
SRAM(offchip)
1.5-30ns 5-15cycles $4k 32MB
DRAM 1transistor(needsrefresh)
50-70ns 150-200cycles $10-$20 8GB
SSD(Flash) 5k-50kns Tens ofthousands
$0.75-$1 512GB
Disk 5M-20Mns Millions $0.05-$0.1
4TB
18
-
BasicCacheDesign
DirectMappedCaches
19
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
16ByteMemoryMEMORY
• Byte-addressablememory• 4addressbitsà 16bytestotal• baddr bitsà 2bbytesinmemory
load 0x1100 à r1
20
-
4-Byte,DirectMappedCacheaddr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
MEMORY
CACHE
dataABCD
Directmapped:• Eachaddressmapsto1cacheblock• 4entriesà 2indexbits(2n à nbits)IndexwithLSB:• Supportsspatiallocality
indexXXXX
index00011011
21
ßCacheentry=row=(cache)line=(cache)block
BlockSize:1byte
-
AnalogytoaSpiceRack
• Comparedtoyourspicewall– Smaller– Faster– Morecostly(peroz.)
ABCDEF
…Z
http://www.bedbathandbeyond.com
SpiceWall(Memory)
SpiceRack(Cache)
index spice
22
-
Cinnamon
• Howdoyouknowwhat’sinthejar?• Needlabels
Tag=Ultra-minimalistlabel
AnalogytoaSpiceRack
innamon
SpiceWall(Memory)
ABCDEF
…Z
SpiceRack(Cache)
index spicetag
23
-
tag|indexXXXX
dataABCD
tag00000000
4-Byte,DirectMappedCache
MEMORY
CACHE
Tag: minimalistlabel/addressaddress = tag + index
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 24
-
4-Byte,DirectMappedCache
MEMORY
CACHE
Onelasttweak:validbit
V tag data0 00 X0 00 X0 00 X0 00 X
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 25
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Lookup:• Index into$• Checktag• Checkvalid bit
Simulation#1ofa4-byte,DMCache
MEMORY
CACHE
V tag data0 11 X0 11 X0 11 X0 11 X
load 0x1100 Miss
tag|indexXXXX
index00011011
26
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#1ofa4-byte,DMCache
MEMORY
CACHE
V tag data1 11 N0 xx X0 xx X0 xx X
load 0x1100 Miss Lookup:• Index into$• Checktag• Checkvalid bit
tag|indexXXXX
index00011011
27
-
Simulation#1ofa4-byte,DMCache
MEMORY
CACHE
V tag data1 11 N0 11 X0 11 X0 11 X
load 0x1100...load 0x1100
Miss
Hit!
Awesome!
tag|indexXXXX
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 28
-
BlockDiagram4-entry,directmappedCache
CACHE
V tag data1 00 111100001 11 1010 01010 01 1010 10101 11 0000 0000
tag|index1101
2
2
2
=
Hit!data
8
10100101
Great!Arewedone?
29
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data0 11 X0 11 X0 11 X0 11 X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss Lookup:• Index into$• Checktag• Checkvalid bit
tag|indexXXXX
index00011011
30
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data1 11 N0 xx X0 xx X0 xx X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
tag|indexXXXX
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
31
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data1 11 N0 11 X0 11 X0 11 X
load 0x1100load 0x1101load 0x0100load 0x1100
MissMiss
tag|indexXXXX
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
32
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data1 11 N1 11 O0 11 X0 11 X
load 0x1100load 0x1101load 0x0100load 0x1100
MissMiss
tag|indexXXXX
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
33
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data1 11 N1 11 O0 xx X0 xx X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
Miss
Miss
tag|indexXXXX
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
34
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data1 01 E1 11 O0 11 X0 11 X
load 0x1100load 0x1101load 0x0100load 0x1100
MissMiss
Miss
tag|indexXXXX
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
35
-
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data1 01 E1 11 O0 11 X0 11 X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
Miss
Miss
Miss
tag|indexXXXX
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 36
-
Simulation#2:4-byte,DMCache
MEMORY
CACHE
V tag data1 11 N1 11 O0 11 X0 11 X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
Miss
Miss
Miss
Disappointed!
L
tag|indexXXXX
coldcoldcold
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 37
-
ReducingColdMissesbyIncreasingBlockSize
LeveragingSpatialLocality
38
-
IncreasingBlockSize addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
CACHE
V tag data0 x A|B0 x C|D0 x E|F0 x G|H
MEMORY
• BlockSize:2bytes• BlockOffset: leastsignificantbitsindicatewhereyouliveintheblock
• Whichbitsaretheindex?tag?
offsetXXXX index
00011011
39
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
Simulation#3:8-byte,DMCache
MEMORY
CACHE
V tag data0 x X|X0 x X|X0 x X|X0 x X|X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
tag| |offsetXXXX
index
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
40
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
V tag data0 x X|X0 x X|X1 1 N|O0 x X|X
Simulation#3:8-byte,DMCache
MEMORY
CACHE
load 0x1100load 0x1101load 0x0100load 0x1100
tag| |offsetXXXX
index
Miss Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
41
-
V tag data0 x X|X0 x X|X1 1 N|O0 x X|X
Simulation#3:8-byte,DMCache
MEMORY
CACHE
load 0x1100load 0x1101load 0x0100load 0x1100
Hit!
tag| |offsetXXXX
index
Miss Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 42
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
V tag data0 x X|X0 x X|X1 1 N|O0 x X|X
Simulation#3:8-byte,DMCache
MEMORY
CACHE
load 0x1100load 0x1101load 0x0100load 0x1100
Hit!
Miss
Miss
tag| |offsetXXXX
index
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
43
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
V tag data0 x X|X0 x X|X1 0 E|F0 x X|X
Simulation#3:8-byte,DMCache
MEMORY
CACHE
load 0x1100load 0x1101load 0x0100load 0x1100
Hit!
Miss
Miss
tag| |offsetXXXX
index
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
44
-
V tag data0 x X|X0 x X|X1 0 E|F0 x X|X
Simulation#3:8-byte,DMCache
MEMORY
CACHE
load 0x1100load 0x1101load 0x0100load 0x1100
Hit!
Miss
Miss
Miss
tag| |offsetXXXX
index
Lookup:• Index into$• Checktag• Checkvalid bit
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 45
-
V tag data0 x X|X0 x X|X1 0 E|F0 x X|X
Simulation#3:8-byte,DMCache
MEMORY
CACHE
load 0x1100load 0x1101load 0x0100load 0x1100
Hit!
Miss
Miss
Miss
1hit,3misses3bytesdon’tfitinan8bytecache?
cold
cold
conflict
index00011011
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 46
-
RemovingConflictMisseswithFully-AssociativeCaches
47
-
8byte,fully-associativeCache
CACHE
MEMORY
Whatshouldtheoffsetbe?Whatshouldtheindexbe?Whatshouldthetag be?
XXXXtag|offsetXXXX
offsetXXXX
V tag data0 xxx X|X
V tag data0 xxx X|X
V tag data0 xxx X|X
V tag data0 xxx X|X
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 48
-
V tag data0 xxx X|X
V tag data0 xxx X|X
V tag data0 xxx X|X
V tag data0 xxx X|X
Simulation#4:8-byte,FACache
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
MEMORY
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
XXXXtag|offset
Lookup:• Index into$• Checktags• Checkvalid bits
CACHE
49LRUPointer
-
V tag data1 110 N|O
V tag data0 xxx X|X
V tag data0 xxx X|X
V tag data0 xxx X|X
Simulation#4:8-byte,FACache
MEMORY
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
XXXXtag|offset
Lookup:• Index into$• Checktags• Checkvalid bits
CACHE
Hit!
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 50
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
V tag data1 110 N|O
V tag data0 xxx X|X
V tag data0 xxx X|X
V tag data0 xxx X|X
Simulation#4:8-byte,FACache
MEMORY
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
XXXXtag|offset
Lookup:• Index into$• Checktags• Checkvalid bits
CACHE
Hit!
Miss
51LRUPointer
-
V tag data1 110 N|O
V tag data1 010 E|F
V tag data0 xxx X|X
V tag data0 xxx X|X
Simulation#4:8-byte,FACache
MEMORY
load 0x1100load 0x1101load 0x0100load 0x1100
Miss
XXXXtag|offset
Lookup:• Index into$• Checktags• Checkvalid bits
CACHE
Hit!
Miss
Hit!
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 52LRUPointer
-
ProsandConsofFullAssociativity+Nomoreconflicts!+Excellentutilization!Buteither:ParallelReads
– lotsofreading!SerialReads
– lotsofwaiting
tavg =thit + %miss*tmiss=4+ 5%x 100=9cycles
=6+ 3%x 100=9cycles
53
-
Pros&Cons
DirectMapped FullyAssociativeTagSize Smaller LargerSRAMOverhead Less MoreControllerLogic Less MoreSpeed Faster SlowerPrice Less MoreScalability Very NotVery#ofconflictmisses Lots ZeroHitRate Low HighPathologicalCases Common ?
-
ReducingConflictMisseswithSet-AssociativeCaches
Nottooconflict-y.Nottooslow.…JustRight!
55
-
8byte,2-waysetassociativeCache
CACHE
MEMORY
Whatshouldtheoffsetbe?Whatshouldtheindexbe?Whatshouldthetag be?
XXXXtag||offset
index
XXXXoffset
V tag data0 xx E|F0 xx C|D
V tag data0 xx N |O0 xx P|Q
XXXX
index01
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 56
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
8byte,2-waysetassociativeCache
CACHE
index01
MEMORY
XXXXtag||offset
index
V tag data0 xx X|X0 xx X|X
V tag data0 xx X |X0 xx X|X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss Lookup:• Index into$• Checktag• Checkvalid bit
58LRUPointer
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
8byte,2-waysetassociativeCache
CACHE
index01
MEMORY
XXXXtag||offset
index
V tag data1 11 N|O0 xx X|X
V tag data0 xx X |X0 xx X|X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss Lookup:• Index into$• Checktag• Checkvalid bit
Hit!
59LRUPointer
-
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
8byte,2-waysetassociativeCache
CACHE
index01
MEMORY
XXXXtag||offset
index
V tag data1 11 N|O0 xx X|X
V tag data0 xx X |X0 xx X|X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss Lookup:• Index into$• Checktag• Checkvalid bit
Hit!
Miss
60LRUPointer
-
8byte,2-waysetassociativeCache
addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q
CACHE
index01
MEMORY
XXXXtag||offset
index
V tag data1 11 N|O0 xx X|X
V tag data1 01 E |F0 xx X|X
load 0x1100load 0x1101load 0x0100load 0x1100
Miss Lookup:• Index into$• Checktag• Checkvalid bit
Hit!
MissHit!
61LRUPointer
-
EvictionPolicies
Whichcachelineshouldbeevictedfromthecachetomakeroomforanewline?• Direct-mapped:nochoice,mustevictlineselectedbyindex
• Associativecaches• Random:selectoneofthelinesatrandom• Round-Robin:similartorandom• FIFO:replaceoldestline• LRU:replacelinethathasnotbeenusedinthelongesttime
62
-
Misses:theThreeC’s
• Cold(compulsory)Miss:neverseenthisaddressbefore
• ConflictMiss:cacheassociativity istoolow
• CapacityMiss:cacheistoosmall
63
-
MissRatevs.BlockSize
64
-
BlockSizeTradeoffs• Foragiventotalcachesize,
Largerblocksizesmean….– fewerlines– sofewertags,lessoverhead– andfewercoldmisses(within-block“prefetching”)
• Butalso…– fewerblocksavailable(forscatteredaccesses!)– somoreconflicts– candecreaseperformanceifworkingsetcan’tfitin$– andlargermisspenalty(timetofetchblock)
-
MissRatevs.Associativity
66
-
ABCsofCaches
+Associativity:⬇conflictmissesJ⬆hittimeL
+BlockSize:⬇coldmissesJ⬆conflictmissesL
+Capacity:⬇capacitymissesJ⬆hittimeL
tavg =thit +%miss*tmiss
67
-
Whichcachesgetwhatproperties?
L1Caches
L2Cache
L3Cache
Fast
Big
MoreAssociativeBiggerBlockSizesLargerCapacity
tavg =thit +%miss*tmiss
Designwithmissrateinmind
Designwithspeedinmind
68
-
Roadmap
• Thingswehavecovered:– TheNeedforSpeed– LocalitytotheRescue!– Calculatingaveragememoryaccesstime– $Misses:Cold,Conflict,Capacity– $Characteristics:Associativity,BlockSize,Capacity
• Thingswewillnowcover:– CacheFigures– CachePerformanceExamples– Writes
69
-
MoreSlidesComing…
-
data
2-WaySetAssociativeCache(Reading)
71
wordselect
hit?
lineselect
= =
32bits
64bytes
Tag IndexOffset
-
data
3-WaySetAssociativeCache(Reading)
72
wordselect
hit?
lineselect
= = =
32bits
64bytes
Tag IndexOffset
-
HowBigistheCache?
n bitindex,m bitoffset,N-waySetAssociativeQuestion:Howbigiscache?• Dataonly?
(whatweusuallymeanwhenweask“howbig”isthecache)
• Data+overhead?
73
Tag Index Offset
-
CachePerformanceExample
tavg =foraccessing16words?MemoryParameters(verysimplified):• MainMemory: 4GB
– Datacost:50cycleforfirstword,plus3cyclespersubsequentword
• L1: 512x64bytecachelines,directmapped– Datacost:3cycleperwordaccess– Lookupcost:2cyclePerformanceif%hit=90%?Performanceif%hit=95%?Note:herethit splitsuplookupvs.datacost.Whyaretheretwoways?
75
tavg =thit +%miss*tmiss
-
PerformanceCalculation with$Hierarchy• Parameters
– Referencestream:allloads– D$:thit =1ns,%miss =5%– L2:thit =10ns,%miss =20%(localmissrate)– Mainmemory:thit =50ns
• WhatistavgD$ withoutanL2?– tmissD$ =– tavgD$=
• WhatistavgD$ withanL2?– tmissD$ =– tavgL2=– tavgD$=
77
tavg =thit +%miss*tmiss
-
PerformanceSummaryAveragememoryaccesstime(AMAT)dependson:• cachearchitectureandsize• Hitandmissrates• Accesstimesandmisspenalty
Cachedesignaverycomplexproblem:• Cachesize,blocksize(akalinesize)• Numberofwaysofset-associativity(1,N,¥)• Evictionpolicy• Numberoflevelsofcaching,parametersforeach• SeparateI-cachefromD-cache,orUnifiedcache• Prefetchingpolicies/instructions• Writepolicy
79
-
TakeawayDirectMappedà fast,butlowhitrateFullyAssociativeà higherhitcost,higherhitrateSetAssociativeàmiddleground
Linesizematters.Largercachelinescanincreaseperformanceduetoprefetching.BUT,canalsodecreaseperformanceisworkingset sizecannotfitincache.
Cacheperformanceismeasuredbytheaveragememoryaccesstime(AMAT),whichdependscachearchitectureandsize,butalsotheaccesstimeforhit,misspenalty,hitrate.
80
-
WhataboutStores?
Whereshouldyouwritetheresultofastore?• Ifthatmemorylocationisinthecache?
– Sendittothecache– Shouldwealsosendittomemoryrightaway?(write-throughpolicy)– Waituntilweevicttheblock(write-backpolicy)
• Ifitisnotinthecache?– Allocatetheline(putitinthecache)?(writeallocatepolicy)– Writeitdirectlytomemorywithoutallocation?(nowriteallocatepolicy)
-
CacheWritePoliciesQ:Howtowritedata?
CPUCache
SRAMMemory
DRAM
addr
data
Ifdataisalreadyinthecache…No-Writewrites invalidatethecacheandgodirectlytomemory
Write-Throughwritesgotomainmemoryandcache
Write-BackCPUwritesonlytocachecachewrites tomainmemorylater(whenblockisevicted)
-
WriteAllocationPoliciesQ:Howtowritedata?
CPUCache
SRAMMemory
DRAM
addr
data
Ifdataisnotinthecache…Write-Allocateallocateacachelinefornewdata(andmaybewrite-through)
No-Write-Allocateignorecache, justgoto mainmemory
-
Write-ThroughStores
29
123
150162
18
33
19
210
0123456789
101112131415
Instructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
CacheRegisterFile$0$1$2$3
Memory78
120
71
173
21
28
200
225
Misses: 0Hits: 0
0
0
16byte,byte-addressedmemory4btye,fully-associativecache:2-byteblocks,write-allocate
4bitaddresses:3 bittag,1bitoffset
lru Vtagdata1
0
-
CacheRegisterFile
Write-Through(REF1)
29
123
150162
18
33
19
210
0123456789
101112131415
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 0Hits: 0
0
0
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata1
0
-
CacheRegisterFile
Write-Through(REF1)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 1Hits: 0
1
02978
29
Addr: 0001
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
M
0
1
-
CacheRegisterFile
Write-Through(REF2)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 1Hits: 0
1
02978
29
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
M
0
1
-
CacheRegisterFile
Write-Through(REF2)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
011
78
120
71
173
21
28
200
225
Misses: 2Hits: 0
1
12978
29
162173
173
Addr: 0111
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MM
0
1
-
CacheRegisterFile
Write-Through(REF3)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
011
78
120
71
173
21
28
200
225
Misses: 2Hits: 0
1
12978
29
162173
173
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MM
1
0
-
CacheRegisterFile
Write-Through(REF3)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
011
120
71
173
21
28
200
225
Misses: 2Hits: 1
1
129
29
162173
173
173
173Addr: 0000
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHit
0
1
-
CacheRegisterFile
Write-Through(REF4)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 2Hits: 1
1
129173
29173
Addr: 0101
16217315071
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitM
0
1
-
CacheRegisterFile
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 3Hits: 1
1
129173
29173
1507115029
Write-Through(REF4)
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitM
1
0
29
-
CacheRegisterFile
Write-Through(REF5)
29
123
29162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 3Hits: 1
1
129173
29173
2971
Addr: 1010
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitM
1
0
-
CacheRegisterFile
Write-Through(REF5)
29
123
29162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 4Hits: 1
1
1
29
2971
3328
33
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitMM 0
1
-
CacheRegisterFile
Write-Through(REF6)
29
123
29162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 4Hits: 1
1
1
29
2971
3328
33
29
29
Addr: 0101
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitMM 0
1
-
CacheRegisterFile
Write-Through(REF6)
29
123
29162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 4Hits: 2
1
1
29
2971
3328
33
29
29
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitMMHit
0
1
-
CacheRegisterFile
Write-Through(REF7)
29
123
29162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 4Hits: 2
1
1
29
2971
3328
33
29
29
Addr: 1011
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitMMHit
0
1
-
CacheRegisterFile
3329
Write-Through(REF7)
29
123
29162
18
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 4Hits: 3
1
1
29
297128
33
29
29
3329
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitMMHitHit
0
1
-
HowManyMemoryReferences?
Write-throughperformance• Howmanymemoryreads?
• Howmanymemorywrites?
• Overhead?Doweneedadirtybit?
-
CacheRegisterFile
Write-Through(REF8,9)
29
123
29162
18
29
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 4Hits: 3
1
1
29
2971
2928
33
29
29
29
29
MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitMMHitHit 0
1
-
CacheRegisterFile
Write-Through(REF8,9)
29
123
29162
18
29
19
210
0123456789
101112131415
101
$0$1$2$3
010
173
120
71
173
21
28
200
225
Misses: 4Hits: 5
1
1
29
2971
2928
33
29
29
29
MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]
lru Vtagdata
MMHitMMHitHitHitHit
0
1
-
Summary:WriteThrough
Write-throughpolicywithwriteallocate• Cachemiss:readentireblockfrommemory• Write:writeonlyupdateditemtomemory• Eviction:noneedtowritetomemory
-
NextGoal:Write-Throughvs.Write-Back
CanwealsodesignthecacheNOT towriteallstoresimmediatelytomemory?– Keepthecurrentcopyincache,andupdatememorywhendataisevicted (write-backpolicy)
– Write-backallevictedlines?• No,onlywritten-toblocks
-
Write-BackMeta-Data(Valid,DirtyBits)
• V=1meansthelinehasvaliddata• D=1meansthebytesarenewerthanmainmemory• Whenallocatingline:
– SetV=1,D=0,fillinTagandData• Whenwritingline:
– SetD=1• Whenevictingline:
– IfD=0:justsetV=0– IfD=1:write-backData,thensetD=0,V=0
V D Tag Byte1 Byte2 …ByteN
-
Write-backExample
• Example:Howdoesawrite-back cachework?• Assumewrite-allocate
-
CacheRegisterFile
HandlingStores(Write-Back)
29
123
150162
18
33
19
210
0123456789
101112131415
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 0Hits: 0
0
0
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vdtagdata
16byte,byte-addressedmemory4btye,fully-associativecache:2-byteblocks,write-allocate
4bitaddresses:3 bittag,1bitoffset
1
0
-
CacheRegisterFile
Write-Back(REF1)
29
123
150162
18
33
19
210
0123456789
101112131415
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 0Hits: 0
0
0
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vdtagdata1
0
-
CacheRegisterFile
Write-Back(REF1)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 1Hits: 0
01
02978
29
Addr: 0001
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
lru Vdtagdata
M
0
1
-
CacheRegisterFile
Write-Back(REF1)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 1Hits: 0
01
02978
29
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
M
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF2)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
78
120
71
173
21
28
200
225
Misses: 1Hits: 0
01
02978
29
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
M
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF2)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
011
78
120
71
173
21
28
200
225
Misses: 2Hits: 0
0
0
1
12978
29
162173
173
Addr: 0111
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
1
0
MM
lru Vdtagdata
-
CacheRegisterFile
Write-Back(REF3)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
011
78
120
71
173
21
28
200
225
Misses: 2Hits: 0
0
0
1
12978
162173
29173
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
1
0
MM
lru Vdtagdata
-
CacheRegisterFile
Write-Back(REF3)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
011
78
120
71
173
21
28
200
225
Misses: 2Hits: 1
1
0
1
129173
29
162173
173
Addr: 0000
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
MMHit
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF4)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
011
78
120
71
173
21
28
200
225
Misses: 2Hits: 1
1
0
1
129173
29
162173
173
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
MMHit
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF4)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 3Hits: 1
1
1
1
129173
29173
15071
Addr: 0101
29
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
1
0
MMHitM lru Vdtagdata
-
CacheRegisterFile
Write-Back(REF5)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 3Hits: 1
1
1
1
129173
29173
2971
Addr: 1010
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
1
0
MMHitM lru Vdtagdata
-
CacheRegisterFile
Write-Back(REF5)
29
123
150162
18
33
19
210
0123456789
101112131415
000
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 3Hits: 1
1
1
1
129173
29173
2971
173Addr: 1010
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
1
0
MMHitM lru Vdtagdata
-
CacheRegisterFile
Write-Back(REF5)
29
123
150162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 4Hits: 1
0
1
1
1
29
2971
3328
33
Addr: 1010
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
MMHitMM
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF6)
29
123
150162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 4Hits: 1
0
1
1
1
29
2971
3328
33
Addr: 0101
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
MMHitMM
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF6)
29
123
150162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 4Hits: 2
0
1
1
1
29
2971
3328
33
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
1
0
MMHitMMHit
lru Vdtagdata
-
CacheRegisterFile
Write-Back(REF7)
29
123
150162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 4Hits: 2
0
1
1
1
29
2971
3328
33
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
1
0
MMHitMMHit
lru Vdtagdata
-
CacheRegisterFile
Write-Back(REF7)
29
123
150162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 4Hits: 3
1
1
1
1
29
2971
2928
33
MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]
MMHitMMHitHit
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF8,9)
29
123
150162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 4Hits: 3
1
1
1
1
29
2971
2928
33
MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]
MMHitMMHitHit
lru Vdtagdata0
1
-
CacheRegisterFile
Write-Back(REF8,9)
29
123
150162
18
33
19
210
0123456789
101112131415
101
$0$1$2$3
010
78
120
71
173
21
28
200
225
Misses: 4Hits: 5
1
1
1
1
29
2971
2928
33
MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]
MMHitMMHitHitHitHit
lru Vdtagdata0
1
-
HowManyMemoryReferences?
Write-backperformance• Howmanyreads?
• Howmanywrites?
-
Write-backvs.Write-throughExampleAssume:largeassociativecache,16-bytelines
for (i=1; i
-
Soiswritebackjustbetter?
ShortAnswer: Yes(fewerwritesisagoodthing)LongAnswer:It’scomplicated.• Evictionsrequireentirelinebewrittenbacktomemory(vs.justthedatathatwaswritten)
• Write-backcanleadtoincoherentcachesonmulti-coreprocessors(laterlecture)
-
1
2
3
4
5
6
7
8
9
10
11
12• Everyaccessacachemiss!• (unlessentirematrixfitsincache)
// H = 12, W = 10int A[H][W];
for(x=0; x < W; x++) for(y=0; y < H; y++)
sum += A[y][x];
CacheConsciousProgramming
-
CacheConsciousProgramming
• Blocksize=4à 75%hitrate• Blocksize=8à 87.5%hitrate• Blocksize=16à 93.75%hitrate• Andyoucaneasilyprefetch towarmthecache
1 2 3 4 5 6 7 8 9 1011 12 13 …
// H = 12, W = 10int A[H][W];
for(y=0; y < H; y++)for(x=0; x < W; x++)
sum += A[y][x];
-
Bytheendofthecachelectures…
-
Summary• Memoryperformancematters!
– oftenmorethanCPUperformance– …becauseitisthebottleneck,andnotimprovingmuch– …becausemostprogramsmoveaLOTofdata
• Designspaceishuge– Gamblingagainstprogrambehavior– Cutsacrossalllayers:usersà programsà osà hardware
• NEXT:Multi-coreprocessorsarecomplicated– Inconsistentviewsofmemory– Extremelycomplexprotocols,veryhardtogetright
top related