cs 240 stage 3 abstractions for practical systemscs240/f19/slides/cache.pdfanother puzzle: cache...
TRANSCRIPT
![Page 1: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/1.jpg)
CS240Stage3AbstractionsforPracticalSystems
CachingandthememoryhierarchyOperatingsystemsandtheprocessmodelVirtualmemoryDynamicmemoryallocationVictorylap
![Page 2: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/2.jpg)
MemoryHierarchy:Cache
MemoryhierarchyCachebasicsLocalityCacheorganizationCache-awareprogramming
![Page 3: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/3.jpg)
HowdoesexecutiontimegrowwithSIZE?int array[SIZE];fillArrayRandomly(array); int s = 0;
for (int i = 0; i < 200000; i++) {for (int j = 0; j < SIZE; j++) {s += array[j];
}}
4SIZE
TIME
![Page 4: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/4.jpg)
reality
5
0
5
10
15
20
25
30
35
40
45
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
SIZE
Time
![Page 5: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/5.jpg)
Processor-MemoryBottleneck
6
MainMemory
CPU Reg
Processorperformancedoubledaboutevery18months Busbandwidth
evolvedmuchslower
Bandwidth:256bytes/cycleLatency:1-fewcycles
Bandwidth:2Bytes/cycleLatency:100cycles
Solution:caches
Cache
Example
![Page 6: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/6.jpg)
CacheEnglish:n.ahiddenstoragespaceforprovisions,weapons,ortreasuresv.tostoreawayinhidingforfutureuse
ComputerScience:n.acomputermemorywithshortaccesstimeusedtostorefrequentlyorrecentlyusedinstructionsordatav. tostore[data/instructions]temporarilyforlaterquickretrieval
AlsousedmorebroadlyinCS:softwarecaches,filecaches,etc.
7
![Page 7: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/7.jpg)
GeneralCacheMechanics
8
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
8 9 14 3Cache
Memory Larger,slower,cheaper.Partitionedintoblocks (lines).
Dataismovedinblockunits
Smaller,faster,moreexpensive.Storessubsetofmemoryblocks.
(lines)
CPU Block: unitofdataincacheandmemory.(a.k.a.line)
![Page 8: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/8.jpg)
CacheHit
9
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
8 9 14 3Cache
Memory
1.Requestdatainblock b.Request:14
142.Cachehit:
Blockbisincache.
CPU
![Page 9: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/9.jpg)
9
CacheMiss
10
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
8 9 14 3Cache
Memory
1.Request datainblockb.Request:12
2.Cache miss:blockisnot incache
4.Cachefill:Fetchblockfrommemory,storeincache.
Request:12
12
12
9
9
12
3.Cacheeviction:Evictablocktomakeroom,maybestoretomemory.
PlacementPolicy:wheretoputblockincache
ReplacementPolicy:whichblocktoevict
CPU
![Page 10: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/10.jpg)
Locality:whycacheswork
Programstendtousedataandinstructionsataddressesnearorequaltothosetheyhaveusedrecently.
Temporallocality:Recentlyreferenceditemsarelikelytobereferencedagaininthenearfuture.
Spatiallocality:Itemswithnearbyaddressesarelikelytobereferencedclosetogetherintime.
Howdocachesexploittemporalandspatiallocality?
11
block
block
![Page 11: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/11.jpg)
Locality#1
Data:Temporal:sum referencedineachiterationSpatial:arraya[] accessedinstride-1 pattern
Instructions:Temporal:executelooprepeatedlySpatial:executeinstructionsinsequence
Assessinglocalityincodeisanimportantprogrammingskill.
12
sum = 0;for (i = 0; i < n; i++) {
sum += a[i];}return sum;
Whatisstoredinmemory?
![Page 12: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/12.jpg)
Locality#2
13
a[0][0] a[0][1] a[0][2] a[0][3]a[1][0] a[1][1] a[1][2] a[1][3]a[2][0] a[2][1] a[2][2] a[2][3]
1:a[0][0]2:a[0][1]3:a[0][2]4:a[0][3]5:a[1][0]6:a[1][1]7:a[1][2]8:a[1][3]9:a[2][0]10:a[2][1]11:a[2][2]12:a[2][3]
stride1
int sum_array_rows(int a[M][N]) {int sum = 0;
for (int i = 0; i < M; i++) {for (int j = 0; j < N; j++) {
sum += a[i][j];}
}return sum;
}
row-majorMxN2DarrayinC
![Page 13: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/13.jpg)
Locality#3
14
int sum_array_cols(int a[M][N]) {int sum = 0;
for (int j = 0; j < N; j++) {for (int i = 0; i < M; i++) {
sum += a[i][j];}
}return sum;
}
1:a[0][0]2:a[1][0]3:a[2][0]4:a[0][1]5:a[1][1]6:a[2][1]7:a[0][2]8:a[1][2]9:a[2][2]10:a[0][3]11:a[1][3]12:a[2][3]
strideN
row-majorMxN2DarrayinC
…
…a[0][0] a[0][1] a[0][2] a[0][3]a[1][0] a[1][1] a[1][2] a[1][3]a[2][0] a[2][1] a[2][2] a[2][3]
![Page 14: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/14.jpg)
Locality#4
Whatis"wrong"withthiscode?Howcanitbefixed?
15
int sum_array_3d(int a[M][N][N]) {int sum = 0;
for (int i = 0; i < N; i++) {for (int j = 0; j < N; j++) {
for (int k = 0; k < M; k++) {sum += a[k][i][j];
}}
}return sum;
}
![Page 15: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/15.jpg)
CostofCacheMissesMisscostcouldbe100× hitcost.
99%hitscouldbetwiceasgoodas97%.How?Assumecachehittimeof1cycle,misspenaltyof100cycles
Meanaccesstime:97%hits:1cycle+0.03*100cycles=4cycles99%hits:1cycle+0.01*100cycles=2cycles
16
hit/missrates
![Page 16: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/16.jpg)
CachePerformanceMetrics
MissRateFractionofmemoryaccessestodatanotincache(misses/accesses)Typically: 3%- 10%forL1;maybe<1% forL2,dependingonsize,etc.
HitTimeTimetofindanddeliverablockinthecachetotheprocessor.Typically:1- 2clockcyclesforL1;5- 20clockcyclesforL2
MissPenaltyAdditionaltimerequiredoncachemiss=mainmemoryaccesstimeTypically50- 200cyclesforL2 (trend:increasing!)
17
![Page 17: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/17.jpg)
Memory
memoryhierarchywhydoesitwork?
persistentstorage(harddisk, flash,overnetwork,cloud,etc.)
mainmemory(DRAM)
L3cache(SRAM,off-chip)
L1cache(SRAM,on-chip)
L2cache(SRAM,on-chip)
registerssmall,fast,power-hungry,expensive
large,slow,power-efficient,cheap
explicitlyprogram-controlled
![Page 18: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/18.jpg)
Cache Organization:KeyPointsBlockFixed-sizeunitofdata inmemory/cache
PlacementPolicyWhereinthecacheshouldagivenblockbestored?
§ direct-mapped,setassociative
ReplacementPolicyWhatifthereisnoroominthecacheforrequesteddata?
§ leastrecentlyused,mostrecentlyused
WritePolicyWhenshouldwritesupdatelowerlevelsofmemoryhierarchy?
§ writeback,writethrough,writeallocate,nowriteallocate
![Page 19: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/19.jpg)
Blocks 00000000
00001000
00010000
00011000
Memory(byte)address
00010010
Divideaddressspaceintofixed-sizealignedblocks.powerof2
fullbyteaddress
BlockIDaddressbits- offsetbits
offsetwithinblocklog2(blocksize)
Example:blocksize=8
block
0
block
1
block
2
block
3
00010001000100100001001100010100000101010001011000010111
rememberwithinSameBlock?(PointersLab) ...
Note:draw
ingaddressorderdifferentlyfromhereon!
![Page 20: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/20.jpg)
PlacementPolicy
00011011
IndexCache
S=#slots=4
Small,fixednumberofblockslots.
Large,fixednumberofblockslots.
Memory Mapping:index(BlockID)=???BlockID
0000000100100011010001010110011110001001101010111100110111101111
![Page 21: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/21.jpg)
Placement:Direct-Mapped
22
00011011
Index
0000000100100011010001010110011110001001101010111100110111101111
Memory Mapping:index(BlockID)=BlockIDmod SBlockID
Cache
S=#slots=4
(easyforpower-of-2blocksizes...)
![Page 22: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/22.jpg)
Placement:mappingambiguity
23
00011011
Index
0000000100100011010001010110011110001001101010111100110111101111
Memory
Whichblockisinslot2?
BlockID
Cache
S=#slots=4
Mapping:index(BlockID)=BlockIDmod S
![Page 23: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/23.jpg)
Placement:Tagsresolveambiguity
24
00011011
Index
0000000100100011010001010110011110001001101010111100110111101111
Memory
BlockIDbitsnotusedforindex.
BlockID
Tag Data00110101
Cache
S
Mapping:index(BlockID)=BlockIDmod S
![Page 24: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/24.jpg)
Address=Tag,Index,Offset
00010010 fullbyteaddress
BlockIDAddressbits- Offsetbits
Offsetwithinblocklog2(blocksize)=b
#addressbits
BlockIDbits- IndexbitsTag
log2(# cacheslots)Index
a-bitAddresssbits(a-s-b)bits bbits
OffsetTag Index
Wherewithinablock?
Whatslotinthecache?Disambiguatesslotcontents.
![Page 25: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/25.jpg)
Placement:Direct-Mapped
26
00011011
Index
0000000100100011010001010110011110001001101010111100110111101111
Memory
(stilleasyforpower-of-2blocksizes...)
BlockID
Cache
Whynotthismapping?index(BlockID)=BlockID/ S
![Page 26: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/26.jpg)
Apuzzle.
Cachestartsempty.Access(address,hit/miss)stream:
(10,miss),(11,hit),(12,miss)
Whatcouldtheblocksizebe?
27
blocksize>=2bytes blocksize<8bytes
![Page 27: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/27.jpg)
Placement:directmappingconflicts
Whathappenswhenaccessinginrepeatedpattern:0010,0110,0010,0110,0010...?
28
00011011
Index
0000000100100011010001010110011110001001101010111100110111101111
BlockID
cacheconflictEveryaccesssuffersamiss,evictscachelineneededbynextaccess.
![Page 28: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/28.jpg)
Placement:SetAssociative
29
0
1
2
3
Set
2-way4sets,
2blockseach
0
1
Set
4-way2sets,
4blockseach
01234567
Set
1-way8sets,
1blockeach
directmapped
0
Set
8-way1set,
8blocks
fullyassociative
Mapping:index(BlockID)=BlockIDmod S
S=#slotsincachesets
Oneindexperset ofblockslots.Storeblockinany slotwithinset.
Replacementpolicy:ifsetisfull,whatblockshouldbereplaced?Common: leastrecentlyused(LRU)buthardwareusuallyimplements“notmostrecentlyused”
![Page 29: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/29.jpg)
Example:Tag,Index,Offset?
index(1101)=____
4-bitAddress OffsetTag Index
tagbits ____setindexbits ____blockoffsetbits____
Direct-mapped4slots2-byteblocks
![Page 30: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/30.jpg)
Example:Tag,Index,Offset?
16-bitAddress OffsetTag IndexE-wayset-associativeS slots16-byteblocks
01234567
Set
0
1
2
3
Set
0
1
Set
E=1-wayS=8sets
E=2-wayS=4sets
E=4-wayS=2sets
tagbits ____setindexbits ____blockoffsetbits ____index(0x1833) ____
tagbits ____setindexbits ____blockoffsetbits ____index(0x1833) ____
tagbits ____setindexbits ____blockoffsetbits ____index(0x1833) ____
![Page 31: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/31.jpg)
ReplacementPolicyIfsetisfull,whatblockshouldbereplaced?
Common: leastrecentlyused(LRU)(buthardwareusuallyimplements“notmostrecentlyused”
Anotherpuzzle:Cachestartsempty,usesLRU.Access(address,hit/miss)stream
(10,miss);(12,miss);(10,miss)
32
12isnotinthesameblockas10 12’sblockreplaced10’sblock
direct-mappedcacheassociativity ofcache?
![Page 32: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/32.jpg)
GeneralCacheOrganization(S,E,B)
33
Elinesperset(“E-way”)
Ssets
set
block/line
0 1 2 B-1tagv
validbit B =2b bytesofdatapercacheline(thedatablock)
cachecapacity:SxExBdatabytesaddresssize:t+s+baddressbits
Powersof2
![Page 33: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/33.jpg)
CacheRead
34
E=2e linesperset
S=2s sets
0 1 2 B-1tag1
validbitB=2b bytesofdatapercacheline(thedatablock)
tbits sbits bbitsAddressofbyteinmemory:
tag setindex
blockoffset
databeginsatthisoffset
LocatesetbyindexHitifanyblockinset:
is valid;andhasmatchingtag
Getdataatoffsetinblock
![Page 34: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/34.jpg)
CacheRead:Direct-Mapped (E=1)
35
S=2s sets
tbits 0…01 100Addressofint:
0 1 2 7tagv 3 654
0 1 2 7tagv 3 654
0 1 2 7tagv 3 654
0 1 2 7tagv 3 654
findset
Thiscache:• Blocksize:8bytes• Associativity:1blockperset(directmapped)
![Page 35: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/35.jpg)
CacheRead:Direct-Mapped (E=1)
36
tbits 0…01 100Addressofint:
0 1 2 7tagv 3 654
match?:yes=hitvalid?+
blockoffset
tag 7654
int (4Bytes)ishere
Ifnomatch:oldlineisevictedandreplaced
Thiscache:• Blocksize:8bytes• Associativity:1blockperset(directmapped)
![Page 36: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/36.jpg)
Direct-MappedCachePractice
12-bitaddress16lines,4-byteblocksizeDirectmapped
37
11 10 9 8 7 6 5 4 3 2 1 0
03DFC2111167––––03161DF0723610D5
098F6D431324––––03630804020011B2––––0151112311991190B3B2B1B0ValidTagIndex
––––014FD31B7783113E15349604116D
––––012C––––00BB3BDA159312DA––––02D98951003A1248B3B2B1B0ValidTagIndex
0x354
0xA20
Offsetbits?Indexbits?Tagbits?
![Page 37: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/37.jpg)
Example (E=1)
38
int sum_array_rows(double a[16][16]){double sum = 0;
for (int r = 0; r < 16; r++){for (int c = 0; c < 16; c++){
sum += a[r][c];}
}return sum;
}
32bytes=4doubles
Assume:cold(empty)cache3-bitsetindex,5-bitoffset
aa...arrr rcc cc000
int sum_array_cols(double a[16][16]){double sum = 0;
for (int c = 0; c < 16; c++){for (int r = 0; r < 16; r++){
sum += a[r][c];}
}return sum;
}
Localsinregisters.Assume a is aligned such that&a[r][c] is aa...a rrrr cccc 000
0,0 0,1 0,2 0,3
0,4 0,5 0,6 0,7
0,8 0,9 0,a 0,b
0,c 0,d 0,e 0,f
1,0 1,1 1,2 1,3
1,4 1,5 1,6 1,7
1,8 1,9 1,a 1,b
1,c 1,d 1,e 1,f
32bytes=4doubles
4missesperrowofarray4*16=64misses
everyaccessamiss16*16=256misses
0,0 0,1 0,2 0,3
1,0 1,1 1,2 1,3
2,0 2,1 2,2 2,3
3,0 3,1 3,2 3,3
4,0 4,1 4,2 4,3
0,0:aa...a000 000 000000,4:aa...a000 001 000001,0:aa...a000 100 000002,0:aa...a001 000 00000
![Page 38: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/38.jpg)
Example (E=1)
39
int dotprod(int x[8], int y[8]) {int sum = 0;
for (int i = 0; i < 8; i++) {sum += x[i]*y[i];
}return sum;
}
x[0] x[1] x[2] x[3]y[0] y[1] y[2] y[3]x[0] x[1] x[2] x[3]y[0] y[1] y[2] y[3]x[0] x[1] x[2] x[3]
ifxandyaremutuallyaligned,e.g.,0x00,0x80
ifxandyaremutuallyunaligned,e.g.,0x00,0xA0
x[0] x[1] x[2] x[3]
y[0] y[1] y[2] y[3]
x[4] x[5] x[6] x[7]
y[4] y[5] y[6] y[7]
block=16bytes;8setsincacheHowmanyblockoffsetbits?Howmanysetindexbits?
Addressbits:ttt....tsssbbbbB=16=2b: b=4offsetbitsS=8=2s: s=3indexbits
Addressesasbits0x00000000: 000....000000000x00000080: 000....100000000x000000A0: 000....1010000016bytes=4ints
![Page 39: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/39.jpg)
CacheRead:Set-Associative (Example:E=2)
40
tbits 0…01 100Addressofint:
findset
0 1 2 7tagv 3 6540 1 2 7tagv 3 654
0 1 2 7tagv 3 6540 1 2 7tagv 3 654
0 1 2 7tagv 3 6540 1 2 7tagv 3 654
0 1 2 7tagv 3 6540 1 2 7tagv 3 654
Thiscache:• Blocksize:8bytes• Associativity:2blocksperset
![Page 40: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/40.jpg)
0 1 2 7tagv 3 6540 1 2 7tagv 3 654
CacheRead:Set-Associative (Example:E=2)
41
Thiscache:• Blocksize:8bytes• Associativity:2blocksperset
tbits 0…01 100Addressofint:
compareboth
valid?+ match:yes=hit
blockoffset
tag 7654
int (4Bytes)ishere
Ifnomatch:Evictandreplaceonelineinset.
![Page 41: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/41.jpg)
Example (E=2)
43
float dotprod(float x[8], float y[8]) {float sum = 0;
for (int i = 0; i < 8; i++) {sum += x[i]*y[i];
}return sum;
}
x[0] x[1] x[2] x[3] y[0] y[1] y[2] y[3]Ifxandyaligned,e.g.&x[0]=0,&y[0]=128,canstillfitbothbecauseeachsethasspacefortwoblocks/lines
x[4] x[5] x[6] x[7] y[4] y[5] y[6] y[7]4sets
2blocks/linesperset
![Page 42: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/42.jpg)
TypesofCacheMisses
Cold(compulsory)miss
Conflictmiss
Capacitymiss
Whichonescanwemitigate/eliminate?How?
44
![Page 43: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/43.jpg)
WritingtocacheMultiplecopiesofdataexist,mustbekeptinsync.
Write-hitpolicyWrite-through:Write-back:needsadirtybit
Write-misspolicyWrite-allocate:No-write-allocate:
Typicalcaches:Write-back+Write-allocate,usuallyWrite-through+No-write-allocate,occasionally
45
![Page 44: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/44.jpg)
Write-back,write-allocateexample
46
0xCAFECache
Memory
U
0xFACE
0xCAFE
0
T
U
dirtybittag
1. mov $T,%ecx2. mov $U,%edx3. mov $0xFEED,(%ecx)
a. MissonT.
eax = 0xCAFEecx =Tedx =U
Cache/memorynotinvolved
![Page 45: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/45.jpg)
Write-back,write-allocateexample
47
Cache
Memory 0xFACE
0xCAFE
T
U
dirtybit
1. mov $T,%ecx2. mov $U,%edx3. mov $0xFEED,(%ecx)
a. MissonT.b. EvictU(clean:discard).c. FillT(write-allocate).d. WriteTincache(dirty).
4. mov (%edx),%eaxa. MissonU.tag
T 00xFACE0xFEED 1
eax = 0xCAFEecx =Tedx =U
![Page 46: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/46.jpg)
Write-back,write-allocateexample
48
0xCAFECache
Memory
U
0xFACE
0xCAFE
0
T
U
dirtybittag
eax = 0xCAFEecx =Tedx =U
1. mov $T,%ecx2. mov $U,%edx3. mov $0xFEED,(%ecx)
a. MissonT.b. EvictU(clean:discard).c. FillT(write-allocate).d. WriteTincache(dirty).
4. mov (%edx),%eaxa. MissonU.b. EvictT(dirty:writeback).c. FillU.d. Set%eax.
5. DONE.0xFEED
0xCAFE
![Page 47: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/47.jpg)
ExampleMemoryHierarchy
49
Regs
L1d-cache
L1i-cache
L2unifiedcache
Core0
Regs
L1d-cache
L1i-cache
L2unifiedcache
Core3
…
L3unifiedcache(sharedbyallcores)
Mainmemory
Processorpackage
L1i-cacheandd-cache:32KB,8-way,Access:4cycles
L2unifiedcache:256KB,8-way,Access:11cycles
L3unifiedcache:8MB,16-way,Access:30-40cycles
Blocksize:64bytesforallcaches.
slower,butmorelikelytohit
Typicallaptop/desktopprocessor(c.a.201_)
![Page 48: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/48.jpg)
Aside:softwarecachesExamples
Filesystembuffercaches,webbrowsercaches,databasecaches,networkCDNcaches,etc.
SomedesigndifferencesAlmostalwaysfully-associative
Oftenusecomplexreplacementpolicies
Notnecessarilyconstrainedtosingle“block”transfers
50
![Page 49: CS 240 Stage 3 Abstractions for Practical Systemscs240/f19/slides/cache.pdfAnother puzzle: Cache starts empty, uses LRU. Access (address, hit/miss) stream (10, miss); (12, miss); (10,](https://reader035.vdocuments.site/reader035/viewer/2022080722/5f7bb689dd26f36389142030/html5/thumbnails/49.jpg)
Cache-FriendlyCodeLocality,locality,locality.Programmercanoptimizeforcacheperformance
DatastructurelayoutDataaccesspatterns
NestedloopsBlocking(seeCSAPP6.5)
Allsystemsfavor“cache-friendlycode”Performanceishardware-specificGenericrulescapturemostadvantages
Keepworkingsetsmall(temporallocality)Usesmallstrides(spatiallocality)Focusoninnerloopcode
51