storage hierarchycs510 computer architecturelecture 12 - 1 lecture 12 storage hierarchy
TRANSCRIPT
![Page 1: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/1.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 1
Lecture 12Lecture 12Storage HierarchyStorage Hierarchy
Lecture 12Lecture 12Storage HierarchyStorage Hierarchy
![Page 2: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/2.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 2
Who Cares about Memory Who Cares about Memory Hierarchy?Hierarchy?
Who Cares about Memory Who Cares about Memory Hierarchy?Hierarchy?
• Processor Only Thus Far in Course
– CPU cost/performance, ISA, Pipelined Execution
• 1980: no cache in mproc; • 1995 2-level cache, 60% transistors on Alpha 21164 mproc
1
10
100
1000
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
35%/year
55%/year
7%/year
Per
form
ance
CPU-DRAM Gap
![Page 3: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/3.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 3
General PrinciplesGeneral PrinciplesGeneral PrinciplesGeneral Principles
• Locality– Temporal Locality: referenced again soon
– Spatial Locality: nearby items referenced soon
• Locality + smaller HW is faster = memory hierarchy– Levels: smaller, faster, more expensive/byte than the level below
– Inclusive: data found in top also found in the bottom
• Definitions– Upper is closer to processor
– Block: minimum unit that present or not in the upper level
– Address = Block frame address + block offset address
– Hit time: time to access the upper level, including hit determination
![Page 4: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/4.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 4
MeasuresMeasuresMeasuresMeasures
• Hit rate: fraction found in that level– So high that usually talk about Miss Rate or Fault Rate– Miss rate fallacy: as MIPS to CPU performance,
miss rate to average memory access time in memory
• Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks))
• Miss penalty:: time to replace a block from the lower level, including to replace in CPU
– access time: time to access the lower level =(lower level latency)
– transfer time: time to transfer block =(BW upper & lower, block size)
![Page 5: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/5.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 5
Block Size vs. MeasuresBlock Size vs. MeasuresBlock Size vs. MeasuresBlock Size vs. Measures
Increasing Block Size generally increases Miss Penalty and decreases Miss Rate
Miss Penalty x Miss Rate = Avg. Memory Access Time(AMAT)
=
MissPenalty Transfer
Time
AccessTime
Block Size
MissRate
Block Size
AverageMemoryAccess
Time
Block Size
![Page 6: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/6.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 6
Implications for CPUImplications for CPUImplications for CPUImplications for CPU
• Fast hit check since every memory access needs it– Hit is the common case
• Unpredictable memory access time– 10s of clock cycles: wait
– 1000s of clock cycles: • Interrupt & switch & do something else• New style: multithreaded execution
• How to handle miss (10s => HW, 1000s => SW)?
![Page 7: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/7.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 7
4 Questions for 4 Questions for Memory Hierarchy DesignersMemory Hierarchy Designers
4 Questions for 4 Questions for Memory Hierarchy DesignersMemory Hierarchy Designers• Q1: Where can a block be placed in the upper level? (Block
placement)
• Q2: How is a block found if it is in the upper level? (Block identification)
• Q3: Which block should be replaced on a miss? (Block replacement)
• Q4: What happens on a write? (Write strategy)
![Page 8: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/8.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 8
Q1: Block Placement:Q1: Block Placement: Where can a Block be Placed in the Where can a Block be Placed in the
Upper Level?Upper Level?
Q1: Block Placement:Q1: Block Placement: Where can a Block be Placed in the Where can a Block be Placed in the
Upper Level?Upper Level? Block 12 placed in 8 block cache
–Fully Associative(FA), Direct Mapped, 2-way Set Associative(SA)
–SA Mapping ; (Block #) Modulo(# of Set
s)
BlockNumber 0 1 2 3 4 5 6 7
Fully Associative: Block 12 can goanywhere
. . .
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 51 1 1 1 1 1 Block
Number
Memory
Block Frame Address
Direct mapped:Block 12 can goonly into Block 4(12 Mod 8)= 40 1 2 3 4 5 6 7
Set Set Set Set 0 1 2 3
Set Associative:Block 12 can goanywhere in Set 0(12 Mod 4)= 00 1 2 3 4 5 6 7
![Page 9: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/9.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 9
Q2: Block Identification:Q2: Block Identification:
How to Find a Block in the Upper Level?How to Find a Block in the Upper Level?Q2: Block Identification:Q2: Block Identification:
How to Find a Block in the Upper Level?How to Find a Block in the Upper Level?
• Tag on each block– No need to check index or block offset
• Increasing associativity shrinks index, expands tag
FAM: No indexDM: Large index
Block Address
Tag IndexBlockOffset
![Page 10: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/10.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 10
Q3: Block Replacement:Q3: Block Replacement:
Which Which Block Should be Replaced Block Should be Replaced on a Miss?on a Miss?
Q3: Block Replacement:Q3: Block Replacement:
Which Which Block Should be Replaced Block Should be Replaced on a Miss?on a Miss?
• Easy for Direct Mapped• SAM or FAM:
– Random – LRU
Miss Rates
Associativity: 2-way 4-way 8-way
Cache Size LRU Random LRU Random LRU Random
16 KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96%
64 KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%
![Page 11: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/11.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 11
Q4: Write Strategy:Q4: Write Strategy:
What Happens on a Write?What Happens on a Write? Q4: Write Strategy:Q4: Write Strategy:
What Happens on a Write?What Happens on a Write? • DLX : store 9%, load 26% in integer programs
– STORE: • 9%/(100%+26%+9%) 7% of the overall memory traffic• 9%/(26%+9%) 25% of the data cache traffic
– READ access is majority, thus to make the common case fast: optimizing caches for reads
– High performance designs cannot neglect the speed of WRITEs
![Page 12: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/12.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 12
Q4: What Happens on a Write?Q4: What Happens on a Write?Q4: What Happens on a Write?Q4: What Happens on a Write?• Write Through: The information is written to both the block in
the cache and to the block in the lower-level memory.• Write Back: The information is written only to the block in the
cache. The modified cache block(Dirty Block) is written to main memory only when it is replaced.– is block clean or dirty?
• Pros and Cons of each:– WT: read misses cannot result in writes (because of
replacements)
– WB: no writes of repeated writes
• WT needs to be combined with write buffers so that don’t wait for lower level memory
![Page 13: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/13.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 13
Q4: What Happens on a Write?Q4: What Happens on a Write?Q4: What Happens on a Write?Q4: What Happens on a Write?
• Write Miss– Write Allocate (fetch on write)
– No-Write Allocate (write around)
• WB caches generally use Write Allocate, while WT caches often use No-Write Allocate
![Page 14: Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy](https://reader036.vdocuments.site/reader036/viewer/2022082710/56649e4d5503460f94b43311/html5/thumbnails/14.jpg)
Storage Hierarchy CS510 Computer Architecture Lecture 12 - 14
SummarySummarySummarySummary• CPU-Memory gap is major performance obstacle for
performance, HW and SW• Take advantage of program behavior: locality• Time of program still only reliable performance measure• 4Qs of memory hierarchy
– Block Placement
– Block Identification
– Block Replacement
– Write Strategy