lecture 14 fundamental memory concepts (part 2)
TRANSCRIPT
![Page 1: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/1.jpg)
Xuan ‘Silvia’ Zhang Washington University in St. Louis
http://classes.engineering.wustl.edu/ese566/
Lecture 14 Fundamental Memory Concepts (Part 2)
![Page 2: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/2.jpg)
Memory Structure and Technology
• Register Files
2
![Page 3: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/3.jpg)
Memory Structure and Technology
• SRAM (cache, on-chip)
3
![Page 4: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/4.jpg)
Memory Structure and Technology
• DRAM
4
![Page 5: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/5.jpg)
Memory Structure and Technology
• DRAM
5
![Page 6: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/6.jpg)
Memory Structure and Technology
• Disk – magnetic hard drives require rotating platters resulting
in long random access times with have hardly improved over several decades
• Flash – solid-state drives using flash have 100x lower latencies,
but also lower density and higher cost
6
![Page 7: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/7.jpg)
Memory Technology Trade-offs
7
![Page 8: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/8.jpg)
Latency Numbers: every programmers (architect) should know
8
find updated at https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html
![Page 9: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/9.jpg)
Cache Memories in Computer Architecture
• Three key questions – how much data is aggregated in a cache line – how do we organize multiple lines in cache – what data is replaced to make room for new data when
cache is full
• Categorizing misses • Write policies • Multi-level cache
9
![Page 10: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/10.jpg)
Typical Access Pattern instruction vs data access, temporal vs spatial locality
10
![Page 11: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/11.jpg)
Understand Locality
11
![Page 12: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/12.jpg)
Q1: How Much Data is Stored in a Cache Line?
12
![Page 13: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/13.jpg)
Q2: How to Organize Multiple Lines in the Cache?
13
![Page 14: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/14.jpg)
Increase Cache Associativity
14
![Page 15: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/15.jpg)
Increase Cache Line Size
15
![Page 16: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/16.jpg)
Q3: What Data to be Replaced When Cache is Full?
• No choice in a direct-mapped cache • Random
– good average case performance, but difficult to implement
• Least recently used (LRU) – replace cache line which has not been accessed recently – exploits temporal locality – LRU cache state must be updated on every access – two-way cache can use a single “last used bit” – pseudo-LRU uses binary tree to approximate LRU for higher
associativity
• First-In First-Out (FIFO, Round Robin) – simpler implementation without exploiting temporal locality – potentially useful in large fully-associative caches
16
![Page 17: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/17.jpg)
Categorize Misses: The Three C’s
• Compulsory – first-reference to a block
• Capacity – cache is too small to hold all of the data
• Conflict – collisions in a specific set
17
![Page 18: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/18.jpg)
Classify Misses as a Sequence of Three Questions
• Q1: would this miss occur in a cache with infinite capacity? – if “yes”, then this is a compulsory miss – if “no”, consider Q2
• Q2: would this miss occur in a fully-associate cache with the desired capacity? – if “yes”, then this is a capacity miss – if “no”, then consider Q3
• Q3: would this miss occur in a cache with the desired capacity and associativity? – if “yes”, then this is a conflict miss – if “no”, then this is not a miss—it is a hit!
18
![Page 19: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/19.jpg)
Examples
19
![Page 20: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/20.jpg)
Write Policies
• Write-through with no write allocate – on write miss, write memory but do not bring line into
cache – on write hit, write both cache and memory – require more memory bandwidth, but simpler
implementation
• Write-back with write allocate – on write miss, bring cache line into cache then write – on write hit, only write cache, do not write memory – only update memory when a dirty cache line is evicted – more efficient, but more complicated to implement
20
![Page 21: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/21.jpg)
Translation, Protection, and Virtualization
• Translation – mapping of virtual address to physical address
• Protection – permission to access address in memory
• Virtualization – transparent extension of memory space using disk
21
Most modern systems provide support for all three functions with a single paged-based memory management unit (MMU).
![Page 22: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/22.jpg)
Analyze Memory Performance
• mem access/sequence depends on program and translation
• time/cycle depends on microarchitecture and implementation
• also known as average memory access latency (AMAL)
22
![Page 23: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/23.jpg)
Analyze Memory Performance
• average cycles/hit is called the hit latency; it depends on microarchitecture
• number of misses/number of access is called the miss rate; it depends on microarchitecture
• average extra cycles/miss is called the miss penalty; it depends on microarchitecture, rest of memory system
23
![Page 24: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/24.jpg)
Analyze Memory Performance
24
![Page 25: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/25.jpg)
Transactions and Steps, Now for Memory
• Executing a memory access involves a sequence of steps – check tag: check one or more tags in cache – select victim: select victim line from cache using
replacement policy – evict victim: evict victim line from cache and write
victim to memory – refill: refill requested line by reading line from memory – write mem: write requested word to memory – access data: read or write requested word in cache
25
![Page 26: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/26.jpg)
Memory Microarchitecture Overview
26
![Page 27: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/27.jpg)
High-level Idea for FSM Cache
27
![Page 28: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/28.jpg)
FSM Cache Datapath
28
![Page 29: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/29.jpg)
High-level Idea for Pipelined Cache
29
![Page 30: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/30.jpg)
Pipeline Cache Datapath
30
![Page 31: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/31.jpg)
Questions?
Comments?
Discussion?
31
![Page 32: Lecture 14 Fundamental Memory Concepts (Part 2)](https://reader034.vdocuments.site/reader034/viewer/2022042219/625a18d390330e514e715253/html5/thumbnails/32.jpg)
Acknowledgement
Cornell University, ECE 4750
32